Detect when file is not being written by another process

Peter Sheridan Tue, 25 Sep 2012 09:28:57 -0700

Hi all.

We're using Hadoop 1.0.3.  We need to pick up a set of large (4+GB) files when 
they've finished being written to HDFS by a different process.  There doesn't 
appear to be an API specifically for this.  We had discovered through 
experimentation that the FileSystem.append() method can be used for this 
purpose — it will fail if another process is writing to the file.


However: when running this on a multi-node cluster, using that API actually 
corrupts the file.  Perhaps this is a known issue?  Looking at the bug tracker 
I see https://issues.apache.org/jira/browse/HDFS-265 and a bunch of 
similar-sounding things.

What's the right way to solve this problem?  Thanks.


--Pete

Detect when file is not being written by another process

Reply via email to