Hi all. We're using Hadoop 1.0.3. We need to pick up a set of large (4+GB) files when they've finished being written to HDFS by a different process. There doesn't appear to be an API specifically for this. We had discovered through experimentation that the FileSystem.append() method can be used for this purpose — it will fail if another process is writing to the file.
However: when running this on a multi-node cluster, using that API actually corrupts the file. Perhaps this is a known issue? Looking at the bug tracker I see https://issues.apache.org/jira/browse/HDFS-265 and a bunch of similar-sounding things. What's the right way to solve this problem? Thanks. --Pete
