Im finding that "hadoop fs -put" on a cluster is quite slow for me when i have large amounts of small files... much slower than native file ops. Note that Im using the RawLocalFileSystem as the underlying backing filesystem that is being written to in this case, so HDFS isnt the issue.
I see that the Put class creates a linkedlist of # number of elements in the path. 1) Is there a more performant way to run "fs -put" 2) Has anyone else noted that "fs -put" has extra overhead? Im going to trace some more but , just wanted to bounce this off the mailing list... maybe others also have run into this issue. ** Is "hadoop fs -put" inherently slower than a unix "cp"action, regardless of filesystem -- and if so , why? ** -- Jay Vyas http://jayunit100.blogspot.com
