No , im using a glob pattern, its all done in one "put" statement
On Tue, Jan 28, 2014 at 9:22 PM, Harsh J <ha...@cloudera.com> wrote: > Are you calling one command per file? That's bound to be slow as it > invokes a new JVM each time. > On Jan 29, 2014 7:15 AM, "Jay Vyas" <jayunit...@gmail.com> wrote: > >> Im finding that "hadoop fs -put" on a cluster is quite slow for me when i >> have large amounts of small files... much slower than native file ops. >> Note that Im using the RawLocalFileSystem as the underlying backing >> filesystem that is being written to in this case, so HDFS isnt the issue. >> >> I see that the Put class creates a linkedlist of # number of elements in >> the path. >> >> 1) Is there a more performant way to run "fs -put" >> >> 2) Has anyone else noted that "fs -put" has extra overhead? >> >> Im going to trace some more but , just wanted to bounce this off the >> mailing list... maybe others also have run into this issue. >> >> ** Is "hadoop fs -put" inherently slower than a unix "cp"action, >> regardless of filesystem -- and if so , why? ** >> >> >> -- >> Jay Vyas >> http://jayunit100.blogspot.com >> > -- Jay Vyas http://jayunit100.blogspot.com