Hello

I have a use case that requires transfer of input files from remote storage
using SCP protocol (using jSCH jar).  To optimize this use case, I have
pre-loaded all my input files into HDFS and modified my use case so that it
copies required files from HDFS. So, when tasktrackers works, it copies
required number of input files to its local directory from HDFS. All my
tasktrackers are also datanodes. I could see my use case has run faster.
The only modification in my application is that file copy from HDFS instead
of transfer using SCP. Also, my use case involves parallel operations (run
in tasktrackers) and they do lot of file transfer. Now all these transfers
are replaced with HDFS copy.

Can anyone tell me HDFS transfer is faster as I witnessed? Is it because,
it uses TCP/IP? Can anyone give me reasonable reasons to support the
decrease of time?


with thanks and regards
rab

Reply via email to