On 11 July 2013 06:27, Hao Ren <h....@claravista.fr> wrote: > Hi, > > I am running a hdfs on Amazon EC2 > > Say, I have a ftp server where stores some data. >
I just want to copy these data directly to hdfs in a parallel way (which > maybe more efficient). > > I think hadoop distcp is what I need. > http://hadoop.apache.org/docs/stable/distcp.html DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting I doubt this is going to help. Are these lot of files. If yes, how about multiple copy jobs to hdfs? -balaji