how to copy data between two hdfs cluster fastly?

2014-10-17 Thread ch huang
hi,maillist: i now use distcp to migrate data from CDH4.4 to CDH5.1 , i find when copy small file,it very good, but when transfer big data ,it very slow ,any good method recommand? thanks

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread Azuryy Yu
Did you specified how many map tasks? On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i now use distcp to migrate data from CDH4.4 to CDH5.1 , i find when copy small file,it very good, but when transfer big data ,it very slow ,any good method

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread ch huang
no ,all default On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com wrote: Did you specified how many map tasks? On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i now use distcp to migrate data from CDH4.4 to CDH5.1 , i find when

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread Shivram Mani
What is your approx input size ? Do you have multiple files or is this one large file ? What is your block size (source and destination cluster) ? On Fri, Oct 17, 2014 at 4:19 AM, ch huang justlo...@gmail.com wrote: no ,all default On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread Jakub Stransky
Distcp? On 17 Oct 2014 20:51, Alexander Pivovarov apivova...@gmail.com wrote: try to run on dest cluster datanode $ hadoop fs -cp hdfs://from_cluster/hdfs://to_cluster/ On Fri, Oct 17, 2014 at 11:26 AM, Shivram Mani sm...@pivotal.io wrote: What is your approx input size ? Do

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread ch huang
some file , total size is 2T ,and block size is 128M On Sat, Oct 18, 2014 at 2:26 AM, Shivram Mani sm...@pivotal.io wrote: What is your approx input size ? Do you have multiple files or is this one large file ? What is your block size (source and destination cluster) ? On Fri, Oct 17,

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread ch huang
yes On Sat, Oct 18, 2014 at 3:53 AM, Jakub Stransky stransky...@gmail.com wrote: Distcp? On 17 Oct 2014 20:51, Alexander Pivovarov apivova...@gmail.com wrote: try to run on dest cluster datanode $ hadoop fs -cp hdfs://from_cluster/hdfs://to_cluster/ On Fri, Oct 17, 2014 at

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread Shivram Mani
Distcp is pretty restrictive w.r.t parallelizing data copy. If all that you are doing is one large file, distcp wouldn't make this any faster. In distcp, files are the lowest level of granularity. So increasing # of maps, may not necessarily increase the overall throughput. The default number of

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread Shivram Mani
If you still do want to use distcp 1. Break the file into smaller files (only if you have the luxury of doing this 2. Use the -m” option to set the number of mappers. (Each map task will aim at copying (total bytes across all file) / numSplits. Uses the UniformSizeInputFormat by default 3.