hi,maillist:
i now use distcp to migrate data from CDH4.4 to CDH5.1 , i
find when copy small file,it very good, but when transfer big data ,it very
slow ,any good method recommand? thanks
Did you specified how many map tasks?
On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote:
hi,maillist:
i now use distcp to migrate data from CDH4.4 to CDH5.1 , i
find when copy small file,it very good, but when transfer big data ,it very
slow ,any good method
no ,all default
On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com wrote:
Did you specified how many map tasks?
On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote:
hi,maillist:
i now use distcp to migrate data from CDH4.4 to CDH5.1 , i
find when
What is your approx input size ?
Do you have multiple files or is this one large file ?
What is your block size (source and destination cluster) ?
On Fri, Oct 17, 2014 at 4:19 AM, ch huang justlo...@gmail.com wrote:
no ,all default
On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com
Distcp?
On 17 Oct 2014 20:51, Alexander Pivovarov apivova...@gmail.com wrote:
try to run on dest cluster datanode
$ hadoop fs -cp hdfs://from_cluster/hdfs://to_cluster/
On Fri, Oct 17, 2014 at 11:26 AM, Shivram Mani sm...@pivotal.io wrote:
What is your approx input size ?
Do
some file , total size is 2T ,and block size is 128M
On Sat, Oct 18, 2014 at 2:26 AM, Shivram Mani sm...@pivotal.io wrote:
What is your approx input size ?
Do you have multiple files or is this one large file ?
What is your block size (source and destination cluster) ?
On Fri, Oct 17,
yes
On Sat, Oct 18, 2014 at 3:53 AM, Jakub Stransky stransky...@gmail.com
wrote:
Distcp?
On 17 Oct 2014 20:51, Alexander Pivovarov apivova...@gmail.com wrote:
try to run on dest cluster datanode
$ hadoop fs -cp hdfs://from_cluster/hdfs://to_cluster/
On Fri, Oct 17, 2014 at
Distcp is pretty restrictive w.r.t parallelizing data copy. If all that you
are doing is one large file, distcp wouldn't make this any faster.
In distcp, files are the lowest level of granularity. So increasing # of
maps, may not necessarily increase the overall throughput.
The default number of
If you still do want to use distcp
1. Break the file into smaller files (only if you have the luxury of doing
this
2. Use the -m” option to set the number of mappers.
(Each map task will aim at copying (total bytes across all file) /
numSplits. Uses the UniformSizeInputFormat by default
3.