I'm trying to copy data between two clusters with hadoop version Hadoop 2.0.0-cdh4.1.3 Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.3/src/hadoop-common-project/hadoop-common -r dbc7a60f9a798ef63afb7f5b723dc9c02d5321e1 Compiled by jenkins on Sat Jan 26 16:46:14 PST 2013 >From source with checksum ad1ed6a3ede2e0e9c39b052bbc76c189
and hadoop version Hadoop 2.5.0-cdh5.3.0 Subversion http://github.com/cloudera/hadoop -r f19097cda2536da1df41ff6713556c8f7284174d Compiled by jenkins on 2014-12-17T03:05Z Compiled with protoc 2.5.0 >From source with checksum 9c4267e6915cf5bbd4c6e08be54d54e0 This command was run using /usr/lib/hadoop/hadoop-common-2.5.0-cdh5.3.0.jar The command I'm using to do so is: hadoop distcp -D mapreduce.job.queuename=search -D mapreduce.job.maxtaskfailures.per.tracker=1 -pb hftp://cdh4source-cluster:50070/backups/HbaseTableCopy hdfs://cdh5dest-cluster/user/colin.williams/hbase/ I've also tried it without the -pb and -D mapreduce.job.maxtaskfailures.per.tracker=1 options. All my attempts fail, and the command prints out various errors during the attempts: Error: java.io.IOException: File copy failed: hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00018 --> hdfs://cdh5dest-cluster/user/colin.williams/hbase/HbaseTableCopy/part-m-00018 at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00018 to hdfs://cdh5dest-cluster/user/colin.williams/hbase/HbaseTableCopy/part-m-00018 at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280) ... 10 more Caused by: java.io.IOException: Check-sum mismatch between hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00018 and hdfs://cdh5dest-cluster/user/colin.williams/hbase/.distcp.tmp.attempt_1453754997414_337405_m_000007_0. at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:211) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:131) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more OR 16/03/21 17:30:47 INFO mapreduce.Job: Task Id : attempt_1453754997414_337405_m_000001_0, Status : FAILED Error: java.io.IOException: File copy failed: hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00004 --> hdfs://cdh5dest-cluster/user/colin.williams/hbase/HbaseTableCopy/part-m-00004 at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00004 to hdfs://cdh5dest-cluster/user/colin.williams/hbase/HbaseTableCopy/part-m-00004 at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280) ... 10 more Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: Got EOF but currentPos = 916783104 < filelength = 21615406422 at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:289) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:257) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more Caused by: java.io.IOException: Got EOF but currentPos = 916783104 < filelength = 21615406422 at org.apache.hadoop.hdfs.web.ByteRangeInputStream.update(ByteRangeInputStream.java:173) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:80) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:284) ... 16 more Then I see a checksum issue and the EOF issue. I've also run hadoop fsck on the source files, and it doesn't report any errors. I see many Jira issues and questions regarding DistCP. Can I get some help with this? --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
