I am using distcp to migrate data directly between directories with ec enabled 
and those without ec enabled,

Error found:

ERROR mapred.CopyMapper: Failure in copying hdfs://HACluster-test/ec/NOTICE.txt 
to hdfs://HACluster-test/notec/NOTICE.txt java.io.IOException: File copy 
failed: hdfs://HACluster-test/ec/NOTICE.txt --> 
hdfs://HACluster-test/notec2/NOTICE.txt at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262)
 at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:219) at 
org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at 
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:800) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: 
Couldn't run retriable-command: Copying hdfs://HACluster-test/ec/NOTICE.txt to 
hdfs://HACluster-test/notec/NOTICE.txt at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
 at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258)
 ... 11 more Caused by: java.io.IOException: Checksum mismatch between 
hdfs://HACluster-test/ec/NOTICE.txt and 
hdfs://HACluster-test/notec/.distcp.tmp.attempt_local1233806810_0001_m_000000_0.1693272752196.
 at 
org.apache.hadoop.tools.util.DistCpUtils.compareFileLengthsAndChecksums(DistCpUtils.java:646)
 at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:146)
 at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:115)
 at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) 
... 12 more

According to the error message, the checksum value of the same file with ec on 
and without ec on is inconsistent. Adding -skipcrccheck to distcp will migrate 
the data successfully, but I don't want to skip the check in case the data is 
inconsistent Is this a bug, or is this not supported?

Reply via email to