Hi, all
I use distcp copying data from hadoop1.0.3 to hadoop 2.0.1.
When the file path(or file name) contain Chinese character, an
exception will throw. Like below. I need some help about this.
Thanks.
[hdfs@host ~]$ hadoop distcp -i -prbugp -m 14 -overwrite -log
/tmp/distcp.log hftp://10.xx.xx.aa:50070/tmp/中文路径测试
hdfs://10.xx.xx.bb:54310/tmp/distcp_test14
12/08/28 23:32:31 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=true, maxMaps=14, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null,
sourcePaths=[hftp://10.xx.xx.aa:50070/tmp/中文路径测试],
targetPath=hdfs://10.xx.xx.bb:54310/tmp/distcp_test14}
12/08/28 23:32:33 INFO tools.DistCp: DistCp job log path: /tmp/distcp.log
12/08/28 23:32:34 WARN conf.Configuration: io.sort.mb is deprecated.
Instead, use mapreduce.task.io.sort.mb
12/08/28 23:32:34 WARN conf.Configuration: io.sort.factor is deprecated.
Instead, use mapreduce.task.io.sort.factor
12/08/28 23:32:34 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/08/28 23:32:36 INFO mapreduce.JobSubmitter: number of splits:1
12/08/28 23:32:36 WARN conf.Configuration: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
12/08/28 23:32:36 WARN conf.Configuration:
mapred.map.tasks.speculative.execution is deprecated. Instead, use
mapreduce.map.speculative
12/08/28 23:32:36 WARN conf.Configuration: mapred.reduce.tasks is
deprecated. Instead, use mapreduce.job.reduces
12/08/28 23:32:36 WARN conf.Configuration: mapred.mapoutput.value.class is
deprecated. Instead, use mapreduce.map.output.value.class
12/08/28 23:32:36 WARN conf.Configuration: mapreduce.map.class is
deprecated. Instead, use mapreduce.job.map.class
12/08/28 23:32:36 WARN conf.Configuration: mapred.job.name is deprecated.
Instead, use mapreduce.job.name
12/08/28 23:32:36 WARN conf.Configuration: mapreduce.inputformat.class is
deprecated. Instead, use mapreduce.job.inputformat.class
12/08/28 23:32:36 WARN conf.Configuration: mapred.output.dir is deprecated.
Instead, use mapreduce.output.fileoutputformat.outputdir
12/08/28 23:32:36 WARN conf.Configuration: mapreduce.outputformat.class is
deprecated. Instead, use mapreduce.job.outputformat.class
12/08/28 23:32:36 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps
12/08/28 23:32:36 WARN conf.Configuration: mapred.mapoutput.key.class is
deprecated. Instead, use mapreduce.map.output.key.class
12/08/28 23:32:36 WARN conf.Configuration: mapred.working.dir is deprecated.
Instead, use mapreduce.job.working.dir
12/08/28 23:32:37 INFO mapred.ResourceMgrDelegate: Submitted application
application_1345831938927_0039 to ResourceManager at baby20/10.1.1.40:8040
12/08/28 23:32:37 INFO mapreduce.Job: The url to track the job:
http://baby20:8088/proxy/application_1345831938927_0039/
12/08/28 23:32:37 INFO tools.DistCp: DistCp job-id: job_1345831938927_0039
12/08/28 23:32:37 INFO mapreduce.Job: Running job: job_1345831938927_0039
12/08/28 23:32:50 INFO mapreduce.Job: Job job_1345831938927_0039 running in
uber mode : false
12/08/28 23:32:50 INFO mapreduce.Job: map 0% reduce 0%
12/08/28 23:33:00 INFO mapreduce.Job: map 100% reduce 0%
12/08/28 23:33:00 INFO mapreduce.Job: Task Id :
attempt_1345831938927_0039_m_000000_0, Status : FAILED
Error: java.io.IOException: File copy failed: hftp://10.1.1.26:50070/tmp/中
文路径测试/part-r-00017 -->
hdfs://10.1.1.40:54310/tmp/distcp_test14/part-r-00017
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:
262)
at
org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1232)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying
hftp://10.1.1.26:50070/tmp/中文路径测试/part-r-00017 to
hdfs://10.1.1.40:54310/tmp/distcp_test14/part-r-00017
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:
101)
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:
258)
... 10 more
Caused by:
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException:
java.io.IOException: HTTP_OK expected, received 500
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableF
ileCopyCommand.java:201)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableF
ileCopyCommand.java:167)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(Retria
bleFileCopyCommand.java:112)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFile
CopyCommand.java:90)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableF
ileCopyCommand.java:71)
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:
87)
... 11 more
Caused by: java.io.IOException: HTTP_OK expected, received 500
at
org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderInputStream.checkResponseCo
de(HftpFileSystem.java:381)
at
org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputSt
ream.java:121)
at
org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStr
eam.java:103)
at
org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:1
58)
at java.io.DataInputStream.read(DataInputStream.java:132)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at
org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.
java:70)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableF
ileCopyCommand.java:198)
... 16 more