for IIS ftp server on Windows, seems the distcp tool always failed on the line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()
Opened a jira for this issue: HADOOP-11886 2015-04-27 16:36 GMT+08:00 sam liu <[email protected]>: > Hi Experts, > > It is really weird that DistCp could successfully get the file from > FileZilla ftp server on Windows7, but failed from the IIS ftp server on the > same Windows7 OS(but I can get file using wget directly: 'wget > ftp://Viewer:[email protected]:21/ftp_file1.txt' ). I tried several > times, but all failed and encountered different error messages as below. > > Any comments? > > *[Success on FileZilla ftp server on Windows7]:* > [[email protected] ~]$ hadoop distcp > ftp://ftp:[email protected]:121/ftp_test.txt /tmp/ > 15/04/26 22:56:20 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ > ftp://ftp:[email protected]:121/ftp_test.txt], targetPath=/tmp, > targetPathExists=true, preserveRawXattrs=false} > 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address: > http://hostname2.com:8188/ws/v1/timeline/ > 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at > hostname2.com/9.32.249.181:8050 > 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address: > http://hostname2.com:8188/ws/v1/timeline/ > 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at > hostname2.com/9.32.249.181:8050 > 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1 > 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1429858372957_0002 > 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application > application_1429858372957_0002 > 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job: > http://hostname2.com:8088/proxy/application_1429858372957_0002/ > 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002 > 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002 > 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running > in uber mode : false > 15/04/26 22:56:51 INFO mapreduce.Job: map 0% reduce 0% > > *[Failure 1 on IIS ftp server on the same Windows7 OS] :* > [[email protected] ~]$ hadoop distcp > ftp://Viewer:[email protected]:21/ftp_file1.txt /tmp/ > 15/04/27 00:02:45 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ > ftp://Viewer:[email protected]:21/ftp_file1.txt], targetPath=/tmp, > targetPathExists=true, preserveRawXattrs=false} > 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address: > http://hostname2.com:8188/ws/v1/timeline/ > 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at > hostname2.com/9.32.249.181:8050 > 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input: > org.apache.hadoop.tools.CopyListing$InvalidInputException: > ftp://Viewer:[email protected]:21/ftp_file1.txt doesn't exist > at > org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84) > at > org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) > at > org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:401) > > *[Failure 2 on IIS ftp server on the same Windows7 OS] :* > [[email protected] ~]$ hadoop distcp > ftp://Viewer:[email protected]/ftp-win.txt /tmp/ > 15/02/01 23:03:37 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ > ftp://Viewer:[email protected]/ftp-win.txt], targetPath=/tmp, > targetPathExists=true} > 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at > hostname2.com/9.32.249.181:8032 > 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered > org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed > without indication. > at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313) > at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290) > at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479) > at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552) > at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601) > at org.apache.commons.net.ftp.FTP.quit(FTP.java:809) > at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979) > at > org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151) > at > org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395) > at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) > at org.apache.hadoop.fs.Globber.glob(Globber.java:248) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632) > at > org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) > at > org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80) > at > org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:390) > > *[Failure 3 on IIS ftp server on the same Windows7 OS] :* > [[email protected] ~]$ hadoop distcp > ftp://Viewer:[email protected]:21/ftp_file1.txt /tmp/ > 15/04/27 00:08:18 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ > ftp://Viewer:[email protected]:21/ftp_file1.txt], targetPath=/tmp, > targetPathExists=true, preserveRawXattrs=false} > 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address: > http://hostname2.com:8188/ws/v1/timeline/ > 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at > hostname2.com/9.32.249.181:8050 > 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:196) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:154) > at java.io.BufferedReader.read(BufferedReader.java:175) > at > org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58) > at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310) > at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290) > at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479) > at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552) > at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601) > at org.apache.commons.net.ftp.FTP.quit(FTP.java:809) > at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979) > at > org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162) > at > org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410) > at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) > at org.apache.hadoop.fs.Globber.glob(Globber.java:252) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625) > at > org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) > at > org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) > at > org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:401) > > Thanks! > > > 2015-02-02 15:41 GMT+08:00 sam liu <[email protected]>: > >> Hi Experts, >> >> I could run distcp against ftp server installed on Linux, but could NOT >> run distcp against ftp server installed on Windows. Below are the steps. >> >> Is this a DistCp bug? Any comments? >> >> [Scenario 1] >> I installed a BI cluster using trunk build on HadoopNode1, and then could >> copy file from a ftp installed on Linux to hdfs using command: >> hadoop distcp ftp://user1:[email protected]/home/user1/ftp.txt >> hdfs://HadoopNode1:9000/tmp/ >> >> [Scenario 2] >> On the same hadoop node, I can copy file from a remote ftp server >> installed on Windows7 using command: >> wget ftp://Viewer:[email protected]/ftp-win.txt. >> >> But I failed to copy file from a ftp installed on Windows7 to hdfs using >> command: >> [user1@HadoopNode1 ~]$ hadoop distcp >> ftp://Viewer:[email protected]/ftp-win.txt /tmp/ >> 15/02/01 23:03:37 INFO tools.DistCp: Input Options: >> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, >> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', >> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ >> ftp://Viewer:[email protected]/ftp-win.txt], targetPath=/tmp, >> targetPathExists=true} >> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at >> HadoopNode1/9.30.239.166:8032 >> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered >> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection >> closed without indication. >> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313) >> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290) >> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479) >> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552) >> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601) >> at org.apache.commons.net.ftp.FTP.quit(FTP.java:809) >> at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979) >> at >> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151) >> at >> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395) >> at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) >> at org.apache.hadoop.fs.Globber.glob(Globber.java:248) >> at >> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632) >> at >> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) >> at >> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80) >> at >> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342) >> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154) >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:390) >> >> Thanks! >> > >
