Hi Experts, It is really weird that DistCp could successfully get the file from FileZilla ftp server on Windows7, but failed from the IIS ftp server on the same Windows7 OS(but I can get file using wget directly: 'wget ftp://Viewer:[email protected]:21/ftp_file1.txt' ). I tried several times, but all failed and encountered different error messages as below.
Any comments? *[Success on FileZilla ftp server on Windows7]:* [[email protected] ~]$ hadoop distcp ftp://ftp:[email protected]:121/ftp_test.txt /tmp/ 15/04/26 22:56:20 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ ftp://ftp:[email protected]:121/ftp_test.txt], targetPath=/tmp, targetPathExists=true, preserveRawXattrs=false} 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address: http://hostname2.com:8188/ws/v1/timeline/ 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at hostname2.com/9.32.249.181:8050 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address: http://hostname2.com:8188/ws/v1/timeline/ 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at hostname2.com/9.32.249.181:8050 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1429858372957_0002 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application application_1429858372957_0002 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job: http://hostname2.com:8088/proxy/application_1429858372957_0002/ 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running in uber mode : false 15/04/26 22:56:51 INFO mapreduce.Job: map 0% reduce 0% *[Failure 1 on IIS ftp server on the same Windows7 OS] :* [[email protected] ~]$ hadoop distcp ftp://Viewer:[email protected]:21/ftp_file1.txt /tmp/ 15/04/27 00:02:45 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ ftp://Viewer:[email protected]:21/ftp_file1.txt], targetPath=/tmp, targetPathExists=true, preserveRawXattrs=false} 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address: http://hostname2.com:8188/ws/v1/timeline/ 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at hostname2.com/9.32.249.181:8050 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input: org.apache.hadoop.tools.CopyListing$InvalidInputException: ftp://Viewer:[email protected]:21/ftp_file1.txt doesn't exist at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160) at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:401) *[Failure 2 on IIS ftp server on the same Windows7 OS] :* [[email protected] ~]$ hadoop distcp ftp://Viewer:[email protected]/ftp-win.txt /tmp/ 15/02/01 23:03:37 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ ftp://Viewer:[email protected]/ftp-win.txt], targetPath=/tmp, targetPathExists=true} 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at hostname2.com/9.32.249.181:8032 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed without indication. at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313) at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290) at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479) at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552) at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601) at org.apache.commons.net.ftp.FTP.quit(FTP.java:809) at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979) at org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151) at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:248) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154) at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:390) *[Failure 3 on IIS ftp server on the same Windows7 OS] :* [[email protected] ~]$ hadoop distcp ftp://Viewer:[email protected]:21/ftp_file1.txt /tmp/ 15/04/27 00:08:18 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ ftp://Viewer:[email protected]:21/ftp_file1.txt], targetPath=/tmp, targetPathExists=true, preserveRawXattrs=false} 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address: http://hostname2.com:8188/ws/v1/timeline/ 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at hostname2.com/9.32.249.181:8050 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:154) at java.io.BufferedReader.read(BufferedReader.java:175) at org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58) at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310) at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290) at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479) at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552) at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601) at org.apache.commons.net.ftp.FTP.quit(FTP.java:809) at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979) at org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162) at org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:252) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160) at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:401) Thanks! 2015-02-02 15:41 GMT+08:00 sam liu <[email protected]>: > Hi Experts, > > I could run distcp against ftp server installed on Linux, but could NOT > run distcp against ftp server installed on Windows. Below are the steps. > > Is this a DistCp bug? Any comments? > > [Scenario 1] > I installed a BI cluster using trunk build on HadoopNode1, and then could > copy file from a ftp installed on Linux to hdfs using command: > hadoop distcp ftp://user1:[email protected]/home/user1/ftp.txt > hdfs://HadoopNode1:9000/tmp/ > > [Scenario 2] > On the same hadoop node, I can copy file from a remote ftp server > installed on Windows7 using command: > wget ftp://Viewer:[email protected]/ftp-win.txt. > > But I failed to copy file from a ftp installed on Windows7 to hdfs using > command: > [user1@HadoopNode1 ~]$ hadoop distcp > ftp://Viewer:[email protected]/ftp-win.txt /tmp/ > 15/02/01 23:03:37 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, > ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', > copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ > ftp://Viewer:[email protected]/ftp-win.txt], targetPath=/tmp, > targetPathExists=true} > 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at > HadoopNode1/9.30.239.166:8032 > 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered > org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed > without indication. > at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313) > at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290) > at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479) > at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552) > at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601) > at org.apache.commons.net.ftp.FTP.quit(FTP.java:809) > at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979) > at > org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151) > at > org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395) > at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) > at org.apache.hadoop.fs.Globber.glob(Globber.java:248) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632) > at > org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) > at > org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80) > at > org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:390) > > Thanks! >
