Hi Experts,

It is really weird that DistCp could successfully get the file from
FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
same Windows7 OS(but I can get file using wget directly: 'wget
ftp://Viewer:[email protected]:21/ftp_file1.txt' ). I tried several
times, but all failed and encountered different error messages as below.

Any comments?

*[Success on FileZilla ftp server on Windows7]:*
[[email protected] ~]$ hadoop distcp
ftp://ftp:[email protected]:121/ftp_test.txt /tmp/
15/04/26 22:56:20 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://ftp:[email protected]:121/ftp_test.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1429858372957_0002
15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
application_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
http://hostname2.com:8088/proxy/application_1429858372957_0002/
15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running in
uber mode : false
15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%

*[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
[[email protected] ~]$ hadoop distcp
ftp://Viewer:[email protected]:21/ftp_file1.txt /tmp/
15/04/27 00:02:45 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:[email protected]:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException:
ftp://Viewer:[email protected]:21/ftp_file1.txt doesn't exist
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

*[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
[[email protected] ~]$ hadoop distcp
ftp://Viewer:[email protected]/ftp-win.txt /tmp/
15/02/01 23:03:37 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:[email protected]/ftp-win.txt], targetPath=/tmp,
targetPathExists=true}
15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8032
15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
without indication.
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)

*[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
[[email protected] ~]$ hadoop distcp
ftp://Viewer:[email protected]:21/ftp_file1.txt /tmp/
15/04/27 00:08:18 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:[email protected]:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:154)
        at java.io.BufferedReader.read(BufferedReader.java:175)
        at
org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

Thanks!


2015-02-02 15:41 GMT+08:00 sam liu <[email protected]>:

> Hi Experts,
>
> I could run distcp against ftp server installed on Linux, but could NOT
> run distcp against ftp server installed on Windows. Below are the steps.
>
> Is this a DistCp bug? Any comments?
>
> [Scenario 1]
> I installed a BI cluster using trunk build on HadoopNode1, and then could
> copy file from a ftp installed on Linux to hdfs using command:
> hadoop distcp ftp://user1:[email protected]/home/user1/ftp.txt
> hdfs://HadoopNode1:9000/tmp/
>
> [Scenario 2]
> On the same hadoop node, I can copy file from a remote ftp server
> installed on Windows7 using command:
> wget ftp://Viewer:[email protected]/ftp-win.txt.
>
> But I failed to copy file from a ftp installed on Windows7 to hdfs using
> command:
> [user1@HadoopNode1 ~]$ hadoop distcp
> ftp://Viewer:[email protected]/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:[email protected]/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> HadoopNode1/9.30.239.166:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> Thanks!
>

Reply via email to