Hi all,
We're trying to use the (pretty much never used) FTPFileSystem in Hadoop, to
send data from a Hadoop cluster to an FTP server.
The first challenge is that the FTPFileSystem configures FTPClient to run in
active mode, which didn't work with our FTP server/firewall configuration.
So we created a PassiveFTPFileSystem that uses passive mode.
This is able to connect to the FTP server, and is able to send some files - but
ultimately this copying always fails.
On the server side we see nothing in the logs (it's using vsftp 2.2.2), even
with debug logging enabled.
On the Hadoop (client) side, we see a mix of errors in the logs. Most look
like...
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
without indication.
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:267)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:460)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:520)
at org.apache.commons.net.ftp.FTP.cwd(FTP.java:745)
I'm wondering if there's any issue running 14 parallel FTPClient sessions from
a single server - e.g. collisions in port numbers, though from my reading of
the code that doesn't seem possible.
Thanks for any input.
Regards,
-- Ken
--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr