What is the best way to bring such a huge file from a FTP server into Hadoop to
persist in HDFS? Since a single jvm process might run out of memory, I was
wondering if I can use Spark or Flume to do this. Any help on this matter is
appreciated.
I prefer a application/process running inside
Why do you need to use Spark or Flume for this?
You can just use curl and hdfs:
curl ftp://blah | hdfs dfs -put - /blah
On Fri, Aug 14, 2015 at 1:15 PM, Varadhan, Jawahar
varad...@yahoo.com.invalid wrote:
What is the best way to bring such a huge file from a FTP server into
Hadoop to
Well what do you do in case of failure?
I think one should use a professional ingestion tool that ideally does not
need to reload everything in case of failure and verifies that the file has
been transferred correctly via checksums.
I am not sure if Flume supports ftp, but Ssh,scp should be