There's a DistCP utility for this kind of purpose;
Also there's Spring XD there, but I am not sure if you want to use it.
Regards,
*Stanley Shi,*
On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan
radhakrishnan.mo...@gmail.com wrote:
Hi,
We used a commercial FT and scheduler
I am a beginner. But this seems to be similar to what I intend. The data
source will be external FTP or S3 storage.
Spark Streaming can read data from HDFS
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
,Flume http://flume.apache.org/, Kafka
Hi,
We used a commercial FT and scheduler tool in clustered mode.
This was a traditional active-active cluster that supported multiple
protocols like FTPS etc.
Now I am interested in evaluating a Distributed way of crawling FTP
sites and downloading files using Hadoop. I thought