There's a DistCP utility for this kind of purpose; Also there's "Spring XD" there, but I am not sure if you want to use it.
Regards, *Stanley Shi,* On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan < radhakrishnan.mo...@gmail.com> wrote: > Hi, > We used a commercial FT and scheduler tool in clustered mode. > This was a traditional active-active cluster that supported multiple > protocols like FTPS etc. > > Now I am interested in evaluating a Distributed way of crawling FTP > sites and downloading files using Hadoop. I thought since we have to > process thousands of files Hadoop jobs can do it. > > Are Hadoop jobs used for this type of file transfers ? > > Moreover there is a requirement for a scheduler also. What is the > recommendation of the forum ? > > > Thanks, > Mohan >