Re: hadoop to ftp files into hdfs

2009-02-03 Thread Tom White
NLineInputFormat is ideal for this purpose. Each split will be N lines
of input (where N is configurable), so each mapper can retrieve N
files for insertion into HDFS. You can set the number of redcers to
zero.

Tom

On Tue, Feb 3, 2009 at 4:23 AM, jason hadoop jason.had...@gmail.com wrote:
 If you have a large number of ftp urls spread across many sites, simply set
 that file to be your hadoop job input, and force the input split to be a
 size that gives you good distribution across your cluster.


 On Mon, Feb 2, 2009 at 3:23 PM, Steve Morin steve.mo...@gmail.com wrote:

 Does any one have a good suggestion on how to submit a hadoop job that
 will split the ftp retrieval of a number of files for insertion into
 hdfs?  I have been searching google for suggestions on this matter.
 Steve




hadoop to ftp files into hdfs

2009-02-02 Thread Steve Morin
Does any one have a good suggestion on how to submit a hadoop job that
will split the ftp retrieval of a number of files for insertion into
hdfs?  I have been searching google for suggestions on this matter.
Steve


Re: hadoop to ftp files into hdfs

2009-02-02 Thread jason hadoop
If you have a large number of ftp urls spread across many sites, simply set
that file to be your hadoop job input, and force the input split to be a
size that gives you good distribution across your cluster.


On Mon, Feb 2, 2009 at 3:23 PM, Steve Morin steve.mo...@gmail.com wrote:

 Does any one have a good suggestion on how to submit a hadoop job that
 will split the ftp retrieval of a number of files for insertion into
 hdfs?  I have been searching google for suggestions on this matter.
 Steve