Input/Output Formaters and FileTypes

2008-06-20 Thread Mathos Marcer
Presumedly like most I've started off with mainly using "Text" based
input and output formatters and using key and values as Text or
IntWritable.  I've been looking more into the other formatters and
writable classes and wondering what they would do for me.  To help
spur some best practices and lessons learned conversations:  What are
the benefits of the other formatters?  And benefits of MapFiles and
SequenceFiles?  What are people out there using or have found gave
them the greatest benefits?

==
MM


Re: Appending to Input Path after mapping has begun

2008-04-23 Thread Mathos Marcer
Though I have only spent a couple days reviewing the code, it seems
the crux of the problem is in the InputFormat interface in that
getSplits is only called at the initiation of the map/reduce job; it
would seem that if this method was more "iterable" in implementation
like a "getNextSplits" you could have a way to add more files to the
pipeline while in process.

==
mathos

On Wed, Apr 23, 2008 at 1:32 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
>
>
>  On Apr 22, 2008, at 11:01 AM, Thomas Cramer wrote:
>
>
> > Is it possible or how may one add to the input path after mapping has
> > begun?  More specifically say my Map process creates more files to
> > needing to "Map" and you don't want to have to keep re-initiating
> > Map/Reduce processes.  I tried simply creating files in the InputPath
> > directory.  I have also pulled the JobConf object into my map process
> > and issued an addInputPath but apparently it doesn't effect the
> > process after it is running.  Any thoughts or options?
> >
>
>  No, it isn't currently possible. I can imagine an extension to the
> framework that let's you add new input splits to a job after it has started,
> but it would be a lot of work to get it right. The primary advantage of such
> a system would be that you could increase the efficiency of a pipeline of
> map/reduce jobs.
>
>  -- Owen
>