Though I have only spent a couple days reviewing the code, it seems
the crux of the problem is in the InputFormat interface in that
getSplits is only called at the initiation of the map/reduce job; it
would seem that if this method was more "iterable" in implementation
like a "getNextSplits" you could have a way to add more files to the
pipeline while in process.
==
mathos
On Wed, Apr 23, 2008 at 1:32 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
>
>
> On Apr 22, 2008, at 11:01 AM, Thomas Cramer wrote:
>
>
> > Is it possible or how may one add to the input path after mapping has
> > begun? More specifically say my Map process creates more files to
> > needing to "Map" and you don't want to have to keep re-initiating
> > Map/Reduce processes. I tried simply creating files in the InputPath
> > directory. I have also pulled the JobConf object into my map process
> > and issued an addInputPath but apparently it doesn't effect the
> > process after it is running. Any thoughts or options?
> >
>
> No, it isn't currently possible. I can imagine an extension to the
> framework that let's you add new input splits to a job after it has started,
> but it would be a lot of work to get it right. The primary advantage of such
> a system would be that you could increase the efficiency of a pipeline of
> map/reduce jobs.
>
> -- Owen
>