Thanks - that helps a lot - I believe I can figure it out from there.

Patrick


On Sun, Dec 15, 2013 at 6:12 PM, Cheolsoo Park <[email protected]> wrote:

> Hi Patrick,
>
> I think what you need is
> OutputCommitter#commitTask()<
> http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext)
> >.
> This is called by Hadoop in each task process, so you can write your own
> OutputCommitter class and associate it with your StoreFunc. Then you can
> make a single call to your DB for the batched output per task.
>
> If you're looking for a way to do some final work per job, you will have to
> rely on either commitJob()<
> http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitJob(org.apache.hadoop.mapreduce.JobContext)
> >
> or
> cleanUpOnSuccess(). But again, these are not called by the task process. I
> am not sure what context you want to share between putNext() and
> cleanUpOnSuccess(). But JobConf object will be constructed on the frontend
> before launching MR jobs, and properties in this JobConf object will be
> available everywhere. However, you won't be able to update some properties
> in putNext() and see them in cleanUpOnSuccess(). Hope this is clear.
>
> Thanks,
> Cheolsoo
>
>
>
> On Sun, Dec 15, 2013 at 7:11 AM, Patrick Thompson <
> [email protected]> wrote:
>
> > So is there a good way to flush a buffer accumulated by putNext? I was
> > hoping it was possible in cleanUpOnSuccess, but that apparently isn't
> going
> > to work. This is horrible for something talking to a store such as MySql,
> > as it means you have to do updates one-at-a-time.
> >
> > Patrick
> >
> >
> > On Sun, Dec 15, 2013 at 12:41 AM, Cheolsoo Park <[email protected]
> > >wrote:
> >
> > > >> putNext and cleanUpOnSuccess will be called in the same execution
> > > context?
> > >
> > > putNext() is called on the backend during the job execution, whereas
> > > cleanUpOnSuccess() is called on the frontend after the job is finished.
> > So
> > > they won't be executed by the same object. From the comment, I also
> doubt
> > > that you can share properties between them via JobConf.
> > >
> > > See MapReduceLauncher.java as for how cleanUpOnSuccess() is used.
> > >
> > > On Thu, Dec 5, 2013 at 11:10 AM, Patrick Thompson <
> > > [email protected]> wrote:
> > >
> > > > It's not clear from the docs where the various StoreFuncInterface
> > > functions
> > > > get called. There are some hints in the API
> > > > docs<http://pig.apache.org/docs/r0.12.0/api/>,
> > > > but I am left wondering, does pig guarantee that, for example,
> putNext
> > > and
> > > > cleanUpOnSuccess will be called in the same execution context?
> > > >
> > > > Is this documented somewhere? Maybe someone can provide an answer? It
> > > would
> > > > save me a lot of time experimenting and spelunking in the code.
> > > >
> > > > Thanks
> > > >
> > > > Patrick
> > > >
> > >
> >
> >
> >
> > --
> > fun and games - a blog <http://funazonki.blogspot.com/>, a word
> > game<http://1.whatwouldwho.appspot.com/wwws.html>and
> > CanCan <http://www.standingwaiting.com/CanCan/Game.html>
> >
>



-- 
fun and games - a blog <http://funazonki.blogspot.com/>, a word
game<http://1.whatwouldwho.appspot.com/wwws.html>and
CanCan <http://www.standingwaiting.com/CanCan/Game.html>

Reply via email to