Hi Patrick, I think what you need is OutputCommitter#commitTask()<http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext)>. This is called by Hadoop in each task process, so you can write your own OutputCommitter class and associate it with your StoreFunc. Then you can make a single call to your DB for the batched output per task.
If you're looking for a way to do some final work per job, you will have to rely on either commitJob()<http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitJob(org.apache.hadoop.mapreduce.JobContext)> or cleanUpOnSuccess(). But again, these are not called by the task process. I am not sure what context you want to share between putNext() and cleanUpOnSuccess(). But JobConf object will be constructed on the frontend before launching MR jobs, and properties in this JobConf object will be available everywhere. However, you won't be able to update some properties in putNext() and see them in cleanUpOnSuccess(). Hope this is clear. Thanks, Cheolsoo On Sun, Dec 15, 2013 at 7:11 AM, Patrick Thompson < [email protected]> wrote: > So is there a good way to flush a buffer accumulated by putNext? I was > hoping it was possible in cleanUpOnSuccess, but that apparently isn't going > to work. This is horrible for something talking to a store such as MySql, > as it means you have to do updates one-at-a-time. > > Patrick > > > On Sun, Dec 15, 2013 at 12:41 AM, Cheolsoo Park <[email protected] > >wrote: > > > >> putNext and cleanUpOnSuccess will be called in the same execution > > context? > > > > putNext() is called on the backend during the job execution, whereas > > cleanUpOnSuccess() is called on the frontend after the job is finished. > So > > they won't be executed by the same object. From the comment, I also doubt > > that you can share properties between them via JobConf. > > > > See MapReduceLauncher.java as for how cleanUpOnSuccess() is used. > > > > On Thu, Dec 5, 2013 at 11:10 AM, Patrick Thompson < > > [email protected]> wrote: > > > > > It's not clear from the docs where the various StoreFuncInterface > > functions > > > get called. There are some hints in the API > > > docs<http://pig.apache.org/docs/r0.12.0/api/>, > > > but I am left wondering, does pig guarantee that, for example, putNext > > and > > > cleanUpOnSuccess will be called in the same execution context? > > > > > > Is this documented somewhere? Maybe someone can provide an answer? It > > would > > > save me a lot of time experimenting and spelunking in the code. > > > > > > Thanks > > > > > > Patrick > > > > > > > > > -- > fun and games - a blog <http://funazonki.blogspot.com/>, a word > game<http://1.whatwouldwho.appspot.com/wwws.html>and > CanCan <http://www.standingwaiting.com/CanCan/Game.html> >
