Thanks - that helps a lot - I believe I can figure it out from there. Patrick
On Sun, Dec 15, 2013 at 6:12 PM, Cheolsoo Park <[email protected]> wrote: > Hi Patrick, > > I think what you need is > OutputCommitter#commitTask()< > http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext) > >. > This is called by Hadoop in each task process, so you can write your own > OutputCommitter class and associate it with your StoreFunc. Then you can > make a single call to your DB for the batched output per task. > > If you're looking for a way to do some final work per job, you will have to > rely on either commitJob()< > http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitJob(org.apache.hadoop.mapreduce.JobContext) > > > or > cleanUpOnSuccess(). But again, these are not called by the task process. I > am not sure what context you want to share between putNext() and > cleanUpOnSuccess(). But JobConf object will be constructed on the frontend > before launching MR jobs, and properties in this JobConf object will be > available everywhere. However, you won't be able to update some properties > in putNext() and see them in cleanUpOnSuccess(). Hope this is clear. > > Thanks, > Cheolsoo > > > > On Sun, Dec 15, 2013 at 7:11 AM, Patrick Thompson < > [email protected]> wrote: > > > So is there a good way to flush a buffer accumulated by putNext? I was > > hoping it was possible in cleanUpOnSuccess, but that apparently isn't > going > > to work. This is horrible for something talking to a store such as MySql, > > as it means you have to do updates one-at-a-time. > > > > Patrick > > > > > > On Sun, Dec 15, 2013 at 12:41 AM, Cheolsoo Park <[email protected] > > >wrote: > > > > > >> putNext and cleanUpOnSuccess will be called in the same execution > > > context? > > > > > > putNext() is called on the backend during the job execution, whereas > > > cleanUpOnSuccess() is called on the frontend after the job is finished. > > So > > > they won't be executed by the same object. From the comment, I also > doubt > > > that you can share properties between them via JobConf. > > > > > > See MapReduceLauncher.java as for how cleanUpOnSuccess() is used. > > > > > > On Thu, Dec 5, 2013 at 11:10 AM, Patrick Thompson < > > > [email protected]> wrote: > > > > > > > It's not clear from the docs where the various StoreFuncInterface > > > functions > > > > get called. There are some hints in the API > > > > docs<http://pig.apache.org/docs/r0.12.0/api/>, > > > > but I am left wondering, does pig guarantee that, for example, > putNext > > > and > > > > cleanUpOnSuccess will be called in the same execution context? > > > > > > > > Is this documented somewhere? Maybe someone can provide an answer? It > > > would > > > > save me a lot of time experimenting and spelunking in the code. > > > > > > > > Thanks > > > > > > > > Patrick > > > > > > > > > > > > > > > -- > > fun and games - a blog <http://funazonki.blogspot.com/>, a word > > game<http://1.whatwouldwho.appspot.com/wwws.html>and > > CanCan <http://www.standingwaiting.com/CanCan/Game.html> > > > -- fun and games - a blog <http://funazonki.blogspot.com/>, a word game<http://1.whatwouldwho.appspot.com/wwws.html>and CanCan <http://www.standingwaiting.com/CanCan/Game.html>
