Still confusing. So the entire execution ( from B = LOAD ..... onwards ) will be blocked till 'out1' is stored ?
> STORE A INTO 'out1'; > EXEC; > B = LOAD 'data2'; > C = FOREACH B GENERATE MYUDF($0,'out1'); > STORE C INTO 'out2'; Any way to restrict it to a particular block ? -Prasen On Tue, Feb 16, 2010 at 2:35 PM, Dmitriy Ryaboy <[email protected]> wrote: > It's been in the docs since 0.3 > > http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html > Implicit Dependencies > > If a script has dependencies on the execution order outside of what Pig > knows about, execution may fail. For instance, in this script MYUDF might > try to read from out1, a file that A was just stored into. However, Pig does > not know that MYUDF depends on the out1 file and might submit the jobs > producing the out2 and out1 files at the same time. > > ... > STORE A INTO 'out1'; > B = LOAD 'data2'; > C = FOREACH B GENERATE MYUDF($0,'out1'); > STORE C INTO 'out2'; > > To make the script work (to ensure that the right execution order is > enforced) add the exec statement. The exec statement will trigger the > execution of the statements that produce the out1 file. > > ... > STORE A INTO 'out1'; > EXEC; > B = LOAD 'data2'; > C = FOREACH B GENERATE MYUDF($0,'out1'); > STORE C INTO 'out2'; > > > > On Tue, Feb 16, 2010 at 12:46 AM, Mridul Muralidharan <[email protected]> > wrote: >> >> Is this documented behavior or current impl detail ? >> A lot of scripts broke when multi-query optimization was committed to > trunk >> because of the implicit ordering assumption (based on STORE) in earlier > pig >> - which was, iirc, documented. >> >> >> Regards, >> Mridul >> >> On Thursday 11 February 2010 10:52 PM, Dmitriy Ryaboy wrote: >>> >>> EXEC will trigger execution of the code that precedes it. >>> >>> >>> >>> On Thu, Feb 11, 2010 at 9:12 AM, prasenjit mukherjee >>> <[email protected]> wrote: >>>> >>>> Is there any way I can have a pig statement wait for a condition.This >>>> is what I am trying to do : I am first creating and storing a >>>> relation in pig, and then I want to upload that relation via >>>> STREAM/DEFINE command. Here is the pig script I am tryign to write : >>>> >>>> ......... >>>> STORE r1 INTO 'myoutput.data' >>>> STREAM 'myfile_containing_output_dat.txt' THRUGH `upload.py` >>>> >>>> Any way I can acheive this ? >>>> >>>> -Prasen >>>> >> >> >
