It's been in the docs since 0.3

http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html
Implicit Dependencies

If a script has dependencies on the execution order outside of what Pig
knows about, execution may fail. For instance, in this script MYUDF might
try to read from out1, a file that A was just stored into. However, Pig does
not know that MYUDF depends on the out1 file and might submit the jobs
producing the out2 and out1 files at the same time.

...
STORE A INTO 'out1';
B = LOAD 'data2';
C = FOREACH B GENERATE MYUDF($0,'out1');
STORE C INTO 'out2';

To make the script work (to ensure that the right execution order is
enforced) add the exec statement. The exec statement will trigger the
execution of the statements that produce the out1 file.

...
STORE A INTO 'out1';
EXEC;
B = LOAD 'data2';
C = FOREACH B GENERATE MYUDF($0,'out1');
STORE C INTO 'out2';



On Tue, Feb 16, 2010 at 12:46 AM, Mridul Muralidharan <[email protected]>
wrote:
>
> Is this documented behavior or current impl detail ?
> A lot of scripts broke when multi-query optimization was committed to
trunk
> because of the implicit ordering assumption (based on STORE) in earlier
pig
> - which was, iirc, documented.
>
>
> Regards,
> Mridul
>
> On Thursday 11 February 2010 10:52 PM, Dmitriy Ryaboy wrote:
>>
>> EXEC will trigger execution of the code that precedes it.
>>
>>
>>
>> On Thu, Feb 11, 2010 at 9:12 AM, prasenjit mukherjee
>> <[email protected]>  wrote:
>>>
>>> Is there any way I can have a pig statement wait for a condition.This
>>> is what I am trying to do :  I am first creating and storing a
>>> relation in pig, and then I want to upload that relation via
>>> STREAM/DEFINE command. Here is the pig script I am tryign to write :
>>>
>>> .........
>>> STORE r1 INTO 'myoutput.data'
>>> STREAM 'myfile_containing_output_dat.txt' THRUGH `upload.py`
>>>
>>> Any way I can acheive this ?
>>>
>>> -Prasen
>>>
>
>

Reply via email to