Re: wait ( or thread.join() ) in pig ?

prasenjit mukherjee Tue, 16 Feb 2010 01:21:36 -0800

Still confusing.  So the entire execution ( from B = LOAD .....
onwards ) will be blocked till 'out1' is stored ?


> STORE A INTO 'out1';
> EXEC;
> B = LOAD 'data2';
> C = FOREACH B GENERATE MYUDF($0,'out1');
> STORE C INTO 'out2';

Any way to restrict it to a particular block ?

-Prasen


On Tue, Feb 16, 2010 at 2:35 PM, Dmitriy Ryaboy <[email protected]> wrote:
> It's been in the docs since 0.3
>
> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html
> Implicit Dependencies
>
> If a script has dependencies on the execution order outside of what Pig
> knows about, execution may fail. For instance, in this script MYUDF might
> try to read from out1, a file that A was just stored into. However, Pig does
> not know that MYUDF depends on the out1 file and might submit the jobs
> producing the out2 and out1 files at the same time.
>
> ...
> STORE A INTO 'out1';
> B = LOAD 'data2';
> C = FOREACH B GENERATE MYUDF($0,'out1');
> STORE C INTO 'out2';
>
> To make the script work (to ensure that the right execution order is
> enforced) add the exec statement. The exec statement will trigger the
> execution of the statements that produce the out1 file.
>
> ...
> STORE A INTO 'out1';
> EXEC;
> B = LOAD 'data2';
> C = FOREACH B GENERATE MYUDF($0,'out1');
> STORE C INTO 'out2';
>
>
>
> On Tue, Feb 16, 2010 at 12:46 AM, Mridul Muralidharan <[email protected]>
> wrote:
>>
>> Is this documented behavior or current impl detail ?
>> A lot of scripts broke when multi-query optimization was committed to
> trunk
>> because of the implicit ordering assumption (based on STORE) in earlier
> pig
>> - which was, iirc, documented.
>>
>>
>> Regards,
>> Mridul
>>
>> On Thursday 11 February 2010 10:52 PM, Dmitriy Ryaboy wrote:
>>>
>>> EXEC will trigger execution of the code that precedes it.
>>>
>>>
>>>
>>> On Thu, Feb 11, 2010 at 9:12 AM, prasenjit mukherjee
>>> <[email protected]>  wrote:
>>>>
>>>> Is there any way I can have a pig statement wait for a condition.This
>>>> is what I am trying to do :  I am first creating and storing a
>>>> relation in pig, and then I want to upload that relation via
>>>> STREAM/DEFINE command. Here is the pig script I am tryign to write :
>>>>
>>>> .........
>>>> STORE r1 INTO 'myoutput.data'
>>>> STREAM 'myfile_containing_output_dat.txt' THRUGH `upload.py`
>>>>
>>>> Any way I can acheive this ?
>>>>
>>>> -Prasen
>>>>
>>
>>
>

Re: wait ( or thread.join() ) in pig ?

Reply via email to