Hadoop QA commented on PIG-978:

+1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 885465.

    +1 @author.  The patch does not contain any @author tags.

    +0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: 
Findbugs warnings: 
Console output: 

This message is automatically generated.

> ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) 
> and ERROR 2999: (Unexpected internal error. null) when using Multi-Query 
> optimization
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: PIG-978
>                 URL: https://issues.apache.org/jira/browse/PIG-978
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>            Assignee: Corinne Chandel
>             Fix For: 0.6.0
>         Attachments: pig-latin-users-guide.patch
> I have  Pig script of this form.. which I execute using Multi-query 
> optimization.
> {code}
> A = load '/user/viraj/firstinput' using PigStorage();
> B = group ....
> C = ..agrregation function
> store C into '/user/viraj/firstinputtempresult/days1';
> ..
> Atab = load '/user/viraj/secondinput' using PigStorage();
> Btab = group ....
> Ctab = ..agrregation function
> store Ctab into '/user/viraj/secondinputtempresult/days1';
> ..
> E = load '/user/viraj/firstinputtempresult/' using PigStorage();
> F = group 
> G = aggregation function
> store G into '/user/viraj/finalresult1';
> Etab = load '/user/viraj/secondinputtempresult/' using PigStorage();
> Ftab = group 
> Gtab = aggregation function
> store Gtab into '/user/viraj/finalresult2';
> {code}
> 2009-07-20 22:05:44,507 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> ERROR 2100: hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist. 
> Details at logfile: /homes/viraj/pigscripts/pig_1248127173601.log)  
> is due to the mismatch of store/load commands. The script first stores files 
> into the 'days1' directory (store C into 
> '/user/viraj/firstinputtempresult/days1' using PigStorage();), but it later 
> loads from the top level directory (E = load 
> '/user/viraj/firstinputtempresult/' using PigStorage()) instead of the 
> original directory (/user/viraj/firstinputtempresult/days1).
> The current multi-query optimizer can't solve the dependency between these 
> two commands--they have different load file paths. So the jobs will run 
> concurrently and result in the errors.
> The solution is to add 'exec' or 'run' command after the first two stores . 
> This will force the first two store commands to run before the rest commands.
> It would be nice to see this fixed as a part of an enhancement to the 
> Multi-query. We either disable the Multi-query or throw a warning/error 
> message, so that the user can correct his load/store statements.
> Viraj

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to