[jira] Commented: (PIG-978) ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) and ERROR 2999: (Unexpected internal error. null) when using Multi-Query optimization
[ https://issues.apache.org/jira/browse/PIG-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784415#action_12784415 ] Olga Natkovich commented on PIG-978: Corinne, The comment from Richard is that exec or run should go after the second store but in your example, it is after the third one. ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) and ERROR 2999: (Unexpected internal error. null) when using Multi-Query optimization --- Key: PIG-978 URL: https://issues.apache.org/jira/browse/PIG-978 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Corinne Chandel Fix For: 0.6.0 Attachments: pig-latin-users-guide.patch I have Pig script of this form.. which I execute using Multi-query optimization. {code} A = load '/user/viraj/firstinput' using PigStorage(); B = group C = ..agrregation function store C into '/user/viraj/firstinputtempresult/days1'; .. Atab = load '/user/viraj/secondinput' using PigStorage(); Btab = group Ctab = ..agrregation function store Ctab into '/user/viraj/secondinputtempresult/days1'; .. E = load '/user/viraj/firstinputtempresult/' using PigStorage(); F = group G = aggregation function store G into '/user/viraj/finalresult1'; Etab = load '/user/viraj/secondinputtempresult/' using PigStorage(); Ftab = group Gtab = aggregation function store Gtab into '/user/viraj/finalresult2'; {code} 2009-07-20 22:05:44,507 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2100: hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist. Details at logfile: /homes/viraj/pigscripts/pig_1248127173601.log) is due to the mismatch of store/load commands. The script first stores files into the 'days1' directory (store C into '/user/viraj/firstinputtempresult/days1' using PigStorage();), but it later loads from the top level directory (E = load '/user/viraj/firstinputtempresult/' using PigStorage()) instead of the original directory (/user/viraj/firstinputtempresult/days1). The current multi-query optimizer can't solve the dependency between these two commands--they have different load file paths. So the jobs will run concurrently and result in the errors. The solution is to add 'exec' or 'run' command after the first two stores . This will force the first two store commands to run before the rest commands. It would be nice to see this fixed as a part of an enhancement to the Multi-query. We either disable the Multi-query or throw a warning/error message, so that the user can correct his load/store statements. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-978) ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) and ERROR 2999: (Unexpected internal error. null) when using Multi-Query optimization
[ https://issues.apache.org/jira/browse/PIG-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784087#action_12784087 ] Hadoop QA commented on PIG-978: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426454/pig-latin-users-guide.patch against trunk revision 885465. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/69/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/69/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/69/console This message is automatically generated. ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) and ERROR 2999: (Unexpected internal error. null) when using Multi-Query optimization --- Key: PIG-978 URL: https://issues.apache.org/jira/browse/PIG-978 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Corinne Chandel Fix For: 0.6.0 Attachments: pig-latin-users-guide.patch I have Pig script of this form.. which I execute using Multi-query optimization. {code} A = load '/user/viraj/firstinput' using PigStorage(); B = group C = ..agrregation function store C into '/user/viraj/firstinputtempresult/days1'; .. Atab = load '/user/viraj/secondinput' using PigStorage(); Btab = group Ctab = ..agrregation function store Ctab into '/user/viraj/secondinputtempresult/days1'; .. E = load '/user/viraj/firstinputtempresult/' using PigStorage(); F = group G = aggregation function store G into '/user/viraj/finalresult1'; Etab = load '/user/viraj/secondinputtempresult/' using PigStorage(); Ftab = group Gtab = aggregation function store Gtab into '/user/viraj/finalresult2'; {code} 2009-07-20 22:05:44,507 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2100: hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist. Details at logfile: /homes/viraj/pigscripts/pig_1248127173601.log) is due to the mismatch of store/load commands. The script first stores files into the 'days1' directory (store C into '/user/viraj/firstinputtempresult/days1' using PigStorage();), but it later loads from the top level directory (E = load '/user/viraj/firstinputtempresult/' using PigStorage()) instead of the original directory (/user/viraj/firstinputtempresult/days1). The current multi-query optimizer can't solve the dependency between these two commands--they have different load file paths. So the jobs will run concurrently and result in the errors. The solution is to add 'exec' or 'run' command after the first two stores . This will force the first two store commands to run before the rest commands. It would be nice to see this fixed as a part of an enhancement to the Multi-query. We either disable the Multi-query or throw a warning/error message, so that the user can correct his load/store statements. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.