[jira] Commented: (PIG-1113) Diamond query optimization throws error in JOIN
[ https://issues.apache.org/jira/browse/PIG-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784515#action_12784515 ] Hadoop QA commented on PIG-1113: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426566/PIG-1113.patch against trunk revision 885858. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/71/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/71/console This message is automatically generated. Diamond query optimization throws error in JOIN --- Key: PIG-1113 URL: https://issues.apache.org/jira/browse/PIG-1113 Project: Pig Issue Type: Bug Reporter: Ankur Assignee: Richard Ding Fix For: 0.6.0 Attachments: PIG-1113.patch The following script results in 1 M/R job as a result of diamond query optimization but the script fails. set1 = LOAD 'set1' USING PigStorage as (a:chararray, b:chararray, c:chararray); set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, c:bag{}); set2_1 = FOREACH set2 GENERATE a as f1, b as f2, (chararray) 0 as f3; set2_2 = FOREACH set2 GENERATE a as f1, FLATTEN((IsEmpty(c) ? null : c)) as f2, (chararray) 1 as f3; all_set2 = UNION set2_1, set2_2; joined_sets = JOIN set1 BY (a,b), all_set2 BY (f2,f3); dump joined_sets; And here is the error org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a bag to a String at org.apache.pig.data.DataType.toString(DataType.java:739) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:625) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1113) Diamond query optimization throws error in JOIN
[ https://issues.apache.org/jira/browse/PIG-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783984#action_12783984 ] Richard Ding commented on PIG-1113: --- The problem here is that the diamond query optimization didn't take into account that the diamond tail may also load files other than the file stored by the diamond head. The diamond query optimization should check the file specs (make sure the load file of the diamond tail is the same as the store file of the diamon head) before removing store/load combination. Diamond query optimization throws error in JOIN --- Key: PIG-1113 URL: https://issues.apache.org/jira/browse/PIG-1113 Project: Pig Issue Type: Bug Reporter: Ankur Assignee: Richard Ding Fix For: 0.6.0 The following script results in 1 M/R job as a result of diamond query optimization but the script fails. set1 = LOAD 'set1' USING PigStorage as (a:chararray, b:chararray, c:chararray); set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, c:bag{}); set2_1 = FOREACH set2 GENERATE a as f1, b as f2, (chararray) 0 as f3; set2_2 = FOREACH set2 GENERATE a as f1, FLATTEN((IsEmpty(c) ? null : c)) as f2, (chararray) 1 as f3; all_set2 = UNION set2_1, set2_2; joined_sets = JOIN set1 BY (a,b), all_set2 BY (f2,f3); dump joined_sets; And here is the error org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a bag to a String at org.apache.pig.data.DataType.toString(DataType.java:739) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:625) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1113) Diamond query optimization throws error in JOIN
[ https://issues.apache.org/jira/browse/PIG-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782877#action_12782877 ] Ankur commented on PIG-1113: The script fails even if correct schema is specified for the c:bag{}. So the following change does not alleviate the problem set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, c:bag{T:tuple(l:chararray)}); Diamond query optimization throws error in JOIN --- Key: PIG-1113 URL: https://issues.apache.org/jira/browse/PIG-1113 Project: Pig Issue Type: Bug Reporter: Ankur The following script results in 1 M/R job as a result of diamond query optimization but the script fails. set1 = LOAD 'set1' USING PigStorage as (a:chararray, b:chararray, c:chararray); set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, c:bag{}); set2_1 = FOREACH set2 GENERATE a as f1, b as f2, (chararray) 0 as f3; set2_2 = FOREACH set2 GENERATE a as f1, FLATTEN((IsEmpty(c) ? null : c)) as f2, (chararray) 1 as f3; all_set2 = UNION set2_1, set2_2; joined_sets = JOIN set1 BY (a,b), all_set2 BY (f2,f3); dump joined_sets; And here is the error org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a bag to a String at org.apache.pig.data.DataType.toString(DataType.java:739) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:625) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.