[ 
https://issues.apache.org/jira/browse/PIG-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783984#action_12783984
 ] 

Richard Ding commented on PIG-1113:
-----------------------------------

The problem here is that the diamond query optimization didn't take into 
account that the diamond "tail" may also load files other than the file stored 
by the diamond "head". The diamond query optimization should check the file 
specs (make sure the load file of the diamond "tail" is the same as the store 
file of the diamon "head") before removing store/load combination.

> Diamond query optimization throws error in JOIN
> -----------------------------------------------
>
>                 Key: PIG-1113
>                 URL: https://issues.apache.org/jira/browse/PIG-1113
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>             Fix For: 0.6.0
>
>
> The following script results in 1 M/R job as a result of diamond query 
> optimization but the script fails.
> set1 = LOAD 'set1' USING PigStorage as (a:chararray, b:chararray, 
> c:chararray);
> set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, c:bag{});
> set2_1 = FOREACH set2 GENERATE a as f1, b as f2, (chararray) 0 as f3;
> set2_2 = FOREACH set2 GENERATE a as f1, FLATTEN((IsEmpty(c) ? null : c)) as 
> f2, (chararray) 1 as f3;
> all_set2 = UNION set2_1, set2_2;
> joined_sets = JOIN set1 BY (a,b), all_set2 BY (f2,f3);
> dump joined_sets;
> And here is the error
> org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot 
> convert a bag to a String
>       at org.apache.pig.data.DataType.toString(DataType.java:739)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:625)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at org.apache.hadoop.mapred.Child.main(Child.java:159)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to