[ 
https://issues.apache.org/jira/browse/PIG-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-537:
-------------------------------

    Status: Patch Available  (was: Open)

The issue was in Implicit Split inserter. In this query, the same load provides 
input to two cogroups. Hence an implicit split needs to be introduced. However 
the ImplicitSplitInserter was changing the order of the inputs to the first 
cogroup as it was rewiring the plan with the new Split and SplitOutput 
operators. The patch changes the algorithm for introducing these new operators 
so that the order of the inputs for the successors of the load is maintained.

> Failure in Hadoop map collect stage due to type mismatch in the keys used in 
> cogroup
> ------------------------------------------------------------------------------------
>
>                 Key: PIG-537
>                 URL: https://issues.apache.org/jira/browse/PIG-537
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Viraj Bhat
>            Assignee: Pradeep Kamath
>            Priority: Critical
>             Fix For: types_branch
>
>         Attachments: explain_aliasC.log, mygrades.txt, mymarks.txt
>
>
> Consider the following pig query, which demonstrates various problems during 
> the Logical Plan creation and the subsequent execution of the M/R job. In 
> this query we do two cogroups, one between A and B to generate an alias 
> ABtemptable. Then we again cogroup A with ABtemptable based on marks which 
> was read in as an int. 
> ==================================================================================
> {code}
> A = load 'mymarks.txt' as (marks:int, username:chararray);
> B = load 'mygrades.txt' as (username:chararray,grade:chararray);
> ABtemp = cogroup A by username, B  by username;
> ABtemptable = foreach ABtemp generate
>            group as username,
>            flatten(A.marks) as newmarks;
> --describe ABtemptable;
> C = cogroup A by marks, ABtemptable by newmarks;
> --describe C;
> explain C;
> dump C;
> {code}
> ==================================================================================
> The schema for C and ABtemptable which pig reports:
> ==================================================================================
> {code}describe ABtemptable;{code} ABtemptable: {username: chararray,newmarks: 
> int}
> {code}describe C;{code} C: {group: int,A: {username: chararray,marks: 
> int},ABtemptable: {username: chararray,newmarks: int}}
> ==================================================================================
> If you run the above query you get the following error:
> ==================================================================================
> 2008-11-18 03:57:14,372 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) task_200810152105_0156_m_000000java.io.IOException: 
> Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, 
> recieved org.apache.pig.impl.io.NullableIntWritable
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:97)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:82)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>         at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ==================================================================================
> Looking at the {code}explain C;{code} output, you see that newmarks has 
> become a chararray (surprising!!)
> ==================================================================================
> ---CoGroup viraj-Tue Nov 18 03:49:42 UTC 2008-25 Schema: {group: 
> Unknown,{username: bytearray,marks: int},ABtemptable: {username: 
> chararray,newmarks: chararray}} Type: bag
>       Project viraj-Tue Nov 18 03:49:42 UTC 2008-23 Projections: [1] 
> Overloaded: false FieldSchema: marks: int Type: int
>       Input: SplitOutput[null] viraj-Tue Nov 18 03:49:42 UTC 2008-29
>       Project viraj-Tue Nov 18 03:49:42 UTC 2008-24 Projections: [1] 
> Overloaded: false FieldSchema: newmarks: chararray Type: chararray
>        Input: ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22
>     ---ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22 Schema: {username: 
> chararray,newmarks: chararray} Type: bag
> ==================================================================================
> In Summary this script demonstrates the following problems:
> 1) Logical Plan creation
> 2) When cogrouping with fields of different types which results in group 
> unknown is not caught during compile phase.
> Additionally I am enclosing the explain output of alias C and testfiles to 
> run the script which is on this jira!!
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to