[ 
https://issues.apache.org/jira/browse/PIG-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-537:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

patch committed; thanks, pradeep

> Failure in Hadoop map collect stage due to type mismatch in the keys used in 
> cogroup
> ------------------------------------------------------------------------------------
>
>                 Key: PIG-537
>                 URL: https://issues.apache.org/jira/browse/PIG-537
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Viraj Bhat
>            Assignee: Pradeep Kamath
>            Priority: Critical
>             Fix For: types_branch
>
>         Attachments: explain_aliasC.log, mygrades.txt, mymarks.txt, 
> PIG-537.patch
>
>
> Consider the following pig query, which demonstrates various problems during 
> the Logical Plan creation and the subsequent execution of the M/R job. In 
> this query we do two cogroups, one between A and B to generate an alias 
> ABtemptable. Then we again cogroup A with ABtemptable based on marks which 
> was read in as an int. 
> ==================================================================================
> {code}
> A = load 'mymarks.txt' as (marks:int, username:chararray);
> B = load 'mygrades.txt' as (username:chararray,grade:chararray);
> ABtemp = cogroup A by username, B  by username;
> ABtemptable = foreach ABtemp generate
>            group as username,
>            flatten(A.marks) as newmarks;
> --describe ABtemptable;
> C = cogroup A by marks, ABtemptable by newmarks;
> --describe C;
> explain C;
> dump C;
> {code}
> ==================================================================================
> The schema for C and ABtemptable which pig reports:
> ==================================================================================
> {code}describe ABtemptable;{code} ABtemptable: {username: chararray,newmarks: 
> int}
> {code}describe C;{code} C: {group: int,A: {username: chararray,marks: 
> int},ABtemptable: {username: chararray,newmarks: int}}
> ==================================================================================
> If you run the above query you get the following error:
> ==================================================================================
> 2008-11-18 03:57:14,372 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) task_200810152105_0156_m_000000java.io.IOException: 
> Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, 
> recieved org.apache.pig.impl.io.NullableIntWritable
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:97)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:82)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>         at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ==================================================================================
> Looking at the {code}explain C;{code} output, you see that newmarks has 
> become a chararray (surprising!!)
> ==================================================================================
> ---CoGroup viraj-Tue Nov 18 03:49:42 UTC 2008-25 Schema: {group: 
> Unknown,{username: bytearray,marks: int},ABtemptable: {username: 
> chararray,newmarks: chararray}} Type: bag
>       Project viraj-Tue Nov 18 03:49:42 UTC 2008-23 Projections: [1] 
> Overloaded: false FieldSchema: marks: int Type: int
>       Input: SplitOutput[null] viraj-Tue Nov 18 03:49:42 UTC 2008-29
>       Project viraj-Tue Nov 18 03:49:42 UTC 2008-24 Projections: [1] 
> Overloaded: false FieldSchema: newmarks: chararray Type: chararray
>        Input: ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22
>     ---ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22 Schema: {username: 
> chararray,newmarks: chararray} Type: bag
> ==================================================================================
> In Summary this script demonstrates the following problems:
> 1) Logical Plan creation
> 2) When cogrouping with fields of different types which results in group 
> unknown is not caught during compile phase.
> Additionally I am enclosing the explain output of alias C and testfiles to 
> run the script which is on this jira!!
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to