[ 
https://issues.apache.org/jira/browse/PIG-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-537:
---------------------------

    Description: 
Consider the following pig query, which demonstrates various problems during 
the Logical Plan creation and the subsequent execution of the M/R job. In this 
query we do two cogroups, one between A and B to generate an alias ABtemptable. 
Then we again cogroup A with ABtemptable based on marks which was read in as an 
int. 
==================================================================================
{code}
A = load 'mymarks.txt' as (username:chararray,marks:int);
B = load 'mygrades.txt' as (username:chararray,grade:chararray);
ABtemp = cogroup A by username, B  by username;
ABtemptable = foreach ABtemp generate
           group as username,
           flatten(A.marks) as newmarks;
--describe ABtemptable;
C = cogroup A by marks, ABtemptable by newmarks;
--describe C;
explain C;
dump C;
{code}
==================================================================================
The schema for C and ABtemptable which pig reports:
==================================================================================
{code}describe ABtemptable;{code} ABtemptable: {username: chararray,newmarks: 
int}
{code}describe C;{code} C: {group: int,A: {username: chararray,marks: 
int},ABtemptable: {username: chararray,newmarks: int}}
==================================================================================
If you run the above query you get the following error:
==================================================================================
2008-11-18 03:57:14,372 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (map) task_200810152105_0156_m_000000java.io.IOException: 
Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, 
recieved org.apache.pig.impl.io.NullableIntWritable
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:97)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:82)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
==================================================================================
Looking at the {code}explain C;{code} output, you see that newmarks has become 
a chararray (surprising!!)
==================================================================================
|---CoGroup viraj-Tue Nov 18 03:49:42 UTC 2008-25 Schema: {group: 
Unknown,{username: bytearray,marks: int},ABtemptable: {username: 
chararray,newmarks: chararray}} Type: bag
    |   |
    |   Project viraj-Tue Nov 18 03:49:42 UTC 2008-23 Projections: [1] 
Overloaded: false FieldSchema: marks: int Type: int
    |   Input: SplitOutput[null] viraj-Tue Nov 18 03:49:42 UTC 2008-29
    |   |
    |   Project viraj-Tue Nov 18 03:49:42 UTC 2008-24 Projections: [1] 
Overloaded: false FieldSchema: newmarks: chararray Type: chararray
    |   Input: ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22
    |---ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22 Schema: {username: 
chararray,newmarks: chararray} Type: bag
==================================================================================
In Summary this script demonstrates the following problems:
1) Logical Plan creation
2) When cogrouping with fields of different types which results in group 
unknown is not caught during compile phase.
Additionally I am enclosing the explain output of alias C and testfiles to run 
the script which is on this jira!!
Viraj

  was:
Consider the following pig query, which demonstrates various problems during 
the Logical Plan creation and the subsequent execution of the M/R job. In this 
query we do two cogroups, one between A and B to generate an alias ABtemptable. 
Then we again cogroup A with ABtemptable based on marks which was read in as an 
int. 
==================================================================================
{code}
A = load 'mymarks.txt' as (username:chararray,marks:int);
B = load 'mygrades.txt' as (username:chararray,grade:chararray);
ABtemp = cogroup A by username, B  by username;
ABtemptable = foreach ABtemp generate
           group as username,
           flatten(A.marks) as newmarks;
--describe ABtemptable;
C = cogroup A by marks, ABtemptable by newmarks;
--describe C;
explain C;
dump C;
{code}
==================================================================================
The schema for C and ABtemptable which pig reports:
==================================================================================
{code}describe ABtemptable;{code} ABtemptable: {username: chararray,newmarks: 
int}
{code}describe C;{code} C: {group: int,A: {username: chararray,marks: 
int},ABtemptable: {username: chararray,newmarks: int}}
==================================================================================
If you run the above query you get the following error:
==================================================================================
2008-11-18 03:57:14,372 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (map) task_200810152105_0156_m_000000java.io.IOException: 
Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, 
recieved org.apache.pig.impl.io.NullableIntWritable
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:97)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:82)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
==================================================================================
Looking at the {code}explain C{code} output, you see that newmarks has become a 
chararray (surprising!!)
==================================================================================
|---CoGroup viraj-Tue Nov 18 03:49:42 UTC 2008-25 Schema: {group: 
Unknown,{username: bytearray,marks: int},ABtemptable: {username: 
chararray,newmarks: chararray}} Type: bag
    |   |
    |   Project viraj-Tue Nov 18 03:49:42 UTC 2008-23 Projections: [1] 
Overloaded: false FieldSchema: marks: int Type: int
    |   Input: SplitOutput[null] viraj-Tue Nov 18 03:49:42 UTC 2008-29
    |   |
    |   Project viraj-Tue Nov 18 03:49:42 UTC 2008-24 Projections: [1] 
Overloaded: false FieldSchema: newmarks: chararray Type: chararray
    |   Input: ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22
    |---ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22 Schema: {username: 
chararray,newmarks: chararray} Type: bag
==================================================================================
In Summary this script demonstrates the following problems:
1) Logical Plan creation
2) When cogrouping with fields of different types which results in group 
unknown is not caught during compile phase.
Additionally I am enclosing the explain output of alias C and testfiles to run 
the script which is on this jira!!
Viraj


> Failure in Hadoop map collect stage due to type mismatch in the keys used in 
> cogroup
> ------------------------------------------------------------------------------------
>
>                 Key: PIG-537
>                 URL: https://issues.apache.org/jira/browse/PIG-537
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Viraj Bhat
>            Priority: Critical
>             Fix For: types_branch
>
>
> Consider the following pig query, which demonstrates various problems during 
> the Logical Plan creation and the subsequent execution of the M/R job. In 
> this query we do two cogroups, one between A and B to generate an alias 
> ABtemptable. Then we again cogroup A with ABtemptable based on marks which 
> was read in as an int. 
> ==================================================================================
> {code}
> A = load 'mymarks.txt' as (username:chararray,marks:int);
> B = load 'mygrades.txt' as (username:chararray,grade:chararray);
> ABtemp = cogroup A by username, B  by username;
> ABtemptable = foreach ABtemp generate
>            group as username,
>            flatten(A.marks) as newmarks;
> --describe ABtemptable;
> C = cogroup A by marks, ABtemptable by newmarks;
> --describe C;
> explain C;
> dump C;
> {code}
> ==================================================================================
> The schema for C and ABtemptable which pig reports:
> ==================================================================================
> {code}describe ABtemptable;{code} ABtemptable: {username: chararray,newmarks: 
> int}
> {code}describe C;{code} C: {group: int,A: {username: chararray,marks: 
> int},ABtemptable: {username: chararray,newmarks: int}}
> ==================================================================================
> If you run the above query you get the following error:
> ==================================================================================
> 2008-11-18 03:57:14,372 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) task_200810152105_0156_m_000000java.io.IOException: 
> Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, 
> recieved org.apache.pig.impl.io.NullableIntWritable
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:97)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:82)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>         at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ==================================================================================
> Looking at the {code}explain C;{code} output, you see that newmarks has 
> become a chararray (surprising!!)
> ==================================================================================
> |---CoGroup viraj-Tue Nov 18 03:49:42 UTC 2008-25 Schema: {group: 
> Unknown,{username: bytearray,marks: int},ABtemptable: {username: 
> chararray,newmarks: chararray}} Type: bag
>     |   |
>     |   Project viraj-Tue Nov 18 03:49:42 UTC 2008-23 Projections: [1] 
> Overloaded: false FieldSchema: marks: int Type: int
>     |   Input: SplitOutput[null] viraj-Tue Nov 18 03:49:42 UTC 2008-29
>     |   |
>     |   Project viraj-Tue Nov 18 03:49:42 UTC 2008-24 Projections: [1] 
> Overloaded: false FieldSchema: newmarks: chararray Type: chararray
>     |   Input: ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22
>     |---ForEach viraj-Tue Nov 18 03:49:42 UTC 2008-22 Schema: {username: 
> chararray,newmarks: chararray} Type: bag
> ==================================================================================
> In Summary this script demonstrates the following problems:
> 1) Logical Plan creation
> 2) When cogrouping with fields of different types which results in group 
> unknown is not caught during compile phase.
> Additionally I am enclosing the explain output of alias C and testfiles to 
> run the script which is on this jira!!
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to