[ 
https://issues.apache.org/jira/browse/PIG-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783978#action_12783978
 ] 

Richard Ding commented on PIG-1068:
-----------------------------------

The cause of this bug is this: On the one hand, the value (as in key/value 
pairs) received by a reducer may not be the complete "value", It may have 
portions in the key. In this case, the real "value" is stitched together by the 
packager. On the other hand, MultiQuery optimizer merges the jobs with 
different map key types by wrapping the keys in tuples (so that the resulting 
job has tuple as common map key type). Unfortunately, the unwrapping the key 
happens in the demuxer (after packager) and the "stitched up" value isn't the 
expected value.  

The solution will be to move the Multiquery unwrapping logic from demuxer to 
packager.

> COGROUP fails with 'Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableText, recieved 
> org.apache.pig.impl.io.NullableTuple'
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1068
>                 URL: https://issues.apache.org/jira/browse/PIG-1068
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Vikram Oberoi
>            Assignee: Richard Ding
>             Fix For: 0.6.0
>
>         Attachments: cogroup-bug.pig, log
>
>
> The COGROUP in the following script fails in its map:
> {code}
> logs = LOAD '$LOGS' USING PigStorage() AS (ts:int, id:chararray, 
> command:chararray, comments:chararray);                                       
>                                                                 
>                                                                               
>                                                                               
>                                                    
> SPLIT logs INTO logins IF command == 'login', all_quits IF command == 'quit'; 
>                                                                               
>                                                    
>                                                                               
>                                                                               
>                                                    
> -- Project login clients and count them by ID.                                
>                                                                               
>                                                    
> login_info = FOREACH logins {                                                 
>                                                                               
>                                                    
>     GENERATE id as id,                                                        
>                                                                               
>                                                    
>     comments AS client;                                                       
>                                                                               
>                                                    
> };                                                                            
>                                                                               
>                                                    
>                                                                               
>                                                                               
>                                                    
> logins_grouped = GROUP login_info BY (id, client);                            
>                                                                               
>                                                    
>                                                                               
>                                                                               
>                                                    
> count_logins_by_client = FOREACH logins_grouped {                             
>                                                                               
>                                                    
>     generate group.id AS id, group.client AS client, COUNT($1) AS count;      
>                                                                               
>                                                    
> }                                                                             
>                                                                               
>                                                    
>                                                                               
>                                                                               
>                                                    
> -- Get the first quit.                                                        
>                                                                               
>                                                    
> all_quits_grouped = GROUP all_quits BY id;                                    
>                                                                               
>                                                    
>                                                                               
>                                                                               
>                                                    
> quits = FOREACH all_quits_grouped {                                           
>                                                                               
>                                                    
>     ordered = ORDER all_quits BY ts ASC;                                      
>                                                                               
>                                                    
>     last_quit = LIMIT ordered 1;                                              
>                                                                               
>                                                    
>     GENERATE FLATTEN(last_quit);                                              
>                                                                               
>                                                    
> }                                                                             
>                                                                               
>                                                    
>                                                                               
>                                                                               
>                                                    
> -- Now, group all the info together.                                          
>                                                                               
>                                                    
> joined_session_info = COGROUP quits BY id, count_logins_by_client BY id;      
>                                                                               
>                                                    
>                                                                               
>                                                                               
>                                                    
> DUMP joined_session_info;
> {code}
> Here's the stack trace:
> {code}
> java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableText, recieved 
> org.apache.pig.impl.io.NullableTuple
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:229)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to