COGROUP fails with 'Type mismatch in key from map: expected 
org.apache.pig.impl.io.NullableText, recieved 
org.apache.pig.impl.io.NullableTuple'
-----------------------------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-1068
                 URL: https://issues.apache.org/jira/browse/PIG-1068
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.4.0
            Reporter: Vikram Oberoi


The COGROUP in the following script fails in its map:

{code}
logs = LOAD '$LOGS' USING PigStorage() AS (ts:int, id:chararray, 
command:chararray, comments:chararray);                                         
                                                              
                                                                                
                                                                                
                                               
SPLIT logs INTO logins IF command == 'login', all_quits IF command == 'quit';   
                                                                                
                                               
                                                                                
                                                                                
                                               
-- Project login clients and count them by ID.                                  
                                                                                
                                               
login_info = FOREACH logins {                                                   
                                                                                
                                               
    GENERATE id as id,                                                          
                                                                                
                                               
    comments AS client;                                                         
                                                                                
                                               
};                                                                              
                                                                                
                                               
                                                                                
                                                                                
                                               
logins_grouped = GROUP login_info BY (id, client);                              
                                                                                
                                               
                                                                                
                                                                                
                                               
count_logins_by_client = FOREACH logins_grouped {                               
                                                                                
                                               
    generate group.id AS id, group.client AS client, COUNT($1) AS count;        
                                                                                
                                               
}                                                                               
                                                                                
                                               
                                                                                
                                                                                
                                               
-- Get the first quit.                                                          
                                                                                
                                               
all_quits_grouped = GROUP all_quits BY id;                                      
                                                                                
                                               
                                                                                
                                                                                
                                               
quits = FOREACH all_quits_grouped {                                             
                                                                                
                                               
    ordered = ORDER all_quits BY ts ASC;                                        
                                                                                
                                               
    last_quit = LIMIT ordered 1;                                                
                                                                                
                                               
    GENERATE FLATTEN(last_quit);                                                
                                                                                
                                               
}                                                                               
                                                                                
                                               
                                                                                
                                                                                
                                               
-- Now, group all the info together.                                            
                                                                                
                                               
joined_session_info = COGROUP quits BY id, count_logins_by_client BY id;        
                                                                                
                                               
                                                                                
                                                                                
                                               
DUMP joined_session_info;
{code}

Here's the stack trace:

{code}
java.io.IOException: Type mismatch in key from map: expected 
org.apache.pig.impl.io.NullableText, recieved 
org.apache.pig.impl.io.NullableTuple
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:229)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to