POProject assumes that when "overloaded" field is true, the type of input is BAG
--------------------------------------------------------------------------------

                 Key: PIG-582
                 URL: https://issues.apache.org/jira/browse/PIG-582
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
             Fix For: types_branch


POProject.getNext(Tuple) checks if the "overloaded" field is true, and if so 
casts the input it obtained to a Bag. The "overloaded" field is set based on 
the value of the overloaded field in LOProject in LogToPhyTranslator. LOProject 
sets overloaded to true when it has a non ExpressionOperator successor. This 
does not strictly mean that the type of the input got by POProject is a Bag and 
POProject should not assume so. In most real word scripts, the Project which 
has its overloaded set to true does have input of type Bag (since typically the 
input to the project is a relation which is a BAG). However, here is a 
contrived example where things go wrong:
{code}
a = load 'distinct.input' as (name:chararray, age:int, gpa:double);
b = group a by name;
c = foreach b  {
        l = distinct group;
        generate l;};
explain c;
dump c;
{code}

This  script on execution gives the following exception because of the 
assumption of the input type in POProject as described:
{noformat}
2008-12-29 13:45:55,537 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (reduce) 
task_200812151518_4209_r_000000java.lang.ClassCastException: java.lang.String 
cannot be cast to org.apache.pig.data.DataBag
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:272)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODistinct.getNext(PODistinct.java:77)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:276)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:368)

{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to