So when I do something like this:
---------
my_data = LOAD 'test.txt' using PigStorage(',')
as (name:chararray, age:int, eye_color:chararray, height:int);
one = foreach my_data generate TOTUPLE(name,age) as groupz,
TOTUPLE(eye_color, height) as second;
two = filter one by groupz.age is null;
--- two = filter one by groupz.age > 33; -- this works also
dump two;
---------------
then I CAN project into a tuple. I would consider this a bug then. Even if
'group' is arrived at in a different way then 'groupz' (i.e. via the group
operator rather than an explicit tuple creation), for the purposes of the
FILTER operator, they both should be considered the same. I will make this a
JIRA ticket.
here is a more basic script that reproduces what I am talking about... you
> will see that dumping OUT works fine, but dumping OUT2 gives me a
>
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> org.apache.pig.data.Tuple
>
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>
> -----------
>
> my_data = LOAD 'test.txt' using PigStorage(',')
> as (name:chararray, age:int, eye_color:chararray, height:int);
>
> by_age_and_color = GROUP my_data BY (age, eye_color);
> -- dump by_age_and_color;
> OUT = FOREACH by_age_and_color generate group.age;
> dump OUT
>
> OUT2 = FILTER by_age_and_color by group.age is not null;
> dump OUT2
> -----------
>
> I get a similar problem even if I do something like:
>
> OUT2 = FILTER by_age_and_color by group.age > 9;
> dump OUT2
>
> --------- sample test.txt ---------
> ravi,33,blue,43
> brendan,33,green,53
> ravichandra,15,blue,43
> leonor,15,brown,46
> caeser,18,blue,23
> JCVD,,blue,23
> anthony,33,blue,46
> xavier,23,blue,13
> patrick,18,blue,33
> sang,33,brown,44
>
>
>
>
>
> On Fri, May 20, 2011 at 3:28 PM, Daniel Dai <[email protected]>wrote:
>
>> It seems the stack does not match your statement. Do have another filter
>> which use "not" and "is null" in your script?
>>
>> Daniel
>>
>>
>> On 05/20/2011 12:22 PM, Daniel Eklund wrote:
>>
>>> If I can access the implicit 'group' column from within FOREACH like
>>> this:
>>>
>>> GROUPED = GROUP InputRelVar by (firstDim,secondDim);
>>> B = FOREACH GROUPED GENERATE group.firstDim;
>>>
>>> ... then should I not be able to do something like this?
>>>
>>> B1 = FILTER GROUPED by group.firstDim == 'something';
>>>
>>> I get messages like this:
>>> java.lang.ClassCastException: java.lang.String cannot be cast to
>>> org.apache.pig.data.Tuple
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:72)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>>>
>>> Interestingly I can use the 'group' alias overall like
>>> B2 = FILTER GROUPED by group is not null;
>>>
>>>
>>> Any explanations of what I am doing incorrect here?
>>>
>>> thanks,
>>> daniel
>>>
>>
>>
>