Shravan Matthur Narayanamurthy commented on PIG-597:

The exception is being thrown from ARITY where it is trying to convert the 
first field of the tuple into a tuple. However, since we have a star, the tuple 
is not wrapped inside another tuple and hence the exception.

This was done in order to model the trunk behavior which is that there is an 
implicit flatten in front of a *. If we want to retain this behavior, then we 
need to change ARITY & other functions which were written with the assumption 
that POUserFunc will wrap anything inside a tuple though most of these 
functions will be useless when we have a UDF which outputs a tuple. To give an 
example, say we have a function which returns a tuple and we want to find its 
arity, ARITY(TupleRetUDF(*)) will always return one since POUserFunc will wrap 
the output of TupleRetUDF into another tuple and ARITY is changed to return 
just the size of the input tuple and not the size of the first field.

However, if we comment this code, then we need to modify FindQuantiles to 
consider the fact that everything will be wrapped inside a tuple & the behavior 
is not conditional upon the use of a star. I think this is better and Olga 
seems to agree as per her previous comment. Any other thoughts? Retain trunk 
behavior or change it?

> Pig does not handdle correctly the case where "*" is passed to UDF
> ------------------------------------------------------------------
>                 Key: PIG-597
>                 URL: https://issues.apache.org/jira/browse/PIG-597
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
> Script:
> ======
> A = LOAD 'foo' USING PigStorage('\t');
> B = FILTER A BY ARITY(*) < 5;
> Error:
> =====
> 2009-01-05 21:46:56,355 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc
> - Caught error from UDF
> org.apache.pig.builtin.ARITY[org.apache.pig.data.DataByteArray cannot be cast 
> to org.apache.pig.data.Tuple [org.apache.pig.data.DataByteArray cannot be 
> cast to org.apache.pig.data.Tuple]
> Problem:
> =======
> Santhosh tracked this to the following code in POUserFunc.java:
> if(op instanceof POProject &&
>                         op.getResultType() == DataType.TUPLE){
>                     POProject projOp = (POProject)op;
>                     if(projOp.isStar()){
>                         Tuple trslt = (Tuple) temp.result;
>                         Tuple rslt = (Tuple) res.result;
>                         for(int i=0;i<trslt.size();i++)
>                             rslt.append(trslt.get(i));
>                         continue;
>                     }
>                 }
> It seems to be unwrapping the tuple before passing it to the function. There 
> is no comments so we are not sure why it is there; will need to run tests to 
> see if removing it would solve this issue and not create others.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to