[
https://issues.apache.org/jira/browse/PIG-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663306#action_12663306
]
Shravan Matthur Narayanamurthy commented on PIG-597:
The exception is being thrown from ARITY where it is trying to convert the
first field of the tuple into a tuple. However, since we have a star, the tuple
is not wrapped inside another tuple and hence the exception.
This was done in order to model the trunk behavior which is that there is an
implicit flatten in front of a *. If we want to retain this behavior, then we
need to change ARITY other functions which were written with the assumption
that POUserFunc will wrap anything inside a tuple though most of these
functions will be useless when we have a UDF which outputs a tuple. To give an
example, say we have a function which returns a tuple and we want to find its
arity, ARITY(TupleRetUDF(*)) will always return one since POUserFunc will wrap
the output of TupleRetUDF into another tuple and ARITY is changed to return
just the size of the input tuple and not the size of the first field.
However, if we comment this code, then we need to modify FindQuantiles to
consider the fact that everything will be wrapped inside a tuple the behavior
is not conditional upon the use of a star. I think this is better and Olga
seems to agree as per her previous comment. Any other thoughts? Retain trunk
behavior or change it?
Pig does not handdle correctly the case where * is passed to UDF
--
Key: PIG-597
URL: https://issues.apache.org/jira/browse/PIG-597
Project: Pig
Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Shravan Matthur Narayanamurthy
Script:
==
A = LOAD 'foo' USING PigStorage('\t');
B = FILTER A BY ARITY(*) 5;
DUMP B;
Error:
=
2009-01-05 21:46:56,355 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc
- Caught error from UDF
org.apache.pig.builtin.ARITY[org.apache.pig.data.DataByteArray cannot be cast
to org.apache.pig.data.Tuple [org.apache.pig.data.DataByteArray cannot be
cast to org.apache.pig.data.Tuple]
Problem:
===
Santhosh tracked this to the following code in POUserFunc.java:
if(op instanceof POProject
op.getResultType() == DataType.TUPLE){
POProject projOp = (POProject)op;
if(projOp.isStar()){
Tuple trslt = (Tuple) temp.result;
Tuple rslt = (Tuple) res.result;
for(int i=0;itrslt.size();i++)
rslt.append(trslt.get(i));
continue;
}
}
It seems to be unwrapping the tuple before passing it to the function. There
is no comments so we are not sure why it is there; will need to run tests to
see if removing it would solve this issue and not create others.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.