I believe that this is a current limitation of Pig: you can't have a
function that uses both getArgToFuncMapping and a variable number of
arguments. In this case, it kind of makes sense that you can't though,
example:

what if * is in fact just a Tuple of something? So you have

TOTUPLE(tuple), 'chararray'
tuple, 'chararray'

which one should they match? The one intended for TOTUPLE(*), or the one
intended for just *? Because both would match just a tuple.

Hmm, one more thing, though, which also is important: you're re-wrapping
the argument in a Tuple. It is implicit that the input to your evalfunc
will come in the form of a Tuple. In the UDF example, note that they don't
rewrap in a tuple:

funcList.add(new FuncSpec(FloatAbs.class.getName(),   new Schema(new
Schema.FieldSchema(null, DataType.FLOAT))));

So unless your argument will be explicitly rewrapped in a tuple, you don't
need that piece.

But yeah, someone else can chime in with whether getArgtoFunc can do wha
you want it to do, but I don't think it can. My suggestion would be to a)
choose one form of input and stick to that, instead of trying to support
two forms and b) you could have a initializer in your EvalFunc that on the
first input, inspects the types and figures out which function to use to
process the input.

We do need to make funcspecs play nice with variable numbers of arguments,
though, especially now that more schema info is available.

2011/11/25 Prashant Kommireddi <[email protected]>

> Thanks Jonathan.
>
> What do I check for as the input type, because DataType.TUPLE does not seem
> to work. I would like to use "getArgToFuncMapping()" to be able to invoke
> different functions based on input type, and I am not sure how to check for
> Case 2.
>
> In my implementation, Case 1 could be checked for (DataType.TUPLE,
> DataType.CHARARRAY) but for Case 2 I would assume it should be
> (DataType.TUPLE) but that does not work. PIg UDF cannot infer a matching
> function.
>
>  @Override
>    public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
>        List<FuncSpec> funcList = new ArrayList<FuncSpec>();
>        Schema s = new Schema();
>        s.add(new Schema.FieldSchema(null, DataType.TUPLE));
>        s.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
>        funcList.add(new FuncSpec(this.getClass().getName(), s));
>
>        s = new Schema();
>        s.add(new Schema.FieldSchema(null, DataType.TUPLE));
>        funcList.add(new FuncSpec(CustomUDF.class.getName(), s));
>
>        return funcList;
>    }
>
>
>
> On Fri, Nov 25, 2011 at 12:52 AM, Jonathan Coveney <[email protected]
> >wrote:
>
> > The first case will give you a tuple which contains, as it first
> element, a
> > tuple of all of the stuff in *, and as its second element, 'input'.
> >
> > The second will give youa tuple which contains all of the elements of *,
> > and then as its last element, 'input'.
> >
> > This is what I thought, but to be sure I ran this UDF:
> >
> > import org.apache.pig.EvalFunc;
> > import java.io.IOException;
> > import org.apache.pig.data.Tuple;
> >
> > public class ATHING extends EvalFunc<String> {
> >  public String exec(Tuple input) throws IOException {
> >    System.out.println(input.toString());
> >    return null;
> >   }
> > }
> >
> > 2011/11/24 Prashant Kommireddi <[email protected]>
> >
> > > I have a question regarding the pig data types.
> > >
> > > If I have a UDF, say 'CustomUDF' and I do something like this:
> > >
> > > REGISTER 'foo.jar';
> > >
> > > A = LOAD '/shared/a.dat';
> > >
> > > What would be the difference in the data types for UDF arguments
> between
> > > -->
> > >
> > > Case 1 : B = FOREACH A GENERATE CustomUDF(TOTUPLE(*), 'input'); AND
> > > Case 2 : B = FOREACH A GENERATE CustomUDF(*, 'input');
> > >
> > > I am sure Case 1 is (tuple, chararray). Can anyone let me know the data
> > > type for Case 2 arguments?
> > >
> > > Thanks,
> > > Prashant
> > >
> >
>

Reply via email to