Re: OutputSchema for EvalFunc

Jonathan Coveney Wed, 18 Apr 2012 23:38:14 -0700

Dmitriy's suggestion is spot on, but just to be pedantic, you'd do:

public Schema outputSchema(Schema input) {
   List<FieldSchema> list = new ArrayList<FieldSchema>();
   list.add(new FieldSchema("one", DataType.CHARARRAY));
   list.add(new FieldSchema("two", DataType.CHARARRAY))


   return new Schema(new Schema.FieldSchema("t", new Schema(list),
DataType.TUPLE));
}

That said, in your question you asked: "how can you get it without the
parenthesis." Short answer is that you can't. A UDF can't return multiple
columns -- it can only return a tuple, which you then flatten into columns.

2012/4/18 Dmitriy Ryaboy <[email protected]>

> It's messy. Easier to use the schema parser:
>
> org.apache.pig.impl.util.Utils.getSchemaFromString("t:tuple(len:int,word:chararray)");
>
> Even easier to use the @OutputSchema annotation (coming in 0.11 I believe)
>
> -D
>
>
> On Wed, Apr 18, 2012 at 7:02 PM, Rajgopal Vaithiyanathan
> <[email protected]> wrote:
> > Hey all,
> >
> > Sorry if i  sound naive, but how should one implement outputSchema of  an
> > eval Func that returns tuple.
> > The way i do it is ,
> >
> > public Schema outputSchema(Schema input) {
> >    List<FieldSchema> list = new ArrayList<FieldSchema>();
> >    list.add(new FieldSchema("one", DataType.CHARARRAY));
> >    list.add(new FieldSchema("two", DataType.CHARARRAY))
> >
> >    return new Schema(list);
> > }
> >
> > but in the front end, If i use
> >  B = foreach A generate flatten(FUNC());
> >  describe B
> > I get the schema like this:
> >    { ( one:chararray, two:chararray ) }
> > Now i use a flatten on this like :
> >    B = foreach A generate flatten(FUNC());
> >  and i get { null::one : chararray, null::two : chararray }
> >
> > The question is,
> > How should i implement the outputSchema so that i get the schema like {
> one
> > : chararray, two : chararray }  // NOTE: without the parenthesis
>

Re: OutputSchema for EvalFunc

Reply via email to