Ah, man, this is jank city, but it works (just in case anyone has to deal
with this, though pig9 seems to deal with it fine on its own)

import java.io.IOException;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.schema.Schema;

public class FakeBagActuallyNull extends EvalFunc<DataBag> {
    @Override
    public DataBag exec(Tuple input) throws IOException {
        return null;
    }

    @Override
    public Schema outputSchema(Schema input) {
        return input;
    }
}



2011/12/6 Dmitriy Ryaboy <[email protected]>

> Ah but you can have the output schema  be an echo of the input schema, and
> bass your bag in as an (ignored) argument.
>
> On Dec 5, 2011, at 5:52 PM, Jonathan Coveney <[email protected]> wrote:
>
> > 1) There is an error in the above. In pig8, the *following*  worked (the
> > two snippers above are the same):
> >
> > bag_of_stuff = load 'thing' as (x:int);
> > a = group bag_of_stuff all;
> > b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null :
> > bag_of_stuff)) as stuff; --no :int
> > dump b;
> >
> > 2) Dmitriy, I thought about doing something like that, but I don't know
> > that it would work? if the UDF just outputs a single null, then it's
> schema
> > is going to be "null," and I imagine you'd see the same error (though I
> can
> > of course test that). To avoid the error, it'd have to be a bag with a
> null
> > element, but then it'd have the same issue the code is trying to avoid:
> if
> > you flatten a bag with a null, the row disappears
> >
> > 2011/12/5 Dmitriy Ryaboy <[email protected]>
> >
> >> s/null/UdfThatContainsASingleNull/ ?
> >>
> >> On Mon, Dec 5, 2011 at 5:04 PM, Jonathan Coveney <[email protected]>
> >> wrote:
> >>> In pig8, the following worked:
> >>>
> >>> bag_of_stuff = load 'thing' as (x:int);
> >>> a = group bag_of_stuff all;
> >>> b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null :
> >>> bag_of_stuff)) as stuff:int;
> >>> dump b;
> >>>
> >>> in pig9, however, in some cases, this could lead to an error, because
> you
> >>> need to explicitly set the type of "stuff," which leads to:
> >>>
> >>> bag_of_stuff = load 'thing' as (x:int);
> >>> a = group bag_of_stuff all;
> >>> b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null :
> >>> bag_of_stuff)) as stuff:int;
> >>> dump b;
> >>>
> >>> However, this doesn't work in pig8.
> >>>
> >>> 2011-12-06 00:50:54,949 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> >>> ERROR 1022: Type mismatch merging schema prefix. Field Schema:
> bytearray.
> >>> Other Field Schema: stuff: int
> >>>
> >>> I'm not sure what the best way around this is. You can't explicitly
> cast
> >>> (int)null, because then you get:
> >>>
> >>> 2011-12-06 01:02:11,962 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> >>> ERROR 1050: Unsupported input type for BinCond: left hand side: int;
> >> right
> >>> hand side: bag
> >>>
> >>> Any suggestions would be welcome. Maybe it'd be worth making a flatten
> >>> that, in the case of an empty bag, returns a null row instead of
> getting
> >>> washed out? I know it's sort of annoying given I know how to make it
> work
> >>> in pig9, but I'd like for the script that uses this to work in both
> pig8
> >>> and pig9, ideally...
> >>
>

Reply via email to