FLATTEN is kind of quirky. If you FLATTEN(null), it will return null, but
if you FLATTEN a bag that is empty (ie size=0), it will throw away the row.
I would have your UDF return an empty bag and let the flatten wipe it out.

2012/3/1 Dexin Wang <[email protected]>

> Hi,
>
> I have a UDF that parses a line and then return a bag, and sometimes the
> line is bad so I'm returning null in the UDF. In my pig script, I'd like to
> filter those nulls like this:
>
> raw = LOAD 'raw_input' AS (line:chararray);
> parsed = FOREACH raw GENERATE FLATTEN(MyUDF(line));    -- get two fields in
> the tuple: id and name
> DUMP parsed;
>
>   (id1,name1)
>   (id2,name2)
>   ()
>   (id3,name3)
>
> parsed_no_nulls = FILTER parsed BY id IS NOT NULL;
> DUMP parsed_no_nulls;
>
>   (id1,name1)
>   (id2,name2)
>   (id3,name3)
>
> This works, but I'm getting this warning:
>
>  WARN
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger
> -
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject:
> Attempt to access field which was not found in the input
>
> When I try to use IsEmpty to filter, I get this error "Cannot test a NULL
> for emptiness".
>
> What's the correct way to filter out these null bags returned from my UDF?
>
> Thanks.
> Dexin
>

Reply via email to