Ah but you can have the output schema be an echo of the input schema, and bass your bag in as an (ignored) argument.
On Dec 5, 2011, at 5:52 PM, Jonathan Coveney <[email protected]> wrote: > 1) There is an error in the above. In pig8, the *following* worked (the > two snippers above are the same): > > bag_of_stuff = load 'thing' as (x:int); > a = group bag_of_stuff all; > b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null : > bag_of_stuff)) as stuff; --no :int > dump b; > > 2) Dmitriy, I thought about doing something like that, but I don't know > that it would work? if the UDF just outputs a single null, then it's schema > is going to be "null," and I imagine you'd see the same error (though I can > of course test that). To avoid the error, it'd have to be a bag with a null > element, but then it'd have the same issue the code is trying to avoid: if > you flatten a bag with a null, the row disappears > > 2011/12/5 Dmitriy Ryaboy <[email protected]> > >> s/null/UdfThatContainsASingleNull/ ? >> >> On Mon, Dec 5, 2011 at 5:04 PM, Jonathan Coveney <[email protected]> >> wrote: >>> In pig8, the following worked: >>> >>> bag_of_stuff = load 'thing' as (x:int); >>> a = group bag_of_stuff all; >>> b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null : >>> bag_of_stuff)) as stuff:int; >>> dump b; >>> >>> in pig9, however, in some cases, this could lead to an error, because you >>> need to explicitly set the type of "stuff," which leads to: >>> >>> bag_of_stuff = load 'thing' as (x:int); >>> a = group bag_of_stuff all; >>> b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null : >>> bag_of_stuff)) as stuff:int; >>> dump b; >>> >>> However, this doesn't work in pig8. >>> >>> 2011-12-06 00:50:54,949 [main] ERROR org.apache.pig.tools.grunt.Grunt - >>> ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. >>> Other Field Schema: stuff: int >>> >>> I'm not sure what the best way around this is. You can't explicitly cast >>> (int)null, because then you get: >>> >>> 2011-12-06 01:02:11,962 [main] ERROR org.apache.pig.tools.grunt.Grunt - >>> ERROR 1050: Unsupported input type for BinCond: left hand side: int; >> right >>> hand side: bag >>> >>> Any suggestions would be welcome. Maybe it'd be worth making a flatten >>> that, in the case of an empty bag, returns a null row instead of getting >>> washed out? I know it's sort of annoying given I know how to make it work >>> in pig9, but I'd like for the script that uses this to work in both pig8 >>> and pig9, ideally... >>
