1) There is an error in the above. In pig8, the *following* worked (the two snippers above are the same):
bag_of_stuff = load 'thing' as (x:int); a = group bag_of_stuff all; b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null : bag_of_stuff)) as stuff; --no :int dump b; 2) Dmitriy, I thought about doing something like that, but I don't know that it would work? if the UDF just outputs a single null, then it's schema is going to be "null," and I imagine you'd see the same error (though I can of course test that). To avoid the error, it'd have to be a bag with a null element, but then it'd have the same issue the code is trying to avoid: if you flatten a bag with a null, the row disappears 2011/12/5 Dmitriy Ryaboy <[email protected]> > s/null/UdfThatContainsASingleNull/ ? > > On Mon, Dec 5, 2011 at 5:04 PM, Jonathan Coveney <[email protected]> > wrote: > > In pig8, the following worked: > > > > bag_of_stuff = load 'thing' as (x:int); > > a = group bag_of_stuff all; > > b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null : > > bag_of_stuff)) as stuff:int; > > dump b; > > > > in pig9, however, in some cases, this could lead to an error, because you > > need to explicitly set the type of "stuff," which leads to: > > > > bag_of_stuff = load 'thing' as (x:int); > > a = group bag_of_stuff all; > > b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null : > > bag_of_stuff)) as stuff:int; > > dump b; > > > > However, this doesn't work in pig8. > > > > 2011-12-06 00:50:54,949 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. > > Other Field Schema: stuff: int > > > > I'm not sure what the best way around this is. You can't explicitly cast > > (int)null, because then you get: > > > > 2011-12-06 01:02:11,962 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 1050: Unsupported input type for BinCond: left hand side: int; > right > > hand side: bag > > > > Any suggestions would be welcome. Maybe it'd be worth making a flatten > > that, in the case of an empty bag, returns a null row instead of getting > > washed out? I know it's sort of annoying given I know how to make it work > > in pig9, but I'd like for the script that uses this to work in both pig8 > > and pig9, ideally... >
