B = foreach A generate item, d, flatten(things); C = foreach B generate item, d, thing, d1, flatten(values);
Sent from my iPhone On Jun 4, 2013, at 5:46 PM, "David Parks" <[email protected]> wrote: > We've been at our first real use case with pig for quite some time now, and > still not successful. I wonder if someone can provide an answer to this very > much simplified version of our problem: > > Input data: > --------------- > 'item1' 111 { ('thing1', 222, {('value1'),('value2')}) } > > Load statement for above data: > ---------------------------------------- > A = load 'data6' as ( item:chararray, d:int, things:bag{(thing:chararray, > d1:int, values:bag{(v:chararray)})} ); > > Desired result: > ------------------ > ('item1' 111 thing1 222 value1) > ('item1' 111 thing1 222 value2) > > Questions: > ---------------- > - Is there a single step I can use to flatten this? Or will it require > doing 2 steps: first flatten 'things', and then take those results and > flatten 'values'? > - We're really looking for the syntax to get this right. I've posted a > number of questions here and on Stack Overflow with lots of good > suggestions, and read through the O'Reilly book online, none of which, > though, have gotten me past constant errors like "Cannot find field v in > values:bag{:tuple(v:chararray)}" > - Should I be working on converting our data to SQL-like table formats > rather than this more Object-Oriented format with nested collections? > > Psudo-code attempt (I've tried 50+ versions of this in every form I can > gleen from examples out on the internet with no success): > ---------------------------------------------------- > B = FOREACH A GENERATE item, d, things.thing as thing, d1, > FLATTEN(things.values.v) as v; > > >
