B = foreach A generate item, d, flatten(things);
C = foreach B generate item, d, thing, d1, flatten(values);

Sent from my iPhone

On Jun 4, 2013, at 5:46 PM, "David Parks" <[email protected]> wrote:

> We've been at our first real use case with pig for quite some time now, and
> still not successful. I wonder if someone can provide an answer to this very
> much simplified version of our problem:
> 
> Input data:
> ---------------
> 'item1' 111     { ('thing1', 222, {('value1'),('value2')}) }
> 
> Load statement for above data:
> ----------------------------------------
> A = load 'data6' as ( item:chararray, d:int, things:bag{(thing:chararray,
> d1:int, values:bag{(v:chararray)})} );
> 
> Desired result:
> ------------------
> ('item1'        111    thing1    222    value1)
> ('item1'        111    thing1    222    value2)
> 
> Questions:
> ----------------
> - Is there a single step I can use to flatten this? Or will it require
> doing 2 steps: first flatten 'things', and then take those results and
> flatten 'values'?
> - We're really looking for the syntax to get this right. I've posted a
> number of questions here and on Stack Overflow with lots of good
> suggestions, and read through the O'Reilly book online, none of which,
> though, have gotten me past constant errors like "Cannot find field v in
> values:bag{:tuple(v:chararray)}"
> - Should I be working on converting our data to SQL-like table formats
> rather than this more Object-Oriented format with nested collections?
> 
> Psudo-code attempt (I've tried 50+ versions of this in every form I can
> gleen from examples out on the internet with no success):
> ----------------------------------------------------
> B = FOREACH A GENERATE item, d, things.thing as thing, d1,
> FLATTEN(things.values.v) as v;
> 
> 
> 

Reply via email to