Flattening nested bags

David Parks Tue, 04 Jun 2013 17:47:19 -0700

We've been at our first real use case with pig for quite some time now, and
still not successful. I wonder if someone can provide an answer to this very
much simplified version of our problem:


Input data:
---------------
'item1' 111     { ('thing1', 222, {('value1'),('value2')}) }

Load statement for above data:
----------------------------------------
A = load 'data6' as ( item:chararray, d:int, things:bag{(thing:chararray,
d1:int, values:bag{(v:chararray)})} );

Desired result:
------------------
('item1'                111     thing1  222     value1)
('item1'                111     thing1  222     value2)

Questions:
----------------
 - Is there a single step I can use to flatten this? Or will it require
doing 2 steps: first flatten 'things', and then take those results and
flatten 'values'?
 - We're really looking for the syntax to get this right. I've posted a
number of questions here and on Stack Overflow with lots of good
suggestions, and read through the O'Reilly book online, none of which,
though, have gotten me past constant errors like "Cannot find field v in
values:bag{:tuple(v:chararray)}"
 - Should I be working on converting our data to SQL-like table formats
rather than this more Object-Oriented format with nested collections?

Psudo-code attempt (I've tried 50+ versions of this in every form I can
gleen from examples out on the internet with no success):
----------------------------------------------------
B = FOREACH A GENERATE item, d, things.thing as thing, d1,
FLATTEN(things.values.v) as v;

Flattening nested bags

Reply via email to