Hello

I am working on a dataset which has relations of the type:

data: {a: (a1: chararray,a2_bag: {a2_tuple: (a21: chararray,a22:
chararray)}, a3_bag: {a3_tuple: (a3: long)})}

What this means that, the each data row will have one 'a1' field, an
'a2_bag' bag which can have n number of 'a2_tuple' tuples each having 'a21'
and 'a22'  fields. It also has another bag 'a3_bag' with m number of
'a3_tuple' tuples having 'a3' field each.

I want to get rid of all the bags and want all data flattened into the
format (ofcourse creating multiple rows of dataNew from each row of data):

dataNew: {a21:chararray , a22: chararray, a3:long}

I tried using FLATTEN on 'a2_bag' and 'a3_bag'  to get
temp: {a2_bag::a2_tuple(a21:chararray, a22:chararray) ,
a3_bag::a3_tuple(a3:long)}

then I FLATTEN it again as
temp1 = FOREACH temp GENERATE FLATTEN(a2_bag::a2_tuple) AS (a21:chararray,
a22:chararray), FLATTEN(a3_bag::a3_tuple) AS (a3:long);

when I describe temp1, I get the desired structure but when I try to execute
it (dump it say), I get an error as cannot convert String to Tuple.

Please let me know if I am wrong somewhere (well I am) and whats the best
way to solve this problem

P.S. I am using Pig 0.6 and use elephant-bird to get data out from HBase and
use twitter's code to get protocol buffered data into pig readable format.

Thanks
Sparsh Gupta

Reply via email to