Resolving Complex Data Type

Sparsh Gupta Thu, 08 Jul 2010 02:58:11 -0700

Hello

I am working on a dataset which has relations of the type:


data: {a: (a1: chararray,a2_bag: {a2_tuple: (a21: chararray,a22:
chararray)}, a3_bag: {a3_tuple: (a3: long)})}

What this means that, the each data row will have one 'a1' field, an
'a2_bag' bag which can have n number of 'a2_tuple' tuples each having 'a21'
and 'a22'  fields. It also has another bag 'a3_bag' with m number of
'a3_tuple' tuples having 'a3' field each.

I want to get rid of all the bags and want all data flattened into the
format (ofcourse creating multiple rows of dataNew from each row of data):

dataNew: {a21:chararray , a22: chararray, a3:long}

I tried using FLATTEN on 'a2_bag' and 'a3_bag'  to get
temp: {a2_bag::a2_tuple(a21:chararray, a22:chararray) ,
a3_bag::a3_tuple(a3:long)}

then I FLATTEN it again as
temp1 = FOREACH temp GENERATE FLATTEN(a2_bag::a2_tuple) AS (a21:chararray,
a22:chararray), FLATTEN(a3_bag::a3_tuple) AS (a3:long);

when I describe temp1, I get the desired structure but when I try to execute
it (dump it say), I get an error as cannot convert String to Tuple.

Please let me know if I am wrong somewhere (well I am) and whats the best
way to solve this problem

P.S. I am using Pig 0.6 and use elephant-bird to get data out from HBase and
use twitter's code to get protocol buffered data into pig readable format.

Thanks
Sparsh Gupta

Resolving Complex Data Type

Reply via email to