As a work-around, put some other non-tuple field inside the bag if you can.
The logic seems to be something like:
if its a bag, then look inside it - for each item inside:
if there is one field inside, and it is a tuple, flatten that too; recurse
until what is inside is not a tuple or more than one field.
On Jul 9, 2010, at 3:50 PM, Scott Carey wrote:
> There is a .. 'feature' (or bug, but no resolution) with FLATTEN that I have
> ran into and others have on this list as well.
>
> Pig will usually flatten nested elements multiple times rather than just one
> element. Your problem is the bag with a tuple in it, its probably doing
> extra layers of unpacking at execution time -- one FLATTEN takes the bag of a
> tuple and converts it directly into the two inner fields.
>
> The planner and the execution engine have different ideas of what FLATTEN
> does when you nest things like that. For example, a Bag with a Tuple and a
> Bag in it, when flattened, tends to unpack the inner tuple too.
>
> I saw most of that behavior on 0.5 and 0.6. I haven't tried again on 0.7.
> But I think it would be wise for Pig to consider making an operator that
> unpacks tuples and does NOT touch bags, and one that is vice-versa.
>
>
> On Jul 8, 2010, at 2:56 AM, Sparsh Gupta wrote:
>
>> Hello
>>
>> I am working on a dataset which has relations of the type:
>>
>> data: {a: (a1: chararray,a2_bag: {a2_tuple: (a21: chararray,a22:
>> chararray)}, a3_bag: {a3_tuple: (a3: long)})}
>>
>> What this means that, the each data row will have one 'a1' field, an
>> 'a2_bag' bag which can have n number of 'a2_tuple' tuples each having 'a21'
>> and 'a22' fields. It also has another bag 'a3_bag' with m number of
>> 'a3_tuple' tuples having 'a3' field each.
>>
>> I want to get rid of all the bags and want all data flattened into the
>> format (ofcourse creating multiple rows of dataNew from each row of data):
>>
>> dataNew: {a21:chararray , a22: chararray, a3:long}
>>
>> I tried using FLATTEN on 'a2_bag' and 'a3_bag' to get
>> temp: {a2_bag::a2_tuple(a21:chararray, a22:chararray) ,
>> a3_bag::a3_tuple(a3:long)}
>>
>> then I FLATTEN it again as
>> temp1 = FOREACH temp GENERATE FLATTEN(a2_bag::a2_tuple) AS (a21:chararray,
>> a22:chararray), FLATTEN(a3_bag::a3_tuple) AS (a3:long);
>>
>> when I describe temp1, I get the desired structure but when I try to execute
>> it (dump it say), I get an error as cannot convert String to Tuple.
>>
>> Please let me know if I am wrong somewhere (well I am) and whats the best
>> way to solve this problem
>>
>> P.S. I am using Pig 0.6 and use elephant-bird to get data out from HBase and
>> use twitter's code to get protocol buffered data into pig readable format.
>>
>> Thanks
>> Sparsh Gupta
>