As a work-around, put some other non-tuple field inside the bag if you can. 

The logic seems to be something like:
if its a bag, then look inside it - for each item inside:
  if there is one field inside, and it is a tuple, flatten that too; recurse 
until what is inside is not a tuple or more than one field.

 
On Jul 9, 2010, at 3:50 PM, Scott Carey wrote:

> There is a .. 'feature' (or bug, but no resolution) with FLATTEN that I have 
> ran into and others have on this list as well.
> 
> Pig will usually flatten nested elements multiple times rather than just one 
> element.  Your problem is the bag with a tuple in it, its probably doing 
> extra layers of unpacking at execution time -- one FLATTEN takes the bag of a 
> tuple and converts it directly into the two inner fields. 
> 
> The planner and the execution engine have different ideas of what FLATTEN 
> does when you nest things like that.  For example, a Bag with a Tuple and a 
> Bag in it, when flattened, tends to unpack the inner tuple too.
> 
> I saw most of that behavior on 0.5 and 0.6.  I haven't tried again on 0.7.    
> But I think it would be wise for Pig to consider making an operator that 
> unpacks tuples and does NOT touch bags, and one that is vice-versa.
> 
> 
> On Jul 8, 2010, at 2:56 AM, Sparsh Gupta wrote:
> 
>> Hello
>> 
>> I am working on a dataset which has relations of the type:
>> 
>> data: {a: (a1: chararray,a2_bag: {a2_tuple: (a21: chararray,a22:
>> chararray)}, a3_bag: {a3_tuple: (a3: long)})}
>> 
>> What this means that, the each data row will have one 'a1' field, an
>> 'a2_bag' bag which can have n number of 'a2_tuple' tuples each having 'a21'
>> and 'a22'  fields. It also has another bag 'a3_bag' with m number of
>> 'a3_tuple' tuples having 'a3' field each.
>> 
>> I want to get rid of all the bags and want all data flattened into the
>> format (ofcourse creating multiple rows of dataNew from each row of data):
>> 
>> dataNew: {a21:chararray , a22: chararray, a3:long}
>> 
>> I tried using FLATTEN on 'a2_bag' and 'a3_bag'  to get
>> temp: {a2_bag::a2_tuple(a21:chararray, a22:chararray) ,
>> a3_bag::a3_tuple(a3:long)}
>> 
>> then I FLATTEN it again as
>> temp1 = FOREACH temp GENERATE FLATTEN(a2_bag::a2_tuple) AS (a21:chararray,
>> a22:chararray), FLATTEN(a3_bag::a3_tuple) AS (a3:long);
>> 
>> when I describe temp1, I get the desired structure but when I try to execute
>> it (dump it say), I get an error as cannot convert String to Tuple.
>> 
>> Please let me know if I am wrong somewhere (well I am) and whats the best
>> way to solve this problem
>> 
>> P.S. I am using Pig 0.6 and use elephant-bird to get data out from HBase and
>> use twitter's code to get protocol buffered data into pig readable format.
>> 
>> Thanks
>> Sparsh Gupta
> 

Reply via email to