i take it all back
 generate group as uid,
  flatten((IsEmpty(fil_height) ? {('')} : fil_height.value)) as height;

does work

thanks for the help
mat

On 11 July 2011 15:44, Mat Kelcey <[email protected]> wrote:

> Thanks Thejas,
> I was using pig0.9 (last nights trunk) and couldn't get the bincond +
> flatten combo to work...
> I'll reproduce tonight (if i get time) and reply with exact messaging...
> Cheers,
> Mat
>
> On 11 July 2011 12:21, Thejas Nair <[email protected]> wrote:
>
>> The nested-foreach statement is your friend!
>>
>> l = load 'b.pig' as (uid:chararray, key:chararray, value:chararray);
>> g = group l by uid;
>> f = foreach g {
>>            fil_age = filter l by key == 'age';
>>            fil_colour = filter l by key == 'colour' ;
>>            fil_food = filter l by key == 'food';
>>
>>    generate group as uid,
>>                   MAX(fil_age.value) as age,
>>                   MAX(fil_colour.value) as value,
>>                   MAX(fil_food.value) as food;
>> }
>>
>> I have used Jacob's idea of using MAX, i think that's more cleaner than
>> flatten + bincond for this use case.
>>
>> The flatten + bincond syntax in your example should work in 0.9, it has
>> some fixes for schema merging issues.
>>
>> -Thejas
>>
>>
>>
>>
>> On 7/10/11 10:47 PM, Mat Kelcey wrote:
>>
>>> hi,
>>>
>>> i've got a pretty simple transform of data i need to do and i can't for
>>> the
>>> life of me work it out.
>>> i feel like i'm missing something trivial...
>>>
>>> i want to go from this...
>>> person key    value
>>> bob    age    25
>>> bob    colour red
>>> fred   age    30
>>> fred   food   bagels
>>>
>>> to this...
>>> person age colour food
>>> bob    25  red    null
>>> fred   30  null   bagels
>>>
>>> here's the best i can do....
>>>
>>>  data = load 'blah' as (uid:chararray, key:chararray, value:chararray);
>>>>
>>> -- data: {uid: chararray,key: chararray,value: chararray}
>>> (bob,age,25)
>>> (bob,colour,red)
>>> (fred,age,30)
>>> (fred,food,bagels)
>>>
>>>  split data into
>>>>
>>>     by_age    if key=='age',
>>>     by_colour if key=='colour',
>>>     by_food   if key=='food';
>>>
>>>  cogrouped = cogroup by_age by uid, by_colour by uid, by_food by uid;
>>>>
>>> -- cogrouped: {group: chararray,by_age: {(uid: chararray,key:
>>> chararray,value: chararray)},by_colour: {(uid: chararray,key:
>>> chararray,value: chararray)},by_food: {(uid: chararray,key:
>>> chararray,value:
>>> chararray)}}
>>> (bob,{(bob,age,25)},{(bob,**colour,red)},{})
>>> (fred,{(fred,age,30)},{},{(**fred,food,bagels)})
>>>
>>>  flattened = foreach cogrouped generate group as uid, by_age.value as
>>>> age,
>>>>
>>> by_colour.value as colour, by_food.value as food;
>>> -- flattened: {uid: chararray,age: {(value: chararray)},colour: {(value:
>>> chararray)},food: {(value: chararray)}}
>>> (bob,{(25)},{(red)},{})
>>> (fred,{(30)},{},{(bagels)})
>>>
>>> any attempt to call flatten on the tuples, eg
>>>
>>>> flattened = foreach cogrouped generate group as uid,
>>>>
>>> flatten(by_food.value) as food;
>>> and i lose the entries that had a empty bag for food (eg bob in this
>>> case)
>>>
>>> i've got a feeling isempty might get me somewhere and
>>>
>>>  flattened = foreach cogrouped generate
>>>>
>>>    group as uid,
>>>    (IsEmpty(by_food.value) ? 0 : 1);
>>> (bob,0)
>>> (fred,1)
>>>
>>> but any attempt to use a real value in there fails, i can't get the
>>> syntax
>>> correct.
>>>
>>>> flattened = foreach cogrouped generate
>>>>
>>>        group as uid,
>>>        (IsEmpty(by_food.value) ? {} : by_food.value);
>>>
>>> not sure how to define an empty bag for the left hand side of the bin
>>> cond?
>>>
>>> i must be missing something fundamental somewhere.
>>> help me obiwan kanobi, you're my only hope.
>>>
>>> cheers,
>>> mat
>>>
>>>
>>
>

Reply via email to