hi,
i've got a pretty simple transform of data i need to do and i can't for the
life of me work it out.
i feel like i'm missing something trivial...
i want to go from this...
person key value
bob age 25
bob colour red
fred age 30
fred food bagels
to this...
person age colour food
bob 25 red null
fred 30 null bagels
here's the best i can do....
data = load 'blah' as (uid:chararray, key:chararray, value:chararray);
-- data: {uid: chararray,key: chararray,value: chararray}
(bob,age,25)
(bob,colour,red)
(fred,age,30)
(fred,food,bagels)
split data into
by_age if key=='age',
by_colour if key=='colour',
by_food if key=='food';
cogrouped = cogroup by_age by uid, by_colour by uid, by_food by uid;
-- cogrouped: {group: chararray,by_age: {(uid: chararray,key:
chararray,value: chararray)},by_colour: {(uid: chararray,key:
chararray,value: chararray)},by_food: {(uid: chararray,key: chararray,value:
chararray)}}
(bob,{(bob,age,25)},{(bob,colour,red)},{})
(fred,{(fred,age,30)},{},{(fred,food,bagels)})
flattened = foreach cogrouped generate group as uid, by_age.value as age,
by_colour.value as colour, by_food.value as food;
-- flattened: {uid: chararray,age: {(value: chararray)},colour: {(value:
chararray)},food: {(value: chararray)}}
(bob,{(25)},{(red)},{})
(fred,{(30)},{},{(bagels)})
any attempt to call flatten on the tuples, eg
flattened = foreach cogrouped generate group as uid,
flatten(by_food.value) as food;
and i lose the entries that had a empty bag for food (eg bob in this case)
i've got a feeling isempty might get me somewhere and
flattened = foreach cogrouped generate
group as uid,
(IsEmpty(by_food.value) ? 0 : 1);
(bob,0)
(fred,1)
but any attempt to use a real value in there fails, i can't get the syntax
correct.
flattened = foreach cogrouped generate
group as uid,
(IsEmpty(by_food.value) ? {} : by_food.value);
not sure how to define an empty bag for the left hand side of the bin cond?
i must be missing something fundamental somewhere.
help me obiwan kanobi, you're my only hope.
cheers,
mat