Let's say I have this dataset: 1,undefined,text1 1,,text2 1,event1,text3 1,undefined,text4 1,event2,text5 1,event3,text6
I would like to group by 1st value, but not quite an ordinary grouping. I would like all lines that contain either an empty value or 'undefined' on the 2nd position to be rolled up in the first line that contains a proper value in the 2nd position. So basically I'd like to obtain this relation: (1,event1,3) (1,event2,2) (1,event3,1) (where the 3rd value is the count of lines that were seen before a proper 'event' line was seen). Is this possible with Pig? Thanks! Grig
