Let's say I have this dataset:

1,undefined,text1
1,,text2
1,event1,text3
1,undefined,text4
1,event2,text5
1,event3,text6

I would like to group by 1st value, but not quite an ordinary
grouping. I would like all lines that contain either an empty value or
'undefined' on the 2nd position to be rolled up in the first line that
contains a proper value in the 2nd position. So basically I'd like to
obtain this relation:

(1,event1,3)
(1,event2,2)
(1,event3,1)

(where the 3rd value is the count of lines that were seen before a
proper 'event' line was seen).

Is this possible with Pig?

Thanks!

Grig

Reply via email to