Hello,
I'm not sure if it's a bug, but the handling of NULL fields seems not to
work correctly:
My data (events):
0,,jawi
,0,juug
,,lfou
0,0,caro
My script:
events = load 'events' using PigStorage(',') AS (sessionid:chararray,
jobid:chararray, user:chararray);
user_events = group events by user;
dump user_events;
event_count_by_user = foreach user_events generate group, COUNT(events);
dump event_count_by_user;
The results:
user_events (correct):
(caro,{(0,0,caro)})
(jawi,{(0,,jawi)})
(juug,{(,0,juug)})
(lfou,{(,,lfou)})
event_count_by_user (incorrect):
(caro,1L)
(jawi,1L)
(juug,0L)
(lfou,0L)
event_count_by_user should be:
(caro,1L)
(jawi,1L)
(juug,1L)
(lfou,1L)
It seems that tuples starting with (, are not counted correctly.
Any suggestion?
Thanks a lot