Currently, COUNT of a bag will ignore bags which have the first field as null (this stems from the fact that COUNT of a column will count non-null columns, for sql compatibility). You may want to try using COUNT_STAR. This behavior is currently being reconsidered: https://issues.apache.org/jira/browse/PIG-1014 (please provide input!)
-Dmitriy On Thu, Oct 15, 2009 at 8:51 AM, Vincent BARAT <[email protected]> wrote: > Hello, > > I'm not sure if it's a bug, but the handling of NULL fields seems not to > work correctly: > > My data (events): > > 0,,jawi > ,0,juug > ,,lfou > 0,0,caro > > My script: > > events = load 'events' using PigStorage(',') AS (sessionid:chararray, > jobid:chararray, user:chararray); > user_events = group events by user; > dump user_events; > event_count_by_user = foreach user_events generate group, COUNT(events); > dump event_count_by_user; > > The results: > > user_events (correct): > (caro,{(0,0,caro)}) > (jawi,{(0,,jawi)}) > (juug,{(,0,juug)}) > (lfou,{(,,lfou)}) > > event_count_by_user (incorrect): > (caro,1L) > (jawi,1L) > (juug,0L) > (lfou,0L) > > event_count_by_user should be: > > (caro,1L) > (jawi,1L) > (juug,1L) > (lfou,1L) > > It seems that tuples starting with (, are not counted correctly. > > Any suggestion? > > Thanks a lot > > >
