The issue here is that COUNT will increment a +1 for all tuples in the bag where the item at the first position is not null.
I've found this behavior to be strange as well though, so I'd like to hear others take on why this is a feature and not a bug (if in fact that's the case). On Thu, Mar 8, 2012 at 8:55 AM, Kevin Lion <[email protected]> wrote: > Hello, > > I think there is a bug in PIG when using COUNT on Bag of Tuple with empty > element. Here is a minimal script to reproduce this bug : > > I've this CSV file : > ,a > 1,a > 2,a > ,a > 3,b > 4,b > 5,b > > I use that script : > test = LOAD 'test.csv' USING org.apache.pig.builtin.PigStorage(',') AS > (key:chararray, value:chararray); > test = GROUP test BY value; > DUMP test; > test = FOREACH test GENERATE group, COUNT(test); > DUMP test; > > And the output is : > (a,{(,a),(1,a),(2,a),(,a)}) > (b,{(3,b),(4,b),(5,b)}) > (a,2) > (b,3) > > Does it seem to be normal ? I was expecting to : > (a,{(,a),(1,a),(2,a),(,a)}) > (b,{(3,b),(4,b),(5,b)}) > (a,*4*) > (b,3) > > Regards, > > Kevin Lion > Capptain.com - Pilot your Apps > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [email protected] going forward.*
