You are supposed to use COUNT_STAR to count all rows. It's one of those "nulls are really strange beasts" things.
On Mar 8, 2012, at 9:02 AM, Bill Graham <[email protected]> wrote: > The issue here is that COUNT will increment a +1 for all tuples in the bag > where the item at the first position is not null. > > I've found this behavior to be strange as well though, so I'd like to hear > others take on why this is a feature and not a bug (if in fact that's the > case). > > On Thu, Mar 8, 2012 at 8:55 AM, Kevin Lion <[email protected]> wrote: > >> Hello, >> >> I think there is a bug in PIG when using COUNT on Bag of Tuple with empty >> element. Here is a minimal script to reproduce this bug : >> >> I've this CSV file : >> ,a >> 1,a >> 2,a >> ,a >> 3,b >> 4,b >> 5,b >> >> I use that script : >> test = LOAD 'test.csv' USING org.apache.pig.builtin.PigStorage(',') AS >> (key:chararray, value:chararray); >> test = GROUP test BY value; >> DUMP test; >> test = FOREACH test GENERATE group, COUNT(test); >> DUMP test; >> >> And the output is : >> (a,{(,a),(1,a),(2,a),(,a)}) >> (b,{(3,b),(4,b),(5,b)}) >> (a,2) >> (b,3) >> >> Does it seem to be normal ? I was expecting to : >> (a,{(,a),(1,a),(2,a),(,a)}) >> (b,{(3,b),(4,b),(5,b)}) >> (a,*4*) >> (b,3) >> >> Regards, >> >> Kevin Lion >> Capptain.com - Pilot your Apps >> > > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me at > [email protected] going forward.*
