You are supposed to use COUNT_STAR to count all rows. It's one of those "nulls 
are really strange beasts" things. 

On Mar 8, 2012, at 9:02 AM, Bill Graham <[email protected]> wrote:

> The issue here is that COUNT will increment a +1 for all tuples in the bag
> where the item at the first position is not null.
> 
> I've found this behavior to be strange as well though, so I'd like to hear
> others take on why this is a feature and not a bug (if in fact that's the
> case).
> 
> On Thu, Mar 8, 2012 at 8:55 AM, Kevin Lion <[email protected]> wrote:
> 
>> Hello,
>> 
>> I think there is a bug in PIG when using COUNT on Bag of Tuple with empty
>> element. Here is a minimal script to reproduce this bug :
>> 
>> I've this CSV file :
>> ,a
>> 1,a
>> 2,a
>> ,a
>> 3,b
>> 4,b
>> 5,b
>> 
>> I use that script :
>> test = LOAD 'test.csv' USING org.apache.pig.builtin.PigStorage(',') AS
>> (key:chararray, value:chararray);
>> test = GROUP test BY value;
>> DUMP test;
>> test = FOREACH test GENERATE group, COUNT(test);
>> DUMP test;
>> 
>> And the output is :
>> (a,{(,a),(1,a),(2,a),(,a)})
>> (b,{(3,b),(4,b),(5,b)})
>> (a,2)
>> (b,3)
>> 
>> Does it seem to be normal ? I was expecting to :
>> (a,{(,a),(1,a),(2,a),(,a)})
>> (b,{(3,b),(4,b),(5,b)})
>> (a,*4*)
>> (b,3)
>> 
>> Regards,
>> 
>> Kevin Lion
>> Capptain.com - Pilot your Apps
>> 
> 
> 
> 
> -- 
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> [email protected] going forward.*

Reply via email to