The issue here is that COUNT will increment a +1 for all tuples in the bag
where the item at the first position is not null.

I've found this behavior to be strange as well though, so I'd like to hear
others take on why this is a feature and not a bug (if in fact that's the
case).

On Thu, Mar 8, 2012 at 8:55 AM, Kevin Lion <[email protected]> wrote:

> Hello,
>
> I think there is a bug in PIG when using COUNT on Bag of Tuple with empty
> element. Here is a minimal script to reproduce this bug :
>
> I've this CSV file :
> ,a
> 1,a
> 2,a
> ,a
> 3,b
> 4,b
> 5,b
>
> I use that script :
> test = LOAD 'test.csv' USING org.apache.pig.builtin.PigStorage(',') AS
> (key:chararray, value:chararray);
> test = GROUP test BY value;
> DUMP test;
> test = FOREACH test GENERATE group, COUNT(test);
> DUMP test;
>
> And the output is :
> (a,{(,a),(1,a),(2,a),(,a)})
> (b,{(3,b),(4,b),(5,b)})
> (a,2)
> (b,3)
>
> Does it seem to be normal ? I was expecting to :
> (a,{(,a),(1,a),(2,a),(,a)})
> (b,{(3,b),(4,b),(5,b)})
> (a,*4*)
> (b,3)
>
> Regards,
>
> Kevin Lion
> Capptain.com - Pilot your Apps
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
[email protected] going forward.*

Reply via email to