Re: COUNT, AVG and nulls

2009-07-07 Thread Amr Awadallah

+1 (count shouldn't count null)

Yiping Han wrote:

+1.

--Yiping


On 7/6/09 10:58 AM, "Dmitriy Ryaboy"  wrote:

  

+1 for standard semantics.

We need a COALESCE function to go along with this.

-D

On Mon, Jul 6, 2009 at 10:46 AM, Olga Natkovich  wrote:



Hi,



The current implementation of COUNT and AVG in Pig counts null values.
This is inconsistent with SQL semantics and also with semantics of other
aggregated functions such as SUM, MIN, and MAX. Originally we chose this
implementation for performance reasons; however, we re-implemented both
functions to support multi-step combiner and now the cost of checking
for null for the case where combiner is invoked is trivial. (I ran some
tests with COUNT and they showed no performance difference.) We will pay
penalty for the non-combinable case including local mode but I think it
is worth the price to have consistent semantics. Also as we are working
on SQL support, having SQL compliant semantics becomes very desirable.



Please, let us know if you have any concerns. I am planning to make the
change later this week.



Olga


  


--
Yiping Han
F-3140 
(408)349-4403

y...@yahoo-inc.com

  


Re: COUNT, AVG and nulls

2009-07-06 Thread Yiping Han
+1.

--Yiping


On 7/6/09 10:58 AM, "Dmitriy Ryaboy"  wrote:

> +1 for standard semantics.
> 
> We need a COALESCE function to go along with this.
> 
> -D
> 
> On Mon, Jul 6, 2009 at 10:46 AM, Olga Natkovich  wrote:
> 
>> Hi,
>> 
>> 
>> 
>> The current implementation of COUNT and AVG in Pig counts null values.
>> This is inconsistent with SQL semantics and also with semantics of other
>> aggregated functions such as SUM, MIN, and MAX. Originally we chose this
>> implementation for performance reasons; however, we re-implemented both
>> functions to support multi-step combiner and now the cost of checking
>> for null for the case where combiner is invoked is trivial. (I ran some
>> tests with COUNT and they showed no performance difference.) We will pay
>> penalty for the non-combinable case including local mode but I think it
>> is worth the price to have consistent semantics. Also as we are working
>> on SQL support, having SQL compliant semantics becomes very desirable.
>> 
>> 
>> 
>> Please, let us know if you have any concerns. I am planning to make the
>> change later this week.
>> 
>> 
>> 
>> Olga
>> 
>> 

--
Yiping Han
F-3140 
(408)349-4403
y...@yahoo-inc.com



Re: COUNT, AVG and nulls

2009-07-06 Thread Dmitriy Ryaboy
+1 for standard semantics.

We need a COALESCE function to go along with this.

-D

On Mon, Jul 6, 2009 at 10:46 AM, Olga Natkovich  wrote:

> Hi,
>
>
>
> The current implementation of COUNT and AVG in Pig counts null values.
> This is inconsistent with SQL semantics and also with semantics of other
> aggregated functions such as SUM, MIN, and MAX. Originally we chose this
> implementation for performance reasons; however, we re-implemented both
> functions to support multi-step combiner and now the cost of checking
> for null for the case where combiner is invoked is trivial. (I ran some
> tests with COUNT and they showed no performance difference.) We will pay
> penalty for the non-combinable case including local mode but I think it
> is worth the price to have consistent semantics. Also as we are working
> on SQL support, having SQL compliant semantics becomes very desirable.
>
>
>
> Please, let us know if you have any concerns. I am planning to make the
> change later this week.
>
>
>
> Olga
>
>