Re: COUNT, AVG and nulls
+1 (count shouldn't count null) Yiping Han wrote: +1. --Yiping On 7/6/09 10:58 AM, "Dmitriy Ryaboy" wrote: +1 for standard semantics. We need a COALESCE function to go along with this. -D On Mon, Jul 6, 2009 at 10:46 AM, Olga Natkovich wrote: Hi, The current implementation of COUNT and AVG in Pig counts null values. This is inconsistent with SQL semantics and also with semantics of other aggregated functions such as SUM, MIN, and MAX. Originally we chose this implementation for performance reasons; however, we re-implemented both functions to support multi-step combiner and now the cost of checking for null for the case where combiner is invoked is trivial. (I ran some tests with COUNT and they showed no performance difference.) We will pay penalty for the non-combinable case including local mode but I think it is worth the price to have consistent semantics. Also as we are working on SQL support, having SQL compliant semantics becomes very desirable. Please, let us know if you have any concerns. I am planning to make the change later this week. Olga -- Yiping Han F-3140 (408)349-4403 y...@yahoo-inc.com
Re: COUNT, AVG and nulls
+1. --Yiping On 7/6/09 10:58 AM, "Dmitriy Ryaboy" wrote: > +1 for standard semantics. > > We need a COALESCE function to go along with this. > > -D > > On Mon, Jul 6, 2009 at 10:46 AM, Olga Natkovich wrote: > >> Hi, >> >> >> >> The current implementation of COUNT and AVG in Pig counts null values. >> This is inconsistent with SQL semantics and also with semantics of other >> aggregated functions such as SUM, MIN, and MAX. Originally we chose this >> implementation for performance reasons; however, we re-implemented both >> functions to support multi-step combiner and now the cost of checking >> for null for the case where combiner is invoked is trivial. (I ran some >> tests with COUNT and they showed no performance difference.) We will pay >> penalty for the non-combinable case including local mode but I think it >> is worth the price to have consistent semantics. Also as we are working >> on SQL support, having SQL compliant semantics becomes very desirable. >> >> >> >> Please, let us know if you have any concerns. I am planning to make the >> change later this week. >> >> >> >> Olga >> >> -- Yiping Han F-3140 (408)349-4403 y...@yahoo-inc.com
Re: COUNT, AVG and nulls
+1 for standard semantics. We need a COALESCE function to go along with this. -D On Mon, Jul 6, 2009 at 10:46 AM, Olga Natkovich wrote: > Hi, > > > > The current implementation of COUNT and AVG in Pig counts null values. > This is inconsistent with SQL semantics and also with semantics of other > aggregated functions such as SUM, MIN, and MAX. Originally we chose this > implementation for performance reasons; however, we re-implemented both > functions to support multi-step combiner and now the cost of checking > for null for the case where combiner is invoked is trivial. (I ran some > tests with COUNT and they showed no performance difference.) We will pay > penalty for the non-combinable case including local mode but I think it > is worth the price to have consistent semantics. Also as we are working > on SQL support, having SQL compliant semantics becomes very desirable. > > > > Please, let us know if you have any concerns. I am planning to make the > change later this week. > > > > Olga > >