Hi,

 

The current implementation of COUNT and AVG in Pig counts null values.
This is inconsistent with SQL semantics and also with semantics of other
aggregated functions such as SUM, MIN, and MAX. Originally we chose this
implementation for performance reasons; however, we re-implemented both
functions to support multi-step combiner and now the cost of checking
for null for the case where combiner is invoked is trivial. (I ran some
tests with COUNT and they showed no performance difference.) We will pay
penalty for the non-combinable case including local mode but I think it
is worth the price to have consistent semantics. Also as we are working
on SQL support, having SQL compliant semantics becomes very desirable.

 

Please, let us know if you have any concerns. I am planning to make the
change later this week.

 

Olga

Reply via email to