Hi,
The current implementation of COUNT and AVG in Pig counts null values. This is inconsistent with SQL semantics and also with semantics of other aggregated functions such as SUM, MIN, and MAX. Originally we chose this implementation for performance reasons; however, we re-implemented both functions to support multi-step combiner and now the cost of checking for null for the case where combiner is invoked is trivial. (I ran some tests with COUNT and they showed no performance difference.) We will pay penalty for the non-combinable case including local mode but I think it is worth the price to have consistent semantics. Also as we are working on SQL support, having SQL compliant semantics becomes very desirable. Please, let us know if you have any concerns. I am planning to make the change later this week. Olga