This behavior is discussed in the count docs: http://pig.apache.org/docs/r0.10.0/func.html#count
The COUNT function follows syntax semantics and ignores nulls. What this means is that a tuple in the bag will not be counted if the FIRST FIELD in this tuple is NULL. If you want to include NULL values in the count computation, use COUNT_STAR<http://pig.apache.org/docs/r0.10.0/func.html#COUNT-STAR> . There is a proposal to change this though, which provides more context: https://issues.apache.org/jira/browse/PIG-1014 On Tue, Feb 5, 2013 at 2:30 PM, Adair Kovac <[email protected]> wrote: > Sorry, correcting an imprecision here--field1 is the first field of the > records that have been grouped; that it's the first field in the key is > nonessential. So basically *any* group/count that I have done in the past > could have been dropping records because the first field happened to be > something I didn't care about at the time that could be null. I am > distressed by this realization. > > Thanks again, > > Adair > > On Tue, Feb 5, 2013 at 3:14 PM, Adair Kovac <[email protected]> wrote: > > > Hi, guys, was wondering what's going on with this. > > > > In pig 0.9 if I do something like this: > > > > grouped = group data by (field1, field2); > > count = foreach grouped generate COUNT(data); > > > > That count is 0 wherever field1 is null regardless of what comes after. > > > > I can use COUNT_STAR() instead (data fresh from a group won't have any > > null records, right?), but it seems like that should be the expected > > behavior of COUNT(). > > > > This was obviously intended behavior, since it's right there in the > > function: > > > > if (t != null && t.size() > 0 && t.get(0) != null ) > > cnt++; > > > > but it just seems bizarre and inconvenient to me. Nor is it mentioned in > > the documentation, unless the bit written for people who are good at SQL > > implies it. Now I'm wondering which of my past scripts might be buggy > > because I didn't expect this behavior. > > > > Anyone have an explanation? > > > > Thanks, > > > > Adair > > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [email protected] going forward.*
