That's actually the documented behavior: https://pig.apache.org/docs/r0.10.0/func.html#count
There was some discussion about changing this: https://issues.apache.org/jira/browse/PIG-1014 Patches gratefully accepted.. D On Sat, Sep 14, 2013 at 12:01 AM, centerqi hu <[email protected]> wrote: > The sample.txt file content: > > android,u1,taobao1 > android,u1,taobao1 > ,u2,taobao2 > > RR = LOAD '/user/www/udc/output/bugfind/sample.txt' USING PigStorage(',') > as (platform, machineID, productID); > RB = GROUP RR BY (productID); > RES = FOREACH RB{ > ITEMUV = DISTINCT RR.machineID; > GENERATE flatten(group),COUNT(ITEMUV) AS UV,COUNT(RR) AS > PV; > }; > DUMP RES; > > OUTPUT: > > (taobao1,1,2) > (taobao2,1,0) > > Why taobao2 the pv is 0, but uv is 1? > > I view? the source code of the COUNT function > > If the first column is null, cnt will not increase > > while (it.hasNext()){ > Tuple t = (Tuple)it.next(); > if (t != null && t.size() > 0 && t.get(0) != null ) > cnt++; > } > > -- > [email protected]|齐忠 >
