The sample.txt file content:
android,u1,taobao1
android,u1,taobao1
,u2,taobao2
RR = LOAD '/user/www/udc/output/bugfind/sample.txt' USING PigStorage(',')
as (platform, machineID, productID);
RB = GROUP RR BY (productID);
RES = FOREACH RB{
ITEMUV = DISTINCT RR.machineID;
GENERATE flatten(group),COUNT(ITEMUV) AS UV,COUNT(RR) AS PV;
};
DUMP RES;
OUTPUT:
(taobao1,1,2)
(taobao2,1,0)
Why taobao2 the pv is 0, but uv is 1?
I view? the source code of the COUNT function
If the first column is null, cnt will not increase
while (it.hasNext()){
Tuple t = (Tuple)it.next();
if (t != null && t.size() > 0 && t.get(0) != null )
cnt++;
}
--
[email protected]|齐忠