On 3/5/12 7:19 PM, guoyun wrote:
Dear All:
this is the description of wiki about distinct:
grunt> A = load 'mydata' using PigStorage() as (a, b, c);
grunt>B = group A by a;
grunt> C = foreach B {
D = distinct A.b;
generate flatten(group), COUNT(D);
}
but if filed b have sub fileds,for example:
A = load 'mydata' using PigStorage() as (a, b(b1,b2,b3), c);
if i want to distinct D = distinct A.b.b1,how can i do?because pig is
not allowed to use D = distinct A.b.b1;
Thank you!
You need to use another nested foreach statement. -
C = foreach B { B1BAG = foreach A generate b.b1; D = distinct B1BAG;
generate flatten(group), COUNT(D);}
-Thejas