Hi,
Hope this helps:
Example:
A = LOAD 'test.txt' as(p,q);
DUMP A;
(a 1)
(b 2)
(a 1)
(a 3)
(b 2)
(a 1)
B = GROUP A BY (p,q);
DUMP B;
((a 1,),{(a 1,),(a 1,),(a 1,)})
((a 3,),{(a 3,)})
((b 2,),{(b 2,),(b 2,)})
C = FOREACH B{
oneTuple = LIMIT A 1; // LIMIT produces only one output, you can
flatten afterwards to get the actual fields from the tuple.
GENERATE group.p, group.q, FLATTEN(oneTuple);
}
DUMP C;
(a 1,,a 1,)
(a 3,,a 3,)
(b 2,,b 2,)
Cheers,
Gerrit
-----Original Message-----
From: felix gao [mailto:[email protected]]
Sent: Monday, February 08, 2010 7:44 PM
To: [email protected]
Subject: Dropping one tuple from the bag purposefully
Ok, this might sound little weird.
my schema is f1, f2, f3 ,f4, f5, f6
when group by f1, f2, f3. I need to drop exactly one tuple when I have more
than one tuples by grouping f1,f2,f3. Also the values of the tuples in each
group could occur more than once.
Is there a way to do it in pig?
Thanks,
Felix