Felix, If I understand correctly what you need is to output a bag which has n-1 tuples if n > 1, where n is the original number of tupled in the group's bag. At the moment you will have to implement a custom UDF, pass in your bag and output a bag as per the criteria. Refer to http://wiki.apache.org/pig/UDFManual For help on implementing the UDF
Regards -...@nkur On 2/9/10 4:00 PM, "Gerrit Jansen van Vuuren" <[email protected]> wrote: Hi, Hope this helps: Example: A = LOAD 'test.txt' as(p,q); DUMP A; (a 1) (b 2) (a 1) (a 3) (b 2) (a 1) B = GROUP A BY (p,q); DUMP B; ((a 1,),{(a 1,),(a 1,),(a 1,)}) ((a 3,),{(a 3,)}) ((b 2,),{(b 2,),(b 2,)}) C = FOREACH B{ oneTuple = LIMIT A 1; // LIMIT produces only one output, you can flatten afterwards to get the actual fields from the tuple. GENERATE group.p, group.q, FLATTEN(oneTuple); } DUMP C; (a 1,,a 1,) (a 3,,a 3,) (b 2,,b 2,) Cheers, Gerrit -----Original Message----- From: felix gao [mailto:[email protected]] Sent: Monday, February 08, 2010 7:44 PM To: [email protected] Subject: Dropping one tuple from the bag purposefully Ok, this might sound little weird. my schema is f1, f2, f3 ,f4, f5, f6 when group by f1, f2, f3. I need to drop exactly one tuple when I have more than one tuples by grouping f1,f2,f3. Also the values of the tuples in each group could occur more than once. Is there a way to do it in pig? Thanks, Felix
