Re: Dropping one tuple from the bag purposefully

Ankur C. Goel Tue, 09 Feb 2010 04:49:44 -0800

Felix,
If I understand correctly what you need is to output a bag which has n-1 tuples 
if n > 1, where n is the original number of tupled in the group's bag.
At the moment you will have to implement a custom UDF, pass in your bag and 
output a bag as per the criteria. Refer to http://wiki.apache.org/pig/UDFManual
For help on implementing the UDF


Regards
-...@nkur

On 2/9/10 4:00 PM, "Gerrit Jansen van Vuuren" <[email protected]> 
wrote:

Hi,

Hope this helps:

Example:
A = LOAD 'test.txt' as(p,q);

DUMP A;

(a 1)
(b 2)
(a 1)
(a 3)
(b 2)
(a 1)

B = GROUP A BY (p,q);

DUMP B;

((a 1,),{(a 1,),(a 1,),(a 1,)})
((a 3,),{(a 3,)})
((b 2,),{(b 2,),(b 2,)})

C = FOREACH B{
       oneTuple = LIMIT A 1; // LIMIT produces only one output, you can
flatten afterwards to get the actual fields from the tuple.
       GENERATE group.p, group.q, FLATTEN(oneTuple);
}

DUMP C;
(a 1,,a 1,)
(a 3,,a 3,)
(b 2,,b 2,)


Cheers,
 Gerrit

-----Original Message-----
From: felix gao [mailto:[email protected]]
Sent: Monday, February 08, 2010 7:44 PM
To: [email protected]
Subject: Dropping one tuple from the bag purposefully

Ok, this might sound little weird.

my schema is f1, f2, f3 ,f4, f5, f6
when group by f1, f2, f3. I need to drop exactly one tuple when I have more
than one tuples by grouping f1,f2,f3. Also the values of the tuples in each
group could occur more than once.

Is there a way to do it in pig?

Thanks,

Felix

Re: Dropping one tuple from the bag purposefully

Reply via email to