Thanks!

A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
B = FILTER A BY mname IS NOT NULL;
C = FOREACH B GENERATE id, mname;
D = DISTINCT C;
E = GROUP D BY id;

-----Original Message-----
From: Brian Adams [mailto:[email protected]] 
Sent: Wednesday, June 16, 2010 1:18 PM
To: [email protected]
Subject: Re: DISTINCT BAG

http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#DISTINCT

?

On Wed, 2010-06-16 at 13:14 -0700, Scott Wine wrote:
> Hi,
> 
> I am trying to find a way to return only DISTINCT values within a bag.  Any 
> ideas?
> 
> Thanks
> Scott
> 
> A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
> B = FILTER A BY mname IS NOT NULL;
> C = FOREACH B GENERATE id, mname;
> D = GROUP C BY id;
> DUMP D;
> (1,{(1,M),(1,M),(1,N)})
> (2,{(2,I),(2,I)})
> (3,{(3,T),(3,T),(3,T)})
> (4,{(4,R),(4,I)})
> 
> E =   **NEED TO DISTINCT BAG***
> 
> DESIRED E OUTPUT
> (1,{(1,M),(1,N)})
> (2,{(2,I)})
> (3,{(3,T)})
> (4,{(4,R),(4,I)})
> 
> 
> 
> 

Reply via email to