Thanks! A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray); B = FILTER A BY mname IS NOT NULL; C = FOREACH B GENERATE id, mname; D = DISTINCT C; E = GROUP D BY id;
-----Original Message----- From: Brian Adams [mailto:[email protected]] Sent: Wednesday, June 16, 2010 1:18 PM To: [email protected] Subject: Re: DISTINCT BAG http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#DISTINCT ? On Wed, 2010-06-16 at 13:14 -0700, Scott Wine wrote: > Hi, > > I am trying to find a way to return only DISTINCT values within a bag. Any > ideas? > > Thanks > Scott > > A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray); > B = FILTER A BY mname IS NOT NULL; > C = FOREACH B GENERATE id, mname; > D = GROUP C BY id; > DUMP D; > (1,{(1,M),(1,M),(1,N)}) > (2,{(2,I),(2,I)}) > (3,{(3,T),(3,T),(3,T)}) > (4,{(4,R),(4,I)}) > > E = **NEED TO DISTINCT BAG*** > > DESIRED E OUTPUT > (1,{(1,M),(1,N)}) > (2,{(2,I)}) > (3,{(3,T)}) > (4,{(4,R),(4,I)}) > > > >
