Hi,

I am trying to find a way to return only DISTINCT values within a bag.  Any 
ideas?

Thanks
Scott

A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
B = FILTER A BY mname IS NOT NULL;
C = FOREACH B GENERATE id, mname;
D = GROUP C BY id;
DUMP D;
(1,{(1,M),(1,M),(1,N)})
(2,{(2,I),(2,I)})
(3,{(3,T),(3,T),(3,T)})
(4,{(4,R),(4,I)})

E =   **NEED TO DISTINCT BAG***

DESIRED E OUTPUT
(1,{(1,M),(1,N)})
(2,{(2,I)})
(3,{(3,T)})
(4,{(4,R),(4,I)})




Reply via email to