Hi, I am trying to find a way to return only DISTINCT values within a bag. Any ideas?
Thanks
Scott
A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
B = FILTER A BY mname IS NOT NULL;
C = FOREACH B GENERATE id, mname;
D = GROUP C BY id;
DUMP D;
(1,{(1,M),(1,M),(1,N)})
(2,{(2,I),(2,I)})
(3,{(3,T),(3,T),(3,T)})
(4,{(4,R),(4,I)})
E = **NEED TO DISTINCT BAG***
DESIRED E OUTPUT
(1,{(1,M),(1,N)})
(2,{(2,I)})
(3,{(3,T)})
(4,{(4,R),(4,I)})
