I would say something along these lines: B = group A by *; C = foreach B generate group, COUNT(A) as count; D = filter C by count > 1; E = foreach D generate group;
Disclaimer: untested code. Cheers, -- Gianmarco On Fri, Aug 24, 2012 at 11:35 AM, Marco Cadetg <[email protected]> wrote: > Hi there, > > What is the best way to retrieve duplicates from a bag. I basically would > like to do something like the opposite of DISTINCT. > > A: {userid: long,foo: long,bar: long} > > dump A > (1,2,3) > (1,2,3) > (1,3,2) > (2,3,1) > > Now I would like to have a bag which contains > (1,2,3) > (1,2,3) > > Thanks, > -Marco >
