Thanks Gianmarco, that is what I was looking for! -Marco On Fri, Aug 24, 2012 at 12:19 PM, Gianmarco De Francisci Morales < [email protected]> wrote:
> I would say something along these lines: > > B = group A by *; > C = foreach B generate group, COUNT(A) as count; > D = filter C by count > 1; > E = foreach D generate group; > > Disclaimer: untested code. > > Cheers, > -- > Gianmarco > > > > On Fri, Aug 24, 2012 at 11:35 AM, Marco Cadetg <[email protected]> wrote: > > > Hi there, > > > > What is the best way to retrieve duplicates from a bag. I basically would > > like to do something like the opposite of DISTINCT. > > > > A: {userid: long,foo: long,bar: long} > > > > dump A > > (1,2,3) > > (1,2,3) > > (1,3,2) > > (2,3,1) > > > > Now I would like to have a bag which contains > > (1,2,3) > > (1,2,3) > > > > Thanks, > > -Marco > > >
