Thanks Gianmarco, that is what I was looking for!
-Marco

On Fri, Aug 24, 2012 at 12:19 PM, Gianmarco De Francisci Morales <
[email protected]> wrote:

> I would say something along these lines:
>
> B = group A by *;
> C = foreach B generate group, COUNT(A) as count;
> D = filter C by count > 1;
> E = foreach D generate group;
>
> Disclaimer: untested code.
>
> Cheers,
> --
> Gianmarco
>
>
>
> On Fri, Aug 24, 2012 at 11:35 AM, Marco Cadetg <[email protected]> wrote:
>
> > Hi there,
> >
> > What is the best way to retrieve duplicates from a bag. I basically would
> > like to do something like the opposite of DISTINCT.
> >
> > A: {userid: long,foo: long,bar: long}
> >
> > dump A
> > (1,2,3)
> > (1,2,3)
> > (1,3,2)
> > (2,3,1)
> >
> > Now I would like to have a bag which contains
> > (1,2,3)
> > (1,2,3)
> >
> > Thanks,
> > -Marco
> >
>

Reply via email to