Re: Does pig support in clause?

Gianmarco De Francisci Morales Mon, 25 Jun 2012 22:57:04 -0700

Bloom filters would help efficiency here.
A bloom join or semi-join would be a nice addition to Pig.


Cheers,
--
Gianmarco




On Mon, Jun 25, 2012 at 7:50 PM, Alan Gates <[email protected]> wrote:

> Agreed.  And with some optimization we could make semi-join more efficient
> than this since it only needs to keep one record per key per map instead of
> all the records for a key.
>
> Alan.
>
> On Jun 25, 2012, at 10:17 AM, Russell Jurney wrote:
>
> > This could be a cool rewrite feature like CUBE/SAMPLE.
> >
> > Russell Jurney http://datasyndrome.com
> >
> > On Jun 25, 2012, at 9:39 AM, Alan Gates <[email protected]> wrote:
> >
> >> This type of in is really a semi-join.  So you could rewrite this as:
> >>
> >> B1 = join A by A1, C by A1;
> >> B2 = filter B1 by SIZE(C) > 0;
> >> B = foreach B2 flatten(A);
> >>
> >> Alan.
> >>
> >> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
> >>
> >>> Dear all,
> >>>
> >>> in the sql, there is a in clause  which is used to check if the value
> >>> is in a set or not? Does pig also have the same in clause? Such as:
> >>>
> >>> B = filter A by A1 in C;
> >>>
> >>> A,B,C are relation names and A1 is a column_name of A.
> >>>
> >>> Thanks!
> >>>
> >>> Yong
> >>
>
>

Re: Does pig support in clause?

Reply via email to