Agreed.  And with some optimization we could make semi-join more efficient than 
this since it only needs to keep one record per key per map instead of all the 
records for a key.

Alan.

On Jun 25, 2012, at 10:17 AM, Russell Jurney wrote:

> This could be a cool rewrite feature like CUBE/SAMPLE.
> 
> Russell Jurney http://datasyndrome.com
> 
> On Jun 25, 2012, at 9:39 AM, Alan Gates <[email protected]> wrote:
> 
>> This type of in is really a semi-join.  So you could rewrite this as:
>> 
>> B1 = join A by A1, C by A1;
>> B2 = filter B1 by SIZE(C) > 0;
>> B = foreach B2 flatten(A);
>> 
>> Alan.
>> 
>> On Jun 25, 2012, at 2:50 AM, yonghu wrote:
>> 
>>> Dear all,
>>> 
>>> in the sql, there is a in clause  which is used to check if the value
>>> is in a set or not? Does pig also have the same in clause? Such as:
>>> 
>>> B = filter A by A1 in C;
>>> 
>>> A,B,C are relation names and A1 is a column_name of A.
>>> 
>>> Thanks!
>>> 
>>> Yong
>> 

Reply via email to