Re: Pig facility analogous to SQL's IN?

BalaSundaraRaman Tue, 01 Jun 2010 00:46:02 -0700

Thanks Ankur. But, in my actual case, it's a COGROUP and not a join.
"replicated" can't be used with COGROUP, no?
Any work around?


- Sundar

 "That language is an instrument of human reason, and not merely a medium for 
the expression of thought, is a truth generally admitted."
- George Boole, quoted in Iverson's Turing Award Lecture



----- Original Message ----
> From: Ankur C. Goel <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Tue, June 1, 2010 12:39:56 PM
> Subject: Re: Pig facility analogous to SQL's IN?
> 
> If data represented by relation B can fit in memory than you can simply use a 
> "replicated" join which is inexpensive and is a map-side join.

C = JOIN 
> A by a2, B by b1 USING "replicated";

-...@nkur


On 5/31/10 3:32 
> PM, "BalaSundaraRaman" <
> href="mailto:[email protected]";>[email protected]> 
> wrote:

Hi,

Is there any operator or UDF in Pig similar to the IN 
> operator of SQL?
Specifically, given a large bag A and a very small 
> single-column bag B, I want to select tuples in A with a field a1 that has 
> its 
> value in B.
My current method of doing it using a JOIN (below) seems very 
> expensive.
grunt> A = LOAD '/tmp/a.txt' USING PigStorage(',') AS 
> (a1:chararray,a2:chararray);
grunt> B = LOAD '/tmp/b.txt' USING 
> PigStorage(',') AS (b1:chararray);
grunt> C = JOIN A by a2, B by 
> b1;

It'll be very useful if such an operator is available for use in 
> FILTER and SPLIT as well.
For example, if I need to substitute '0' when a2 is 
> NOT IN B::b1, currently, there's no easy way, I 
> guess.


Thanks,
Sundar (a Pig n00b)

"That language is an 
> instrument of human reason, and not merely a medium for the expression of 
> thought, is a truth generally admitted."
- George Boole, quoted in Iverson's 
> Turing Award Lecture

Re: Pig facility analogous to SQL's IN?

Reply via email to