Will try, Ankur. Thanks.

- Sundar

 "That language is an instrument of human reason, and not merely a medium for 
the expression of thought, is a truth generally admitted."
- George Boole, quoted in Iverson's Turing Award Lecture



----- Original Message ----
> From: Ankur C. Goel <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Wed, June 2, 2010 4:58:26 PM
> Subject: Re: Pig facility analogous to SQL's IN?
> 
> For the case you described, you can do a right outer replicated join followed 
> by 
> a projection to substitute '0' for missing values.

-...@nkur


On 
> 6/1/10 1:15 PM, "BalaSundaraRaman" <
> href="mailto:[email protected]";>[email protected]> 
> wrote:

Thanks Ankur. But, in my actual case, it's a COGROUP and not a 
> join.
"replicated" can't be used with COGROUP, no?
Any work 
> around?

- Sundar

"That language is an instrument of human reason, 
> and not merely a medium for the expression of thought, is a truth generally 
> admitted."
- George Boole, quoted in Iverson's Turing Award 
> Lecture



----- Original Message ----
> From: Ankur C. Goel 
> <
> href="mailto:[email protected]";>[email protected]>
> To: "
> ymailto="mailto:[email protected]"; 
> href="mailto:[email protected]";>[email protected]" <
> ymailto="mailto:[email protected]"; 
> href="mailto:[email protected]";>[email protected]>
> 
> Sent: Tue, June 1, 2010 12:39:56 PM
> Subject: Re: Pig facility analogous 
> to SQL's IN?
>
> If data represented by relation B can fit in memory 
> than you can simply use a
> "replicated" join which is inexpensive and is 
> a map-side join.

C = JOIN
> A by a2, B by b1 USING 
> "replicated";

-...@nkur


On 5/31/10 3:32
> PM, 
> "BalaSundaraRaman" <
> href="mailto:
> ymailto="mailto:[email protected]"; 
> href="mailto:[email protected]";>[email protected]">
> ymailto="mailto:[email protected]"; 
> href="mailto:[email protected]";>[email protected]>
> 
> wrote:

Hi,

Is there any operator or UDF in Pig similar to the 
> IN
> operator of SQL?
Specifically, given a large bag A and a very 
> small
> single-column bag B, I want to select tuples in A with a field a1 
> that has its
> value in B.
My current method of doing it using a JOIN 
> (below) seems very
> expensive.
grunt> A = LOAD '/tmp/a.txt' USING 
> PigStorage(',') AS
> (a1:chararray,a2:chararray);
grunt> B = LOAD 
> '/tmp/b.txt' USING
> PigStorage(',') AS (b1:chararray);
grunt> C = 
> JOIN A by a2, B by
> b1;

It'll be very useful if such an operator 
> is available for use in
> FILTER and SPLIT as well.
For example, if I 
> need to substitute '0' when a2 is
> NOT IN B::b1, currently, there's no 
> easy way, I
> guess.


Thanks,
Sundar (a Pig 
> n00b)

"That language is an
> instrument of human reason, and not 
> merely a medium for the expression of
> thought, is a truth generally 
> admitted."
- George Boole, quoted in Iverson's
> Turing Award 
> Lecture

Reply via email to