Thanks Ankur. But, in my actual case, it's a COGROUP and not a join. "replicated" can't be used with COGROUP, no? Any work around?
- Sundar "That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted." - George Boole, quoted in Iverson's Turing Award Lecture ----- Original Message ---- > From: Ankur C. Goel <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Tue, June 1, 2010 12:39:56 PM > Subject: Re: Pig facility analogous to SQL's IN? > > If data represented by relation B can fit in memory than you can simply use a > "replicated" join which is inexpensive and is a map-side join. C = JOIN > A by a2, B by b1 USING "replicated"; -...@nkur On 5/31/10 3:32 > PM, "BalaSundaraRaman" < > href="mailto:[email protected]">[email protected]> > wrote: Hi, Is there any operator or UDF in Pig similar to the IN > operator of SQL? Specifically, given a large bag A and a very small > single-column bag B, I want to select tuples in A with a field a1 that has > its > value in B. My current method of doing it using a JOIN (below) seems very > expensive. grunt> A = LOAD '/tmp/a.txt' USING PigStorage(',') AS > (a1:chararray,a2:chararray); grunt> B = LOAD '/tmp/b.txt' USING > PigStorage(',') AS (b1:chararray); grunt> C = JOIN A by a2, B by > b1; It'll be very useful if such an operator is available for use in > FILTER and SPLIT as well. For example, if I need to substitute '0' when a2 is > NOT IN B::b1, currently, there's no easy way, I > guess. Thanks, Sundar (a Pig n00b) "That language is an > instrument of human reason, and not merely a medium for the expression of > thought, is a truth generally admitted." - George Boole, quoted in Iverson's > Turing Award Lecture
