Will try, Ankur. Thanks. - Sundar
"That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted." - George Boole, quoted in Iverson's Turing Award Lecture ----- Original Message ---- > From: Ankur C. Goel <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Wed, June 2, 2010 4:58:26 PM > Subject: Re: Pig facility analogous to SQL's IN? > > For the case you described, you can do a right outer replicated join followed > by > a projection to substitute '0' for missing values. -...@nkur On > 6/1/10 1:15 PM, "BalaSundaraRaman" < > href="mailto:[email protected]">[email protected]> > wrote: Thanks Ankur. But, in my actual case, it's a COGROUP and not a > join. "replicated" can't be used with COGROUP, no? Any work > around? - Sundar "That language is an instrument of human reason, > and not merely a medium for the expression of thought, is a truth generally > admitted." - George Boole, quoted in Iverson's Turing Award > Lecture ----- Original Message ---- > From: Ankur C. Goel > < > href="mailto:[email protected]">[email protected]> > To: " > ymailto="mailto:[email protected]" > href="mailto:[email protected]">[email protected]" < > ymailto="mailto:[email protected]" > href="mailto:[email protected]">[email protected]> > > Sent: Tue, June 1, 2010 12:39:56 PM > Subject: Re: Pig facility analogous > to SQL's IN? > > If data represented by relation B can fit in memory > than you can simply use a > "replicated" join which is inexpensive and is > a map-side join. C = JOIN > A by a2, B by b1 USING > "replicated"; -...@nkur On 5/31/10 3:32 > PM, > "BalaSundaraRaman" < > href="mailto: > ymailto="mailto:[email protected]" > href="mailto:[email protected]">[email protected]"> > ymailto="mailto:[email protected]" > href="mailto:[email protected]">[email protected]> > > wrote: Hi, Is there any operator or UDF in Pig similar to the > IN > operator of SQL? Specifically, given a large bag A and a very > small > single-column bag B, I want to select tuples in A with a field a1 > that has its > value in B. My current method of doing it using a JOIN > (below) seems very > expensive. grunt> A = LOAD '/tmp/a.txt' USING > PigStorage(',') AS > (a1:chararray,a2:chararray); grunt> B = LOAD > '/tmp/b.txt' USING > PigStorage(',') AS (b1:chararray); grunt> C = > JOIN A by a2, B by > b1; It'll be very useful if such an operator > is available for use in > FILTER and SPLIT as well. For example, if I > need to substitute '0' when a2 is > NOT IN B::b1, currently, there's no > easy way, I > guess. Thanks, Sundar (a Pig > n00b) "That language is an > instrument of human reason, and not > merely a medium for the expression of > thought, is a truth generally > admitted." - George Boole, quoted in Iverson's > Turing Award > Lecture
