Hi,

Is there any operator or UDF in Pig similar to the IN operator of SQL?
Specifically, given a large bag A and a very small single-column bag B, I want 
to select tuples in A with a field a1 that has its value in B.
My current method of doing it using a JOIN (below) seems very expensive.
grunt> A = LOAD '/tmp/a.txt' USING PigStorage(',') AS 
(a1:chararray,a2:chararray);
grunt> B = LOAD '/tmp/b.txt' USING PigStorage(',') AS (b1:chararray);           
  
grunt> C = JOIN A by a2, B by b1;

It'll be very useful if such an operator is available for use in FILTER and 
SPLIT as well. 
For example, if I need to substitute '0' when a2 is NOT IN B::b1, currently, 
there's no easy way, I guess.
 

Thanks,
Sundar (a Pig n00b)
 
"That language is an instrument of human reason, and not merely a medium for 
the expression of thought, is a truth generally admitted."
- George Boole, quoted in Iverson's Turing Award Lecture

Reply via email to