You could specify a condition using the the RegexMatch or RegexExtract UDF in piggybank:
http://svn.apache.org/repos/asf/hadoop/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/string/RegexMatch.java http://svn.apache.org/repos/asf/hadoop/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/string/RegexExtract.java On Thu, Feb 25, 2010 at 10:17 AM, Jan Zimmek <[email protected]> wrote: > hi, > > i recently found pig, really like it and want to use it for one of our > actual projects. > > getting the basics running was easy, but now i am struggling one a problem. > > i am trying to get customers whose email is not blacklisted. > > blacklist entires can be specified as: > > [email protected] > > or wildcarded > > @domain.de > > in sql i would solve this by: > > ---- > > select > * > from > customer c > left join blacklist b > on > c.email like concat("%",b.email) > where > b.email is null > > ---- > > this is the structure of my input files: > > raw_customer = LOAD 'customer.csv' USING PigStorage('\t') AS (id: long, > email: chararray); > raw_blacklist = LOAD 'blacklist.csv' USING PigStorage('\t') AS (email: > chararray); > > > how would i solve this using pig ? - especially handling the "like %" > condition. > > i already looked into udf, but need some advice how to implement this. > > > any help would be really appreciated. > > regards, > jan > >
