You could specify a condition using the the RegexMatch or RegexExtract UDF
in piggybank:

http://svn.apache.org/repos/asf/hadoop/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/string/RegexMatch.java

http://svn.apache.org/repos/asf/hadoop/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/string/RegexExtract.java

On Thu, Feb 25, 2010 at 10:17 AM, Jan Zimmek <[email protected]> wrote:

> hi,
>
> i recently found pig, really like it and want to use it for one of our
> actual projects.
>
> getting the basics running was easy, but now i am struggling one a problem.
>
> i am trying to get customers whose email is not blacklisted.
>
> blacklist entires can be specified as:
>
> [email protected]
>
> or wildcarded
>
> @domain.de
>
> in sql i would solve this by:
>
> ----
>
> select
>  *
> from
>  customer c
> left join blacklist b
> on
>  c.email like concat("%",b.email)
> where
>  b.email is null
>
> ----
>
> this is the structure of my input files:
>
> raw_customer = LOAD 'customer.csv' USING PigStorage('\t') AS (id: long,
> email: chararray);
> raw_blacklist = LOAD 'blacklist.csv' USING PigStorage('\t') AS (email:
> chararray);
>
>
> how would i solve this using pig ? - especially handling the "like %"
> condition.
>
> i already looked into udf, but need some advice how to implement this.
>
>
> any help would be really appreciated.
>
> regards,
> jan
>
>

Reply via email to