Basically you want to join on a regular expression, correct?
Unfortunately Map Reduce (and thus Pig) is spectacularly bad at non-
equijoins. Is 'prefixes' small enough to fit in memory? If so, you
could write a UDF that loaded it into memory and did the comparison.
This way the join would be done in the map phase.
Alan.
On Nov 2, 2010, at 10:19 AM, Joe Ciaramitaro wrote:
Hi all,
I have 2 data files. One which contains a number of records, and
one which contains a number of prefixes.
A = load 'data' AS (id, name)
B = load 'prefixes' AS (prefix)
I'd like to pull records in A whose name begins with prefix
The prefixes are of varying lengths
I've been scouring the documentation, but haven't figured out what
the best approach could be.
Thanks for any help,
Joe