Greetings.

I currently have two sets of data, let's call them QUERY and TARGETS. What I am 
currently trying to do is the following:

1. For each row in QUERY extract a 'query' property
2. For each 'query' extracted locate all TARGET rows whose 'value' property 
"matches" the 'query' property.

Note: Determining the "matches" state involves the execution of a custom UDF to 
determine the validity of equality. (Essentially implementing a SQL LIKE-style 
request) As a result there doesn't appear to be in-built Pig functionality to 
perform this comparison.

I have tried multiple methods including utilizing a FOREACH with a FILTER 
command, convoluted COGROUPing, and countless other methods to no avail. The 
only method that I've found works is to compute a full CROSS between QUERY and 
TARGETS and performing the FILTER on the result. However the execution time of 
this single task is on the order runs on the order of 30 minutes and would only 
grow exponentially once operational data is introduced.

So, am I missing something obvious or is there some standard method to 
implement this functionality?

(Please be kind, for as embarrassingly long as I have been on the internet I 
have never before submitted information to a mailing list.)

Reply via email to