Greetings. I currently have two sets of data, let's call them QUERY and TARGETS. What I am currently trying to do is the following:
1. For each row in QUERY extract a 'query' property 2. For each 'query' extracted locate all TARGET rows whose 'value' property "matches" the 'query' property. Note: Determining the "matches" state involves the execution of a custom UDF to determine the validity of equality. (Essentially implementing a SQL LIKE-style request) As a result there doesn't appear to be in-built Pig functionality to perform this comparison. I have tried multiple methods including utilizing a FOREACH with a FILTER command, convoluted COGROUPing, and countless other methods to no avail. The only method that I've found works is to compute a full CROSS between QUERY and TARGETS and performing the FILTER on the result. However the execution time of this single task is on the order runs on the order of 30 minutes and would only grow exponentially once operational data is introduced. So, am I missing something obvious or is there some standard method to implement this functionality? (Please be kind, for as embarrassingly long as I have been on the internet I have never before submitted information to a mailing list.)