Join on a dummy key or CROSS, then plug the token in a udf. Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
On Aug 29, 2012, at 4:56 PM, Mat Kelcey <[email protected]> wrote: > Hello! > > Considering the following two relations... > > grunt> querys = load 'query' as (id:int, token:chararray); > grunt> dump querys > (11,foo) > (12,bar) > (13,frog) > > and > > grunt> documents = load 'document' as (id:int, text:chararray); > grunt> dump documents; > (21,foo bar frog) > (22,hello frog) > > Is is possible to do a join where the query:token is not equal to but > contained in documents:text ? > > eg > (11,foo,21,foo bar frog) > (12,bar,21,foo bar frog) > (13,frog,21,foo bar frog) > (13,frog,22,hello frog) > > I can certainly do this in Java map/reduce (as we all had to in the > dark days days before pig) but is there a way to hack this together > with a custom udf or some other weird join backdoor (customer > partitioner for a group or something whacky) ??? > > It's been a long day, maybe I'm just missing some super obvious.. > > Cheers! > Mat
