Hello!

Considering the following two relations...

grunt> querys = load 'query' as (id:int, token:chararray);
grunt> dump querys
(11,foo)
(12,bar)
(13,frog)

and

grunt> documents = load 'document' as (id:int, text:chararray);
grunt> dump documents;
(21,foo bar frog)
(22,hello frog)

Is is possible to do a join where the query:token is not equal to but
contained in documents:text ?

eg
(11,foo,21,foo bar frog)
(12,bar,21,foo bar frog)
(13,frog,21,foo bar frog)
(13,frog,22,hello frog)

I can certainly do this in Java map/reduce (as we all had to in the
dark days days before pig) but is there a way to hack this together
with a custom udf or some other weird join backdoor (customer
partitioner for a group or something whacky) ???

It's been a long day, maybe I'm just missing some super obvious..

Cheers!
Mat

Reply via email to