I'm not sure if this can be done at the UDF level, or if it'd have to be
done lower level. Imagine you have a good candidate for a replicated join,
but beyond that you know most about the structure of one of the pieces of
information you are joining (for example, that you could build a binary
search tree from it and do your comparisons really quickly, or something).
Is there a way to make your own join, or extend the one in pig? I could
imagine a UDF that takes two bags, the left piece and the right piece,
constructs your join, etc, but I don't know that that would be as fast.

Any thoughts?

Reply via email to