> > left.join(right, my_fuzzy_udf (left("cola"),right("cola"))) >
While this could work, the problem will be that we'll have to check every possible combination of tuples from left and right using your UDF. It would be best if you could somehow partition the problem so that we could reduce the number of comparisons. For example, if you had a fuzzy hash that you could do an equality check on in addition to the UDF, that would greatly speed up the computation.