>
> left.join(right, my_fuzzy_udf (left("cola"),right("cola")))
>

While this could work, the problem will be that we'll have to check every
possible combination of tuples from left and right using your UDF.  It
would be best if you could somehow partition the problem so that we could
reduce the number of comparisons.  For example, if you had a fuzzy hash
that you could do an equality check on in addition to the UDF, that would
greatly speed up the computation.

Reply via email to