Chris Santerre wrote:
> I remember that paper. I was impressed and sceptical at the same time. I
> could see it FPing a lot. One person in the crowd brought up Niagra vs. the
> V-drug word :) 
> 
> Cialis vs. Dial-Lisa 
> ect......

That was MailFrontier, using the term lexigraphical distancing rather
than edit distance.  He mentioned that (in addition to the words used
against the algorithm being chosen by humans) stopwords were hand-picked
to avoid false positives.

A quick google for 'edit distance' led me to a talk on string matching
algorithms with links to several edit distance implementations on CPAN:
http://cs.haifa.ac.il/~shlomo/talks/edit_distance/slides/all_in_one.html

A plugin to catch text substitutions for SA would need to be fast and/or
only get invoked for strings likely to produce a match.  The current
version of String::Approx might be a good starting point.

--d

Reply via email to