On Mon, 28 Feb 2005 15:34:13 +0000, [EMAIL PROTECTED] (Justin Mason) writes:

> A paper at the spam conference suggested using an Edit Distance algorithm
> with very good results; the idea being, the edit distance from "cialis" to
> "C 1 a l | s" isn't as far as it is to "specialized" or so on.
>
> if I recall correctly, someone submitted an implementation quite a while
> ago on our BZ, but I think the FP rates were too high.   Given the
> recent paper's published results, though, it may be there are good ways
> to tweak it to get FPs at a tolerable rate.

I did an implementation of it some time ago, but I didn't get a chance
to take it far enough to test out its effectiveness. I heard remarks
that naively applying edit distance is too slow. To avoid having a FP
rate that was too high, the edit-distance costs are paramaterized, so
some edits are much cheaper than others. Eg.

# Cost of replacing a character with a punctuation in the obfu.
setreps ("bcdfghijklmnpqrstvwxyz","*?.-",.08);
setreps ("aeiou","*?.-",.03);

# Cost to insert these into the obfuscated string is cheap
setins ("/\|()=-'!*`;:?+[]\"^",.01);
setins ("_,.",.01);

So, 'v.agr.' and 'v..ia...gra' both cost <.10  


Got a bugzilla# that I can attach the prototype code to?  (Also, is it
possible to report a bug/attach the code without creating a bugzilla
account?)

Scott

Reply via email to