-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Scott A Crosby writes: > On Mon, 28 Feb 2005 15:34:13 +0000, [EMAIL PROTECTED] (Justin Mason) writes: > > > A paper at the spam conference suggested using an Edit Distance algorithm > > with very good results; the idea being, the edit distance from "cialis" to > > "C 1 a l | s" isn't as far as it is to "specialized" or so on. > > > > if I recall correctly, someone submitted an implementation quite a while > > ago on our BZ, but I think the FP rates were too high. Given the > > recent paper's published results, though, it may be there are good ways > > to tweak it to get FPs at a tolerable rate. > > I did an implementation of it some time ago, but I didn't get a chance > to take it far enough to test out its effectiveness. I heard remarks > that naively applying edit distance is too slow. To avoid having a FP > rate that was too high, the edit-distance costs are paramaterized, so > some edits are much cheaper than others. Eg. > > # Cost of replacing a character with a punctuation in the obfu. > setreps ("bcdfghijklmnpqrstvwxyz","*?.-",.08); > setreps ("aeiou","*?.-",.03); > > # Cost to insert these into the obfuscated string is cheap > setins ("/\|()=-'!*`;:?+[]\"^",.01); > setins ("_,.",.01); > > So, 'v.agr.' and 'v..ia...gra' both cost <.10 > > Got a bugzilla# that I can attach the prototype code to? (Also, is it > possible to report a bug/attach the code without creating a bugzilla > account?) Hi Scott -- looks like there isn't a BZ entry that'd be suitable (at a glance). Also, it sounds like it's "new" enough that throwing it into an existing patch might be confusing anyway. So I'd say go ahead and create one. btw, you don't have a BZ account? how did you manage *that*! ;) we do prefer stuff go through BZ, since our CLA-tracking is implemented in there, and it's the de-facto-standard way for us to find old conversations, assertions, assumptions, and so on regarding implementation details -- we can simply say "see bug XXXX" in the changelog, and the entire thread of discussion can then be found with one click from that. But in an emergency you could always mail it to dev and hopefully some one will pick it up and create the bug for you. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCJY6aMJF5cimLx9ARAg1VAKCbDVxK/02yiEm1ZUFjjU7plLD1TwCfUw8v K6U9f9DS1SJn6KXgLkidx7s= =L0Yv -----END PGP SIGNATURE-----