You can move the md5 generation into the SQL server. Of course,
you'd want to be mindful of the communications channel between
SpamAssassin and the SQL server.
I was thinking that the database/whatever would be populated by feeding
in lists of dsto=len passwords, since they seem to be more or less
freely and at least semi-legally available,
How about modifying the interface to bayse to MD5 all of the words in the
email and match against that?
There seems to be a basic conceptual flaw here with the current thinking.
It is bad to store an unencrypted password.
*How do you KNOW that any random word in any random email is NOT a
password?*
You don't. Any word or phrase can be a password.
Putting ANY word or phrase in a rule or database is potentially storing an
unencrypted password,
even if the database used is named "shopping list" or the rule is
KNOWN_SPAMMER_NAMES.
Just because the extortion email claims to know a password does not mean
that it IS (or ever was) a password. The email address it is sent to might
be the acual password, and the putative password might be a mcguffin. Or
some phrase in the extortion mail might be the acutal password.
What if some major account is compromised because the password of "I knOw
YOur passwOrd" is discovered and you had a rule that contained =~ /i know
your password/i ? That woulnd't look very good for you, would it?
Clearly you are best off modifying the input stream to SA to MD5 all of the
input text, and rewrite all local rules to use MD5 rule names and have MD5
lookup text. You can probably write a simple preprocessor for the existing
SA rule base, so you don't have to take chances there.
Loren