On the minus side, no matter how many times I send some messages to my "Learn Spam" folder (where it's processed and emptied nightly), certain messages I get many times a day still are not marked as spam. Mostly rolex watch spams, but there are others as well.
have you trained at least 200 spam and 200 ham? until you hit that point, spamassassin only operates in "rules" mode.
On all of these messages, I've noticed rules like BAYES_00, BAYES_20, etc., which I'm assuming are "score droppers" that reduce the spam score of an email.
right, if you mark a message as ham, sa thinks that future messages which contain similar words are more likely to be good.
How can I find out what triggers these rules, and stop it from happening on these emails? Where is the bayes database even stored by default? (I certainly haven't changed it, so it should be there).
the bayes db is in $HOME/.spamassassin/bayes_* by default. you can remove those files (if you want to start training from scratch) or use sa-learn to manipulate them.
-jsd-