Alan <spamassassin.tw...@ambitonline.com> writes: > I've got someone who posted text from MS Office into an email (wish I > could ban that). The text contained a numbered list. The fourth list > item started with "Date & Time". The 4 and following period were in a > span element with a margin to separate it from the text but no actual > whitespace, so the plain text version comes up as (I've used {dot} to > avoid another trigger) "4{dot}Date & Time". This then triggered :
Wow, that's funny. But agreed it's ham... > 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: 4{dot}date (date)] This seems reasonable. 2 points is not a killer rule and that probably would not have messed up delivery. > 5.0 KAM_SOMETLD_ARE_BAD_TLD .stream, .trade, .pw, .top, .press, .bid & > .date TLD Abuse That's the KAM ruleset, not base, and given that it's an add-on rule I see that as effectively "the base rule should be scored 7" (at least for the domains that overlap). I suspect though that the rule/score are almost entirely right in terms of probability, for uses of those tlds as domains. They all sound sketchy. > Thus consigning a meeting agenda to the trash. I suspect this is an > uncommon but not rare false positive. > > These rules would benefit from excluding single character domain > matches (which IIRC would be invalid domains anyway). A this sort of > FP would be avoided. For bonus points excluding three-character roman > numerals under 10 (iii, vii, etc.) would be useful too. My own view is that no rule should be scored above about 3 unless it is vanishingly unlikely that the rule will fire on legit mail (even if the legit mail is messed up in ways that actually happen to legit mail). That's a different opinion than the one encoded in the KAM ruleset socres, which I interpret as saying that it's ok to have a few FPs if that's the price of getting rid of some nasty phishing/malware and a lot of spam. You need to think about your own needs on how to tune that FP and effectiveness tradeoff, and if you're not willing to live what I consider a little dangerously on FP risk then the KAM ruleset is not for you. I run it personally, and I find problems with rules that have very high scores hitting ham, maybe once a month or every few months, and I'm accumulating downscoring config. But it saves me from a vast amount of spam, I think. I would be very nervous if I were configuring it for lots of others, but I have the luxury of not having to admin mail for more than myself and family. My current config, in case you want to look at these rules and see what you think. Beware that the below is tuned to my personal ham; I'm on mailinglists where people occasionally discuss voicemail and watches. I no longer remember all the reasons, but surely it was that the rule fired on ham. score KAM_UNIV 2 # was 4.5 score KAM_SOMETLD_ARE_BAD_TLD 2 # was 5 score KAM_FAKE_DELIVER 3 # was 6.25 score KAM_SHORT 0.5 # was 2, can't figure out why it fires score KAM_LIST3_1 3.8 # was 5.8 score KAM_TIME 0.1 # was 3.0, FP on time-nuts score KAM_SENDGRID 0.3 # was 1.5, but now URIBL_GREY score KAM_ASCII_DIVIDERS 0.1 # can't figure out why it fires score KAM_MARKADV 5 # was 10 score KAM_VM 3 # was 5
signature.asc
Description: PGP signature