Re: DCC/pyzor questions
On 14.03.22 20:15, Alex wrote: I'm seeing a lot of DCC/pyzor mail being marked as spam that shouldn't be, and want to see what can be done to prevent that. DCC contains fuzzy checksums of bulk messages, which means they have been seen on the internet multiple times. This includes common notifications from big sites as social networks. it is also possible to report message to DCC as bulk. pyzor contains fuzzy checksums of messahes that have been reported multiple times. neither of these means messages are spam, but both indicate it might be. unfortunately, short messages often hit, since the fuzzy checksums for short messages may often match. For example, many emails with just an image attachment and an empty body are hitting DCC. I thought I recalled a way to create a checksum of these empty messages and add them to an allow list, but it seems it is specific to the sender, based on /var/lib/dcc/testmsg-whitelist: # empty Exchange ok hex fuz1 e038b933 6003e07e 8e990536 110cfa90 How do I generate that signature? I've been unable to find any instructions on how to do it. Same with pyzor? Another example is an email I received from Pizza Hut. Their marketing emails hit DCC and pyzor and sendgrid, making it very difficult for that email to be delivered unless it also hits some negative bayes or is allowlisted. Do people add them to the welcomelist? Do you train marketing emails for bayes? I usually train many kinds of marketing messages so they don't hit BAYES_00 (BAYES_50 is usually OK) - marketing messages are very similar to typical spam and hitting BAYES_00 may lower cause for real spam. * 1.5 KAM_SENDGRID Sendgrid being exploited by scammers * 0.3 DIGEST_MULTIPLE Message hits more than one network digest check * 1.0 DCC_REPUT_95_98 DCC reputation between 95 and 98 % (mostly spam) * 0.5 KAM_REALLYHUGEIMGSRC RAW: Spam with image tags with ridiculously * huge http urls * 1.4 PYZOR_CHECK Listed in Pyzor * 3.0 BAYES_95 BODY: Bayes spam probability is 95 to 99% * [score: 0.9668] * 0.1 POISEN_SPAM_PILL_3 BODY: random spam to be learned in bayes Is sendgrid still as big of a problem as it was a year ago? if your wanted marketing messages hit BAYES_[89]*, simply train them as ham. There are a few negative rules, like TXREP and DKIMWL_WL and RCVD_IN_SENDERSCORE_90_100, but someone really doesn't want Pizza Hut email to be delivered. btw I configured DKIMWL to be ignored when training, because these hit many outlook/gmail spam. Separately, is ExtractText broken? I have legitimate invoices that are hitting multiple money rules. Is this the expected behavior? Any advice on how to deal with it? -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Fucking windows! Bring Bill Gates! (Southpark the movie)
Re: DCC/pyzor questions
On Mon, Mar 14, 2022 at 08:15:49PM -0400, Alex wrote: > > How do I generate that signature? I've been unable to find any > instructions on how to do it. https://www.dcc-servers.net/dcc/dcc-tree/dcc.html dccproc -CQ < message Add to /var/dcc/whiteclnt "Hex ctype cksum starts with the string Hex followed a checksum type, and a string of four hexadecimal numbers obtained from a DCC log file or the dccproc(8) command using -CQ. The check- sum type is body, Fuz1, or Fuz2 or one of the preceding checksum types such as env_From." > Same with pyzor? pyzor local_whitelist < message (which updates .pyzor/whitelist) > Do you train marketing emails for bayes? You teach Bayes either ham or spam. It makes no difference if it's "marketing" or not. Just feed it. > Separately, is ExtractText broken? I have legitimate invoices that are > hitting multiple money rules. Is this the expected behavior? Any > advice on how to deal with it? Invoices contain money. ExtractText feeds the content to body rules. What are you expecting to happen? Don't use it if it doesn't fit your profile. Personally I don't think the concept of the plugin is good - body rules are written with the expectation of hitting stuff from email body, not some random attachments (which might even decode to garbage). But it's put out there for you to decide.
DCC/pyzor questions
Hi, I'm seeing a lot of DCC/pyzor mail being marked as spam that shouldn't be, and want to see what can be done to prevent that. For example, many emails with just an image attachment and an empty body are hitting DCC. I thought I recalled a way to create a checksum of these empty messages and add them to an allow list, but it seems it is specific to the sender, based on /var/lib/dcc/testmsg-whitelist: # empty Exchange ok hex fuz1 e038b933 6003e07e 8e990536 110cfa90 How do I generate that signature? I've been unable to find any instructions on how to do it. Same with pyzor? Another example is an email I received from Pizza Hut. Their marketing emails hit DCC and pyzor and sendgrid, making it very difficult for that email to be delivered unless it also hits some negative bayes or is allowlisted. Do people add them to the welcomelist? Do you train marketing emails for bayes? * 1.5 KAM_SENDGRID Sendgrid being exploited by scammers * 0.3 DIGEST_MULTIPLE Message hits more than one network digest check * 1.0 DCC_REPUT_95_98 DCC reputation between 95 and 98 % (mostly spam) * 0.5 KAM_REALLYHUGEIMGSRC RAW: Spam with image tags with ridiculously * huge http urls * 1.4 PYZOR_CHECK Listed in Pyzor * 3.0 BAYES_95 BODY: Bayes spam probability is 95 to 99% * [score: 0.9668] * 0.1 POISEN_SPAM_PILL_3 BODY: random spam to be learned in bayes Is sendgrid still as big of a problem as it was a year ago? There are a few negative rules, like TXREP and DKIMWL_WL and RCVD_IN_SENDERSCORE_90_100, but someone really doesn't want Pizza Hut email to be delivered. Separately, is ExtractText broken? I have legitimate invoices that are hitting multiple money rules. Is this the expected behavior? Any advice on how to deal with it?