Re: Malformed spam email gets through.
On 4 Jan 2018, at 21:13 (-0500), @lbutlr wrote: On 4 Jan 2018, at 11:47, Bill Colewrote: On 3 Jan 2018, at 15:42, @lbutlr wrote: There is no requirement that the right side be globally unique, just that the entire message ID is globally unique. Right. And any software that can use localhost (or any other unqualified name whose meaning is contextually variable) as the right hand side is likely to be doing so on multiple machines that don't know about each other and so generally cannot know that they are not generating duplicate MIDs. Sure, but depending on how the MID is generated it can certainly be statistically unique. As I said earlier, it only takes 256 bits to get an ID within spitting distance of the number of atoms in the universe. Should be unique enough. Not even that. A standard UUID has 122 bits of entropy. To *probably* have *one* collision in that space, you'd need to generate 1 billion UUIDs per second for about 85 years. That should be good enough for naming unique things including email messages for as long as any one person cares, but if you want it more solid you could put a UUID of one of the node-specific versions on one side of the @ and a random UUID on the other: 244 bits, won't collide in any space-time region visible to one observer. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Currently Seeking Steady Work: https://linkedin.com/in/billcole
Re: Malformed spam email gets through.
On 4 Jan 2018, at 11:47, Bill Colewrote: > On 3 Jan 2018, at 15:42, @lbutlr wrote: >> There is no requirement that the right side be globally unique, just that >> the entire message ID is globally unique. > > Right. And any software that can use localhost (or any other unqualified name > whose meaning is contextually variable) as the right hand side is likely to > be doing so on multiple machines that don't know about each other and so > generally cannot know that they are not generating duplicate MIDs. Sure, but depending on how the MID is generated it can certainly be statistically unique. As I said earlier, it only takes 256 bits to get an ID within spitting distance of the number of atoms in the universe. Should be unique enough. > The reason for the RHS=FQDN tradition is to establish a namespace for each > domain whereby global uniqueness can be guaranteed deterministically. OH, I absolutely agree that using the domain for the RHS is a great idea, and there's really no reason not to. But there are other ways. >>> An additional ~1% has a MID header with either no dots or no '@'. >> >> Dots are irrelevant, but the way I read the RFC, ‘@‘ is required. > > See the message I was responding too, which asked about the feasibility of > enforcing a "valid domain" rule. For that, dots are absolutely relevant. My > point, in short, is that doing so may result in 2 orders of magnitude more > rejection of wanted mail than most sites would deem tolerable. Yep. Requiring MIDs to conform to out-of-spec requirements is sure to cause trouble. -- You and me Sunday driving Not arriving
Re: SURBL false positives ratio
On 01/04/2018 02:12 PM, Pedro David Marco wrote: Out of curiosity... how is SUBRL in terms of false positives?? is it a worthy IOC DDBB?? Thanks. --- PedroD My mail filtering volume is high enough that I would have to pay for a feed subscription. I tried out a trial feed about a year ago for a few days and it was horrible for my US-based mail flow. I saw way too many FPs against solid RBLs. I don't even think I would want to use it even if it was free for my mail flow. Disclaimer: Each mail flow will see much different spam so it may work for others. I get good value out of the Invaluement RBL. Combine it with Spamhaus ZEN and that will block the majority of junk. -- David Jones
SURBL false positives ratio
Out of curiosity... how is SUBRL in terms of false positives?? is it a worthy IOC DDBB?? Thanks. ---PedroD
Re: Malformed spam email gets through.
On 3 Jan 2018, at 15:42, @lbutlr wrote: [...] On 03 Jan 2018, at 12:36, Bill Colewrote: About 1.5% of my personal non-spam email over the past 20 years has had "localhost" as the right hand side of the MID. This implies a de facto RFC violation because it poses a real risk of duplication. There is no requirement that the right side be globally unique, just that the entire message ID is globally unique. Right. And any software that can use localhost (or any other unqualified name whose meaning is contextually variable) as the right hand side is likely to be doing so on multiple machines that don't know about each other and so generally cannot know that they are not generating duplicate MIDs. The reason for the RHS=FQDN tradition is to establish a namespace for each domain whereby global uniqueness can be guaranteed deterministically. An additional ~1% has a MID header with either no dots or no '@'. Dots are irrelevant, but the way I read the RFC, ‘@‘ is required. See the message I was responding too, which asked about the feasibility of enforcing a "valid domain" rule. For that, dots are absolutely relevant. My point, in short, is that doing so may result in 2 orders of magnitude more rejection of wanted mail than most sites would deem tolerable.
Re: Question about BAYES_999
>> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc >> 6508 247134 16925929 >> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep >> BAYES_999|grep BAYES_99\"|wc >> 6508 247134 16925929 >> > > You need that last grep for BAYES_99 to be a "grep -v" and it needs some > delimiter after the "99" to disinguish it from "999" like an equals sign > since that is how amavis outputs it's rule hits and score. > > Jan 4 06:41:59 mail02 amavis[15124]: (15124-14) Passed SPAM > {RelayedTaggedInbound}, [203.246.167.14]:63669 [203.246.167.14] >-> , Queue-ID: C193E4A5F78C, > Message-ID: <9d14f53b-e8f9-186d-339d-aece00029...@zeilcar.net>, mail_id: > pDEMud2MEZKg, Hits: 55.731, size: 9691, queued_as: 9C5CD4A5F795, 1328 ms, > Tests: [BAYES_999=0.2,... > > Note the "BAYES_999=0.2" above would make your grep look like this: > > # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep -v > BAYES_99=|wc Ugh, yes, sorry. This was the result of pasting the wrong line while experimenting. My separator is a quote. This is actually more precise now, as the logging separates the rules into tests, tests_ham and tests_spam: # cat /var/log/maillog|grep timestamp|grep BAYES_99|perl -p -e 's|.*tests\":\[(.*)\],\"tests_ham.*|$1|'|grep BAYES_999\"|grep -v BAYES_99\" results with nothing printed. ... ing in your area","subject_rot13":"EBPXL, n ybfg QBT, vf zvffvat va lbhe nern","tests":["BAYES_99","BAYES_999","DCC_CHECK","DKIM_SIGNED","DKIM_VALID","DKIM_VALID_AU","HTML_IMAGE_RATIO_04","HTML_MESSAGE","MIME_HTML_ONLY","RCVD_IN_DNSWL_NONE","RCVD_IN_HOSTKARMA_W","RCVD_IN_SENDERSCORE_90_100","RELAYCOUNTRY_US","SPF_HELO_PASS","SPF_PASS","TXREP","T_DMARC_TESTS_PASS","T_REMOTE_IMAGE","T_RP_MATCHES_RCVD"],"tests_ham":["RCVD_IN_HOSTKARMA_W","RCVD_IN_SENDERSCORE_90_100","DKIM_VALID_AU","DKIM_VALID","T_RP_MATCHES_RCVD","TXREP","SPF_HELO_PASS","SPF_PASS","RCVD_IN_DNSWL_NONE"],"tests_spam":["BAYES_99","MIME_HTML_ONLY","HTML_IMAGE_RATIO_04","DCC_CHECK","BAYES_999","DKIM_SIGNED","T_DMARC_TESTS_PASS","RELAYCOUNTRY_US","T_REMOTE_IMAGE","HTML_MESSAGE"],"time_iso_week_date":"2018-W01-4","time_unix":1515042197.974,"to_addr":["mor...@example.com"],"type":"amavis"}
Re: Question about BAYES_999
On 01/04/2018 11:20 AM, RW wrote: On Thu, 4 Jan 2018 10:40:49 -0600 David Jones wrote: On 01/04/2018 10:04 AM, RW wrote: Are you sure that's right? It's a radically different frequency from 0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you think they are and not something like My production MailScanner instance has a highly tuned MTA in front of it so SA doesn't see as much spam. The amavis instance is intentionally open to more spam to collect for the nightly masscheck processing. That's not obviously relevant since I was referring to the frequency of mails missing BAYES_99 within emails hitting BAYES_999. ... rules: meta test FOO has dependency 'BAYES_999' with a zero score If I had BAYES_99 set to a zero score, it would never show up in my logs. As I said, they are bogus warnings. I think it's a known issue. I have BAYES_999 scored and I get 3 such matches per spamd restart using your grep patterns. Your 4 seem highly suspicious. Until you manually check those 4, or retry with better grep patterns, you don't really know what's happening. I understand what you are saying now. I will track down those 4 and report back. -- David Jones
Re: Question about BAYES_999
On Thu, 4 Jan 2018 10:40:49 -0600 David Jones wrote: > On 01/04/2018 10:04 AM, RW wrote: > > Are you sure that's right? It's a radically different frequency from > > 0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you > > think they are and not something like > > > > My production MailScanner instance has a highly tuned MTA in front of > it so SA doesn't see as much spam. The amavis instance is > intentionally open to more spam to collect for the nightly masscheck > processing. That's not obviously relevant since I was referring to the frequency of mails missing BAYES_99 within emails hitting BAYES_999. > > ... rules: meta test FOO has dependency 'BAYES_999' with a zero > > score > > If I had BAYES_99 set to a zero score, it would never show up in my > logs. As I said, they are bogus warnings. I think it's a known issue. I have BAYES_999 scored and I get 3 such matches per spamd restart using your grep patterns. Your 4 seem highly suspicious. Until you manually check those 4, or retry with better grep patterns, you don't really know what's happening.
Re: Question about BAYES_999
On 01/04/2018 10:58 AM, Alex wrote: Hi, I am seeing this problem on my MailScanner filters as well: # grep BAYES_999 maillog-20171231 | wc -l 9172 # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l 4 # rpm -q amavisd-new amavisd-new-2.11.0-4.fc25.noarch # rpm -q perl perl-5.24.3-389.fc25.x86_64 This is with the JSON logging enabled so my grep is a bit different. This is also with an SVN spamassassin snapshot from about two weeks ago. This is also with bayes stored in mysql. # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc 6508 247134 16925929 # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep BAYES_99\"|wc 6508 247134 16925929 You need that last grep for BAYES_99 to be a "grep -v" and it needs some delimiter after the "99" to disinguish it from "999" like an equals sign since that is how amavis outputs it's rule hits and score. Jan 4 06:41:59 mail02 amavis[15124]: (15124-14) Passed SPAM {RelayedTaggedInbound}, [203.246.167.14]:63669 [203.246.167.14]-> , Queue-ID: C193E4A5F78C, Message-ID: <9d14f53b-e8f9-186d-339d-aece00029...@zeilcar.net>, mail_id: pDEMud2MEZKg, Hits: 55.731, size: 9691, queued_as: 9C5CD4A5F795, 1328 ms, Tests: [BAYES_999=0.2,... Note the "BAYES_999=0.2" above would make your grep look like this: # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep -v BAYES_99=|wc Please let me know if there's anything further I can do to help. -- David Jones
Re: Question about BAYES_999
Hi, >>> I am seeing this problem on my MailScanner filters as well: >>> >>> # grep BAYES_999 maillog-20171231 | wc -l >>> 9172 >>> # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l >>> 4 # rpm -q amavisd-new amavisd-new-2.11.0-4.fc25.noarch # rpm -q perl perl-5.24.3-389.fc25.x86_64 This is with the JSON logging enabled so my grep is a bit different. This is also with an SVN spamassassin snapshot from about two weeks ago. This is also with bayes stored in mysql. # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc 6508 247134 16925929 # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep BAYES_99\"|wc 6508 247134 16925929 Please let me know if there's anything further I can do to help.
Re: Question about BAYES_999
On 01/04/2018 10:04 AM, RW wrote: On Thu, 4 Jan 2018 08:02:48 -0600 David Jones wrote: On 01/04/2018 04:46 AM, Matus UHLAR - fantomas wrote: On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote: I haven't redefined these rules from what I can tell by searching my local rules. I would think that if I had done this, then there would be consistent non-hits of BAYES_99 with BAYES_999 all of the time. This is only happening a small percentage of the time. On 02.01.18 15:39, @lbutlr wrote: Checking my mail I see an incidence rate of this of about 0.5%, which matches the rate you posted earlier. amavis? I am seeing this problem on my MailScanner filters as well: # grep BAYES_999 maillog-20171231 | wc -l 9172 # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l 4 Are you sure that's right? It's a radically different frequency from 0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you think they are and not something like My production MailScanner instance has a highly tuned MTA in front of it so SA doesn't see as much spam. The amavis instance is intentionally open to more spam to collect for the nightly masscheck processing. ... rules: meta test FOO has dependency 'BAYES_999' with a zero score If I had BAYES_99 set to a zero score, it would never show up in my logs. I get some bogus warnings like this. You need something to make sure it's a result line and some boundary checks like \bBAYES_99\b might help too. MailScanner log output looks like this: Dec 31 07:30:46 smtp2i.ena.net MailScanner[26137]: Message 8902A148068E.ACC23 from 106.10.241.143 (novak5...@att.net) to k12tn.net is spam, SpamAssassin (not cached, score=35.679, required 4, autolearn=spam, BAYES_99 5.20, BAYES_999 0.20, DCC_CHECK 2.20, DIGEST_MULTIPLE 0.29, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, DMARC_NONE 0.01, ... Pretty sure my grep'ing above is good. I can't reproduce the problem using spamc/spamd from 3.4.1 with perl 5.24.3 on FreeBSD 11.1 with Berkeley DB. I don't have any missing BAYES_99 hits in my corpus headers and to check it's not a recent bug I rescanned ~5k spams and checked the logs. I appreciate you checking to help us determine how widespread this issue is and possibly narrow it down. Here is my SA setup that is common to both my amavis and MailScanner: SA 3.4.1 Shared Bayes DB in Redis Share the same local configs for custom rules Since we have at least one other person on this list reporting the same problem, I don't think my local custom rules are the problem. -- David Jones
Re: Question about BAYES_999
On 01/04/2018 04:46 AM, Matus UHLAR - fantomas wrote: On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote: I haven't redefined these rules from what I can tell by searching my local rules. I would think that if I had done this, then there would be consistent non-hits of BAYES_99 with BAYES_999 all of the time. This is only happening a small percentage of the time. On 02.01.18 15:39, @lbutlr wrote: Checking my mail I see an incidence rate of this of about 0.5%, which matches the rate you posted earlier. amavis? I am seeing this problem on my MailScanner filters as well: # grep BAYES_999 maillog-20171231 | wc -l 9172 # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l 4 I have a temporary fix in place with a meta rule: metaMISSING_BAYES_99BAYES_999 && !BAYES_99 describeMISSING_BAYES_99BAYES_99 should always hit with BAYES_999 but sometimes it doesn't. score MISSING_BAYES_994.2 -- David Jones
Re: Question about BAYES_999
On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote: I haven't redefined these rules from what I can tell by searching my local rules. I would think that if I had done this, then there would be consistent non-hits of BAYES_99 with BAYES_999 all of the time. This is only happening a small percentage of the time. On 02.01.18 15:39, @lbutlr wrote: Checking my mail I see an incidence rate of this of about 0.5%, which matches the rate you posted earlier. amavis? -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. The early bird may get the worm, but the second mouse gets the cheese.