Re: Open source (WAS: Spam rule for HTTP/HTTPS request to sender's root domain)
Here ya go ;) https://github.com/mikernet/HttpCheckDnsServer On 3/21/2019 5:42 AM, Tom Hendrikx wrote: On 20-03-19 19:56, Mike Marynowski wrote: A couple people asked about me posting the code/service so they could run it on their own systems but I'm currently leaning away from that. I don't think there is any benefit to doing that instead of just utilizing the centralized service. The whole thing works better if everyone using it queries a central service and helps avoid people easily making bad mistakes like the one above and then spending hours scrambling to try to find non-existent botnet infections on their network while mail bounces because they are on a blocklisted :( If someone has a good reason for making the service locally installable let me know though, haha. When people are interested in seeing the code, their main incentive for such a request is probably not that they want to run it themselves. They might, in no particular order: - would like to learn from what you're doing - would like to see how you're treating their contributed data - would like to verify the listing policy that you're proposing - would like to study if there could be better criteria for listing/unlisting than the ones currently available - change things to the software and contribute that back for the benefit of everyone - squash bugs that you're currently might be missing - help out on further development of the service if or when your time is limited - don't be depending on a single person to maintain a service they like This is called open source, and it's a good thing. For details on the philosophy behind it, http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ is a good read. In short: if you like your project to prosper, put it on github for everyone to see. Kind regards, Tom
Re: Open source (WAS: Spam rule for HTTP/HTTPS request to sender's root domain)
Perhaps I should have been clearer - I'm not against posting the code for any reason and I am planning to do that anyway in case anyone wants to look at it or chip in improvements and whatnot. I'm an active contributor on many open source projects and I have fully embraces OSS :) I was more asking if there is a good reason to build packages intended for local installation by email server operators and I don't think there really is. There's a fundamental difference in how the project would be setup if it was intended to be installed by all email server operators, i.e. writing a config file loader instead of hardcoding values, allowing more flexibility, building packages for different operating systems, etc. What I'm saying is I don't think I will be officially supporting that route as it seems more beneficial to collaborate on a central database, though people are obviously free to do with the code as they wish. Cheers! Mike On 3/21/2019 5:42 AM, Tom Hendrikx wrote: On 20-03-19 19:56, Mike Marynowski wrote: A couple people asked about me posting the code/service so they could run it on their own systems but I'm currently leaning away from that. I don't think there is any benefit to doing that instead of just utilizing the centralized service. The whole thing works better if everyone using it queries a central service and helps avoid people easily making bad mistakes like the one above and then spending hours scrambling to try to find non-existent botnet infections on their network while mail bounces because they are on a blocklisted :( If someone has a good reason for making the service locally installable let me know though, haha. When people are interested in seeing the code, their main incentive for such a request is probably not that they want to run it themselves. They might, in no particular order: - would like to learn from what you're doing - would like to see how you're treating their contributed data - would like to verify the listing policy that you're proposing - would like to study if there could be better criteria for listing/unlisting than the ones currently available - change things to the software and contribute that back for the benefit of everyone - squash bugs that you're currently might be missing - help out on further development of the service if or when your time is limited - don't be depending on a single person to maintain a service they like This is called open source, and it's a good thing. For details on the philosophy behind it, http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ is a good read. In short: if you like your project to prosper, put it on github for everyone to see. Kind regards, Tom
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Continuing to fine-tune this service - thank you to everyone testing it. Some updates were pushed out yesterday: * Initial new domain "grace period" reduced to 8 minutes (down from 15 mins) - 4 attempts are made within this time to get a valid HTTP response * Mozilla browser spoofing is implemented to avoid problems with websites that block HttpClient requests * Fixes to NXDOMAIN negative result caching appear to be working well now Some lessons learned in the meantime as well. Turns out that letting the HTTP test run though an email server IP is a terrible idea as it will put the IP on some blocklists for attempting to make HTTP connections to botnet command & control honeypot servers if someone happens to query one of those domains, LOL. A couple people asked about me posting the code/service so they could run it on their own systems but I'm currently leaning away from that. I don't think there is any benefit to doing that instead of just utilizing the centralized service. The whole thing works better if everyone using it queries a central service and helps avoid people easily making bad mistakes like the one above and then spending hours scrambling to try to find non-existent botnet infections on their network while mail bounces because they are on a blocklisted :( If someone has a good reason for making the service locally installable let me know though, haha.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Thank you! I have no idea how I missed that... On 3/13/2019 7:11 PM, RW wrote: On Wed, 13 Mar 2019 17:40:57 -0400 Mike Marynowski wrote: Can someone help me form the correct SOA record in my DNS responses to ensure the NXDOMAIN responses get cached properly? Based on the logs I don't think downstream DNS servers are caching it as requests for the same valid HTTP domains keep hitting the service instead of being cached for 4 days. ... Based on random sampling of responses from other DNS servers this seems correct to me. Nothing I'm reading indicates that TTL factors into the negative caching but is it possible servers are only caching the negative response for 15 mins because of the TTL on the SOA record, using the smaller value between that and the default TTL? I believe so, from RFC 2308: 3 - Negative Answers from Authoritative Servers Name servers authoritative for a zone MUST include the SOA record of the zone in the authority section of the response when reporting an NXDOMAIN or indicating that no data of the requested type exists. This is required so that the response may be cached. The TTL of this record is set from the minimum of the MINIMUM field of the SOA record and the TTL of the SOA itself, and indicates how long a resolver may cache the negative answer.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Can someone help me form the correct SOA record in my DNS responses to ensure the NXDOMAIN responses get cached properly? Based on the logs I don't think downstream DNS servers are caching it as requests for the same valid HTTP domains keep hitting the service instead of being cached for 4 days. From what I understand, if you want to cache an NXDOMAIN response then you need to include an SOA record with the response and DNS servers should use the min/default TTL value as a negative cache hint. My NXDOMAIN responses currently look like this: HEADER: opcode = QUERY, id = 27, rcode = NXDOMAIN header flags: response, want recursion, recursion avail. questions = 1, answers = 0, authority records = 1, additional = 0 QUESTIONS: www.singulink.com.httpcheck.singulink.com, type = A, class = IN AUTHORITY RECORDS: -> httpcheck.singulink.com ttl = 900 (15 mins) primary name server = httpcheck.singulink.com responsible mail addr = admin.singulink.com serial = 4212294798 refresh = 172800 (2 days) retry = 86400 (1 day) expire = 2592000 (30 days) default TTL = 345600 (4 days) Based on random sampling of responses from other DNS servers this seems correct to me. Nothing I'm reading indicates that TTL factors into the negative caching but is it possible servers are only caching the negative response for 15 mins because of the TTL on the SOA record, using the smaller value between that and the default TTL?
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Any HTTP status code 400 or higher is treated as no valid website on the domain. I see a considerable amount of spam that returns 5xx codes so at this point I don't plan on changing that behavior. 503 is supposed to indicate a temporary condition so this seems like an abuse of the error code. On 3/13/2019 2:21 PM, Jari Fredriksson wrote: What would it result for this: I have a couple domains that do not have any services for the root domain name. How ever, the server the A points do have a web server that acts as a reverse proxy for many subdomains that will be served a web page. A http 503 is returned by the pound reverse for the root domains.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Back up after some extensive modifications. Setting the DNS request timeout to 30 seconds is no longer necessary - the service instantly responds to queries. In order to prevent mail delivery issues if the website is having technical issues the first time a domain is seen by the service, it will instantly return a response that it is a valid domain (NXDOMAIN) with a 15 minute TTL. It will then queue up testing of this domain in the background and automatically keep retrying every few minutes if HTTP contact fails. After 15 minutes of failed HTTP contact, the DNS service will begin responding with an invalid domain response (127.0.0.1), exponentially increasing TTLs and time between background checks until it reaches about 17 hours between checks. The service automatically run checks in the background for all domains queried within the last 30 days and instantly responds to DNS queries with the cached result. If a web server goes down, has technical issues, etc...it will still be reported as a valid domain for approximately 4 days after the last successful HTTP contact while being continually being checked in the background, so temporary issues won't affect mail delivery. On 3/11/2019 7:18 PM, RW wrote: It doesn't seem to be working. Is it gone? $ dig +norecurse @ns1.singulink.com hwvyuprmjpdrws.com.httpcheck.singulink.com ; <<>> DiG 9.11.0-P5 <<>> +norecurse @ns1.singulink.com hwvyuprmjpdrws.com.httpcheck.singulink.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 57443 ;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ...
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Does SpamAssassin even have facilities to do that? Don't all rules run all the time? SpamAssassin still needs to run all the rules because MTAs might have different spam mark / spam delete /etc thresholds than the one set in SA. The number of cycles you're talking about is the same as an RBL lookup so I really don't see it as being significant. The DNS service does all the heavy lifting and I'm planning to make it public. On 3/1/2019 5:09 PM, Rupert Gallagher wrote: Case study: example.com bans any e-mail sent from its third levels up, and does it by spf. spf-banned.example.com sent mail, and my SA at server.com adds a big fat penalty, high enough to bounch it. Suppose I do not bounch it, and use your filter to check for its websites. It turns out that both example.com and spf-banned.example.com have a website. Was it worth it to spend cycles on it? I guess not. The spf is an accepted rfc and it should have priority. So, I recommend the website test to first read the result of the SPF test, quit when positive, continue otherwise. --- ruga On 3/1/2019 5:09 PM, Rupert Gallagher wrote: Case study: example.com bans any e-mail sent from its third levels up, and does it by spf. spf-banned.example.com sent mail, and my SA at server.com adds a big fat penalty, high enough to bounch it. Suppose I do not bounch it, and use your filter to check for its websites. It turns out that both example.com and spf-banned.example.com have a website. Was it worth it to spend cycles on it? I guess not. The spf is an accepted rfc and it should have priority. So, I recommend the website test to first read the result of the SPF test, quit when positive, continue otherwise. --- ruga On Fri, Mar 1, 2019 at 22:31, Grant Taylor <mailto:gtay...@tnetconsulting.net>> wrote: On 02/28/2019 09:39 PM, Mike Marynowski wrote: > I modified it so it checks the root domain and all subdomains up to the > email domain. :-) > As for your question - if afraid.org has a website then you are correct, > all subdomains of afraid.org will not flag this rule, but if lots of > afraid.org subdomains are sending spam then I imagine other spam > detection methods will have a good chance of catching it. ACK afraid.org is much like DynDNS in that one entity (afaid.org themselves or DynDNS) provide DNS services for other entities. I don't see a good way to differentiate between the sets of entities. > I'm not sure what you mean by "working up the tree" - if afraid.org has > a website and I work my way up the tree then either way eventually I'll > hit afraid.org and get a valid website, no? True. I wonder if there is any value in detecting zone boundaries via not going any higher up the tree past the zone that's containing the email domain(s). Perhaps something like that would enable differentiation between Afraid & DynDNS and the entities that they are hosting DNS services for. (Assuming that there are separate zones. > My current implementation fires off concurrent HTTP requests to the root > domain and all subdomains up to the email domain and waits for a valid > answer from any of them. ACK s/up to/down to/ I don't grok the value of doing this as well as you do. But I think your use case is enough different than mine such that I can't make an objective value estimate. That being said, I do find the idea technically interesting, even if I think I'll not utilize it. -- Grant. . . . unix || die
Re: Spam rule for HTTP/HTTPS request to sender's root domain
On 3/1/2019 4:31 PM, Grant Taylor wrote: afraid.org is much like DynDNS in that one entity (afaid.org themselves or DynDNS) provide DNS services for other entities. I don't see a good way to differentiate between the sets of entities. I haven't come across any notable amount of spam that's punched through all the other detection methods in place with a reply-to/from email address subdomain on a service like that. I'm sure it happens though and in that case this filter simply won't add any value.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
On 3/1/2019 1:07 PM, RW wrote: Sure, but had it turned-out that most of these domains didn't have the A record necessary for your HTTP test, it wouldn't have been worth doing anything more complicated. I've noticed a lot of the spam domains appear to point to actual web servers but throw 403 or 503 errors, which A records wouldn't help with and has been taken into account here. As for being "more complicated" - it's basically done and running in my test environment for final tweaking haha, so bit late now :P It was only a day's work to put everything together including the DNS service and caching layer, so meh. Unless you mean complicated in the sense that it's more technically complicated as opposed to effort wise. You don't need an A record for email. The last time I looked it just tests that there's enough DNS for a bounce to be received, so an A or MX for the sender domain. I'm confusing different tests here, you can disregard my previous message.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Sorry, I meant I thought it was doing those checks because I know I was playing with checking A records before and figured the rules would have it enabled by default...I tried to find the rules after I sent that message and realized that was related to sender domain A record checks done in my MTA. On 3/1/2019 2:26 PM, Antony Stone wrote: On Friday 01 March 2019 at 17:37:18, Mike Marynowski wrote: Quick sampling of 10 emails: 8 of them have valid A records on the email domain. I presumed SpamAssassin was already doing simple checks like that. That doesn't sound like a good idea to me (presuming, I mean). Antony.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Looking for an A record on what - just the email address domain or the chain of parent domains as well? If the latter, well a lack of A record will cause this to fail so it's kind of embedded in. Quick sampling of 10 emails: 8 of them have valid A records on the email domain. I presumed SpamAssassin was already doing simple checks like that. On 3/1/2019 10:23 AM, RW wrote: On Wed, 27 Feb 2019 12:16:20 -0500 Mike Marynowski wrote: Almost all of the spam emails that are coming through do not have a working website at the room domain of the sender. Did you establish what fraction of this spam could be caught just by looking for an A record?
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Changing up the algorithm a bit. Once a domain has been added to the cache, the DNS service will perform HTTP checks in the background automatically on a much more aggressive schedule for invalid domains so that temporary website problems are much less of an issue and invalid domains don't delay mail delivery threads for up to 15s after TTL expirations during the initial test period with progressively increasing TTLs - queries can always return instantly after the first one, as long as the domain has been queried in the last 30 days and is still in cache. Domains deemed to have "invalid" websites will be rechecked much more aggressively in the background to ensure newly queried domains with temporary website issues stop tripping this filter as soon as possible. There will be a "sliding window" of a few days where temporary website issues during the window won't cause the filter to trip, it just needs to provide a valid response sometime during the sliding window to stay in good standing.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
For anyone who wants to play around with this, the DNS service has been posted. You can test the existence of a website on a domain or any of its parent domains by making DNS queries as follows: subdomain.domain.com.httpcheck.singulink.com So, if you wanted to check if mail1.mx.google.com or any of its parent domains have a website, you would do a DNS query with a 30 second timeout for: mail1.mx.google.com.httpcheck.singulink.com This will check the following domains for a valid HTTP response within 15 seconds: mail1.mx.google.com mx.google.com google.com If a valid HTTP response comes back then the DNS query will return NXDOMAIN with a 7 day TTL. If no valid HTTP response comes back then the DNS query will return 127.0.0.1 with progressively increasing TTLs: #1: 2 mins #2: 4 mins #3: 6 mins #4: 8 mins #5: 10 mins #6: 20 mins #7: 30 mins #8: 40 mins #9: 50 mins #10: 1 hour #11: 2 hours #12+: add 2 hours extra for each attempt up to 24h max As long as an invalid domain has been queried in the last 7 days, it will remain cached and any further invalid attempts will continue to progressively increase the TTL according to the rules above. If a domain doesn't get queried for 7 days then it drops out of the cache and its invalid attempt counter is reset. A valid HTTP response will reset the domains invalid counter and a 7 day TTL is returned. Once a domain is in the cache, responses are immediate until the TTL runs out and the domain is rechecked again.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
You'll be able to decide how you want to prioritize the fields - I've implemented it as a DNS server, so which domain you decide to send to the DNS server is entirely up to you. On 2/28/2019 10:23 PM, Grant Taylor wrote: On 2/28/19 9:33 AM, Mike Marynowski wrote: I'm doing grabs the first available address in this order: reply-to, from, sender. That sounds like it might be possible to game things by playing with the order. I'm not sure what sorts of validations are applied to the Sender: header. (I don't remember if DMARC checks the Sender: header or not.) How would your filter respond if the MAIL FROM: and the From: header were set to something that didn't have a website, yet had a Sender: header with @gmail.com listed before the Reply-To: and From: headers?
Re: Spam rule for HTTP/HTTPS request to sender's root domain
I modified it so it checks the root domain and all subdomains up to the email domain. As for your question - if afraid.org has a website then you are correct, all subdomains of afraid.org will not flag this rule, but if lots of afraid.org subdomains are sending spam then I imagine other spam detection methods will have a good chance of catching it. I'm not sure what you mean by "working up the tree" - if afraid.org has a website and I work my way up the tree then either way eventually I'll hit afraid.org and get a valid website, no? My current implementation fires off concurrent HTTP requests to the root domain and all subdomains up to the email domain and waits for a valid answer from any of them. On 2/28/2019 10:27 PM, Grant Taylor wrote: What about domains that have many client subdomains? afraid.org (et al) come to mind. You might end up allowing email from spammer.afraid.org who doesn't have a website because the parent afraid.org does have a website. I would think that checking from the child and working up the tree would be more accurate, even if it may take longer.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
I'm pretty sure the way I ended up implementing it everything is working fine and it's nice and simple and clean but maybe there's some edge case that doesn't work properly. If there is I haven't found it yet, so if you can think of one let me know. Since I'm sending an HTTP request to all subdomains simultaneously it doesn't really matter if I go one further than the actual root domain. A "co.uk" request will come back with no website so there's no need to special handle it. For example, if the email address being tested is b...@mail1.mx.stuff.co.uk, an HTTP request goes out to: mail1.mx.stuff.co.uk mx.stuff.co.uk stuff.co.uk co.uk The last one will always be cached from a previous .co.uk address lookup so it won't actually be sent out anyway. If any of them respond with a valid website then an OK result is returned. On 2/28/2019 3:24 PM, Luis E. Muñoz wrote: This is more complicated than it seems. I have the t-shirt to prove it. I suggest you look at the Mozilla Public Suffix List at https://publicsuffix.org/ — it was created for different purposes, but I believe it maps well enough to my understanding of your use case. You'll be able to pad the gaps using a custom list. Best regards -lem
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Thunderbird normally shows reply-to in normal messages...is this something that some MUAs ignore just on mailing list emails or all emails? Because I see reply-to on plenty of other emails. On 2/28/2019 3:44 PM, Bill Cole wrote: On 28 Feb 2019, at 14:29, Mike Marynowski wrote: Unfortunately I don't see a reply-to header on your messages. What do you have it set to? I thought mailing lists see who is in the "to" section of a reply so that 2 copies aren't sent out. The "mailing list ethics" guide I read said to always use "reply all" and the mailing list system takes care of not sending duplicate replies. I removed your direct email from this reply and only kept the mailing list address, but for the record I don't see any reply-to headers. But it's right there in the copy that the list delivered to me: From: "Bill Cole" To: users@spamassassin.apache.org Subject: Re: Spam rule for HTTP/HTTPS request to sender's root domain Date: Thu, 28 Feb 2019 14:21:41 -0500 Reply-To: users@spamassassin.apache.org Whether you see it is a function of how your MUA (TBird, it seems... ) displays messages. Unfortunately, it has become common for MUAs simply ignore Reply-To. I didn't think TBird was in that class.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
There are many ways to determine what the root domain is. One way is analyzing the DNS response from the query to realize it's actually a root domain, or you can just grab the ICANN TLD list and use that to make a determination. What I'm probably going to do now that I'm building this as a cached DNS service is just walk up the subdomains until I hit the root domain and if any of them have a website then it's fine. On 2/28/2019 2:39 PM, Antony Stone wrote: On Thursday 28 February 2019 at 20:33:42, Mike Marynowski wrote: But scconsult.com does in fact have a website so I'm not sure what you mean. This method checks the *root* domain, not the subdomain. How do you identify the root domain, given an email address? For example, for many years in the UK, it was possible to get something.co.uk or something.org.uk (and maybe something.net.uk), but now it is also possible to get something.uk So, I'm just wondering how you determine what the "root" domain for a given email address is. Antony.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
But scconsult.com does in fact have a website so I'm not sure what you mean. This method checks the *root* domain, not the subdomain. Even if this wasn't the case well, it is what it is. Emails from this mailing list (and most well configured lists) come in at a spam score of -6, so they are no risk of being blocked even if a non-website domain triggers this particular rule. On 2/28/2019 2:25 PM, Bill Cole wrote: On 28 Feb 2019, at 13:43, Mike Marynowski wrote: On 2/28/2019 12:41 PM, Bill Cole wrote: You should probably put the envelope sender (i.e. the SA "EnvelopeFrom" pseudo-header) into that list, maybe even first. That will make many messages sent via discussion mailing lists (such as this one) pass your test where a test of real header domains would fail, while it it is more likely to cause commercial bulk mail to fail where it would usually pass based on real standard headers. (That's based on a hunch, not testing.) Can you clarify why you think my currently proposed headers would fail with the mailing list? As far as I can tell, all the messages I've received from this mailing list would pass just fine. As an example from the emails in this list, which header value specifically would cause it to fail? If I did not explicitly set the Reply-To header, this message would be delivered without one. The domain part of the From header on messages I post to this and other mailing lists has no website and never will.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Unfortunately I don't see a reply-to header on your messages. What do you have it set to? I thought mailing lists see who is in the "to" section of a reply so that 2 copies aren't sent out. The "mailing list ethics" guide I read said to always use "reply all" and the mailing list system takes care of not sending duplicate replies. I removed your direct email from this reply and only kept the mailing list address, but for the record I don't see any reply-to headers. On 2/28/2019 2:21 PM, Bill Cole wrote: Please respect my consciously set Reply-To header. I don't ever need 2 copies of a message posted to a mailing list, and ignoring that header is rude. On 28 Feb 2019, at 13:28, Mike Marynowski wrote: On 2/28/2019 12:41 PM, Bill Cole wrote: You should probably put the envelope sender (i.e. the SA "EnvelopeFrom" pseudo-header) into that list, maybe even first. That will make many messages sent via discussion mailing lists (such as this one) pass your test where a test of real header domains would fail, while it it is more likely to cause commercial bulk mail to fail where it would usually pass based on real standard headers. (That's based on a hunch, not testing.) Hmmm. I'll have to give some more thought into the exact headers it decides to test. I'm not sure if my MTA puts in envelope info into the SA request or not. For sake of simplicity right now I might just ignore mailing lists, I don't know. What I do know is that in the spam messages I'm reviewing right now, the reply-to / from headers set often don't have websites at those domains and none of them are masquerading as mailing lists. I haven't thought through the situation with mailing lists yet. I'm new to this whole SA plugin dev process - can you suggest the best way to log the full requests that SA receives so I can see what info it is getting and what I have to work with? The best way to see far too much information about what SA is doing is to add a "-D all" to the invocation of the spamassassin script. You can also add that to the flags used by spamd, if you want to punish your logging subsystem
Re: Spam rule for HTTP/HTTPS request to sender's root domain
On 2/28/2019 12:41 PM, Bill Cole wrote: You should probably put the envelope sender (i.e. the SA "EnvelopeFrom" pseudo-header) into that list, maybe even first. That will make many messages sent via discussion mailing lists (such as this one) pass your test where a test of real header domains would fail, while it it is more likely to cause commercial bulk mail to fail where it would usually pass based on real standard headers. (That's based on a hunch, not testing.) Can you clarify why you think my currently proposed headers would fail with the mailing list? As far as I can tell, all the messages I've received from this mailing list would pass just fine. As an example from the emails in this list, which header value specifically would cause it to fail?
Re: Spam rule for HTTP/HTTPS request to sender's root domain
On 2/28/2019 12:41 PM, Bill Cole wrote: You should probably put the envelope sender (i.e. the SA "EnvelopeFrom" pseudo-header) into that list, maybe even first. That will make many messages sent via discussion mailing lists (such as this one) pass your test where a test of real header domains would fail, while it it is more likely to cause commercial bulk mail to fail where it would usually pass based on real standard headers. (That's based on a hunch, not testing.) Hmmm. I'll have to give some more thought into the exact headers it decides to test. I'm not sure if my MTA puts in envelope info into the SA request or not. For sake of simplicity right now I might just ignore mailing lists, I don't know. What I do know is that in the spam messages I'm reviewing right now, the reply-to / from headers set often don't have websites at those domains and none of them are masquerading as mailing lists. I haven't thought through the situation with mailing lists yet. I'm new to this whole SA plugin dev process - can you suggest the best way to log the full requests that SA receives so I can see what info it is getting and what I have to work with?
Re: Spam rule for HTTP/HTTPS request to sender's root domain
You know what I mean. *Many (not all) of the rules (rDNS verification, hostname check, SPF records, etc) are easy to circumvent but we still check all that. Those simple checks still manage to catch a surprising amount of spam. I could just not publish this and keep it for myself and I'm sure that would make it more effective long term for me, but I figured I would contribute it so that others can gain some benefit from it. If it doesn't become widespread and SpamAssassin isn't interested in embedding it directly into their rule checks then that's fine by me, I'm not going to cry about it...more spam catching for me and whoever decides to install the plugin on their own servers. If it does become widespread and some spammers adapt then I'll take solace in knowing I helped a lot of people stop at least some of their spam. * Mike Marynowski: Everything we test for is easily compromised on its own. That's quite a sweeping statement, and I disagree. IP-based real time blacklists, anyone? Also, "we" is too unspecific. In addition to the stock rules, I happen to maintain a set of custom tests which are neither published nor easily circumvented. They have proven pretty effective for us. -Ralph
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Why even use a test for something that is so easily compromised? -Ralph Everything we test for is easily compromised on its own.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
And the cat and mouse game continues :) That said, all the big obvious "email-only domains" that send out newsletters and notifications and such that I've come across in my sampling already have placeholder websites or redirects to their main websites configured. I'm sure that's not always the case but the data I have indicates that's the exception and not the rule. On 2/28/2019 11:37 AM, Ralph Seichter wrote: * Antony Stone: Each to their own. Of course. Alas, if this gets widely adopted, we'll probably have to set up placeholder websites (as will spammers, I'm sure). -Ralph
Re: Spam rule for HTTP/HTTPS request to sender's root domain
I would not do it at all, caching or no caching. Personally, I don't see a benefit trying to correlate email with a website, as mentioned before, based on how we utilise email-only-domains. -Ralph Fair enough. Based on the sampling I've done and the way I intend to use this, I still see this as a net benefit. If you're running an email-only domain then you're probably doing some pretty email intensive stuff and you should be well-configured enough to the point where a nudge in the score shouldn't put you over the spam threshold. If you're a spammer just trying to make quick use of a domain and the spam score is already quite high but not quite over then this can tip the score over into marking it as spam.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Question though - what is your reply-to address set to in the emails coming from your email-only domain? The domain checking I'm doing grabs the first available address in this order: reply-to, from, sender. It's not using the domain of the SMTP server. I did come across some email-only domain SENDERS in my sampling, but the overwhelming majority of reply-to addresses pointed to emails with HTTP servers on their domains. On 2/28/2019 11:14 AM, Ralph Seichter wrote: * Grant Taylor: Why would you do it per email? I would think that you would do the test and cache the results for some amount of time. I would not do it at all, caching or no caching. Personally, I don't see a benefit trying to correlate email with a website, as mentioned before, based on how we utilise email-only-domains. -Ralph
Re: Spam rule for HTTP/HTTPS request to sender's root domain
Just one more note - I've excluded .email domains from the check as I've noticed several organizations using that as email only domains. Right now the test plugin I've built makes a single HTTP request for each email while I evaluate this but I'll be building a DNS query endpoint or a local domain cache to make it more efficient before putting it into production.
Re: Spam rule for HTTP/HTTPS request to sender's root domain
I've tested this with good results and I'm actually not creating any HTTPS connections - what I've found is a single HTTP request with zero redirections is enough. If it returns a status code >= 400 then you treat it like no valid website, and if you get a < 400 result (i.e. a 301/302 redirect or a 200 ok) then you can treat it like a valid website. You don't even need to receive the body of the HTTP result, you can quit after seeing the status. And yes, as a 100% ban rule this is obviously a bad idea. As a score modifier I think it would be highly effective. I found several "email only" domains in my sampling but all the big ones still had landing pages at the root domain saying "this domain is only used for serving email" or similar. I'm sure there are exceptions and some people will have email only domains, but that's why we don't put 100% confidence into any one rule. On 2/27/2019 7:57 PM, Grant Taylor wrote: On 02/27/2019 03:25 PM, Ralph Seichter wrote: We use some of our domains specifically for email, with no associated website. I agree that /requiring/ a website at one of the parent domains (stopping before traversing into the Public Suffix List) is problematic and prone to false positives. There /may/ be some value to /some/ people in doing such a check and altering the spam score. (See below.) Besides, I think the overhead to establish a HTTPS connection for every incoming email would be prohibitive. Why would you do it per email? I would think that you would do the test and cache the results for some amount of time. There is a reason most whitelist/blacklist services use "cheap" DNS queries instead. I wonder if there is a way to hack DNS into doing this for us. I.e. a custom DNS ""server (BIND's DLZ comes to mind) that can perform the test(s) and fabricate an answer that could then be cached. ""Publish these answers in a new zone / domain name, and treat it like another RBL. Meaning a query goes to the new RBL server, which does the necessary $MAGIC to return an answer (possibly NXDOMAIN if there is a site and 127.0.0.1 if there is no site) which can be cached by standard local / recursive DNS servers.
Spam rule for HTTP/HTTPS request to sender's root domain
Hi everyone, I haven't been able to find any existing spam rules or checks that do this, but from my analysis of ham/spam I'm getting I think this would be a really great addition. Almost all of the spam emails that are coming through do not have a working website at the room domain of the sender. Of the 100 last legitimate email domains that have sent me mail, 100% of them have working websites at the root domain. As far as I can tell there isn't currently a way to build a rule that does this and a Perl plugin would have to be created. Is this an accurate assessment? Can you recommend some good resources for building a SpamAssassin plugin if this is the case? Thanks!