Re: Open source (WAS: Spam rule for HTTP/HTTPS request to sender's root domain)

2019-03-21 Thread Mike Marynowski

Here ya go ;)

https://github.com/mikernet/HttpCheckDnsServer

On 3/21/2019 5:42 AM, Tom Hendrikx wrote:

On 20-03-19 19:56, Mike Marynowski wrote:

A couple people asked about me posting the code/service so they could
run it on their own systems but I'm currently leaning away from that. I
don't think there is any benefit to doing that instead of just utilizing
the centralized service. The whole thing works better if everyone using
it queries a central service and helps avoid people easily making bad
mistakes like the one above and then spending hours scrambling to try to
find non-existent botnet infections on their network while mail bounces
because they are on a blocklisted :( If someone has a good reason for
making the service locally installable let me know though, haha.

When people are interested in seeing the code, their main incentive for
such a request is probably not that they want to run it themselves. They
might, in no particular order:

- would like to learn from what you're doing
- would like to see how you're treating their contributed data
- would like to verify the listing policy that you're proposing
- would like to study if there could be better criteria for
listing/unlisting than the ones currently available
- change things to the software and contribute that back for the
benefit of everyone
- squash bugs that you're currently might be missing
- help out on further development of the service if or when your time is
limited
- don't be depending on a single person to maintain a service they like

This is called open source, and it's a good thing. For details on the
philosophy behind it,
http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ is
a good read.

In short: if you like your project to prosper, put it on github for
everyone to see.

Kind regards,

Tom






Re: Open source (WAS: Spam rule for HTTP/HTTPS request to sender's root domain)

2019-03-21 Thread Mike Marynowski
Perhaps I should have been clearer - I'm not against posting the code 
for any reason and I am planning to do that anyway in case anyone wants 
to look at it or chip in improvements and whatnot.


I'm an active contributor on many open source projects and I have fully 
embraces OSS :) I was more asking if there is a good reason to build 
packages intended for local installation by email server operators and I 
don't think there really is. There's a fundamental difference in how the 
project would be setup if it was intended to be installed by all email 
server operators, i.e. writing a config file loader instead of 
hardcoding values, allowing more flexibility, building packages for 
different operating systems, etc. What I'm saying is I don't think I 
will be officially supporting that route as it seems more beneficial to 
collaborate on a central database, though people are obviously free to 
do with the code as they wish.


Cheers!

Mike

On 3/21/2019 5:42 AM, Tom Hendrikx wrote:

On 20-03-19 19:56, Mike Marynowski wrote:

A couple people asked about me posting the code/service so they could
run it on their own systems but I'm currently leaning away from that. I
don't think there is any benefit to doing that instead of just utilizing
the centralized service. The whole thing works better if everyone using
it queries a central service and helps avoid people easily making bad
mistakes like the one above and then spending hours scrambling to try to
find non-existent botnet infections on their network while mail bounces
because they are on a blocklisted :( If someone has a good reason for
making the service locally installable let me know though, haha.

When people are interested in seeing the code, their main incentive for
such a request is probably not that they want to run it themselves. They
might, in no particular order:

- would like to learn from what you're doing
- would like to see how you're treating their contributed data
- would like to verify the listing policy that you're proposing
- would like to study if there could be better criteria for
listing/unlisting than the ones currently available
- change things to the software and contribute that back for the
benefit of everyone
- squash bugs that you're currently might be missing
- help out on further development of the service if or when your time is
limited
- don't be depending on a single person to maintain a service they like

This is called open source, and it's a good thing. For details on the
philosophy behind it,
http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ is
a good read.

In short: if you like your project to prosper, put it on github for
everyone to see.

Kind regards,

Tom






Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-20 Thread Mike Marynowski
Continuing to fine-tune this service - thank you to everyone testing it. 
Some updates were pushed out yesterday:


 * Initial new domain "grace period" reduced to 8 minutes (down from 15 
mins) - 4 attempts are made within this time to get a valid HTTP response
 * Mozilla browser spoofing is implemented to avoid problems with 
websites that block HttpClient requests

 * Fixes to NXDOMAIN negative result caching appear to be working well now

Some lessons learned in the meantime as well. Turns out that letting the 
HTTP test run though an email server IP is a terrible idea as it will 
put the IP on some blocklists for attempting to make HTTP connections to 
botnet command & control honeypot servers if someone happens to query 
one of those domains, LOL.


A couple people asked about me posting the code/service so they could 
run it on their own systems but I'm currently leaning away from that. I 
don't think there is any benefit to doing that instead of just utilizing 
the centralized service. The whole thing works better if everyone using 
it queries a central service and helps avoid people easily making bad 
mistakes like the one above and then spending hours scrambling to try to 
find non-existent botnet infections on their network while mail bounces 
because they are on a blocklisted :( If someone has a good reason for 
making the service locally installable let me know though, haha.




Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-15 Thread Mike Marynowski

Thank you! I have no idea how I missed that...

On 3/13/2019 7:11 PM, RW wrote:

On Wed, 13 Mar 2019 17:40:57 -0400
Mike Marynowski wrote:


Can someone help me form the correct SOA record in my DNS responses
to ensure the NXDOMAIN responses get cached properly? Based on the
logs I don't think downstream DNS servers are caching it as requests
for the same valid HTTP domains keep hitting the service instead of
being cached for 4 days.

...

Based on random sampling of responses from other DNS servers this
seems correct to me. Nothing I'm reading indicates that TTL factors
into the negative caching but is it possible servers are only caching
the negative response for 15 mins because of the TTL on the SOA
record, using the smaller value between that and the default TTL?

I believe so, from RFC 2308:

3 - Negative Answers from Authoritative Servers

Name servers authoritative for a zone MUST include the SOA record of
the zone in the authority section of the response when reporting an
NXDOMAIN or indicating that no data of the requested type exists.
This is required so that the response may be cached.  The TTL of this
record is set from the minimum of the MINIMUM field of the SOA record
and the TTL of the SOA itself, and indicates how long a resolver may
cache the negative answer.





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-13 Thread Mike Marynowski
Can someone help me form the correct SOA record in my DNS responses to 
ensure the NXDOMAIN responses get cached properly? Based on the logs I 
don't think downstream DNS servers are caching it as requests for the 
same valid HTTP domains keep hitting the service instead of being cached 
for 4 days.


From what I understand, if you want to cache an NXDOMAIN response then 
you need to include an SOA record with the response and DNS servers 
should use the min/default TTL value as a negative cache hint. My 
NXDOMAIN responses currently look like this:


    HEADER:
    opcode = QUERY, id = 27, rcode = NXDOMAIN
    header flags:  response, want recursion, recursion avail.
    questions = 1,  answers = 0,  authority records = 1, additional = 0

    QUESTIONS:
    www.singulink.com.httpcheck.singulink.com, type = A, class = IN
    AUTHORITY RECORDS:
    ->  httpcheck.singulink.com
    ttl = 900 (15 mins)
    primary name server = httpcheck.singulink.com
    responsible mail addr = admin.singulink.com
    serial  = 4212294798
    refresh = 172800 (2 days)
    retry   = 86400 (1 day)
    expire  = 2592000 (30 days)
    default TTL = 345600 (4 days)

Based on random sampling of responses from other DNS servers this seems 
correct to me. Nothing I'm reading indicates that TTL factors into the 
negative caching but is it possible servers are only caching the 
negative response for 15 mins because of the TTL on the SOA record, 
using the smaller value between that and the default TTL?




Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-13 Thread Mike Marynowski
Any HTTP status code 400 or higher is treated as no valid website on the 
domain. I see a considerable amount of spam that returns 5xx codes so at 
this point I don't plan on changing that behavior. 503 is supposed to 
indicate a temporary condition so this seems like an abuse of the error 
code.


On 3/13/2019 2:21 PM, Jari Fredriksson wrote:

What would it result for this:

I have a couple domains that do not have any services for the root domain name. 
How ever, the server the A points do have a web server that acts as a reverse 
proxy for many subdomains that will be served a web page. A http 503 is 
returned by the pound reverse for the root domains.




Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-13 Thread Mike Marynowski

Back up after some extensive modifications.

Setting the DNS request timeout to 30 seconds is no longer necessary - 
the service instantly responds to queries.


In order to prevent mail delivery issues if the website is having 
technical issues the first time a domain is seen by the service, it will 
instantly return a response that it is a valid domain (NXDOMAIN) with a 
15 minute TTL. It will then queue up testing of this domain in the 
background and automatically keep retrying every few minutes if HTTP 
contact fails. After 15 minutes of failed HTTP contact, the DNS service 
will begin responding with an invalid domain response (127.0.0.1), 
exponentially increasing TTLs and time between background checks until 
it reaches about 17 hours between checks. The service automatically run 
checks in the background for all domains queried within the last 30 days 
and instantly responds to DNS queries with the cached result. If a web 
server goes down, has technical issues, etc...it will still be reported 
as a valid domain for approximately 4 days after the last successful 
HTTP contact while being continually being checked in the background, so 
temporary issues won't affect mail delivery.


On 3/11/2019 7:18 PM, RW wrote:

It doesn't seem to be working. Is it gone?



$ dig +norecurse @ns1.singulink.com hwvyuprmjpdrws.com.httpcheck.singulink.com

; <<>> DiG 9.11.0-P5 <<>> +norecurse @ns1.singulink.com 
hwvyuprmjpdrws.com.httpcheck.singulink.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 57443
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
...





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-01 Thread Mike Marynowski
Does SpamAssassin even have facilities to do that? Don't all rules run 
all the time? SpamAssassin still needs to run all the rules because MTAs 
might have different spam mark / spam delete /etc thresholds than the 
one set in SA.


The number of cycles you're talking about is the same as an RBL lookup 
so I really don't see it as being significant. The DNS service does all 
the heavy lifting and I'm planning to make it public.


On 3/1/2019 5:09 PM, Rupert Gallagher wrote:

Case study:

example.com bans any e-mail sent from its third levels up, and does it 
by spf.


spf-banned.example.com sent mail, and my SA at server.com adds a big 
fat penalty, high enough to bounch it.


Suppose I do not bounch it, and use your filter to check for its 
websites. It turns out that both example.com and 
spf-banned.example.com have a website. Was it worth it to spend cycles 
on it? I guess not. The spf is an accepted rfc and it should have 
priority. So, I recommend the website test to first read the result of 
the SPF test, quit when positive, continue otherwise.


--- ruga


On 3/1/2019 5:09 PM, Rupert Gallagher wrote:

Case study:

example.com bans any e-mail sent from its third levels up, and does it 
by spf.


spf-banned.example.com sent mail, and my SA at server.com adds a big 
fat penalty, high enough to bounch it.


Suppose I do not bounch it, and use your filter to check for its 
websites. It turns out that both example.com and 
spf-banned.example.com have a website. Was it worth it to spend cycles 
on it? I guess not. The spf is an accepted rfc and it should have 
priority. So, I recommend the website test to first read the result of 
the SPF test, quit when positive, continue otherwise.


--- ruga



On Fri, Mar 1, 2019 at 22:31, Grant Taylor <mailto:gtay...@tnetconsulting.net>> wrote:

On 02/28/2019 09:39 PM, Mike Marynowski wrote:
> I modified it so it checks the root domain and all subdomains up to the
> email domain.

:-)

> As for your question - if afraid.org has a website then you are 
correct,

> all subdomains of afraid.org will not flag this rule, but if lots of
> afraid.org subdomains are sending spam then I imagine other spam
> detection methods will have a good chance of catching it.

ACK

afraid.org is much like DynDNS in that one entity (afaid.org themselves
or DynDNS) provide DNS services for other entities.

I don't see a good way to differentiate between the sets of entities.

> I'm not sure what you mean by "working up the tree" - if afraid.org has
> a website and I work my way up the tree then either way eventually I'll
> hit afraid.org and get a valid website, no?

True.

I wonder if there is any value in detecting zone boundaries via not
going any higher up the tree past the zone that's containing the email
domain(s).

Perhaps something like that would enable differentiation between Afraid
& DynDNS and the entities that they are hosting DNS services for.
(Assuming that there are separate zones.

> My current implementation fires off concurrent HTTP requests to the 
root

> domain and all subdomains up to the email domain and waits for a valid
> answer from any of them.

ACK

s/up to/down to/

I don't grok the value of doing this as well as you do. But I think
your use case is enough different than mine such that I can't make an
objective value estimate.

That being said, I do find the idea technically interesting, even if I
think I'll not utilize it.



--
Grant. . . .
unix || die








Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-01 Thread Mike Marynowski



On 3/1/2019 4:31 PM, Grant Taylor wrote:
afraid.org is much like DynDNS in that one entity (afaid.org 
themselves or DynDNS) provide DNS services for other entities.


I don't see a good way to differentiate between the sets of entities.


I haven't come across any notable amount of spam that's punched through 
all the other detection methods in place with a reply-to/from email 
address subdomain on a service like that. I'm sure it happens though and 
in that case this filter simply won't add any value.




Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-01 Thread Mike Marynowski

On 3/1/2019 1:07 PM, RW wrote:

Sure, but had it turned-out that most of these domains didn't have the A
record necessary for your HTTP test, it wouldn't have been worth doing
anything more complicated.


I've noticed a lot of the spam domains appear to point to actual web 
servers but throw 403 or 503 errors, which A records wouldn't help with 
and has been taken into account here. As for being "more complicated" - 
it's basically done and running in my test environment for final 
tweaking haha, so bit late now :P It was only a day's work to put 
everything together including the DNS service and caching layer, so meh. 
Unless you mean complicated in the sense that it's more technically 
complicated as opposed to effort wise.



You don't need an A record for email. The last time I looked it just
tests that there's enough DNS for a bounce to be received, so an A or
MX for the sender domain.


I'm confusing different tests here, you can disregard my previous message.



Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-01 Thread Mike Marynowski
Sorry, I meant I thought it was doing those checks because I know I was 
playing with checking A records before and figured the rules would have 
it enabled by default...I tried to find the rules after I sent that 
message and realized that was related to sender domain A record checks 
done in my MTA.


On 3/1/2019 2:26 PM, Antony Stone wrote:

On Friday 01 March 2019 at 17:37:18, Mike Marynowski wrote:


Quick sampling of 10 emails: 8 of them have valid A records on the email
domain. I presumed SpamAssassin was already doing simple checks like that.

That doesn't sound like a good idea to me (presuming, I mean).


Antony.





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-01 Thread Mike Marynowski
Looking for an A record on what - just the email address domain or the 
chain of parent domains as well? If the latter, well a lack of A record 
will cause this to fail so it's kind of embedded in.


Quick sampling of 10 emails: 8 of them have valid A records on the email 
domain. I presumed SpamAssassin was already doing simple checks like that.


On 3/1/2019 10:23 AM, RW wrote:

On Wed, 27 Feb 2019 12:16:20 -0500
Mike Marynowski wrote:

Almost all of the spam emails that are
coming through do not have a working website at the room domain of
the sender.

Did you establish what fraction of this spam could be caught just by
looking for an A record?





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-03-01 Thread Mike Marynowski
Changing up the algorithm a bit. Once a domain has been added to the 
cache, the DNS service will perform HTTP checks in the background 
automatically on a much more aggressive schedule for invalid domains so 
that temporary website problems are much less of an issue and invalid 
domains don't delay mail delivery threads for up to 15s after TTL 
expirations during the initial test period with progressively increasing 
TTLs - queries can always return instantly after the first one, as long 
as the domain has been queried in the last 30 days and is still in cache.


Domains deemed to have "invalid" websites will be rechecked much more 
aggressively in the background to ensure newly queried domains with 
temporary website issues stop tripping this filter as soon as possible. 
There will be a "sliding window" of a few days where temporary website 
issues during the window won't cause the filter to trip, it just needs 
to provide a valid response sometime during the sliding window to stay 
in good standing.




Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
For anyone who wants to play around with this, the DNS service has been 
posted. You can test the existence of a website on a domain or any of 
its parent domains by making DNS queries as follows:


subdomain.domain.com.httpcheck.singulink.com

So, if you wanted to check if mail1.mx.google.com or any of its parent 
domains have a website, you would do a DNS query with a 30 second 
timeout for:


mail1.mx.google.com.httpcheck.singulink.com

This will check the following domains for a valid HTTP response within 
15 seconds:


mail1.mx.google.com
mx.google.com
google.com

If a valid HTTP response comes back then the DNS query will return 
NXDOMAIN with a 7 day TTL. If no valid HTTP response comes back then the 
DNS query will return 127.0.0.1 with progressively increasing TTLs:


#1: 2 mins
#2: 4 mins
#3: 6 mins
#4: 8 mins
#5: 10 mins
#6: 20 mins
#7: 30 mins
#8: 40 mins
#9: 50 mins
#10: 1 hour
#11: 2 hours
#12+: add 2 hours extra for each attempt up to 24h max

As long as an invalid domain has been queried in the last 7 days, it 
will remain cached and any further invalid attempts will continue to 
progressively increase the TTL according to the rules above. If a domain 
doesn't get queried for 7 days then it drops out of the cache and its 
invalid attempt counter is reset. A valid HTTP response will reset the 
domains invalid counter and a 7 day TTL is returned. Once a domain is in 
the cache, responses are immediate until the TTL runs out and the domain 
is rechecked again.




Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
You'll be able to decide how you want to prioritize the fields - I've 
implemented it as a DNS server, so which domain you decide to send to 
the DNS server is entirely up to you.


On 2/28/2019 10:23 PM, Grant Taylor wrote:

On 2/28/19 9:33 AM, Mike Marynowski wrote:
I'm doing grabs the first available address in this order: reply-to, 
from, sender.


That sounds like it might be possible to game things by playing with 
the order.


I'm not sure what sorts of validations are applied to the Sender: 
header.  (I don't remember if DMARC checks the Sender: header or not.)


How would your filter respond if the MAIL FROM: and the From: header 
were set to something that didn't have a website, yet had a Sender: 
header with @gmail.com listed before the Reply-To: and 
From: headers?









Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
I modified it so it checks the root domain and all subdomains up to the 
email domain.


As for your question - if afraid.org has a website then you are correct, 
all subdomains of afraid.org will not flag this rule, but if lots of 
afraid.org subdomains are sending spam then I imagine other spam 
detection methods will have a good chance of catching it.


I'm not sure what you mean by "working up the tree" - if afraid.org has 
a website and I work my way up the tree then either way eventually I'll 
hit afraid.org and get a valid website, no?


My current implementation fires off concurrent HTTP requests to the root 
domain and all subdomains up to the email domain and waits for a valid 
answer from any of them.


On 2/28/2019 10:27 PM, Grant Taylor wrote:

What about domains that have many client subdomains?

afraid.org (et al) come to mind.

You might end up allowing email from spammer.afraid.org who doesn't 
have a website because the parent afraid.org does have a website.


I would think that checking from the child and working up the tree 
would be more accurate, even if it may take longer.









Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
I'm pretty sure the way I ended up implementing it everything is working 
fine and it's nice and simple and clean but maybe there's some edge case 
that doesn't work properly. If there is I haven't found it yet, so if 
you can think of one let me know.


Since I'm sending an HTTP request to all subdomains simultaneously it 
doesn't really matter if I go one further than the actual root domain. A 
"co.uk" request will come back with no website so there's no need to 
special handle it. For example, if the email address being tested is 
b...@mail1.mx.stuff.co.uk, an HTTP request goes out to:


mail1.mx.stuff.co.uk
mx.stuff.co.uk
stuff.co.uk
co.uk

The last one will always be cached from a previous .co.uk address lookup 
so it won't actually be sent out anyway. If any of them respond with a 
valid website then an OK result is returned.


On 2/28/2019 3:24 PM, Luis E. Muñoz wrote:

This is more complicated than it seems. I have the t-shirt to prove it.

I suggest you look at the Mozilla Public Suffix List at 
https://publicsuffix.org/ — it was created for different purposes, but 
I believe it maps well enough to my understanding of your use case. 
You'll be able to pad the gaps using a custom list.


Best regards

-lem





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
Thunderbird normally shows reply-to in normal messages...is this 
something that some MUAs ignore just on mailing list emails or all 
emails? Because I see reply-to on plenty of other emails.


On 2/28/2019 3:44 PM, Bill Cole wrote:

On 28 Feb 2019, at 14:29, Mike Marynowski wrote:

Unfortunately I don't see a reply-to header on your messages. What do 
you have it set to? I thought mailing lists see who is in the "to" 
section of a reply so that 2 copies aren't sent out. The "mailing 
list ethics" guide I read said to always use "reply all" and the 
mailing list system takes care of not sending duplicate replies.


I removed your direct email from this reply and only kept the mailing 
list address, but for the record I don't see any reply-to headers.


But it's right there in the copy that the list delivered to me:

From: "Bill Cole" 
To: users@spamassassin.apache.org
Subject: Re: Spam rule for HTTP/HTTPS request to sender's root domain
Date: Thu, 28 Feb 2019 14:21:41 -0500
Reply-To: users@spamassassin.apache.org

Whether you see it is a function of how your MUA (TBird, it seems... ) 
displays messages. Unfortunately, it has become common for MUAs simply 
ignore Reply-To. I didn't think TBird was in that class.





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
There are many ways to determine what the root domain is. One way is 
analyzing the DNS response from the query to realize it's actually a 
root domain, or you can just grab the ICANN TLD list and use that to 
make a determination.


What I'm probably going to do now that I'm building this as a cached DNS 
service is just walk up the subdomains until I hit the root domain and 
if any of them have a website then it's fine.


On 2/28/2019 2:39 PM, Antony Stone wrote:

On Thursday 28 February 2019 at 20:33:42, Mike Marynowski wrote:


But scconsult.com does in fact have a website so I'm not sure what you
mean. This method checks the *root* domain, not the subdomain.

How do you identify the root domain, given an email address?

For example, for many years in the UK, it was possible to get something.co.uk
or something.org.uk (and maybe something.net.uk), but now it is also possible
to get something.uk

So, I'm just wondering how you determine what the "root" domain for a given
email address is.


Antony.






Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
But scconsult.com does in fact have a website so I'm not sure what you 
mean. This method checks the *root* domain, not the subdomain.


Even if this wasn't the case well, it is what it is. Emails from this 
mailing list (and most well configured lists) come in at a spam score of 
-6, so they are no risk of being blocked even if a non-website domain 
triggers this particular rule.


On 2/28/2019 2:25 PM, Bill Cole wrote:

On 28 Feb 2019, at 13:43, Mike Marynowski wrote:


On 2/28/2019 12:41 PM, Bill Cole wrote:
You should probably put the envelope sender (i.e. the SA 
"EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
will make many messages sent via discussion mailing lists (such as 
this one) pass your test where a test of real header domains would 
fail, while it it is more likely to cause commercial bulk mail to 
fail where it would usually pass based on real standard headers. 
(That's based on a hunch, not testing.)
Can you clarify why you think my currently proposed headers would 
fail with the mailing list? As far as I can tell, all the messages 
I've received from this mailing list would pass just fine. As an 
example from the emails in this list, which header value specifically 
would cause it to fail?


If I did not explicitly set the Reply-To header, this message would be 
delivered without one. The domain part of the From header on messages 
I post to this and other mailing lists has no website and never will.






Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
Unfortunately I don't see a reply-to header on your messages. What do 
you have it set to? I thought mailing lists see who is in the "to" 
section of a reply so that 2 copies aren't sent out. The "mailing list 
ethics" guide I read said to always use "reply all" and the mailing list 
system takes care of not sending duplicate replies.


I removed your direct email from this reply and only kept the mailing 
list address, but for the record I don't see any reply-to headers.


On 2/28/2019 2:21 PM, Bill Cole wrote:
Please respect my consciously set Reply-To header. I don't ever need 2 
copies of a message posted to a mailing list, and ignoring that header 
is rude.


On 28 Feb 2019, at 13:28, Mike Marynowski wrote:


On 2/28/2019 12:41 PM, Bill Cole wrote:
You should probably put the envelope sender (i.e. the SA 
"EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
will make many messages sent via discussion mailing lists (such as 
this one) pass your test where a test of real header domains would 
fail, while it it is more likely to cause commercial bulk mail to 
fail where it would usually pass based on real standard headers. 
(That's based on a hunch, not testing.)


Hmmm. I'll have to give some more thought into the exact headers it 
decides to test. I'm not sure if my MTA puts in envelope info into 
the SA request or not. For sake of simplicity right now I might just 
ignore mailing lists, I don't know. What I do know is that in the 
spam messages I'm reviewing right now, the reply-to / from headers 
set often don't have websites at those domains and none of them are 
masquerading as mailing lists. I haven't thought through the 
situation with mailing lists yet.


I'm new to this whole SA plugin dev process - can you suggest the 
best way to log the full requests that SA receives so I can see what 
info it is getting and what I have to work with?


The best way to see far too much information about what SA is doing is 
to add a "-D all" to the invocation of the spamassassin script. You 
can also add that to the flags used by spamd, if you want to punish 
your logging subsystem







Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski

On 2/28/2019 12:41 PM, Bill Cole wrote:
You should probably put the envelope sender (i.e. the SA 
"EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
will make many messages sent via discussion mailing lists (such as 
this one) pass your test where a test of real header domains would 
fail, while it it is more likely to cause commercial bulk mail to fail 
where it would usually pass based on real standard headers. (That's 
based on a hunch, not testing.)
Can you clarify why you think my currently proposed headers would fail 
with the mailing list? As far as I can tell, all the messages I've 
received from this mailing list would pass just fine. As an example from 
the emails in this list, which header value specifically would cause it 
to fail?


Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski

On 2/28/2019 12:41 PM, Bill Cole wrote:
You should probably put the envelope sender (i.e. the SA 
"EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
will make many messages sent via discussion mailing lists (such as 
this one) pass your test where a test of real header domains would 
fail, while it it is more likely to cause commercial bulk mail to fail 
where it would usually pass based on real standard headers. (That's 
based on a hunch, not testing.)


Hmmm. I'll have to give some more thought into the exact headers it 
decides to test. I'm not sure if my MTA puts in envelope info into the 
SA request or not. For sake of simplicity right now I might just ignore 
mailing lists, I don't know. What I do know is that in the spam messages 
I'm reviewing right now, the reply-to / from headers set often don't 
have websites at those domains and none of them are masquerading as 
mailing lists. I haven't thought through the situation with mailing 
lists yet.


I'm new to this whole SA plugin dev process - can you suggest the best 
way to log the full requests that SA receives so I can see what info it 
is getting and what I have to work with?




Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
You know what I mean. *Many (not all) of the rules (rDNS verification, 
hostname check, SPF records, etc) are easy to circumvent but we still 
check all that. Those simple checks still manage to catch a surprising 
amount of spam.


I could just not publish this and keep it for myself and I'm sure that 
would make it more effective long term for me, but I figured I would 
contribute it so that others can gain some benefit from it.


If it doesn't become widespread and SpamAssassin isn't interested in 
embedding it directly into their rule checks then that's fine by me, I'm 
not going to cry about it...more spam catching for me and whoever 
decides to install the plugin on their own servers. If it does become 
widespread and some spammers adapt then I'll take solace in knowing I 
helped a lot of people stop at least some of their spam.

* Mike Marynowski:


Everything we test for is easily compromised on its own.

That's quite a sweeping statement, and I disagree. IP-based real time
blacklists, anyone? Also, "we" is too unspecific. In addition to the
stock rules, I happen to maintain a set of custom tests which are
neither published nor easily circumvented. They have proven pretty
effective for us.

-Ralph





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski




Why even use a test for something that is so easily compromised?
-Ralph


Everything we test for is easily compromised on its own.



Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski

And the cat and mouse game continues :)

That said, all the big obvious "email-only domains" that send out 
newsletters and notifications and such that I've come across in my 
sampling already have placeholder websites or redirects to their main 
websites configured. I'm sure that's not always the case but the data I 
have indicates that's the exception and not the rule.


On 2/28/2019 11:37 AM, Ralph Seichter wrote:

* Antony Stone:


Each to their own.

Of course. Alas, if this gets widely adopted, we'll probably have to set
up placeholder websites (as will spammers, I'm sure).

-Ralph





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski




I would not do it at all, caching or no caching. Personally, I don't see
a benefit trying to correlate email with a website, as mentioned before,
based on how we utilise email-only-domains.

-Ralph


Fair enough. Based on the sampling I've done and the way I intend to use 
this, I still see this as a net benefit. If you're running an email-only 
domain then you're probably doing some pretty email intensive stuff and 
you should be well-configured enough to the point where a nudge in the 
score shouldn't put you over the spam threshold. If you're a spammer 
just trying to make quick use of a domain and the spam score is already 
quite high but not quite over then this can tip the score over into 
marking it as spam.




Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
Question though - what is your reply-to address set to in the emails 
coming from your email-only domain?


The domain checking I'm doing grabs the first available address in this 
order: reply-to, from, sender. It's not using the domain of the SMTP 
server. I did come across some email-only domain SENDERS in my sampling, 
but the overwhelming majority of reply-to addresses pointed to emails 
with HTTP servers on their domains.


On 2/28/2019 11:14 AM, Ralph Seichter wrote:

* Grant Taylor:


Why would you do it per email? I would think that you would do the
test and cache the results for some amount of time.

I would not do it at all, caching or no caching. Personally, I don't see
a benefit trying to correlate email with a website, as mentioned before,
based on how we utilise email-only-domains.

-Ralph





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
Just one more note - I've excluded .email domains from the check as I've 
noticed several organizations using that as email only domains.


Right now the test plugin I've built makes a single HTTP request for 
each email while I evaluate this but I'll be building a DNS query 
endpoint or a local domain cache to make it more efficient before 
putting it into production.





Re: Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-28 Thread Mike Marynowski
I've tested this with good results and I'm actually not creating any 
HTTPS connections - what I've found is a single HTTP request with zero 
redirections is enough. If it returns a status code >= 400 then you 
treat it like no valid website, and if you get a < 400 result (i.e. a 
301/302 redirect or a 200 ok) then you can treat it like a valid 
website. You don't even need to receive the body of the HTTP result, you 
can quit after seeing the status.


And yes, as a 100% ban rule this is obviously a bad idea. As a score 
modifier I think it would be highly effective.


I found several "email only" domains in my sampling but all the big ones 
still had landing pages at the root domain saying "this domain is only 
used for serving email" or similar. I'm sure there are exceptions and 
some people will have email only domains, but that's why we don't put 
100% confidence into any one rule.


On 2/27/2019 7:57 PM, Grant Taylor wrote:

On 02/27/2019 03:25 PM, Ralph Seichter wrote:
We use some of our domains specifically for email, with no associated 
website.


I agree that /requiring/ a website at one of the parent domains 
(stopping before traversing into the Public Suffix List) is 
problematic and prone to false positives.


There /may/ be some value to /some/ people in doing such a check and 
altering the spam score.  (See below.)


Besides, I think the overhead to establish a HTTPS connection for 
every incoming email would be prohibitive.


Why would you do it per email?  I would think that you would do the 
test and cache the results for some amount of time.


There is a reason most whitelist/blacklist services use "cheap" DNS 
queries instead.
I wonder if there is a way to hack DNS into doing this for us. I.e. a 
custom DNS ""server (BIND's DLZ comes to mind) that can perform the 
test(s) and fabricate an answer that could then be cached.  ""Publish 
these answers in a new zone / domain name, and treat it like another RBL.


Meaning a query goes to the new RBL server, which does the necessary 
$MAGIC to return an answer (possibly NXDOMAIN if there is a site and 
127.0.0.1 if there is no site) which can be cached by standard local / 
recursive DNS servers.









Spam rule for HTTP/HTTPS request to sender's root domain

2019-02-27 Thread Mike Marynowski

Hi everyone,

I haven't been able to find any existing spam rules or checks that do 
this, but from my analysis of ham/spam I'm getting I think this would be 
a really great addition. Almost all of the spam emails that are coming 
through do not have a working website at the room domain of the sender. 
Of the 100 last legitimate email domains that have sent me mail, 100% of 
them have working websites at the root domain.


As far as I can tell there isn't currently a way to build a rule that 
does this and a Perl plugin would have to be created. Is this an 
accurate assessment? Can you recommend some good resources for building 
a SpamAssassin plugin if this is the case?


Thanks!