Re: Score 0.001

2024-05-12 Thread Greg Troxel
I would suggest that if Debian is modifying the default config from 5 to
6.31, then

  probably they should not be doing that.  as a packager, I fix bugs
  (and file upstream bug reports), but it's usually linuxy
  nonportability things that are clearly bugs (test ==, hardcoded lists
  of accepted operating systems, etc.).  This is a difference in
  judgement.

  if they are applying a difference in judgement, the package
  description should disclose this really clearly.  Hard to tell what's
  going on, but this appears to be new to most people here.
  


Re: Score 0.001

2024-05-11 Thread Greg Troxel
Thomas Barth  writes:

> Am 2024-05-11 21:54, schrieb Bill Cole:
>> I have no idea who the Debian "spam analysts" are but I am certain
>> that they are not doing any sort of data-driven dynamic adjustments
>> of scores based on a threshold of 6.3 nor are they (obviously)
>> adjusting that threshold daily based on current scores.
>
> I found the passage in my old Postfix book. The author writes: "It is
> recommended not to carelessly set the value of $sa_kill_level_deflt to
> any fantasy values. The score of 6.31 is not arbitrarily chosen, but
> the statistically calculated optimum for the best possible spam filter
> rate with as few false positives as possible. If you increase the
> value, more spam will get through; if you lower it, your false
> positives will increase."

The comments about adjustments are true, but the idea that it is optimum
is flat-out nonsensical.

The key question is how you weight a false positive compared to a false
negative.  Only after you decided that can you pick an optimium, for a
given corpus of already-received mail.

> It may be that the value is outdated, but that is for the maintainers
> of the relevant Debian package to decide. I'll just adapt my rules to
> this one value.

That is an odd position.  It is very easy to set the threshold in a
local config.   Deciding instead to adjust scores to an oddball
threshold seems bizarre to me.

Personally, I don't use the 5, but instead have shades of grey, where
>=1 is binned into mailboxes that are "maybe spam" through "very likely"
spam, and at some score, I reject at the MTA level.

I find that legit mail shows up in e.g. spam.2 (>= 2 and < 3), but it is
almost never mail that I would be upset to have missed (but I don't) or
mail that I would be upset to not get in a timely manner (I only see it
every day or so).  However, this really drops the FN rate of spam in my
INBOX, which matters a lot to me.Basically I consider a FP into my
"spam.1" mailbox, as long as it isn't really important to me, to be not
a big deal at all, and I'd rather have 10 or those than 1 FN in my
INBOX.  But, actually MTA-rejecting mail that I shouldn't, a FP at that
level, is a big deal, and I avoid it.  I think it's about one message a
year -- and while it's ham, it's very spammy ham.


Re: Defining what the default welcomelist means

2024-04-14 Thread Greg Troxel
Bill Cole  writes:

> On 2024-04-12 at 18:56:15 UTC-0400 (Fri, 12 Apr 2024 18:56:15 -0400)
> Greg Troxel 
>
>> Bill Cole  writes:
>>
>>> 1. We serve our users: receivers, not senders. Senders claiming FPs
>>> need the support of a corroborating would-be receiver.
>>
>> Agreed.  Or maybe we take requests to add only from receivers.
>
> Effectively, yes. Senders won't refrain from requesting to be welcomed
> by default just because we say we don't accept those requests. Only
> receivers can corroborate the existence of any FP problem which would
> be solved by a default welcomelist entry, and this isn't a 'just find
> one example' sort of issue.

They won't refrain from writing, but it's fair to not let them open bugs
or have bugs open in the tracker.  And to tell them

  1) clean up your mail

  2) we only take requests for defwl from actual receivers, so we're
  done with this conversation.  use of sock puppets is not ok.

That's what I meant by "not take requests from".

>>> 2. If senders have FPs on objectively legitimate mail, their first and
>>> most important step is to identify WHY SpamAssassin thinks it is
>>> spam. and address that. Do you need the invisible text? Is the message
>>> embedded in a remotely-fetched image? The sea of "" entities in
>>> your messages' HTML serves what purpose exactly? If there's a real FP
>>> problem with some rule that regularly is proved out by RuleQA, open a
>>> bug.
>>
>> Sure, but if you serve receivers, often people will have misfiling and
>> the sender is opaque, even if not spam and dkim.  So saying the sender
>> should fix is misaligned with serving receivers.  Yes, they *should*,
>> but people shouldn't send html mail either :-)
>
> I don't see this as misaligned, but rather a way of saying that def_w*
> entries come behind site-local receiver mitigations and
> receiver/sender collaboration on fixing the shabby mail.

What I was trying to express is that often senders, even zero-spam
senders, are often enormous, opaque, and intractable.  So while I agree
in theory, I guess the real question is whather we want to say to a
receiver:

  your non-spam mail is spammy, and we aren't going to add a defwl
  because first you need to get e.g. Bank of America to stop sending
  html mail.

or

  your non-spam mail is spammy and it's ok to add a defwl

I have occasionally complained to BigCorp and it has never been useful.
Sure, one can get the branch manager to reverse a fee, but I mean one
cannot get them to change their practices.

> One reason I opened this topic is that many existing listings were
> nothing like last resorts to solve concrete problems but seem to be
> more prophylactically applied. I.e. to assure that generally (and
> vaguely) 'good' senders will get their mail through despite using
> pointless antipatterns that are predominantly used by spammers. Maybe
> there's a need for that, but it should not be part of SA proper.

This is a slippery slope.  We're trying to make correct classification
decisions for users.  I can definitely see both sides.

But I don't mean generally/vaguely.  I mean senders that are zero-spam
and likely important to receivers, in the bank/airline notification (and
similar) class.  Meaning something with real-world consequences that is
timely.  Not newsletters.

>> I see all spam classification as probabalistic and there is risk of FP.
>> If a domain emits *only ham* and is dkim signed, and we believe that
>> receivers want it, I think it makes sense to have it in.
>
> I see no point in that if there is no *evidence* of actual FPs. I
> don't think the default rules should try to game local incidents of
> Bayes or AWL dis-learning that ends up hitting banking
> notifications. Or (at the risk of being misinterpreted...) by the use
> of 3rd-party rules like the KAM channel that are much tougher on the
> bad HTML practices of corporate email composers.

FWIW, I have given up on the KAM rules.  The scores are insanely high
for things that appear in ham, and I was having too-frequent
misclassification.  Some of the scores were triggering on things which
are not even objectively spammy, e.g a watch rule on a technical
discussion of clocks where it was on topic and I was subscribed.

Because of the probabalistic nature, I see it as sensible to defwl
things like bank notifications (that are 100% non-spam and dkim) to
reduce the odds that future rules will cause problems.  This is partly
from my KAM ruleset experience where I wake up to misfiled mail because
there is new overly aggressive rule.  Much less likely in SA proper, but
still.

>> I am extremely skeptical of anything that smells of email marketing
>> here.  I would expect only places sending transactional mai

Re: Defining what the default welcomelist means

2024-04-12 Thread Greg Troxel
jdow  writes:

> One pesky detail still exists. There is a very broad fuzzy area where
> my spam is your ham and vice versa. You could probably drive yourself
> to an early grave trying to get the perfect Bayes training plus
> perfect rule set.

spam is bulk and unsolicited.   So yes the same message could be either,
but if a sender spams anyone, they are spammer, even if they send mail
that isn't spam.


Re: Defining what the default welcomelist means

2024-04-12 Thread Greg Troxel
Also, I'm not sure you said this, but I would say:

   default whitelist is dkim only

   This means

 All existing entries are converted to dkim as well as we can, not
 worrying if they break.  We'll prune ones that don't work as dkim,
 and add a signing domain as we figure it out, as a lightweight
 thing.  But all non-dkim entries go away.

 to consider a new entry, it must be dkim

or maybe that's already true


Re: Defining what the default welcomelist means

2024-04-12 Thread Greg Troxel
I see it very slightly differently, but mostly agree

Bill Cole  writes:

> 1. We serve our users: receivers, not senders. Senders claiming FPs
> need the support of a corroborating would-be receiver.

Agreed.  Or maybe we take requests to add only from receivers.

> 2. If senders have FPs on objectively legitimate mail, their first and
> most important step is to identify WHY SpamAssassin thinks it is
> spam. and address that. Do you need the invisible text? Is the message
> embedded in a remotely-fetched image? The sea of "" entities in
> your messages' HTML serves what purpose exactly? If there's a real FP
> problem with some rule that regularly is proved out by RuleQA, open a
> bug.

Sure, but if you serve receivers, often people will have misfiling and
the sender is opaque, even if not spam and dkim.  So saying the sender
should fix is misaligned with serving receivers.  Yes, they *should*,
but people shouldn't send html mail either :-)

I agree that requests from senders should be met with "make your mail
less spammy".

> 3. This is NOT a general-purpose reputation list. It exists to aid SA
> users who have FPs from SpamAssassin default rules for wanted mail,
> where we cannot determine any acceptable adjustment to rules which
> would avoid the problem. It is a "last resort" form of FP mitigation
> when we cannot find an acceptable general solution that isn't
> domain-specific to a widely accepted sender domain.

I see all spam classification as probabalistic and there is risk of FP.
If a domain emits *only ham* and is dkim signed, and we believe that
receivers want it, I think it makes sense to have it in.

I think of things like alerts from banks, airline saying your flight
time has changed, etc. where FPs are a real problem.

I am extremely skeptical of anything that smells of email marketing
here.  I would expect only places sending transactional mail and alerts
to established customers.

> 4. We should only add or remove listings based on specific requests
> backed by transparent evidence. Subversion commit messages are not
> enough, we need a bug report or a mailing list discussion.

sure

> 5. Existing entries are presumed valid unless and until they cause a
> false "ham" classification of spam which can be shared publicly in a
> useful form.

I guess, or if someone makes an argument that they aren't right.

> 6. New entries must pass prolonged RuleQA testing of sender-specific
> rules before being added to the default welcomelist.

I don't follow this.  Do you mean add 'def_welcomelist_dkim foo@bar' to
a testing ruleset and see if it's ok?  That seems fine if so.  If not, I
didn't follow you.


It might also make sense for each welcomelist rule to have a score.
Basically to bring the mail down to -2, to give it some headroom.   But
that might be too complicated compared to benefit.


Re: [UPDATE] Changes to Validity Reputation Data Through DNS

2024-01-18 Thread Greg Troxel
H
Tom Bartel  writes:

> Starting March 1, 2024, we will allow up to 10,000 requests per user over a
> 30-day time period. After the 10,000 requests, users must create a
> MyValidity account to continue using this free service. Upon the creation
> of a MyValidity account, you will receive continued access to queries
> through Spam Assassin


If a person doesn't have an account, what does "user" mean?  If what you
really mean is "1 requests from a given IP address over a 30-day
period" (which seems fine) then just say that.


Re: Question about forwarding email (not specifically SA, pointers greatly appreciated)

2024-01-03 Thread Greg Troxel
Thomas Cameron  writes:

> Yeah, the weird thing is, when I check the forwarded email on GMail, I
> see in the headers that both the original sending email server (call
> it mail.somedomain.com) and the relay server (call it
> mail.myassociation.org) put DKIM signatures in the message.

That's more or less broken in my opinion.   I think an MTA should only
DKIM-sign messages that it is responsible for in the sense of
origination, because it is from an authenticated sender.

> GMail doesn't flag it as "passed" for DKIM. I am looking to see if
> PostSRSd has any sort configuration option to delete the DKIM of the
> original sending server so that it will "pass" DKIM checks.

Not sure why pass is in quotes.   But again if you don't change headers
the original signature should be valid.


Re: Question about forwarding email (not specifically SA, pointers greatly appreciated)

2024-01-03 Thread Greg Troxel
"Thomas Cameron via users"  writes:

> I actually set up SPF, DMARC, and DKIM on the non-profit's email
> server. It works fine if I send email from the server.
>
> The rub is, I want all emails to presid...@example.org to be forwarded
> to presidents_real_addr...@gmail.com. Since the forward happens at
> mail.example.org, the "from" is from some other domain from
> example.org, so it fails all the tests.

You are overlooking that DKIM from the original From: is the
responsibility of that domain and that if you do not modify the message
then it should still pass.  Domains sending without DKIM are going to be
a mess.


Re: Question about forwarding email (not specifically SA, pointers greatly appreciated)

2024-01-02 Thread Greg Troxel
"Thomas Cameron via users"  writes:

> I built email servers for a non-profit I volunteer for. If email comes
> into the server for presid...@myassociation.org, I would normally just
> create an alias in /etc/aliases so that emails to president@ get
> forwarded to the president's "real" email address, say
> presidents_real_em...@gmail.com.
>
> The problem is, when I send email to presid...@myassociation.org,
> gmail rejects the forwarded email because it appears to come from my
> personal domain, not the mythical myassociation.org domain. DKIM,
> DMARC, and SPF all fail, which I totally understand.

Why does DKIM fail?  You said there is an /etc/aliases alias, but you
did not say that you modified the message.  Basically you should never
modify messages.

> How can I make this work? Is there a good way to use something like
> /etc/aliases to forward emails to the domain I manage to another
> recipient? Or is there something better I can do?

I think the advice to set up IMAP and submission is wise.  I realize
this may be a small non-profit, but company mail belongs on company
servers, and personal mail on personal servers.  With IMAP and
submission, your president can have their outgoing email be
presid...@myassociation.org, DKIM signed, with an SPF record, and even
DMARC.  If someone writes and gets a reply from a random gmail account,
that is at best confusing.


Re: proper use of internal_networks?

2023-12-07 Thread Greg Troxel
"Dan Mahoney (Gushi)"  writes:

> Hey there all,
>
> Recently, we noticed that one of our system's "cron" mails started
> getting caught by our spam filter (because it had lots of hostnames in
> it about failed ssh logins, which the uribl plugin didn't like).
>
> This system is listed (v4 and v6) in trusted_networks -- and it sends
> it straight to our MX host via v6.  (no SMTP auth)
>
> We're getting a warning about "unparseable relay", but I think that's
> just the DMA [freebsd's default mailer] throwing it off:
>
> Received: from dmahoney (uid 10302)
>(envelope-from dmaho...@bommel.dayjob.org)
>id 237584
>by bommel.dayjob.org (DragonFly Mail Agent v0.13 on bommel.dayjob.org);
>Thu, 07 Dec 2023 19:45:29 +
>
> I also noticed that the all_trusted rule did not fire -- perhaps,
> again, because of the above unparseable relay.
>
> Is DMA putting a crappy header in that would cause this not to break
> if we were running a local postfix/sendmail?
>
> Maybe I'm unclear on how this all works, but I thought that putting a
> host in trusted_networks basically sidestepped spam processing.
> What's the "correct" way to do this?  These are boxes that do not
> normally relay mail -- they only generate it from system reports and
> cron jobs, and generally speaking, only to us.

The correct way is probably to read the RFC about Received and see if
it's compliant, and then decide that it's too broad and you really need
something parseable, and then either patch DMA to emit something
compatible or patch SA to parse what DMA writes.

What you posted doesn't look terrible.  It's clearly a local inject from
a process.


Re: Too many dots?

2023-11-16 Thread Greg Troxel
Alex  writes:

> Also, the KAM rules are designed to be used in conjunction with the stock
> rules, so it also seemed somewhat punitive to award so many points and to
> be expected to offset them for a completely benign email.

My experience is that many of the KAM rules are unreasonably
aggressive.

In particular, I don't think it's ok for a rule to be over 3 points,
unless it is virtually certain that any message that hits it will be
spam.  Overall, they don't feel tuned to meet SA doctrine which AIUI is
that there should be quite rare FPs, meaning ham >= 5 points.

I have reported a number of FPs.  I have ~always heard back and had
reasonable discussions.  But it usually turns out that KAM thinks the
aggressiveness of whatever rule I am having problems with is good on
balance.  It might be; that's a really hard question to answer.

Overall, I've had too many problems with FPs, and given that my view of
how things should be and the ruleset's view are far enough apart, I
decided to just stop using it.  I was expecting to get more spam through
but it has not been noticeable (that's a perception, not anything
careful, and of course the arriving spam changes over time).


Re: sane max value for message size in 2023?

2023-09-11 Thread Greg Troxel
AJ Weber  writes:

> I realize this is very much an "it depends", but recently I'm getting
> a lot of messages bypassing spamc because they're a few KB over the
> default, 500KB limit (spamassassin 3.4.x).

That is way way too small now.

I would go to at least 8 MB.  $prefix/etc/spamassassin/spamc.conf should
have "-s nnn" where nnn is an integer i bytes.

If you aren't pained by the CPU time, it's better to scan.  There is
certainly spam over a MB.   If your server is maxed on CPU scanning mail
for a thousand people. that's something else.[q


Re: Ensuring SPF/DKIM for @gmail.com

2023-07-25 Thread Greg Troxel
J Doe  writes:

> I am currently using SpamAssassin 4.0.0 and I had a question on how I
> can ensure that any e-mail from @gmail.com has a valid SPF and DKIM
> signature.

You should phrase what you want more carefully.  What I think you said
is:

  I want that if mail comes in with a From: of *@gmail,com and if either
  SPF or DKIM fails, then I want to reject that mail.

Be careful what you wish for.  That will cause mailinglist mail to be
rejected.  Probably you should accept if DKIM passes, regardless of SPF.
And maybe SPF without DKIM, but I doubt there is much mail like that.

> I am aware that the following can be easily fooled, because it is not
> checking SPF and DKIM:
>
> welcomelist_from *@gmail.com

Not only that, it says that any such mail is accepted, which is not what
you said.

>
> ... so to ensure valid SPF and DKIM, I believe I would need:
>
> welcomelist_from_spf  *@gmail.com
> welcomelist_from_dkim *@gmail.com
>
> ... or *two* entries.

That means that anything that passes spf is accepted and anything that
passes dkim.  But that's not what you said; you said "ensure" which
means that you *reject* things that do not have both.   And then you
stilld do spam filtering on things that you didn't reject outright.

There is a lot of DKIM-signed SPF-compliant spam from gmail.  They let
people sign up for accounts, and some of them spam.  So "accept all mail
from gmail" is not a sensible policy.

Rejecting mail that claims to be grom gmail but isn't is more sensible,
but you need to understand that many mailinglists (incorrectly) munge
mail and cause it to fail DKIM, and of course SPF doesn't match.


What I do is assign a few spam points for gmail and add
welcomelist_from_dkim for people I know, or welcomelist_from_rcvd for
people on lists from the list sender.


Re: Welcome/unwelcome list not working correctly.

2023-07-21 Thread Greg Troxel
Grant Keller  writes:

> I don't think the query result order masters here, from what I could
> gather in the spamassassin source, the  welcome list is built in 2
> steps:
> 1. Create the list using the whitelist_from values.
> 2. Remove from that list everything in unwhitelist_from

I guess you need to look at the code that is doing the queries  and add
more debug  logging.


Re: Welcome/unwelcome list not working correctly.

2023-07-20 Thread Greg Troxel
Grant Keller  writes:

> | gvk  | unwhitelist_from| grant.kel...@sonic.com   | 7421538 |
> | gvk  | whitelist_from  | grant.kel...@sonic.com   | 7526210 |

What do you think that means?  What's the fourth column?

Note that we are in transition from white to welcome, but that shouldn't
matter.

> Still, a message from that address to the gvk user results in the
> following rules being hit:
>
> tests=ALL_TRUSTED,SCC_BODY_SINGLE_WORD,SONIC_BX_A2,SONIC_FRIEND,SPF_HELO_NONE,
> T_SCC_BODY_TEXT_LINE,USER_IN_WELCOMELIST

the welcomelist is hitting.


> I'm out of ideas to try on my side. Is there a way to have spamassasin
> or spamc print the config, or perhaps debugging I can enable to try to
> track down this problem?

std () 
{ 
spamassassin -t -D < $1 > /tmp/STDOUT 2> /tmp/STDERR
}

not quite what you asked, but better than where you are now.


Re: mystery score definition

2023-05-12 Thread Greg Troxel
Henrik K  writes:

> On Fri, May 12, 2023 at 07:12:35AM -0400, Greg Troxel wrote:
>> Henrik K  writes:
>> 
>> > From what I've seen, it's very uncommon to use this format.  Why rely on
>> > some vague previously defined score, which can change at any time?  Just 
>> > set
>> > a static score you like and fits your system.
>> 
>> It's not vague; it's the score which is defined by the distributed
>> rules.
>> 
>> My intent is to say that I want 1 point more than what the rules say,
>> and I mean that to float with rule changes.
>
> It _is_ vague.  It's either an educated static score the developer gave, or
> a corpus generated score, both which might not reflect your personal
> mailflow at all.

It seems we disagree what vague means; I think it means that it lacks a
precise meaning, and I find "the score that spamassassin would assign
before I try to change it" to be precise, even if that might change with
a rule update.  But in general we believe that users using the score in
an updated ruleset is a good thing; that's they the scores were changed.

Of course distributed scores might not fit my own mail.  But that's true
of all people, all the time and it isn't specially true because I want
to adjust one.

>> Perhaps you are arguing that all uses of () are confused and thus we
>> should lean to removing that facility.
>
> I just think it's much more common to create meta that checks if the rule
> you are interested in hit, and add to scoring that way.  Yes I realize by
> that logic things are vague as well, *shrug*.  But if you use a non-common
> method, it's possible that there are bugs and strangness as we now found
> out.

It might be more common, but it's very surprising to me, because the
manual page documents that () works, even if it technically leaves out
default scores.  I've been adjusting scores this way for years and this
is the first time I hit a rule with an implicit 1.

And, a user that is not authorized to create rules can adjust scores,
but can't create meta rules.

Yes, I realize that such a user can just set the score to 2, instead of
(1).


Re: mystery score definition

2023-05-12 Thread Greg Troxel
Henrik K  writes:

> From what I've seen, it's very uncommon to use this format.  Why rely on
> some vague previously defined score, which can change at any time?  Just set
> a static score you like and fits your system.

It's not vague; it's the score which is defined by the distributed
rules.

My intent is to say that I want 1 point more than what the rules say,
and I mean that to float with rule changes.

Perhaps you are arguing that all uses of () are confused and thus we
should lean to removing that facility.

Anyway, once I figured out that () will confusingly fail with some rules
(that just show up as if they are normal in score reports), it was easy
to fix.  Thank you all for the comments and explanations.


Re: mystery score definition

2023-05-11 Thread Greg Troxel
Matus UHLAR - fantomas  writes:

> On 11.05.23 10:58, Greg Troxel wrote:
>>I am seeing a lot of "claim your prize from X", where X is a known
>>company, coming from fresh foo.autos domains.  I bet y'all are seeing
>>this too.  Until these get on blocklists they don't score that high.
>>
>>One rule that does hit is
>>
>>  OBFU_UNSUB_UL
>>
>>which is defined in 72_active.cf as meta, and does not seem to have a
>>score defined.
>>
>>I put in local.cf (not knowing where it was defined)
>>
>>score OBFU_UNSUB_UL (1)
>>
>>to bump it up, but I got an error that I can't adjust an undefined
>>score.  However, scoring gives it 1 point.
>
> the default score for any rule is 1 poing, unless that rule starts
> with T_ (0.01) or __ (0, used for meta rules)

ok and not surprising.

But is it good practice for the main distributed rules to rely on this
default?  It feels like a lint/pedantic error to define a rule that is
not T_ or __ and does not have an assigned score.  But maybe this is
common and normal.

> so, you have changed nothing.

I asked for an additional point over the previous score.  I got an error
in the log:

  May 11 10:47:46 s1 spamd[11723]: config: score: relative score without 
previous setting in configuration 
  May 11 10:47:46 s1 spamd[11723]: config: invalid 'score' value in 
/usr/pkg/etc/spamassassin/local.cf (line 271): score\tOBFU_UNSUB_UL\t\t(2) 

which is what I'm asking about.

>>I wonder if there is a default 1 point for rules with no score, but the
>>adjustment process doesn't respect that default, or if not what is going
>>on.
>
> https://spamassassin.apache.org/full/4.0.x/doc/Mail_SpamAssassin_Conf.html

That says scores in () are relative to the "already set score".  So
technically this is not a failure to follow docs, in that no score is
set.  But it seems unhelpful to users not to be able to see

  FOO_RULE1

in a report and to decide they like that rule and do

score FOO_RULE (1)

to tell SA to give it one local point plus the score that the official
config gives is.

So maybe that (n) expression should be ok with the implicit 1.


mystery score definition

2023-05-11 Thread Greg Troxel
I am seeing a lot of "claim your prize from X", where X is a known
company, coming from fresh foo.autos domains.  I bet y'all are seeing
this too.  Until these get on blocklists they don't score that high.

One rule that does hit is

  OBFU_UNSUB_UL

which is defined in 72_active.cf as meta, and does not seem to have a
score defined.

I put in local.cf (not knowing where it was defined)

score OBFU_UNSUB_UL (1)

to bump it up, but I got an error that I can't adjust an undefined
score.  However, scoring gives it 1 point.

I wonder if there is a default 1 point for rules with no score, but the
adjustment process doesn't respect that default, or if not what is going
on.

Should all meta rules in the default ruleset have a score, 0.001 if they
are meant to be 0, vs something explicit?  What's the intent for this
one?

(I defined a meta rule with just OBFU_UNSUB_UL as the underlying with a
score, and that worked fine, as one would expect.)



Re: DKIM absence

2023-05-02 Thread Greg Troxel
Matus UHLAR - fantomas  writes:

> On 02.05.23 08:37, Thomas Johnson wrote:
>> If there’s no dkim signature, you can’t check for dkim records in
>> dns.  The selector for a dkim signature is arbitrary - there’s no
>> one dns lookup you can do to see all possible dkim records for a
>> domain.
>
> a trick: if _domainkeys.example.com exists (returns anything but
> NXDOMAIN), we may assume that at least DKIM records exist.
>
> I just have no idea how to test this in SA (at least not within rule).

I think that's a great idea, and we could add

DKIM_MISSINGDomain has DKIM records but message has no DKIM signature

with maybe +3 to start, as a sort-of-soft-impliced-DMARC.

(surely this is doable in a plugin; it's not conceptually hard)


Re: DKIM absence

2023-05-02 Thread Greg Troxel
> Right, because you need to grovel out the selector from the
> DKIM-Signature line.  Groan.
>
> That you can't mark a domain as requiring DKIM at the top-level seems
> to be a design flaw in the protocol.

Yes, but I think the way that is fixed is spelled DMARC.


Re: FP on KAM_SOMETLD_ARE_BAD_TLD

2023-04-12 Thread Greg Troxel
Alan  writes:

> A lovely message from a reputable sender with a penchant for fancy
> email formatting has CSS rules expressed in JSON, presumably so it can
> adjust for the mail client or some such.
>
> A segment contains the text:
>
> "items":[{"type":"Input.Date","id":"date"}]}
>
> The KAM_SOMETLD_ARE_BAD_TLD rule is triggering on Input.Date. The rule is 
> weighed quite high by default (5.0 here).
> This is pushing messages over the spam threshold. I've adjusted the weight 
> locally but it's probably something that should be tweaked globally.

(The KAM rules are on the aggressive side, and downscoring is appropriate
for those who like to be a bit less aggressive, especially those who are
not comfortable with single rules over 4ish.  But I am still running
them, because I think they help a lot more than they hurt.)

You seem to be suggesting reducing score, but that's not the real issue
in this case.  What you have found, I think, is treating something like
a URL that isn't.  However, that's really hard to fix given the MUA
so-called feature of treating things that sort of look like URLs as
URLs.

If you haven't, I would send the message in question to KAM for analysis
and perhaps rule adjustment.

FWIW, I find that I have adjusted score to 1.5.


Re: Why was USER_IN_DEF_SPF_WL triggered on this email, even though it's spam?

2023-03-20 Thread Greg Troxel
Bill Cole  writes:

> It can happen, particularly when a listed domain changes the way they
> send email. I'm not sure I understand exactly what Dropbox is doing
> here or how it is  possible for a user to masquerade as PayPal, but I
> suspect this is a new service of some sort.

It seems to be a new service:

https://invoice.dropbox.com/login

and from the mail Mark posted, it seems they let people

  choose the human part of the name: "John Doe "
  choose the Subject
  choose the Reply-To:
  choose the body

  put something at dropbox that will have a link in the mail

  but include a footer which is

[name] sent you an invoice using Dropbox, Inc. PO Box 77= 767, San
Francisco, CA 94107 View Privacy Policy[2]

  have the mail go out dkim-signed under dropbox.com

and thus I think dropbox.com needs to be removed from
default_welcomelist, as surely entities on default_welcomelist can't
allow web users to spam and match the entry.


Re: Why was USER_IN_DEF_SPF_WL triggered on this email, even though it's spam?

2023-03-20 Thread Greg Troxel
A quick grep shows:

  4.00/updates_spamassassin_org/60_welcomelist_auth.cf:def_welcomelist_auth 
*@*.dropbox.com

so the code is operating as designed.

It seems that either dropbox is compromised, or dropbox is allowing
user-generated content to go out under their domain.   Either way it
seems they should be removed from USER_IN_DEF_SPF_WL, unless this is a
blip and they fix it right away.

Have you written to ab...@dropbox.com, and what did they say?



DKIMWL functional?

2023-03-07 Thread Greg Troxel
I got spam which hit DKIMWL_WL_HIGH (from smartbrief).   I went to find
out how to report this as obviously they should not be on HIGH, and
found that

  https://www.dkimwl.org/

gets me

  A Database Error Occurred

  Error Number: 1146
  Table 'bladmin.dkimwl_magnitude_monthly' doesn't exist
  SELECT *, (count/28) as totcount FROM `dkimwl_magnitude_monthly` WHERE 
domain_id REGEXP '\\.' ORDER BY `totcount` DESC LIMIT 100
  Filename: models/Entriesmodel.php
  Line Number: 19


I wonder if anyone knows if DKIMWL is still functioning and if so how to
report being spammed by someone in their DB?


Re: adobe phishing?

2023-02-22 Thread Greg Troxel
Kris Deugau  writes:

> Greg Troxel wrote:
>> One of my users got mail that really looks like a phish. They are
>> unaware of having an adobe account.   It is DKIM signed, but looks a bit
>> spammy in terms of the content (low-quality HTML markup, missing
>> text/plain content).
>
> ... How much otherwise legitimate mail have you inspected recently?
>
> Grotty HTML and missing text/plain is here to stay.  :(

I realize that, but it's still icky.

It just seemed like 'obvious phish' from context so I thought I'd ask.

Sounds like it's as legit as Adobe is :-)


adobe phishing?

2023-02-22 Thread Greg Troxel
One of my users got mail that really looks like a phish. They are
unaware of having an adobe account.   It is DKIM signed, but looks a bit
spammy in terms of the content (low-quality HTML markup, missing
text/plain content).

Is anyone else seeing this?

Opinions on if it's real, if adobe is compromised, or ?



Return-Path: 
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-14) on mail.example.com
X-Spam-Level:
X-Spam-Status: No, score=-7.3 required=1.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED,
DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,HTML_IMAGE_RATIO_08,
HTML_MESSAGE,MAILING_LIST_MULTI,RCVD_IN_HOSTKARMA_W,
RCVD_IN_VALIDITY_CERTIFIED,RCVD_IN_VALIDITY_SAFE,SPF_HELO_NONE,
SPF_PASS,TXREP shortcircuit=no autolearn=disabled version=4.0.0
X-Original-To: u...@example.com
Delivered-To: u...@mail.example.com
Received: from r42.mail.adobe.com (r42.mail.adobe.com [192.243.226.42])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by mail.example.com (Postfix) with ESMTPS id E7096410756
for ; Wed, 22 Feb 2023 11:05:08 -0500 (EST)
Authentication-Results: mail.example.com;
dkim=pass (1024-bit key) header.d=mail.adobe.com 
header.i=@mail.adobe.com header.b=EtgaivIv
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mail.adobe.com;
s=neolane; t=1677081908;
bh=IfJX78+kf+++BGIgmI6NTSU3ZUI1dzDwNJ5pRlW6Y+w=;
h=From:Subject:Date:To:MIME-Version:Message-ID:List-Unsubscribe:
 Content-Type;
b=EtgaivIvUiNOiiVI5kpGQONOWfcAOQvbfpJrGiR0xQQvORkDfj5uVp6LH3JftKL1+
 E/DIsY896w9NajMG7AOHNBrDnN6+BpBx+J0OOWy62EcdYBntSnDiifQmat0CH0p7Xg
 Ozw4G3a2zZc/nJ+QRBK75/Zgg2Nyg9rF+y23gufI=
X-MSFBL: XsGvftOJ+4LnDyzV1Q3igtbyPwQxb/rf8JNpMfEpA0E=|eyJyIjoibWV0QGxleG9
ydC5jb20iLCJnIjoibWlkLnJlYWN0aXZhdGlvbl8xZDBlNjMxMS02Zjk4LTRjNWI
tOGIwZS04ZGY4MGQ1Yjc3MzkiLCJiIjoiYXdzX2Fkb2JlaW50X3Byb2Q2X21pZC5
yZWFjdGl2YXRpb25fbW9tZW50dW0xOV9tdGEwMDJfMTkyLjI0My4yMjYuNDIiLCJ
yY3B0X21ldGEiOnsgImluIjogImFkb2JlaW5fbWlkX3Byb2Q2IiwgInIiOiAibWV
0QGxleG9ydC5jb20iLCAibSI6ICItMTcyMjM2MjU0IiwgImQiOiAiNjI5NTEzOTM
iLCAiaSI6ICIiIH19
Received: from [10.139.37.161] ([10.139.37.161:12939] helo=r42.mail.adobe.com)
by momentum19.or1.cpt.adobe.net (envelope-from )
(ecelerity 4.2.38.62370 r(:)) with ESMTP
id 97/FA-14171-43D36F36; Wed, 22 Feb 2023 08:05:08 -0800
From: "Adobe" 
Subject: =?utf-8?B?SW1wb3J0YW50IGluZm9ybWF0aW9uIGFib3V0IHlvdXIgQWRvYg==?=
 =?utf-8?B?ZSBhY2NvdW50?=
Date: Wed, 22 Feb 2023 08:05:07 -0800
To: 
Reply-To: "Adobe" 
MIME-Version: 1.0
X-mailer: nlserver, Build 6.7.0
Message-ID: 
List-Unsubscribe: List-Unsubscribe: 
X-CSA-Complaints: whitelist-complai...@eco.de
List-Id: <-1193003540.neolane.client.com>
Precedence: bulk
List-Unsubscribe-Post: List-Unsubscribe=One-Click
Content-Type: multipart/alternative;
charset="windows-1252";
boundary="=_NextPart_166_5CA8CB4B.5CA8CB4B"


[SNIP]

Dear Adobe customer,
We've noticed you have not logged in to your Adobe account in more =
than a year. In keeping with our policies, we are contacting you to let you=
 know your Adobe ID will expire 90 days from now. If you take no action wit=
hin the next 90 days, your https://t-info.mail.adobe.com/r/=3Fid=
=[RANDOM_BASE64_SUFF]" target=3D"_blank" style=3D"color:#505050; text-dec=
oration:underline;">Adobe ID will no longer be valid, you will no longe=
r have access to content you may have stored on our servers and this accoun=
t will beclosed.
Your Adobe ID is: USER@E=
XAMPLE.COM
 
If you would like to maintain your Adobe ID listed above, you can l=
og in now to keep itactive.


Re: TxRep records unreliably on MySQL

2023-01-09 Thread Greg Troxel
"Matt Anton via users"  writes:

> Here's what I'm having on the SQL spamassassin db:
>
> 

Thanks, much easier!

>> 1) txrep seems not 100% baked.   I suggest reading the code to see how
>> this happened.
>
> What code are you talking about?

The perl source code for TxRep.  On my system:

  /usr/pkg/lib/perl5/vendor_perl/5.36.0/Mail/SpamAssassin/Plugin/TxRep.pm

>> 2) txrep with bdb only has keys and values and it does overload the
>> key for address and name.  So perhaps this is incompletey moving to
>> the more complicated scheme.
>
> And there you completely lost me ;)
> I naively thought TxRep would record in a same way that AWL did (sql
> schema for both plugins are the same).

Maybe it should; someone has to read the code and figure this out.  I
sort of intend to at some point, but haven't.


Re: TxRep records unreliably on MySQL

2023-01-09 Thread Greg Troxel
"Matt Anton via users"  writes:

> After an upgrade to SA-4.0.0 I decided to give TxRep a try after using 
> AWL since it was introduced.
> I set up TxRep accordingly to SA’s documentation with a mysql-5.7.40 
> server, give it a first try by sending an email to the box where SA is 
> running and saw TxRep just has recorded unreliably onto the sql table:

Your mail was miswrapped and thus hard to read.

1) txrep seems not 100% baked.   I suggest reading the code to see how
this happened.

2) txrep with bdb only has keys and values and it does overload the key
for address and name.  So perhaps this is incompletey moving to the more
complicated scheme.


Re: Espoofer - An Email Spoofing Testing Tool That Aims To Bypass SPF/DKIM/DMARC And Forge DKIM Signatures

2022-12-28 Thread Greg Troxel
It would be great if someone(tm) went through the blackhat pdf and wrote
rules for all the evasions, and fixed the MTAs etc.


Re: Whitelist or add negative values for score

2022-12-21 Thread Greg Troxel
The other thing that should be done for j...@company.com is that
company.com should sign their mail with DKIM, and then you can

  welcomelist_from_dkim *@company.com

I find that many companies I deal with that produce semi-spammy mail
(most big companies :-) have DKIM signatures and I can welcomelist on
that, without welcomelisting forgeries.

You can of course use _rcvd for the IP address.  DKIM is just nicer if
you can get them to do it.


Re: [ANNOUNCE] Apache SpamAssassin 4.0.0 available

2022-12-19 Thread Greg Troxel

Benny Pedersen  writes:

> Kenneth Porter skrev den 2022-12-20 04:59:
>> RPM status for Red Hat distros:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=2154501
>>
>> https://bodhi.fedoraproject.org/updates/FEDORA-2022-e341ba52a1
>>
>> https://koji.fedoraproject.org/koji/buildinfo?buildID=2102188
>>
>> It looks like the packaging fails before building anything because the
>> filename in the spec file for the rules tarball doesn't match the
>> version of rules actually included in the srpm.
>
> lol, there should not be any rules in tarballs, rules should be
> fetched with sa-update AFTER install

Actually, not really.  Packages should be able to run out of the box,
with no network fetching needed.  The pkgsrc entry -- also updated to
4.0.0 -- fetches the release rules at package build time and includes
them.  But, it does build :-)

This is a little silly with spamassassin, but the general principle
matters.  I once set up several computers not at all connected to the
internet, including an MTA.  In that case I didn't need spamassassin,
because none of the people sending mail were spammers :-) But, binary
packages being directly usable mattered.

Of course, one should update rules daily, especially on systems that
receive mail from the internet.


signature.asc
Description: PGP signature


Re: Whitelist or add negative values for score

2022-12-19 Thread Greg Troxel

Joey J  writes:

> I'm trying to see if there is a "best way" to provide negative scoring for
> a certain persons email.

That's easy.  There are many ways, but not best way.

> As an example if j...@company.com is communicating with paypal or other real
> banking institutions, then at times within the email chain, SA will tag it
> as spam.

It's really not clear what your issue is.

> I want to see if there is if email is from j...@company.com AND is from IP
> address 1.2.3.4, then lets take away 2 from the score, hopefully allowing
> those legitimate types of messages through.
> I couldn't find an example on how to accomplish this dual criteria check.
> Any assistance is apreciated.

welcomelist_from_rcvd   j...@company.com[1.2.3.4]

should work, but -100.  It would be nice if welcomelist_* could take a
score, but it you are sure you want *your* SA to not mark it as spam,
-100 is the way to spell that.


signature.asc
Description: PGP signature


Re: New Release Candidate 4.0.0-rc4 Testers Needed

2022-12-15 Thread Greg Troxel

On 12/14/22 10:51 AM, Kevin A. McGrail wrote:
Excellent news!  Please let us know more about the WL/BL changes and 
open a bugzilla bug.


My other post about this has the info, but I just wrote a bug entry that 
is probably more succinct and coherent now that I understand better.


https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8092


Re: New Release Candidate 4.0.0-rc4 Testers Needed

2022-12-14 Thread Greg Troxel

"Kevin A. McGrail"  writes:

>> I am finding that short-circuiting seems not to be working, but this is
>> not new and I am not 100% clueful about it.  However in trying to figure
>> things out I am running into things I do not understand and think that
>> at least a bit more doc clarity would help.

> We have had issues with shortcircuit that cropped up.in the rc process from
> optimization that was performed. Can you open up bugzilla ticket
> please?

False alarm.  The issue was that I wasn't loading the shortcircuit
plugin, since it never occured to me from reading local.cf that it was
not default.  Regular welcomelist short circuits very well, and DKIM
welcomelist is also working well, but dns queries are launched (I get
why of course).

There is also the possibility of confusion between shortcircuit config for
USER_IN_WELCOMELIST when the enable_compat is not on, but I have no
evidence for that.  I'll try to check that later.


So, my issues with the RC are now down to understanding the welcomelist
compat strategy (which is good, because that's not really a big deal and
easy to address).


(My "taking 30s" issue was that the NFS locking method fails on NetBSD,
and I switched to flock.  My "core dump" issue is a bug someplace in the
BDB hash table support (or in the bdb code on my system) showing up in
TxRep use, which feels like off-by-1 errot but I'll figure it out, and I
have a really gross workaround.)


signature.asc
Description: PGP signature


Re: New Release Candidate 4.0.0-rc4 Testers Needed

2022-12-14 Thread Greg Troxel

Greg Troxel  writes:

> The wiki page in the release notes says:
>
> In SpamAssassin version 4.0.0 all rules, functions, command line
> options and modules that contain "whitelist" or "blacklist" have
> been renamed to contain "welcomelist" and "blocklist" terms. This
> allows acronyms like WL and BL to remain the same. Previous options
> will continue work at least until version 4.1.0 is released. If you
> have local settings including scores or meta rules referring to old
> rule names, these should be changed and "enable_compat
> welcomelist_blocklist" added in init.pre.
>
> I haven't enabled compat, but I did rename.  I would expect that with
> the transition to new keywords in 4.0.0, the normal approach is to edit
> one's config and be all set.  Or, one could leave the old words and have
> them treated as compatible, maybe with a warning.  Or possibly have to
> enable compatibility for the old ones.
>
> Am I really supposed to change the keywords to welcome/block *and* set
> "enable_compat"?  The man page Mail::SpamAssassin::Conf.3 doesn't say
> that, that I was able to find.

There's nothing like reading the code to answer questions.  So

  - The basic rule is USER_IN_WELCOMELIST, and the basic score for it is
-100.  This is in 60_welcomelist.cf.

  - If "enable_compat welcomelist_blocklist" has been given, it stays
that way.

  - Without that compat statement, a meta rule USER_IN_WHITELIST is
created and given a score of -100, and the score for
USER_IN_WELCOMELIST is set to -0.01.

I don't understand this approach at all:

  - I think it's important that if a user has "whitelist_from" in their
config, that's still followed.  As far as I can tell that's in the
config parser not the scoring, so that is separate.

  - The words are changed in this release, so if the user doesn't ask
for anything special, their scoring output should have -100 for the
WELCOMELIST, and shouldn't show WHITELIST.

  - If someone has meta rules that include USER_IN_WHITELIST, then there
needs to be a compat rule with that name for that to work.  But that
seems like a very unusual thing to do, as usually
WELCOMELIST/WHITELIST rules have a score such that no further rules
are needed.

  - Someone might reasonably want to turn on shortcircuiting for
USER_IN_WELCOMELIST, and it seems awkward at best for that to
fire on a -0.01 score and expect the -100 meta rule to kick in.
This is the default behavior.


With considerable trepidation from not really understanding, I would
instead

  - have a

  enable_compat whitelist_blacklist

that adds the meta rules (USER_IN_WHITELIST) with scores -0.01, and
leaves the scores for USER_IN_WELCOMELIST alone, to accomodate
people with meta rules that refer to this.

  - adjust the wiki documentation to say that both welcomlist_from
(standard approach) and whitelist_from (deprecated, compatibility)
are recognized in config files, and explain about the compat for
meta rules in the first bullet point.


signature.asc
Description: PGP signature


Re: New Release Candidate 4.0.0-rc4 Testers Needed

2022-12-13 Thread Greg Troxel

I am finding that short-circuiting seems not to be working, but this is
not new and I am not 100% clueful about it.  However in trying to figure
things out I am running into things I do not understand and think that
at least a bit more doc clarity would help.

I have a fairly normal installation, milter with postfix, base rules,
KAM, some custom rules, and (now with 4.0.0rcN, renamed) a bunch of
welcomelist, which I try to do with dkim, then rcvd and for some give up
and just welcomelist.  Plus a bunch of blocklist.  These rules work and
I get sensible scores with occasional minor issues.

I just got a mail which was a little spammy and reasonably got 1.4, but
I decided to call it ham and added a "welcomelist_from_dkim
n...@example.com mailchimpapp.net".  It then scored -98.6, so that's
good.

I had in local.cf;

  shortcircuit USER_IN_WELCOMELIST   on

and realized that doesn't cover USER_IN_DKIM_WELCOMELIST, so I added

  shortcircuit USER_IN_DKIM_WELCOMELIST   on

but still scoring looks like:

  -0.0 USER_IN_DKIM_WELCOMELIST From: address is in the user's DKIM
  -100 USER_IN_DKIM_WHITELIST DEPRECATED: See USER_IN_DKIM_WELCOMELIST

plus a bunch of other stuff including network tests (all done
correctly).  It took 36 seconds and there was no sign of short
circuiting.

The wiki page in the release notes says:

In SpamAssassin version 4.0.0 all rules, functions, command line
options and modules that contain "whitelist" or "blacklist" have
been renamed to contain "welcomelist" and "blocklist" terms. This
allows acronyms like WL and BL to remain the same. Previous options
will continue work at least until version 4.1.0 is released. If you
have local settings including scores or meta rules referring to old
rule names, these should be changed and "enable_compat
welcomelist_blocklist" added in init.pre.

I haven't enabled compat, but I did rename.  I would expect that with
the transition to new keywords in 4.0.0, the normal approach is to edit
one's config and be all set.  Or, one could leave the old words and have
them treated as compatible, maybe with a warning.  Or possibly have to
enable compatibility for the old ones.

Am I really supposed to change the keywords to welcome/block *and* set
"enable_compat"?  The man page Mail::SpamAssassin::Conf.3 doesn't say
that, that I was able to find.

I wonder if I am not getting short circuit because the -100 is awarded
to USER_IN_DKIM_WHITELIST, not USER_IN_DKIM_WELCOMELIST?   Or is DKIM a
network test, and thus doesn't really work for short circuiting?
Something else?

(I realize that 36s is a clue that an RBL is being queried that times
out and I should find and fix that, but I think that's orthogonal to my
questions.)


signature.asc
Description: PGP signature


Re: New Release Candidate 4.0.0-rc4 Testers Needed

2022-12-11 Thread Greg Troxel

"Kevin A. McGrail"  writes:

> I have it in production.

Thanks - I just reinstalled, re-ran sa-update for base and KAM rules,
and so far it's looking good modulo a few nits:

UPGRADE says:

  - All rules, functions, command line options and modules that contain
"whitelist" or "blacklist" have been renamed to contain more
racially neutral "welcomelist" and "blocklist" terms. This allows
acronyms like WL and BL to remain the same. Previous options will
continue work at least until version 4.1.0 is released. If you have
local settings including scores or meta rules referring to old rule
names, these should be changed and "enable_compat
welcomelist_blocklist" added in init.pre. See:
https://wiki.apache.org/spamassassin/WelcomelistBlocklist (Bug 7826)

1) I find  that URL gets me "page not found.

2) It feels like a bug not to have compat by default.  My memory is that
while 3.4.6 has compat-for-new, the output was such as to suggest that
the WELCOME syntax was non-standard, so I hadn't done query-replace yet.
For a From: that is "whitelist_from" (yes I know I know I need to tell
them to set up DKIM, and use whitelist_from_rcvd in the meantime), I
get:

  -0.0 USER_IN_WELCOMELISTUser is listed in 'welcomelist_from'
  -100 USER_IN_WHITELIST  DEPRECATED: See USER_IN_WELCOMELIST

Changing all entries to welcomelist_from gets me the same:

  -0.0 USER_IN_WELCOMELISTUser is listed in 'welcomelist_from'
  -100 USER_IN_WHITELIST  DEPRECATED: See USER_IN_WELCOMELIST

(I have run sa-update less than an hour ago; the mod time on my rules is
0959 EST.)

3) Maybe I'm misreading that there is no change in default and if you
want to use the new terms, you should turn on enable_compat
welcomelist_blocklist, but I never would have guessed that you need to
opt in to the new standard approach.  I would expect that the new way
would work with no noise and the old way, for now, would work with
something that feels like a warning.  And that some future release would
maybe fail to start if there are such warnings.


So it seems like 1) there is compat by default, which is good and 2)
somehow it is sort-of-old-compat, instead of omitting the whitelist line
since I hadn't configured it.  Maybe this is something on my end, but I
did merge to the new local.cf.  And as always maybe I am confused.



The good news is that I can find nothing else wrong.  I am not seeing
short-circuiting happening but that is no different than 3.4.6 so I
think that's my fault, not a 4.0.0 issues.


signature.asc
Description: PGP signature


Re: New Release Candidate 4.0.0-rc4 Testers Needed

2022-12-11 Thread Greg Troxel

Sidney Markowitz  writes:

> I know a number of you have been looking at the release candidates for
> the 4.0.0 release and have been helpful in finding issues with them.
>
> We have just announced a new release candidate 4 that looks very close
> to ready for the full 4.0.0 release.
>
> We could use as many people as possible who are in a position to try it out.
>
> Here is a link to the archived announcement we made on the developers
> mailing list, which has all the details on downloading, what's new,
> and upgrading from version 3.4.x.
>
> https://lists.apache.org/thread/j10xp2b9166ctqsydhjqo5y9h8dw7zdp

I am testing on NetBSD 9 amd64, likely a platform you don't have a
report for yet.

I have locally updated the mail/spamassasin entry in pkgsrc, which only
required minor defuzzing of our patches.  (They are almost all about
accomodating to our non-default prefix and handling of config files.
I'll try to sort through and make sure anything that isn't that gets
sent upstream for discussion.)

I've run it on a machine which doesn't really do mail, starting spamd,
and running spamassassin -t on a message.  That looks good; it gets
sensible scores include ARC_SIGNED and ARC_VALID.

I don't have a staging server, and could try it on my production server.
I therefore wonder:

  Do people think it's prudent to try this in production (for a small
  server for just personal mail, by someone who knows what they are
  doing)?

  If I run it, and I decide to flip back (because there are issues), is
  there anything to worry about?  I am using TxRep but I didn't see in
  the announcement that there was a database schema change.

  Have other put this in production and was it ok?  I am guessing yes
  and yes.


signature.asc
Description: PGP signature


Re: Mial hits MISSING rules despite presence of headers

2022-12-04 Thread Greg Troxel

"Kevin A. McGrail"  writes:

> #2 Work on the code so that short circuiting or at least the scoring
> behaves as with 3.4.6.

As penance for ranting I went back and re-read everything more
carefully, but feel free to ignore me if I am being unhelpful.

  I don't think a -2 shortcircuit rule makes any sense.   It seems to me
  that the idea of shortcircuit is "I can more  or less prove that
  skipping the rest won't change the classification in any meaningful
  way, so save the resources", and -2 just isn't like that.

  Reading the bz entry, I think the real bug is a meta rule evaluating
  when the rules it refers to have not finished.  It seems obvious (I
  say knowing I probably don't understand something) that this leads to
  wrong results, and they aren't structurally of the "skip processing"
  type that's within "acceptable wrong results".

Wrong meta results seem to me to be outside the vague spec from before.

So I would lean to "do not allow meta rules to evaluate unless all of
the rules they refer to have completed", and if there's a new
special-case that they eval anyway after short circuit -- bypassing the
usual dependency, then don't do that.

As always I may be confused.




signature.asc
Description: PGP signature


Re: Mial hits MISSING rules despite presence of headers

2022-12-04 Thread Greg Troxel

Bill Cole  writes:

> On 2022-12-04 at 09:57:09 UTC-0500 (Sun, 04 Dec 2022 09:57:09 -0500)
> Greg Troxel 
> is rumored to have said:
>
>> Putting on my CS pedant hat, I guess the big question is if there is a 
>> violation of a previously published specification.
>
> If not, it would only be a consequence of no definitive clear spec existing.
>
> The logic around rule ordering, completion of meta rules, and
> shortcircuiting is mind-numbingly subtle. If there is a clear unified
> description of how it has worked in the past, I cannot find it.  My
> sense from the 3-year odyssey that was Bug 7735 is that we've never
> worked out a complete flowchart or state diagram that covers the whole
> realm of possible situations. I wouldn't even bet on the existing
> relevant documentation spread around the project being 100% internally
> self-consistent.

That's more or less what I was getting at.  If there is not a clear
specification (i.e. the documentation says that it works like X) that
people can properly rely on, then the pedant in me says that behavior
changing slightly, but still within the swim lane implied by the
previous non-spec, is not a bug.


signature.asc
Description: PGP signature


Re: Mial hits MISSING rules despite presence of headers

2022-12-04 Thread Greg Troxel

"Kevin A. McGrail"  writes:

> I think that will have to go to discussion since if the rules don't short
> circuit the way they used to, other rules outside of the ones we control
> are going to act oddly. The one that was reported was with validity for
> example.
>
> What happens if I have a local rule that's high scoring and meta that would
> have been short circuited prior?  In 3.4 I would have expected to stop when
> I hit the validity rule, now I continue running and hit another rule that's
> very high scoring and end up with a mis classification.

Perspective from someone who does not deeply understand short
circuiting:

0) I have never had the impression that there were guarantees about the
order of rule evaluations.  I do have the impression that network tests
are kicked off in parallel.

1) My impression has always been that short circuiting is about early
termination of scoring and skipping further tests for two reasons:

  avoiding both CPU time and remote queries for further tests

  avoiding the elapsed time that such tests will take, so that
  short-circuited ham can be delivered in a few seconds rather than a
  minute

I have always expected that short circuiting should be done for rules
that are -100 or +100, where when they hit you have made a decision.
It seems strange to me that someone would configure short circuiting for
a rule that does not have overwhelming weight.

2) It seems strange to me to have a situation where a message might hit a
+100 and a -100 rule both (on purpose) and further strange that one
might have a scheme where one is marked short circuit and the proper
classification relies on that happening before the others.



Putting on my CS pedant hat, I guess the big question is if there is a
violation of a previously published specification.


I am probably way off, but I hope this is helpful as a proxy for the
typical understanding of someone who does not really understand.


signature.asc
Description: PGP signature


Re: spamassassin sometimes suddenly ends scanning

2022-11-29 Thread Greg Troxel

Henrik K  writes:

>> I see occasional coredumps (as in perl.core).   It is often enough to be
>> annoying (beyond worrisome that it happens at all), but not reproducible
>> and no apparent pattern.
>
> Try memtester/memtest86, atleast if it's not a proper server with ECC
> memory..

I am pretty sure the hardware is OK, but I can't really run memtest86 as
it is a VPS.  Spamassassin has trouble often, and the machine does a
lot of other things, and they are all trouble-free.  The logs do not
show a single core dump from anything else.

> And if you have core dumps, running gdb would be helpful:
>
> $ gdb /usr/bin/perl /path/to/core
> (gdb) backtrace

Yes, and I should rebuild it all with -g.

But it sounds like others are not seeing this, which is a useful
datapoint.


signature.asc
Description: PGP signature


Re: spamassassin sometimes suddenly ends scanning

2022-11-29 Thread Greg Troxel

Wolfgang Breyha  writes:

> It doesn't finish any other rules and doesn't display final results at all.
>
> And then I start it simply again and everything is fine.
>
> Has anybody else seen this odd behavior?

I see occasional coredumps (as in perl.core).   It is often enough to be
annoying (beyond worrisome that it happens at all), but not reproducible
and no apparent pattern.


signature.asc
Description: PGP signature


Re: spam subject marking

2022-11-16 Thread Greg Troxel

Greg Troxel  writes:

> I did just get a bounce message in reply to a message I sent here,
> complaining that my message failed DKIM (maybe the list munged it) and
> SPF (ok; the list is not in general authorized to send mail from my
> domain) and therefore was being rejected (but I do not currently publish
> a DMARC policy).

Update: my messages to the list, as I received them, both this one and
the one that provoked the bounce, have valid DKIM signatures as
determined by inbound processing on my MTA.

So while my rant about DKIM and lists stands in general, I apologize for
casting aspersions on this list: it appears to be working well as far as
not breaking DKIM, at least for senders without DMARC.



signature.asc
Description: PGP signature


Re: spam subject marking

2022-11-16 Thread Greg Troxel

"Grant Taylor via users"  writes:

> On 11/15/22 1:16 PM, Marc wrote:
>> Hmmm, good point, not really thought about this even. Are email
>> clients complaining about this?
>
> Few email clients are testing DKIM.  Some servers are testing
> DKIM. Some systems are mis-treating DKIM failure as something more
> sever than the specification allows.

Can you expand on that?   A DKIM failure means that one can't establish
that the message came from the domain, and this leads to:

  decline to apply whitelist_from_dkim

  perhaps, if one has data that most things with that From: have valid
  dkim sigs, give it some spam points.

in spam filtering and

  if there is a DMARC policy, and it fails SPF also, file as spam or
  reject

Are you saying tht some MTAs outright reject on DKIM failure, in the
absence of DMARC?

I did just get a bounce message in reply to a message I sent here,
complaining that my message failed DKIM (maybe the list munged it) and
SPF (ok; the list is not in general authorized to send mail from my
domain) and therefore was being rejected (but I do not currently publish
a DMARC policy).

Not really this topic, but I think mailing lists really need to be set
up to not break DKIM.  The kids all want us to use forums anyway, and
DKIM-breaking and spam filtering issues, really doesn't help.

>> Currently I just want to 'warn' users that the message is possible
>> spam, they can decide to move such emails automatically to a spam
>> folder by enabling a sieve rule.
>
> I suspect any visible modification you make to the message will also
> likely break DKIM in the same way.

Agreed.  Really the MUA needs support for a spam-marking header, or to
file messages with such headers into a separate mailbox/folder/whatever.

>> What would be an alternative method to keep such functionality
>> without altering the subject?
>
> Adding headers is the most common thing that I see.  Then let the
> email client decide what action, if any, to take based on that
> header's contents.

me too


signature.asc
Description: PGP signature


Re: PBL and rejects

2022-11-14 Thread Greg Troxel

Alex  writes:

> I'm hoping I can ask this question here. Somehow the PBL considered the IP
> addresses given to us by our ISP (I can share this if needed) as ineligible
> to send email, resulting in any recipient domain that checks the PBL to
> reject our email,

AIUI, PBL is supposed to be for dynamic-type IP addresses for
residential service, so if you have business service something seems
off.

What did your ISP say when you asked them about this?   I would expect
them to be concerned because giving customers addresses in RBL is
obviously going to get them sorted into giving not-really-ok service and
negative recommendations, if that's what is really going on.


signature.asc
Description: PGP signature


Re: Gmail confidential mode

2022-10-16 Thread Greg Troxel

Alex  writes:

> What do you know about "Gmail confidential mode" emails? I'm starting to
> see a few of these come in to users now, and not sure how to treat them.
> They are sent through gmail, but require a one-time passcode sent to the
> recipient,

Did you actually look at them?  What do they look like?  What does the
recipient have to do to actually get the mail?  Does this only work
gmail to gmail?

> so any potential threat is not transferred through the same
> email (or any email at all).

huh?  I don't follow this at all.

It is a longstanding tradition to send malware through zip or encryption
to avoid scanning.   I would view these with extreme suspicion as if you
are communicating with people you know and want privacy, the obvious
first step is to avoid gmail and use Matrix/Signal or OpenPGP mail, and
if it's from someone you don't know, well...n

> otherwise have no other spam indicators.

When you looked at the raw bytes in the mailspool, what was in it?  What
does the SA debug output look like?  It doesn't make sense that wouldn't
have done these things before posting, but you didn't explain.


signature.asc
Description: PGP signature


Re: More Sendgrid trouble?

2022-09-29 Thread Greg Troxel

Kris Deugau  writes:

> The Bayes result is not great, but the USER_IN_DEF_*_WL hits between
> them account for most of that negative score anyway.

With dkim-signed spam, I think the only two paths forward are:
  - hope they fix their apparently compromised system
  - take them out the default WL (locally now, and via a rule update in
a few weeks)



signature.asc
Description: PGP signature


Re: More Sendgrid trouble?

2022-09-28 Thread Greg Troxel

Kris Deugau  writes:

> Is anyone else seeing intermittent FNs on mail sent through Sendgrid
> where the nominal sender has a default welcomelist_* entry?
>
> Today's spample is a Mcafee scam email, pretty clearly sent through
> Intuit's Sendgrid account based on the rDNS.  On testing in my sandbox
> it was only allowed through due to the stock welcomelist entry for
> Intuit.
>
> Not 100% sure whether this is a Sendgrid issue, or an Intuit issue - 
> I've reported the message to both of them, for whatever good it will do.

very interesting.  was this DKIM signed?


signature.asc
Description: PGP signature


Re: subscribe to blacklist for domains

2022-08-15 Thread Greg Troxel

Vincent Lefevre  writes:

> On 2022-08-13 14:05:43 -0400, joe a wrote:
>> On 8/13/2022 12:38 PM, Martin Gregorie wrote:
>> . . .
>> > 2) There's no mandatory need to REJECT spam. It has always been up to
>> > the recipient to decide whether to return it to the sender or not.
>> 
>> Agreed in part.  I see returning SPAM to sender as an exercise in futility
>> or perhaps further enabling.  But I do prefer labeling as SPAM to outright
>> rejection in many cases.

Be careful in "returning".  There is replying with 550 and not accepting
it, which ensures that *you* are not generating backscatter, and there
is sending a bounce later.   I think that if you're going to reject it,
you should 550 it.

> Rejecting mail (instead of accepting it and dropping it) is useful
> in case of false positives.

This is a key point.  A lot of mail ends up in spam folders that are so
full they don't get looked at, at a number of ISPs that have a poor
email recipient experience.  I know people at AOL/Yahoo/Verizon and
Comcast that have mail end up in spam and in practice do not cope with
looking at it.  (Further, this mail is wrongly classified, and people
can't in practice fix that.)

By rejecting spam with 550, it doesn't end up in the spam folder, and
that folder becomes easier to scan.  And if legit mail is rejected, at
least the sender knows it didn't get there, even if the ISP is
intractable.

If you accept mail and then send it to /dev/null, then the recipient is
unaware that it was sent, and the sender is unaware that it wasn't
received, other than by implementing a human-human ack protocol.

So I'm a firm believer that at SMTP time, you need to pick one of

  550 and you're done

  accept and then sort into ham mailboxes and spam mailboxes, with the
  idea that the user should be checking all of them

By choosing 550 you can turn up the aggressiveness of checking a bit
compared to if you don't.



signature.asc
Description: PGP signature


Re: Understanding FORGED_GMAIL_RCVD and other rules

2022-06-22 Thread Greg Troxel

Nikolaos Milas  writes:

> I am trying to understand what is wrong with these mails and they
> trigger the "FORGED_GMAIL_RCVD" rule.

What is wrong with them is that they have a From: of gmail and do not
have a gmail DKIM signature.   They are in fact forged -- even if the
user that owns the email address agreed to this.


> Can you please help me understand why the rule was triggered? I have
> done my search but I have not really understood why.

Did you read the rules?  20_head_tests.cf has

  if (version >= 3.004002)
  header FORGED_GMAIL_RCVD  eval:check_for_forged_gmail_received_headers()
  describe FORGED_GMAIL_RCVD'From' gmail.com does not match 'Received' 
headers
  endif

But I do not see a score assigned.   In my own system, the score for
this rule (as seen in debug output) is 1.0.   That seems entirely
reasonable for a fairly common but irregular situation.

> Secondarily, if I understand right, the following rules:
>
>FREEMAIL_FORGED_FROMDOMAIN
>
>HEADER_FROM_DIFFERENT_DOMAINS
>
> were also triggered because the Envelope-From is different from
> "From:" but this is expectable from mailing lists.
>
> How should these (and possibly other ones too) rules be treated in
> production systems to avoid banning legitimate mailing list mails?

If you want to welcomelist mailchip, you can do that.

I suspect your real problem is that there is config to increase the
score for FORGED_GMAIL_RCVD.   Your example shows 4.0 which I think
everyone would say is too high.



signature.asc
Description: PGP signature


Re: IPv6 issue

2022-05-06 Thread Greg Troxel

I agree with what Grant said.

Also, I wonder how much greylisting would help, and if you were already
doing that.  The data I posted is for a machine that already does
greylisting in general, with varying times depending on inclusion in
various RBLs and local data.

I find that delaying connections from unknown places even 2 minutes
helps a lot.


signature.asc
Description: PGP signature


Re: IPv6 issue

2022-05-06 Thread Greg Troxel

Ted Mittelstaedt  writes:

> For unrelated reasons I had to turn off IPv6 on my incoming mailserver.
>
> Spam plummeted.  Like by 80% at least.  Both uncaught and caught spam did.
>
> When IPv6 was on, the mailserver had all PTR and  and MX records to
> allow it to receive incoming mail via IPv6.
>
> Something about this seems really wrong.  Any suggestions of where to
> start digging?

Something indeed seems fishy.  I look at uncaught spam to see what I
should tweak on a routine basis, and my impression has been that it's
overwhelmingly either places like gmail (which tend to be delivered over
v6 but would of course come v4 if you don't have v6), or v4.  So being
v4 only and getting 20% of the spam you used to get just doesn't make
sense.

When you "turned off" IPv6, did you change DNS so that doing MX/A/
no longer returned an  record?

Did you notice a reduction in legit mail and an associated increase in
complaints?

When you looked at incoming spam from the time when you had the normal
v4/v6 setup, did you find that most spam arrived over IPv6?

I looked over my own logs.  In the log interval I examined there are
spam counts:

  329 MTA rejects (which I count as 100% spam)
  139 filed as spam by the normal SA standards (>=5)
  26 filed as marginal (>=1 < 5)
  13 filed as ham (<1)

I'm not examining things misfiled as spam that I refiled into ham
folders.  I also skipped about 13 spams misfiled as ham, but on a quick
scan they fit the same pattern.

Looking at the 329 MTA rejects (because that was easiest):

  309   IPv4
   20   IPv6

and of the IPv6:

   4gmail

  13a mailinglist/forwarding host (lists I'm on -- they don't filter
well enough)

   2my own v6 address - need to look into this, but pretty sure it
is external spam logged oddly

   1a v6 address with no rDNS that is probably some compromised
server that happens to have v6 set up.  As far as I can tell
it is some company in .au.

Looking over the 139 >=5 spams, it's mostly v4, and of the v6, once I
exclude google and the same mailinglist, there is only1 v6 address, this
time a random company in .es.

So for me, spam over v6 is very rare, except for mailinglists without
adequately strict filtering and google (which we all know doesn't do a
good enough job of outgoing filtering, but that's not about v6).

Thus, I don't know what to make of your experience; something about it
must be very different and understanding that is likely interesting.


signature.asc
Description: PGP signature


Re: Microsoft to block Office VBA macros by default

2022-03-15 Thread Greg Troxel

Alex  writes:

> I'm just curious if this announcement has changed anyone's thinking
> about how we should be handling docx/xlsx/etc attachments in email?
> This obviously doesn't prevent someone from emailing a document with a
> malicious macro, but is this going to provide sufficient protection
> once a potentially malicious document is received to relax email
> protections a bit?
>
> https://www.theverge.com/2022/2/7/22922032/microsoft-block-office-vba-macros-default-change
>
> Are you outright blocking these attachments? Perhaps you're only
> blocking those with macros?
>
> Is the ExtractText plugin good enough to extract potentially malicious
> links to be checked?

Can you explain your thinking on the causal link and timeline from an
announcement to 99.999% of actual windows systems having updated code
that behaves this way?

The article says

  "The change will apply to Office files that are downloaded from the
  internet and include macros"

which implies that other files - which may or may not have arrived in
mail - might be treated differently.

It talks about Office 365.   It doesn't say anything about old,
unmaintained copies of Office on XP.


I don't see any reason it makes sense to to lighten up on protections.


signature.asc
Description: PGP signature


Re: how sendgrid is abusing the ukraine crisis (or they are still to dumb to filter for spam)

2022-03-04 Thread Greg Troxel

Bill Cole  writes:

> On 2022-03-04 at 09:18:08 UTC-0500 (Fri, 04 Mar 2022 09:18:08 -0500)
> Greg Troxel 
> is rumored to have said:
>
>> Greg Troxel  writes:
>>
>>> With stock scores, sendgrid gets
>>>
>>>  2.1 URIBL_GREY Contains an URL listed in the URIBL greylist
>>> [URIs: sendgrid.net]
>>>  1.5 KAM_SENDGRID   Sendgrid being exploited by scammers
>>>
>>> and I find 3.6 a bit much.

(sorry, URIBL_GREY is only 1.1, so that's 2.6 between them)

> Note that those are quasi-independent rules. URIBL looks at all of the
> URIs in a message. KAM_SENDGRID only hits mail transferred through
> Sendgrid where the From header and envelope sender addresses are in
> unrelated domains.

I meant only that I find that for this particular sender, both rules
hit.

> I may be wrong, but I do not believe that all Sendgrid ham will hit
> either of those rules, although much surely will hit both. The KAM
> rules don't go through QA that would reveal their overlap/independence
> as the stock rules do, so there's not a good way that I can check.

I am unclear on if KAM_SENDGRID is supposed to hit on legit mail from
sendgrid; it is for this particular class of ham.  It sounds like you
think at least some sendgrid ham will hit this.

Return-Path: seems like it matches __KAM_SENDGRID1A, Received looks like
it matches __KAM_SENDGRID2, and the From: is from the government
office's domain.

>>> But maybe 72% of what sendgrid sends is
>>> spam?  (Knowing the spam % is actually a serious question.)
>>
>> sorry, didn't quite get back to stock for that  test, so I think it's
>> only 1.1+1.5=2.6, so tuned for 52% spam...
>
> FWIW, that is NOT how the math works for score determination. Even for
> the stock rules which get programmatically adjusted as a set, that's
> not a "tuning" target that would be useful or even have a calculable
> solution.

Sorry, I do know that, but what I was trying to get at, and did so
badly, was that if a rule has a score of 2.5, then I would expect that a
fairly large amount of the messages that trigger it would be spam.
Otherwise, I would expect that score to be reduced by the tuning
algorithms.

> The rule score tuning doesn't really pay any attention to aggregate
> score values except in >/< relation to the threshold. If 100% of a
> sender's mail is ham that just happens to score 4.2, that's great. If
> it is 100% spam, all scoring 5.2, that's also great. If it is a 50/50
> mix that SA scores perfectly at either 4.2 or 5.2, that would be
> astoundingly good. Message scores do NOT have a score distribution
> that can be approximated by any combination of statistically useful
> distributions which could support the sort of score arithmetic you are
> positing.

I see your point but it would be interesting to see the %spam data (out
of some background ham/spam a priori rate) per rule, somehow in a
scatter plot with score.

Also given how things are, if ham scored 4.2 it would take very little
in terms of a 1-point rule or 2 x .5 rules triggering vs not to push it
over.  So while 4.2 is a good score for ham in the metrics, it's not in
my view a good score for a ham message viewed over the ensemble of other
things that are likely to happen.

All I'm really trying to say is that ham getting 2.5 from one rule moves
it halfway to threshold, where it gets marked as spam if the rest of the
rules give it >=2.5.

> I wish Justin had originally made the base score -5 and the threshold
> 0. It's 20 years too late to fix that, but it would have made it
> easier for people to avoid wrong mathematical assumptions about the
> value of the aggregate score of a message.

I do know how scores are determined for the base ruleset (and above you
said that the KAM scores aren't determined that way, I think).

And I know it's against doctrine, but I find that the odds of spam
change from near 0 at -2 to near 1 at >=4.  Just above about 2, its
roughly 50%, and it's not linear.  Because of that I treat 3 different
from <1, putting 3 in a maybe-spam folder not allowed to show up on my
phone.  I know that's not how SA's "was this message scored
correctly" is defined, but I find this sort of sorting very useful.

The message in question did actually get to 5.0.  I've tweaked scores,
up and down, so I know that doesn't technically count.


signature.asc
Description: PGP signature


Re: how sendgrid is abusing the ukraine crisis (or they are still to dumb to filter for spam)

2022-03-04 Thread Greg Troxel

Greg Troxel  writes:

> With stock scores, sendgrid gets
>
>  2.1 URIBL_GREY Contains an URL listed in the URIBL greylist
> [URIs: sendgrid.net]
>  1.5 KAM_SENDGRID   Sendgrid being exploited by scammers
>
> and I find 3.6 a bit much.  But maybe 72% of what sendgrid sends is
> spam?  (Knowing the spam % is actually a serious question.)

sorry, didn't quite get back to stock for that  test, so I think it's
only 1.1+1.5=2.6, so tuned for 52% spam...


signature.asc
Description: PGP signature


Re: how sendgrid is abusing the ukraine crisis (or they are still to dumb to filter for spam)

2022-03-04 Thread Greg Troxel

CC: trimmed as my message is not an abuse report.

You asked about outright blocking, but you didn't ask if people thought
that was wise.

I received a piece of ham today, and the received line added by my MTA is:

  Received: from o1678989x80.outbound-mail.sendgrid.net 
(o1678989x80.outbound-mail.sendgrid.net [167.89.89.80])

This was a legitimate message from an agency of a local government, and
solidly ham.

I'm not going to claim that sendgrid is or isn't ok -- I don't
personally have any data.But it's clear that at least one legitimate
entity uses them and that I receive some ham from them.

With stock scores, sendgrid gets

 2.1 URIBL_GREY Contains an URL listed in the URIBL greylist
[URIs: sendgrid.net]
 1.5 KAM_SENDGRID   Sendgrid being exploited by scammers

and I find 3.6 a bit much.  But maybe 72% of what sendgrid sends is
spam?  (Knowing the spam % is actually a serious question.)

I find ham misfiled as spam just due to sendgrid is fairly rare, and I
just welcomelist them.  So that's probably a clue that I get little ham
from sendgrid.

But an outright block doesn't seem like a good idea.  It certainly would
result in me losing ham.



signature.asc
Description: PGP signature


false hits on FORM_FM

2022-02-27 Thread Greg Troxel
This morning i found a lot of ham in my maybe-spam inboxes (1-4 points).
I found that this rule was hitting:

*  4.0 FROM_FMBLA_NEWDOM From domain was registered in last 7 days

and the common pattern in the messages was that the From: addresses were
all @gmail.com.  All of the messages were normal legit messages, some on
weewx-users list, and some were commit messages from pkgsrc-wip.

I had earlier upped the score of this rule as I found it to work very
well.

(Yes, I know that doesn't count as an FP under strict SA doctrine, esp
since I had upped the score.  But it's still wrong for FROM_FMBLA_NEWDOM
to fire on gmail.com which is... not new.)

I reran SA on one message just now, and it scored normally, with no
FROM_FMBLA_NEWDOM hit.

This seems to be fresh.fmb.la as described:

  https://fmb.la/pages/about


So I wonder if anybody else got a bunch of incorrect hits from fmb.la?



Re: OT - Hotmail/Outlook.com marking most of our email as Junk

2022-02-19 Thread Greg Troxel

Cian ApacheBugzilla  writes:

>> However, the shared IP comment is worth paying attention to
>
> Ah, so you think I should get a dedicated IP?  I had read mixed things

I meant tha you should understand what's going on.

> I'm a little confused which way you mean this.  If I understand
> correctly, positive points in SA are bad, but you are subtracting
> points with your rule.  Are you saying you get less spam from domains
> listed by RP?

Sorry, I pasted the wrong line.  There have been multiple returnpath
listing levels, and they have changed over the years, and the names have
changed.  I didn't keep notes, but my score adjustments are to give a
small positive score (2) to the lower of the VALIDITY rules, and a
negative to the higher one.  Keep in mind this is based on what arrived
in my mailbox, and possibly a very long time ago.

score   RCVD_IN_VALIDITY_SAFE   2   # was -2
score   RCVD_IN_VALIDITY_CERTIFIED  -2  # was -3

The real point is to evaluate what arrives and realize that all RBLs,
positive and negative, need to be assessed.

> Anyway, I might have to migrate my domain and get a dedicated IP
> first, but with a dedicated IP, I could try to get on dnwsl.org.  I
> imagine that would have a similar effect, minus having to spend money,
> right?

dnswl.org has NONE/LOW/MED/HIGH.  NONE is mostly for skipping
greylisting, and I'm not sure what it takes to get on LOW and MED.  I
know it takes a lot to get on HIGH.   Unlike companies that take money
for listing, I have confidence in dnswl to act in the interests of
receivers that use their RBL.

>> "listed as contacts" sort of sounds like you are spamming...
>
> Honestly, I wasn't sure how to explain this without going down the
> rabbit hole of explaining my whole situation.  Any contact I'm
> "cold-emailing" is coming from a page with text such as:

I see, so that sounds ok.

>> That is probably your entire issue.  UCEPROTECT
>
> So you basically recommend switching from NameCheap, to a registrar
> that isn't listed on UCEPROTECT_LVL3?  A part of me hates to do it,
> since I'm basically validating UCEPROTECT's philosophy, but it'll cost
> me, what, $8?  I could live with that, if this is your number one
> suggestion.  I'll even write to NameCheap and tell them this is why
> I'm leaving.

I am not saying what you should do.  My point is that you do not seem to
truly understand what is going on (fair enough, the world is opaque and
complicated) and that understanding it is good.

>> .space is Widely Regarded as Sketchy
>
> I still have pretty strong feelings about this, but that's a debate
> for a different time and a different thread.  Would you say I am
> likely to solve my problem without changing domain names?

I really don't know, but using a domain name that leads to people giving
it spam points seems like an uphill battle.

>> As for your "domain", also look up the IP address your mail comes from, 
>> because that's more important.> A lookup service I have found useful is:
>
>>https://multirbl.valli.org/
>
> Ok, actually, I got some interesting results for 136.143.188.53, which
> is a Zoho server I have apparently sent mail from.  Some blacklists,
> some yellow lists, some whitelists, and a bunch of blue and red.  Do
> you think Zoho is the bigger problem than NameCheap?

I said you should understand if you have a shared IP, and *who else is
sharing it*.  When they spam, it gets the IP on lists, which causes you
trouble.

It looks like spam comes from that IP address.

> My takeaway here is that I should be switching registrars, I should
> probably pay for a dedicated IP address, and once I'm getting a
> dedicated IP, anyway, I should try to get on a whitelist (probably
> dnswl, unless you have a better suggestion).  Would you agree with
> that summary?  Or do you think Zoho is the more likely problem?

I really don't know, and you may need a consultant to help you figure it
out.  In general, I recommend not sending mail from an IP address that
other people send spam from, and not dealing with companies that provide
any kind of services to spammers.


signature.asc
Description: PGP signature


Re: OT - Hotmail/Outlook.com marking most of our email as Junk

2022-02-18 Thread Greg Troxel

Your mail is in html.  That will get it some points; I suggest
text/plain :-)   Many will say I'm just being a curmudgeon about
this.  Attempting to recover content and continuing:

Cian  writes:

  > I am also having a world of trouble getting my emails to Outlook
  > users.  For reference, my work domain has one user (me).  I have had
  > the account for about 9 months and I have not yet sent 100 emails.
  > I typically send an email to a single recipient, although I will
  > occasionally CC a handful of people. What I’ve tried: I have also
  > set up SPF, DKIM, and DMARC.  I’m *pretty sure* they’re solid.
  > Emails still go to junk.Initially, I didn’t have anything actually
  > at the website for my domain, so I threw my executive summary into a
  > google site.  Emails still go to junk I've checked our public IP and
  > the domain name at mxtoolbox.com – no errors, but it warns that a)
  > my DMARC policy isn’t q or r, and b) it doesn’t care for my SOA
 
I have only run into one scoring technique that complained about the
lack of a website at the sending domain.  I think that's totally
ridiculous.  There is no reason that mail from john...@example.com
should be considered suspect because trying to resolve example.com to an
A record and connecting to 80 or 443 fails.  Email predates the web by a
very long time.

  > I tried to get on Microsoft’s SDNS and JMRP, but I was not able.  I
  > am pretty sure I have a shared IP, but I don’t know how I would
  > check that.

This is the first I've heard of SDNS and JMRP.  However, the shared IP
comment is worth paying attention to.  Basically, much reputation is per
IP address, and hosting plans that put lots of customers on a single IP
address cause them to be affected by each other's behavior in terms of
blocklists.  That is not good for you.

  > Microsoft also suggested I join the Return Path Safe Senders
  > program, but I am pretty sure I would need a dedicated IP for that.

I give positive points for some RP categories as I have the vague
impresssion that it its, for mail that arrives at my server, correlated
with spam:

score   RCVD_IN_VALIDITY_CERTIFIED  -2  # was -3

  > In any case, I don’t love the idea of paying to get whitelisted so I
  > can send 11 emails a month.I’ve checked several sites and my domain
  > isn’t on any blacklists.

Generally the view in the open source spam world is that a list that
asks you to pay to get good treatment (removal from blacklistt, addition
to whitelist) is unethical.

As for your "domain", also look up the IP address your mail comes from,
because that's more important.  A lookup service I have found useful is:

https://multirbl.valli.org/

  > However, I did register the domain through NameCheap, which is on
  > the UCEPROTECT_LVL3 list.

That is probably your entire issue.  UCEPROTECT is at best
controversial.  Take everything you read on the internet with a grain of
salt, but just earlier today I had cause to read about this.

https://en.wikipedia.org/wiki/Comparison_of_DNS_blacklists
https://www.linode.com/community/questions/20952/linode-blacklisted-on-uceprotect-rbl
https://blog.sucuri.net/2021/02/uceprotect-when-rbls-go-bad.html

  > The domain is relatively new, as I said, but I don’t send any bulk
  > mail of any kind from it.  All mail is either to people I
  > specifically know, people to whom I have received a personal
  > introduction, or people listed as contacts for their organization on
  > public websitesMy mail is handled by Zoho Mail, so I haven’t done

"listed as contacts" sort of sounds like you are spamming...

  > anything fancy with the mail server.  If there’s anything I should
  > try, I will, but I might need the instructions at a fifth-grade
  > levelI am fairly careful with my words, and the emails are
  > appropriately long, so I would be surprised if they were getting
  > flagged for trigger words.  I have tried mail-tester.com and it did
  > not object to the body of my emailsMail-tester.com claims to test
  > emails against SA, although I know this is a contentious point
  > around here.  I bring it up, though, because the fact that my TLD is
  > “.space” raised some flagsWhen I have called my contacts, they have

.space is Widely Regarded as Sketchy.

  > been as confused as I am that they did not receive my emailsEmails I
  > send to any other domains are never a problem spam-wise Notes: I do
  > not have a list-unsubscribe header in my emails, for one because I
  > don’t have a list, and for two, because I don’t really know how.  I
  > can add one if necessary, although ideally I’d like the language to
  > be clear that my emails don’t go to a list of any kindI have a
  > signature in my email.  It has my phone number, but no address
  > because I don’t have a physical location yet.  Some articles
  > suggested this is bad; I hate to put my home address in all my
  > emails, but I can if necessary.  It’s in my Dun and Bradstreet
  > profile, anywayMy domain contacts 

Re: False "bad domain" positive

2022-02-16 Thread Greg Troxel

Alan  writes:

> I've got someone who posted text from MS Office into an email (wish I
> could ban that). The text contained a numbered list. The fourth list
> item started with "Date & Time". The 4 and following period were in a
> span element with a margin to separate it from the text but no actual
> whitespace, so the plain text version comes up as (I've used {dot} to
> avoid another trigger) "4{dot}Date & Time". This then triggered :

Wow, that's funny.  But agreed it's ham...

>   2.0 PDS_OTHER_BAD_TLD  Untrustworthy TLDs [URI: 4{dot}date (date)]

This seems reasonable.  2 points is not a killer rule and that probably
would not have messed up delivery.

>   5.0 KAM_SOMETLD_ARE_BAD_TLD .stream, .trade, .pw, .top, .press, .bid & 
> .date TLD Abuse

That's the KAM ruleset, not base, and given that it's an add-on rule I
see that as effectively "the base rule should be scored 7" (at least for
the domains that overlap).

I suspect though that the rule/score are almost entirely right in terms
of probability, for uses of those tlds as domains.  They all sound
sketchy.

> Thus consigning a meeting agenda to the trash. I suspect this is an
> uncommon but not rare false positive.
>
> These rules would benefit from excluding single character domain
> matches (which IIRC would be invalid domains anyway). A this sort of
> FP would be avoided. For bonus points excluding three-character roman
> numerals under 10 (iii, vii, etc.) would be useful too.

My own view is that no rule should be scored above about 3 unless it is
vanishingly unlikely that the rule will fire on legit mail (even if the
legit mail is messed up in ways that actually happen to legit mail).
That's a different opinion than the one encoded in the KAM ruleset
socres, which I interpret as saying that it's ok to have a few FPs if
that's the price of getting rid of some nasty phishing/malware and a lot
of spam.

You need to think about your own needs on how to tune that FP and
effectiveness tradeoff, and if you're not willing to live what I
consider a little dangerously on FP risk then the KAM ruleset is not for
you.  I run it personally, and I find problems with rules that have very
high scores hitting ham, maybe once a month or every few months, and I'm
accumulating downscoring config.  But it saves me from a vast amount of
spam, I think.  I would be very nervous if I were configuring it for
lots of others, but I have the luxury of not having to admin mail for
more than myself and family.

My current config, in case you want to look at these rules and see what
you think.  Beware that the below is tuned to my personal ham; I'm on
mailinglists where people occasionally discuss voicemail and watches.  I
no longer remember all the reasons, but surely it was that the rule
fired on ham.

score   KAM_UNIV2   # was 4.5
score   KAM_SOMETLD_ARE_BAD_TLD 2   # was 5
score   KAM_FAKE_DELIVER3   # was 6.25
score   KAM_SHORT   0.5 # was 2, can't figure out why 
it fires
score   KAM_LIST3_1 3.8 # was 5.8
score   KAM_TIME0.1 # was 3.0, FP on time-nuts
score   KAM_SENDGRID0.3 # was 1.5, but now URIBL_GREY
score   KAM_ASCII_DIVIDERS  0.1 # can't figure out why it fires

score   KAM_MARKADV 5   # was 10
score   KAM_VM  3   # was 5



signature.asc
Description: PGP signature


Re: Add header, not beginning with X?

2022-02-14 Thread Greg Troxel

"joea- lists"  writes:

> Nutshell: I want to add "Reply-to: (some address)" to messages without same.  

Please do explain why.  It sounds like a clear standards violation
because Reply-To may only be set by the sender.

> While it seems feasible to do this in postfix, I wanted to explore
> doing it with minimal fuss in SAm or if a FILTER or MILTER might be
> required.
>
> So far I've only found "Basic Message Tagging Options".

What you are wanting to do does not seem related to spamassassin's
mission, so I think it is probably best to avoid trying to make
spamassassin doin git.


signature.asc
Description: PGP signature


Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread Greg Troxel

John Hardin  writes:

> On Mon, 7 Feb 2022, Greg Troxel wrote:
>
>> and then I got a reply back with the content he was trying to send etc.
>> But, it had:
>>
>>  *  2.5 CONTENT_AFTER_HTML More content after HTML close tag
>>
>> but one was only text/plain and I could see nothing wrong.   reading
>> 72_active.cf I found:
>>
>>  rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i
>> which fires on a text/plain part that discusses html formatting!
>
> Ah, I'll see if I can add something to that so it only fires when
> there's an actual HTML body part. Thanks for the report.
>
> Pity there's not an "htmlbody" rule type...

Agreed - I think the way you are trying to tighten is correct.



signature.asc
Description: PGP signature


CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-07 Thread Greg Troxel

(Instances of html have been changed to htnl in this message to
avoid tripping the rule I'm talking about.)

A legit message arrived at my server, for me and another user, and it
scored 8 for them and I think about 11 for me.  This is really unusual.
The big issues were:

  Sent by sendgrid: points from KAM and from URIBL_GREY both, each
  reasonable separately and I think URIBL_GREY newly lists sendgrid.

  From: was someone's (class teacher) gmail address, but it got sent out
  via sendgrid via a schoool, and there was no DKIM, so it lit up all
  sorts of FREEMAIL_FORGED, From:/env mismatch with freemail, ought to
  have DKIM from google and doesn't.

So I wrote to the person because they probably had no idea, and exlained
the above and added some other "deliverabilty hygiene" :-) comments:

> with more minor issues:
>
>The message is html only, rather than also having text/plain.
>
>The message body doesn't have enclosing   tags, so it is
>malformed.

and then I got a reply back with the content he was trying to send etc.
But, it had:

*  2.5 CONTENT_AFTER_HTML More content after HTML close tag

but one was only text/plain and I could see nothing wrong.   reading
72_active.cf I found:

  rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i 


   
which fires on a text/plain part that discusses html formatting!

So I'll be reducing that score...


signature.asc
Description: PGP signature


Re: getting spamass-milter to work with remote spamd (on CentOS8)

2022-02-06 Thread Greg Troxel

Marc  writes:

>> On 06.02.22 14:02, Marc wrote:
>> >Thanks! Got it to work with this:
>> >EXTRA_FLAGS=" -D xx.xxx.xxx -- -p 34219"
>> 
>> the man page for spamass-milter says:
>> 
>>  -D host
>>  Connects to a remote spamd server on host, instead of using
>> one
>>  on localhost.  This option is deprecated; use -- -d host
>> instead.
>> 
>> so, 1. it's deprecated, 2. only uses host.
>> 
>
> It is not deprecated and -d is for debug.
>
> in source:
> 307   cout << "   -C RejectCode: using this Reject Code." << endl;
> 308   cout << "   -d xx[,yy ...]: set debug flags.  Logs to syslog" 
> << endl;
> 309   cout << "   -D host: connect to spamd at remote host 
> (deprecated)" << endl;

See the word deprecated in the previous line.


> 310   cout << "   -e defaultdomain: pass full email address to spamc 
> instead of just\n"
> 311   "  username.  Uses 'defaultdomain' if there was 
> none" << endl;
> 312   cout << "   -f: fork into background" << endl;
> 313   cout << "   -i: skip (ignore) checks from these IPs or 
> netblocks" << endl;
>
> on centos8
>
>  ~]# spamass-milter -h
> spamass-milter: invalid option -- 'h'
> spamass-milter - Version 0.4.0
> SpamAssassin Sendmail Milter Plugin
> Usage: spamass-milter -p socket [-b|-B bucket] [-d xx[,yy...]] [-D host]
>   [-e defaultdomain] [-f] [-i networks] [-m] [-M]
>   [-P pidfile] [-r nn] [-u defaultuser] [-x] [-a]
>   [-C rejectcode] [-R rejectmsg] [-g group]
>   [-- spamc args ]

Understand the difference between "-d" and "-- -d".

>-p socket: path to create socket
>-b bucket: redirect spam to this mail address.  The orignal
>   recipient(s) will not receive anything.
>-B bucket: add this mail address as a BCC recipient of spam.
>-C RejectCode: using this Reject Code.
>-d xx[,yy ...]: set debug flags.  Logs to syslog



signature.asc
Description: PGP signature


Re: Hits on item with " No description available"

2022-01-20 Thread Greg Troxel

I followed my own advice about egrep -R and found this immediately

it's in

3.004006/updates_spamassassin_org/72_active.cf

and it is

##{ FSL_HELO_NON_FQDN_1
header  FSL_HELO_NON_FQDN_1 X-Spam-Relays-External =~ /^[^\]]+ 
helo=[a-zA-Z0-9-_]+ /i
##} FSL_HELO_NON_FQDN_1

with score

score FSL_HELO_NON_FQDN_1 2.361 0.001 1.783 0.001


signature.asc
Description: PGP signature


Re: Hits on item with " No description available"

2022-01-20 Thread Greg Troxel

"Joe Acquisto-j4"  writes:

> Where can I get some idea of what the rule below actually checks for?   I 
> noticed some normally passed email was flagged as SPAM.  
>
> Started seeing it sometime after making some configuration changes to local 
> settings on postfix, attempting to isolate a "bug".   But before reverting 
> them all, or one at a time, I'd rather have a clue.  Semi-informed hacking 
> about can be problematic.   
>
> X-Spam-Checker-Version: SpamAssassin 3.4.5 (2021-03-20)
>
> *  1.8 FSL_HELO_NON_FQDN_1 No description available

cd /var/spamassassin

egrep -R FSL_HELO_NON_FQDN_1 .

Find the rules file and read it.


(rules may be someplace else on your system; that's where they are on
mine)



signature.asc
Description: PGP signature


Re: Txrep, add-addr-to-whitelist

2021-12-16 Thread Greg Troxel

Hey Peter: Your mailserver appears to be a bit aggressive and is
blocking mail from people on the list who are replying to you:

  : host acemail1.ace.net.au[150.101.236.36] said: 553 5.3.0
  Rejected 71.19.148.97 by clients-b.blocked.rbl (in reply to MAIL FROM
  command)

multirbl.valli.org shows no issues, and I'm even in DNSWL_MED.



signature.asc
Description: PGP signature


Re: Txrep, add-addr-to-whitelist

2021-12-16 Thread Greg Troxel

"Peter"  writes:

> New to TXrep, the manual says the add-addr-to-whitelist command should add
> -100, but for me it doesn't do anything - nor does add-addr-to-blacklist.
>
> It comes back with SpamAssassin TxRep: 1  with either the white or
> blacklist.
>
> While the server is new, I want to be able to adjust a senders score, but
> don't want to make new rules which will be there forever.
>
> Am I missing something?

I have also struggled with understanding what's going on, and tried to
run some scripts to look at the database.

If what you want is to preload a good reputation that will then be
subject to adjustment, you probably want to write a program to adjust
the database and say put in fake data that the average score is -5 over
10 messages, and then let it go.

Beware though that txrep is per sender, per sender/addr pair, and other
things (I forget the details), and so you may need to put in more fake
data than you are willing to put up with.

If you are putting in -100 and it stays, I don't see why that should be
about txrep rather than just fixed huge scores for addresses.

I think what SA needs is a welcomelist command that takes a score, or
maybe just a welcomelist_mild that gives -5.  I'm uncomfortable putting
tons of addresses in welcomelist without dkim or rcvd, but -5 might be
ok, enough to keep ham out of my purgatory folders (>1 < 5) while also
keeping any forged spam out of inbox (<1).


signature.asc
Description: PGP signature


Re: SPF_NONE scoring

2021-11-30 Thread Greg Troxel

Philip Prindeville  writes:

> I'm looking at the 0.001 scoring for SPF_NONE and scratching my head.  This 
> was discussed a bit in early 2015, but maybe it needs revisiting with new 
> perspective.
>
> Surely no one who cares about maintaining their reputation by
> protecting themselves against spoofing would fail to provide SPF
> records...  So how is this score arrived at?
>
> And of Ham, how much of it has a valid SPF?
>
> And of Spam, how much of it lacks a valid SPF?
>
> Has anyone run some numbers?

I see 0.001 as a score that says: this might be a spam sign, we don't
know, and this way it shows up in reports, without really affecting
anything.

Lots of people think SPF is silly.  And spammers spamming from a domain
they control can even dkim/dmarc.   So I agree that actual data would be
interesting.


signature.asc
Description: PGP signature


Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Greg Troxel

Philip Prindeville  writes:

>> That looks very familiar.  I was having timeouts, and saw that in the
>> logs, on certain messages.  I ended up nuking and rebuilding my TXREP
>> database and then things were  ok.
>> 
>> That doesn't explain why we can't find the rule, which is a good
>> question.
>
> Where is the TXREP database?

For me, it's in:

  $HOME/.spamassassin/tx-reputation

> Also, is it possible that the name is generated through some sort of
> mangling, the way that C function names can be generated from macro
> expansions, etc?

Almost anything is possible...


signature.asc
Description: PGP signature


Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Greg Troxel

Philip Prindeville  writes:

> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>
> Why can't I even find the rule?

That looks very familiar.  I was having timeouts, and saw that in the
logs, on certain messages.  I ended up nuking and rebuilding my TXREP
database and then things were  ok.

That doesn't explain why we can't find the rule, which is a good
question.



signature.asc
Description: PGP signature


Re: Fw: spam from gmail.com

2021-11-12 Thread Greg Troxel

Arne Jensen  writes:

> Den 11-11-2021 kl. 20:21 skrev Greg Troxel:
>> It's a really interesting question what DNSWL_MED ought to be for score.
>> Given what MED is supposed to be:
>>
>>MediumRare spam occurrences, corrected promptly.
>>
>> -2.3 points seems entirely reasonable.
>>
>> But I don't see how gmail makes sense being medium, as spam from gmail
>> is not rare.  Probably it happens to me every day.  NONE seems more
>> appropriate, especially since I have no perception of google making a
>> serious attempt to avoid emanating spam.  (I realize this comment
>> belongs on the DNSWL list, but for now I'm not bothered personally
>> because the v6 addrs aren't listed.)
>
> Google (Gmail) is not, and have never been on medium.
>
> Last score change on Google's addresses, was in June 2018, demoting
> the last remaining ones from "low" to "none".
>
> Are you by any chance forwarding traffic from one server to another,
> and/or potentially missing something in your trusted_networks and/or
> internal_networks? This one is *very* common.

Sorry for being fuzzy. What I meant, and didn't say clearly, is:

  I get a lot of spam from gmail (that is properly DKIM signed and
  passes SPF).  I'm not seeing any of it get tagged as coming from
  DNSWL_MED.

  Having seen other people claim that google servers are on MED, I was
  opining that this didn't make sense.  (It seems that everybody agrees
  that it doesn't make sense and also that it has never been true.)

> Checking up with DNSWL is actually done by checking the first server
> in reverse order, that your own server does not trust, so if the
> inbound message you see was sent from Gmail, relayed over your
> friend's server (which is/was at medium), and then finally hitting
> yours, and that you do not have set your friend's server as one of
> your trusted ones, the DNSWL check will be done on your friend's
> server, ending up with flagging the message as medium.

For me, the trickiness is in mailinglists, especially when they are set
up without restrict-to-list-member and without good filtering.   So I
have put their addresses into trusted_networks.   This isn't quite the
same as someone MX-catching for me, but I think it works out the same.

Greg


signature.asc
Description: PGP signature


Re: Fw: spam from gmail.com

2021-11-12 Thread Greg Troxel

Arne Jensen  writes:

> Den 12-11-2021 kl. 00:43 skrev Loren Wilton:
>> I have to admit I'd never paid much attention to the RCVD_IN_DNSWL_*
>> scores on spam before.
> [...]
>> Looking at spam for last month, [...]
>>
>> But I do have 12 pretty blatent spams that hit RCVD_IN_DNSWL_HI.
>> It makes me wonder just how useful a rule it is.
> A pretty blatant misconfiguration of a mail server (and/or the system
> running same), can unfortunately lead to various negative side
> effects.

Loren might want to check about spam received by mailinglists.   I have
seen spam sent to lists and then delivered to me, so that it arrives
from the MTA of the org running the list.   Adding that to
trusted_networks moves the check points earlier and avoids treating
the mail as good because it came from the list.

Of course, it would be better if the list were set up for both spam
filtering and rejecting non-member posts, and machines that host lists
that send spam probably aren't in DNSWL anyway.


Thanks for all the confirmations for what isn't listed.  I have always
had the view that DNSWL runs a tight ship (and fairly too), and I
continue to feel that -2.3 for MED is a reasonable score.


signature.asc
Description: PGP signature


Re: Fw: spam from gmail.com

2021-11-11 Thread Greg Troxel

Philipp Ewald  writes:

> You can report it. Gmail is on DNSWL
>
> @gmail.com>
> RCVD_IN_DNSWL_MED=-2.3
>
> https://www.dnswl.org/?page_id=17

I tried to find gmail being on DNSWL_MED and I haven't been able to.
There are google.com servers on DNSWL_NONE.

Can someone explain what addresses are

  part of gmail
  being used to deliver spam
  on DNSWL_MED

?


I went over my mail, looking for recent spam with DNSWL_MED, and also
ham.  I did find 3 messages that hit DNSWL_MED that were outright spam,
and etiher those places had a rare compromise or should be listed lower.
But I also found a large amount of ham with MED.

So from my viewpoint, the issues I see with DNSWL_MED are very minor,
and I am ok with the default score.

Thanks all for the discussion as I will probably try harder to report
FNs due to DNSWL now.


signature.asc
Description: PGP signature


Re: spam from gmail.com

2021-11-11 Thread Greg Troxel

Bill Cole  writes:

>> I've ended up giving a point each to FREEMAIL_FROM and TO_GMAIL, which
>> sort of nulls that out.
>
> Also: the DNSWL rules in the default ruleset are mis-scored, based
> apparently on a Perceptron run early in the history of SA and DNSWL. I
> don't know exactly how to fix this at the distribution level because
> the RuleQA system can't cope well with possibly labile network
> reputation rules. The effect of this is that the DNSWL rule scores are
> not routinely rescored. The fact that they've had the same scores for
> ~10 years means that they are probably a fixed basis for static local
> rules in many places. We don't want to disrupt anyone's working system
> by changing the default scores.

It would be interesting to know what they would be set to, if there
weren't the concern of things built on them.

> With that said, I don't think anyone should use the RCVD_IN_DNSWL*
> rule scores just because they are the default scores.

I see your point that you think the defaults are bad, but it also seems
awkward that basically every SA user be expected to change them.

> Locally I use this:

> score  RCVD_IN_DNSWL_LOW 0.8
> score  RCVD_IN_DNSWL_MED  -0.2
> score  RCVD_IN_DNSWL_HI  -2
>
> Those are NOT based on any formal analysis, but simply on my
> eyeballing a bunch of local stats and heuristically picking values,
> because I'm a bozo...

Sure, I use that process myself, and that's fine because I have to
answer to a tiny number of people.

FWIW, I haven't really found a lot of problems from DNSWL.  I file <1
into INBOX, >=1 to >=5 into spam.[12345], and accept that .spam.1 is
going to have a lot of FPs as the cost of keeping FNs out of INBOX.
That's of course contrary to doctrine, but it means that I look over any
spam that makes it to INBOX carefully and I just haven't been seeing
DNSWL_MED on spam very often.

My view is that if -2.3 on DNSWL_MED leads people to want to change the
score, that's a clue that there are things in MED that should not be
listed.

>> It would be really nice if there were an easy way to exclude a domain
>> from whitelist checks.
>
> So, for the internal default "whitelist" this exists: unwhitelist_from (see 
> 'perldoc Mail::SpamAssassin::Conf')
>
> It is easy enough to construct rules that counteract DNSWL or other
> external reputation sources, and the addition of ad hoc internal lists
> (WLBLEval plugin) in 3.4.x makes it possible to do so in a
> well-structured manner. Basically, you can create a list of domains
> that should NOT get any DNSWL bonus and use a meta rule to counteract
> that bonus. This isn't quite the same as excluding domains from a
> check entirely, but you can get the same effect.

Thanks - I realize I could do this somehow, but it feels fragile to have
all these matching inverse points.   I also realize writing the feature
I want is a bunch of code and that I haven't attached a patch.



signature.asc
Description: PGP signature


Re: Fw: spam from gmail.com

2021-11-11 Thread Greg Troxel

Matus UHLAR - fantomas  writes:

>>>It would be really nice if there were an easy way to exclude a domain
>>>from whitelist checks.
>
> On 11.11.21 17:24, Benny Pedersen wrote:
>>add
>>
>>freemail_whitelist gmail.com
>>
>>to local.cf
>>
>> its not a whitelist, more a skip gmail.com as a freemail if that
>> changes anything
>>
>> i begin to add score more then default score to freemail hits, with
>> imho is more desireble then class it not freemail
>
> i guess this just disables detection of fake reply-to which is I believe
> exactly opposite of what OP needs.

yes, what I really want is something like

exclude_from_dnswl  gmail

and then somehow, anything that is somehow from gmail, when the DNSWL
check runs, gets 0 points instead of the default score for medium.
Basically, I want "behave as if gmail is not listed in DNSWL".

This is messy because DNSWL lookups are via IP address.   However, I
just looked back at some of my incoming mail, and it seems google is
delivering to me over IPv6 and the v6 addresses of their sending MTAs
are not in DNSWL.


It's a really interesting question what DNSWL_MED ought to be for score.
Given what MED is supposed to be:

  MediumRare spam occurrences, corrected promptly.

-2.3 points seems entirely reasonable.

But I don't see how gmail makes sense being medium, as spam from gmail
is not rare.  Probably it happens to me every day.  NONE seems more
appropriate, especially since I have no perception of google making a
serious attempt to avoid emanating spam.  (I realize this comment
belongs on the DNSWL list, but for now I'm not bothered personally
because the v6 addrs aren't listed.)


signature.asc
Description: PGP signature


Re: Fw: spam from gmail.com

2021-11-11 Thread Greg Troxel

Philipp Ewald  writes:

> You can report it. Gmail is on DNSWL
>
> @gmail.com>
> RCVD_IN_DNSWL_MED=-2.3
>
> https://www.dnswl.org/?page_id=17
>
> As far as i know DNSWL is used by default

I've ended up giving a point each to FREEMAIL_FROM and TO_GMAIL, which
sort of nulls that out.

It would be really nice if there were an easy way to exclude a domain
from whitelist checks.


signature.asc
Description: PGP signature


Re: timeouts on processing some messages, started October 24

2021-11-04 Thread Greg Troxel

I have captured a bad message.   It seems innocuous; it's from me at a
host in my domain, to me, basically

From: g...@foo.lexort.com
To: g...@lexort.com

and has a body "foo", no DKIM headers, just Received, Subject,
Message-Id.


Processing this with my normal config results in the timeout.


I noticed lockfiles for txrep, even though I couldn't figure out that
txrep was involved from' -D all', and turned off txrep in my config
("use_txrep 0" instead of 1).  Then, the message processes in 2s.

When I had txrep enabled, I saw a tx-reputation.lock with a single line
that was a pid of the spamd child process that was accumulating CPU
time.  I also had files like:
  tx-reputation.lock.bar.lexort.com.5023
where that was another pid, and this second file seemed to be
accumulating lines.

I did find a stray sa-learn from October and killed it.

Running my spam learning script, which just calls sa-learn with --spam
or --ham (and -L always) is turning out slow, probably from the same thing.

So it sort of smells like one of
  - something is wrong with my txrep database
  - some code is hitting O(n^k) or something
  - there is some strange locking/spinning behavior
  - something else I don't understand, as always
  


Does anyone have pointers to a database export/import script for txrep?


signature.asc
Description: PGP signature


Re: timeouts on processing some messages, started October 24

2021-11-03 Thread Greg Troxel

Bill Cole  writes:

> It would generally be a bad idea to increase the Postfix timeout, as
> that passes the problem back upstream as senders will generally time
> out at 300s as well.
>
> So, add '--timeout-child=295' to your spamd arguments if you want to
> make spamd timeout faster than Postfix reliably.

Thanks; I didn't think of the previous timeout.  Before getting your
mail, I did set my postfix milter timeout to 330s, but the actual delay
was ~301s since the spamd timeout worked.  That resulted in delivery and
the remote system (also postfix) not giving up.   I have since changed
to --timeout-child=290 in spamd and restored postfix to default.

>>   need to figure out why there is a timeout
>
> That's the important part.

I am narrowing the circumstances and will follow up when I figure it out.

>> The first is surely manual reading, but I wonder why it isn't default.
>
> We don't try very hard to guess what users will want in the
> integration details between SA and the tools like MTAs that use
> it. 300s is the SMTP default timeout at end-of-data, which presumably
> is why it is spamd's default. I think it makes sense to reduce that
> for most circumstances, but I'm a bit hesitant to do so in the
> distribution because there could be people relying on the specific
> idiosyncratic behavior of spamd timing out after its caller has given
> up rather than before.

It strikes me that timeouts happening is basically a symptom of bugs and
each layer should be set up to avoid being non-responsive to the calling
layer.  While I see your point about not tuning for what people might
want, it seems that if a system is to meet the 300s SMTP data timeout,
spamd needs to take less than 300s, so going for 290 or 295 seems
sensible.  I would guess, without any real basis, that far more people
are just sitting on latent trouble than really intend to have a milter
callout give up about a second just before spamd.

> The most common reason for SA to hit its internal timeout is the
> combination of a rule with a pattern that can generate a large number
> of backtracks while scanning (exponential or factorial order) and a
> message which causes such backtracking. Typically that's caused by a
> '*' or '+' in a pattern where a fixed range for the number of repeats
> should be used instead. A few years ago we tried to fix all cases of
> dangerous rules in the default ruleset, and I think we succeeded. I
> believe the KAM rules have also been audited for likely problems. If
> you have any unbounded wildcards in your local rules, tightening those
> rules up should be your first step. If you can't find and fix the
> problematic rule by eye, you can get clues about it by scanning a
> problematic message with the "-D all" option to get a detailed rundown
> of what SA does in scanning a message. That will show you what rules
> are checked successfully. You can find a problematic rule by comparing
> that debug output from a bad message to that of a message which
> doesn't hang SA.

Thanks, that regexp hint is a huge clue to me.


signature.asc
Description: PGP signature


Re: timeouts on processing some messages, started October 24

2021-11-02 Thread Greg Troxel

>   postfix is waiting 300s
>   SA thinks it can spend 300s processing
>   postfix gives up 1s before SA is done

The default spamd child timeout is 300s.
The default postfix content milter timeout is 300s.
Each is a reasonable choice, but really postfix's timeout should be
longer.

I set in postfix main.cf: "milter_content_timeout = 330s" and now I
still get spamd child timeouts, but things are better.

So probably we should set the default spamd child timeout to 270s.

A wrinkle is that I realize that I had a learn process running, where I
run over my ham folders and spam folders and run sa-learn -L.  I used to
run that often, and it would take some number of minutes, but this one
had been running for days.  My guess is that it took long as a symptom
of the same bug, vs being a cause, but that remains to be seen.


signature.asc
Description: PGP signature


timeouts on processing some messages, started October 24

2021-11-02 Thread Greg Troxel
I have a systeem with postfix and spamassassin 3.4.6 via spamd.  It's
been generally running well.  I noticed mail from one of my other
systems timing out and 471, and that caused me to look at the logs.  I
have KAM rules, some RBL adjustments, a bunch of local rules for my
spam, but really nothing I consider unusual.

I realized I had DCC enabled, perhaps not correctly, and I just took
that out, since I've never really been clear on how it works and if I
want to use it.


My logs go back to October 3, but starting 24th I have lots of lines like:

  Oct 24 03:23:13 bar spamd[25868]: check: exceeded time limit in 
Mail::SpamAssassin::Plugin::Check::_eval_tests_type9_pri1000_set1, skipping 
further tests 

Looking further, I see

  Nov  1 12:02:01 bar postfix/cleanup[18861]: 6E2D74106C3: 
message-id=<20211031071804.b221b16...@bar.example.com>  
   Nov  1 12:07:01 bar postfix/cleanup[18861]: warning: 
milter unix:/var/run/spamass.sock: can't read SMFIC_BODYEOB reply packet 
header: Connection timed out
  Nov  1 12:07:01 bar postfix/cleanup[18861]: 6E2D74106C3: milter-reject: 
END-OF-MESSAGE from foo.example.com[10.0.0.2]: 4.7.1 Service unavailable - try 
again later ; from= to= proto=ESMTP 
helo=
  Nov  1 12:07:02 bar spamd[23510]: check: exceeded time limit in 
Mail::SpamAssassin::Plugin::Check::_eval_tests_type9_pri1000_set1, skipping 
further tests
  Nov  1 12:07:02 bar spamd[13194]: spamd: clean message (-1.0/1.0) for 
fred:10853 in 300.2 seconds, 2064 bytes.
  Nov  1 12:07:02 bar spamd[13194]: spamd: result: . 0 - 
ALL_TRUSTED,KAM_DMARC_STATUS,TIME_LIMIT_EXCEEDED 
scantime=300.2,size=2064,user=fred,uid=10853,required_score=1.0,rhost=::1,raddr=::1,rport=56983,mid=<20211031071804.b221b16...@foo.example.com>,autolearn=unavailable

so it sort of looks like:

  postfix is waiting 300s
  SA thinks it can spend 300s processing
  postfix gives up 1s before SA is done

  something is causing a delay

and thus I have two problems:

  need to have postfix delay be more than spamassassin delay plus rounding

  need to figure out why there is a timeout

The first is surely manual reading, but I wonder why it isn't default.

On the second, I wonder if anyone else is seeing this, and clues appreciated.

Thanks,
Greg



Re: handle_user and connect to spamd failed

2021-10-18 Thread Greg Troxel

Linkcheck  writes:

>>  instruct spamd to connect to 127.0.0.1
>
> Sorry, I'm not sure where to do that. I've tried as noted in the OP; I
> can't find anywhere else (remembering I've dropped spamfilter.sh).

I'm fuzzy on the details but hope this helps.

What's going on is basically

  spamd might listen on ::1 an 127.0.0.1, or just 127.0.0.1. Sometimes
  things listen on a socket which can get both.Use lsof, fstat,
  etc. to see what spamd is listening on.

  spamc probably does normal name resolution for localhost and gets ::1
  and 127.0.0.1 in that order and tries them.  If the connect to ::1
  fails, it moves on.  No harm done except for a log  line and a few
  wasted cycles

so options are:

  make spamd listen on ::1 as well as 127.0.0.1.  This is arguably the
  right fix, and it shoudl happen out of the box.

  tell spamc to use 127.0.0.1 when connecting to spamd, and you might
  have to do this inside spamass-milter config or code

On my NetBSD 9 amd64 system with SA 3.4.6, spamd has two sockets open,
one on ::1 and one on 127.0.0.1.  The logs show that ::1 is being used.



signature.asc
Description: PGP signature


Re: CVD_IN_DNSWL_HI ?

2021-10-12 Thread Greg Troxel

David B Funk  writes:

> The other thing you should do is to report false-positives to the
> dnswl.org site.
> See: https://www.dnswl.org/?page_id=17

That's great advice.  I have found over the years that DNSWL is well
run, and I'm confident that if a listed machine is emitting spam and
it's reported, then it would either get delisted or fixed very fast.

> You first might want to verify that your FPs aren't being generated by
> some upstream relay that is is trusted but due to some configuration
> issue is "masking" the spam source.

The kind of places that get listed in HI tend to be well-managed.

> If you put a copy of one of the offending spams in pastebin.com and
> post the URL here we can look at it with you to see if we can spot
> your issue.

Putting the spam in a pastebin will let other people do a test scoring
run and that will likely shed some light on the situation.

Also, check how your DNS is set up.  While DNSBLs in general don't want
to return false results on purpose, when they get abused with high query
rates there are not a lot of options to get people to stop.



signature.asc
Description: PGP signature


Re: Message-ID with IPv6 domain-literal

2021-09-21 Thread Greg Troxel

Grant Taylor  writes:

> On 9/21/21 2:00 PM, Greg Troxel wrote:
>> You are missing that SA is not a standards conformance test suite.  It
>> is a tool to guess if a message is spam.   Bill said that some forms of
>> Message-ID are correlated with spamminess.   So whether the form that is
>> correlated is compliant to the spec or not is not a relevant question.
>
> Fair enough.
>
> Rupert's original question was about syntax, which seems to be more
> RFC based than convention applied by SpamAssassin.  This seems
> perfectly legitimate to me, just different than what I understood
> Rupert's question to be about.
>
> Thank you for clarification.

It could be a fair question if a SA plugin/rule is trying to evaluate
"is this field correct according to the standards", and gets that wrong,
as a separate issue from "is it a clue of spam".  I mean that a rule
that is "MESSAGE_ID_SYNTAX_ERROR" is buggy even if it fires on spammy
but legit message ids, but that the same rule called
"MESSAGE_ID_IS_ICKY" isn't buggy.

As a separate comment, I didn't go read the RFC, but my quick reaction
about the message-id values with IPv6 literals with embedded IPv4
addresses was: these are not reasonable values, and reasonable software
would not emit them.  So to me, the question of whether they are
technically compliant was not likely to be that important, within the
context of spam filtering.

Greg


signature.asc
Description: PGP signature


Re: Message-ID with IPv6 domain-literal

2021-09-21 Thread Greg Troxel

Grant Taylor  writes:

> What am I missing?

You are missing that SA is not a standards conformance test suite.  It
is a tool to guess if a message is spam.   Bill said that some forms of
Message-ID are correlated with spamminess.   So whether the form that is
correlated is compliant to the spec or not is not a relevant question.



signature.asc
Description: PGP signature


Re: TLD rules catch non-domain data

2021-08-20 Thread Greg Troxel

Kenneth Porter  writes:

>> *  5.0 KAM_SOMETLD_ARE_BAD_TLD .stream, .trade, .pw, .top, .press,
>> *  .guru, .casa, .online, .cam, .shop, .club & .date TLD
>> Abuse 
>
> The KAM rule was just recently fixed. If you have an example that's
> still tripping it, post it to a pastebin and share the link here.

I just had it falsely hit, in that it triggered on mail that was ham.
There was a .club URL, but it was to a club website mentioned in mail
that I actually agreed to get and that was on topic.

So I would suggest that rules that do not show actual evidence of spam,
but merely "other people have abused things that seem like you", be
limited to 2 or 3 points.


signature.asc
Description: PGP signature


Re: Score for certain spam

2021-08-18 Thread Greg Troxel

Alan  writes:

> It's sent to the bit bucket, not done in the MTA. In this case, each
> account can set individual thresholds and has an individual set of
> local rules, so that might be why. I'd prefer to 550 them as well,
> although I suspect the majority of sources just don't care. Lately the
> most insidious stuff has been coming from VPS providers with
> insufficient vetting.

For actual spam, it doesn't matter if you /dev/null or 550 them.

My point -- to the list, not really so much to you since I realize you
have your own reasons --  was that there is a possibility of a legit
sender's message hitting the threshold, and for that message, it is much
better to 550 than /dev/null so they can figure it out.   It's only for
that very rare legit mail that it matters, in my view, but there it's
important.


Thus, I have a setup to MTA-reject at 8 and everything that makes it
through that gets filed, in INBOX if low enough, and  in a spam folder
if not.



signature.asc
Description: PGP signature


Re: Score for certain spam

2021-08-17 Thread Greg Troxel

Alan  writes:

> I manage email for a couple of hundred domains, so a fair bit of stuff
> that arrives to my inbox are spam complaints (they're supposed to open
> tickets or use the support mailbox but... users). I flag anything over
> 5.0 as spam, but it still comes to my inbox. Anything over 8.0 goes to
> the bit bucket. Our support inbox deletes anything over 10.0. Stuff
> that scores over 20 arrives on a regular basis but 10 seems to be a
> decent threshold for "absolute crap".

WHen you talk about 8/10 and bitbucket/delete, are you accepting this
email at the MTA level and then sending it to /dev/null?  If so, I
wonder what your thoughts are on the wisdom of that vs rejecting at the
MTA level?  In my view MTA, rejection is much better because if there is
a legit sender they get a 550 back, rather than silent discard.


signature.asc
Description: PGP signature


Re: Score for certain spam

2021-08-17 Thread Greg Troxel

David Bürgin  writes:

[all the other replies sound 100% sensible to me]

> In your experience, what is a good ‘certain spam’ threshold? By that I
> mean the score above which messages are virtually always spam, no false
> positives.

There is no certainty; there is only probability.   So you have to
decide what risk you want to put up with, and that's in my experience a
risk of accepted spam and a risk of rejected ham.

> The default threshold for spam is 5.0, which works well for me. Only
> very rarely a ham message scores above that and lands in my Junk folder.

I have set up TXREP, and added known senders to a welcomelist, plus some
private rules and score tweaks, SA base plus KAM.

I find that ham over 5 is extremely rare.

I am rejecting at the SMTP level at 8.   I have so far not received a
single complaint of legit mail being rejected.  8 is a bit more
aggressive than I would recommend in general.

Note that I take two unconventional views compared to standard SA
doctrine:

  mail is personal-ham, list-ham, or spam.  If a message from a
  mailinglist that is technically ham gets misfiled or even rejected,
  that's not a big deal.  Mail that is personally to me (really, that I
  care about) that gets rejected is a big deal.

  I really don't want any spam in my INBOX, because it appears on my
  phone, and thus I sort mail into "ham", "maybe spam", "spam" and
  "definitely spam", basically sorting <1 point into inbox, 1-5 into
  spam.N folders, with 5+ into pam.5, combined with MTA-level rejection
  at 8.  This means that every day several messages are sorted into
  spam.1 and spam.2 that are technically ham, and I just refile them
  when at a computer.  The benefit to this is that only a handful of
  spam messages land in my inbox every week.

I often add welcomelist or rule tweaks for list senders who score 1-5.
Usually the messages are icky somehow, from an MTA on a BL,
misformatted, etc.  Almost always I wouldn't really care if I had missed
them.   Real people, real transactional notifications, I add exceptions
for.

This is higher effort, but it serves my dual purposes of not missing ham
and protecting my phone INBOX from spam.  But it also gives me insight
into score distribution.  1-2 point ham is pretty normal, and arguably
that folder is 75% ham.  The 4-5 folder is about 98% spam.

> Would 10.0 be a good ‘certain spam’ threshold? 15.0? I could then reject
> such messages at the SMTP layer, without having to worry about rejecting
> legitimate messages.

My view is that very occasional rejecting of legit mail is much better
than having it buried in a spam folder.   I would be very surprised if
rejecting >= 10 caused you real trouble.   You just said that you almost
never have ham get scored over 5.  So 10 seems like a reasonable step.



signature.asc
Description: PGP signature


Re: Question about whitelisting of naadac.org

2021-08-12 Thread Greg Troxel

Lukasz Maik  writes:

[not sure what the relationship of ricoh-europe is to a US .org is]

> Sure, please find full tests results here: 
> https://www.mail-tester.com/test-bw02eaxrt
>
> We've lost a point for not having DKIM/DMARC authentication, which is 
> unfortunately not supported by our hosted exchange.
> We also lost 0.5 point for not having alt attribute in the images, so we will 
> add it.
> Total is 7.8/10.
>
> The problem, when user is sending normal work e-mails, recipients are
> finding those messages in the Junk Email folder. Even people with who
> he was previously working before.

I'm not sure anybody said this yet, but: spamassassin the project is not
going to add your domain to a whitelist because you are having problems
with how others sort your mail.  As I understand it, the project would
only consider that sot of addition for domains that are 1) really known
to send pretty much zero spam and 2) users of spamassassin are
inconvenienced by what they perceive as incorrect tagging as spam.
Note that this is very different from senders being unhappy about how
recipients tag the messages.

Reading the  test report, I see that you have a URL in SBL

This domain has two hits in rfc-clueless

  https://multirbl.valli.org/lookup/naadac.org.html

and the outgoing IP address is

   208.70.208.232   Spam Grouper Net block list


So basically you (they?) need to clean up all the issues.  That may
involve finding a mail host that doesn't do business with spammers and
whose IP addresses are not in DNSBLs.


Also, if you are bothered by recipient filtering decisions, you need to
ask the recipients what filtering they are doing and why they sorted how
they did.  That's up to them, not the spamassassin project.

It may be that they have no idea and are uncooperative.  I have had
problems with yahoo misfiling mail, and found the experience of asking
them about it not to be useful.   So it is possible that your recipients
should get a different email provider.



You might also remove URLS to social media.  They have privacy policies
which are inconsistent with addiction treatment anyway.


signature.asc
Description: PGP signature


Re: DKIM_* scores

2021-07-26 Thread Greg Troxel

Matus UHLAR - fantomas  writes:

> I noticed that pure existence of DKIM signature can push score under zero:
>
> DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
>
> ...so the cumulative score is -0.2.
>
> I'm aware that we don't have many rules with negative scores, but multiple
> scores for single valid DKIM sinature should not be redundant.

I don't follow the logic in "should not be redundant" especially for
scores with such low values of -0.1.

You're talking about "below 0", but what matters is "<5", per SA
doctrine.

As I see it SIGNED and VALID are intended to cancel, causing a signature
that isn't valid to get a +0.1.  That seems sensible, although given how
much DKIM is broken by mailing lists that (incorrectly IMHO) modify
messages, it doesn't seem really useful to make that higher.

And then there's -0.1 for a valid dkim matching From: and another -0.1
for valid dkim matching the envelope sender, which is often different.
So -0.2 means that there are two dkim signatures, one for each, and they
are both valid.  Not a guarantee of ham of course, but -0.2 is a small
score.

It's a fair question to ask how these shake out with masscheck, but I
see nothing intrinsically wrong.

> do you people modify scores of these rules?
> I would turn both off, but  DKIM_VALID is used in some meta rules...

I am someone who tweaks a lot of scores, but basically my tweaking
reduces scores of +3 or more down a few points because I find they hit
ham, and scoring up things of 1-2 to higher because they hit my spam and
I find they don't really hit my ham.  I have never been motivated  to
adjust these.

For me, the biggest deal with dkim is that I can whitelist_from_dkim for
senders, and avoid whitelisting forged mail not actually from them.

> BTW, looking at metas in 72_active.cf:
>
>  meta XPRIO  __XPRIO_MINFP && !DKIM_SIGNED && !__DKIM_DEPENDABLE 
> && !DKIM_VALID && !DKIM_VALID_AU && !RCVD_IN_DNSWL_NONE
>  meta XPRIO  __XPRIO_MINFP && !DKIM_SIGNED && !__DKIM_DEPENDABLE 
> && !DKIM_VALID && !DKIM_VALID_AU && !RCVD_IN_DNSWL_NONE && !SPF_PASS
>
> !DKIM_VALID && !DKIM_VALID_AU is redundant and !DKIM_VALID_AU should be enough

I don't think so.  These are negated.  And, a dkim signature from some
random domain that is not the From: or envelope-from will cause
DKIM_VALID.  But I do think !DKIM_VALID will impliy !DKIM_VALID_AU.
Still, I'm 50/50 whether I'm write or I'm about to learn something.
>
>  meta __HTML_FONT_LOW_CONTRAST_MINFP HTML_FONT_LOW_CONTRAST &&
> !__HAS_SENDER && !__THREADED && !__HAS_THREAD_INDEX && !ALL_TRUSTED &&
> !__NOT_SPOOFED && !__HDRS_LCASE_KNOWN && !DKIM_VALID
>
>  meta __NOT_SPOOFED  DKIM_VALID || !__LAST_EXTERNAL_RELAY_NO_AUTH || 
> ALL_TRUSTED   # yes DKIM, no SPF
>  meta __NOT_SPOOFED  SPF_PASS || DKIM_VALID || !__LAST_EXTERNAL_RELAY_NO_AUTH 
> || ALL_TRUSTED   # yes DKIM, yes SPF
>
> shouldn't these contain DKIM_VALID_AU instead?

perhaps, but the problem is that there is a lot of mail that is From:
i...@foobank.com and has envelope-from of
foobank-sen...@bankserviceprovider.com with a dkim from
bankserviceprovider.com.  This is bogus; people who deal with
foobank.com should be able to
  whitelist_from_dkim *@foobank.com
and treat everything else claiming to be from foobank as spam/phish.
But the world isn't like that.


signature.asc
Description: PGP signature


Re: Another evil number

2021-06-25 Thread Greg Troxel

RW  writes:

>> You can reach out
>>to our Customer Support Team+1 (800) 781 - 2511.
>
> Is it common in the US to put 800 in brackets like that? In my
> experience brackets normally go around either country codes or area
> codes, digits that may be optional.

Yes, it common.  The proper form is

  +1 800 782 2511

but people in the US do not write numbers like that.

The normal way in the US would be

  (800) 782-2511

and i find the spaces around the - to be unusual.  But really there is a
fair degree of variation.



signature.asc
Description: PGP signature


Re: Recent experience with RCVD_IN_SORBS_NR_SPAM and others

2021-05-28 Thread Greg Troxel

John Hardin  writes:

> On Thu, 27 May 2021, Greg Troxel wrote:
>
>> The other problem on a small number of messages was
>> RCVD_DOTEDU_SHORT.  I realize this must have passed masscheck, but
>> getting a message of 1-1.5 kB from an address in .edu is to me not at
>> all suspicious, and 2.5 points is a lot for something likely to
>> appear in legitimate mail.  (In my case it was a notification of air
>> conditioning shutdown in a particular building, and that's all there
>> was to say.)
>
> Score limit adjusted.

Thanks.

> Do you know whether it happened to hit
> ALL_TRUSTED? I added an exclusion for that.

It did not hit ALL_TRUSTED, and I'd say that's not really wrong.  The
edu in question has outlook hosted mail which has a lot of servers.  I'm
not actually part of the edu, but am on some lists, and have something
to do with it.

I expanded trusted_networks and then it did hit, but the rule still
fired.  I will see if after the regexp fixes just made arrive on my
system, it's still the case.


(I realize everybody's mail stream is different.  Part of where I'm
coming from is knowing a fairly large number of people using edu
addresses, so to me this seemed sort of like 2.5 point for a message
being from gmail and 1-1.5 kB.)


signature.asc
Description: PGP signature


Re: Recent experience with RCVD_IN_SORBS_NR_SPAM and others

2021-05-28 Thread Greg Troxel

"Bill Cole"  writes:

> That rule does not now exist in trunk and IT NEVER HAS, according to the 
> Subversion history.
>
> It is not in the current KAM channel rules and I see no evidence in my logs 
> of any such rule ever hitting within the past 3 months.

Totally my fault.   I added it to local several weeks ago and managed to
forget about it.   I have been seeing it hit on spam quite well, and
until last week it was not unreasonably hitting ham.

Sorry for the noise about that.


signature.asc
Description: PGP signature


Recent experience with RCVD_IN_SORBS_NR_SPAM and others

2021-05-27 Thread Greg Troxel

I lost track of checking my spam folders recently for almost a week (I
filter to a maybe-spam folder on scores that are lower than what
doctrine says, splitting into really-ham, iffy, and really-spam -- it
was the iffy I didn't look at).  On checking, I refiled a bunch of ham
that had from 2 to 6 points.  There was much more of this than normal,
at all scores.

There are lots of reasons for the scores, some of which is just how it
is (MIME HTML with no HTML tag), and rDNS lookup failures on google
MTAs.  But one thing jumped out at me: a fair number of
RCVD_IN_SORBS_NR_SPAM hits, including for yahoo servers.  It seems to me
a bit much to apply that and 2.5 points for MTAs from freemails that
have mostly ham and some spam -- that's what 1 point for FREEMAIL_FROM
is for.  As usual, I look up rules that hit on my ham and think about
changing the score, but I can't find it.

So: was this rule in trunk or KAM, and was it withdrawn in the last
week?  Perhaps because of listing yahoo and maybe others?  I didn't find
anything about this on the users list.


The other problem on a small number of messages was RCVD_DOTEDU_SHORT.
I realize this must have passed masscheck, but getting a message of
1-1.5 kB from an address in .edu is to me not at all suspicious, and 2.5
points is a lot for something likely to appear in legitimate mail.  (In
my case it was a notification of air conditioning shutdown in a
particular building, and that's all there was to say.)

Thanks,
Greg


signature.asc
Description: PGP signature


Re: txrep_autolearn range - how does the range influence autolearning

2021-05-16 Thread Greg Troxel

Lucas Rolff  writes:

> Thanks for the notes about sa-learn, txrep outgoing and the autolearn itself.
> In my particular case, I'll only use it as an inbound filter, since I
> handle outbound very differently (I let other people take care of the
> filtering using an external relay); For inbound I've used a commercial
> solution for years, they sadly decided to 5x the cost starting 2022,
> which then doesn't really make it worth it anymore, so time to change!

It's unfortunate that you can't use it on tx.   On outbound, all it does
keep track of who mail was sent to, and that causes it to get better
treatment on inbound.

So probably if you can arrange to a get  a log feed somehow of the
outbound and write something to adjust the database you can get better
results.

In particular, as you give negative points to mail that is a reverse
match to outbound mail, you can turn up the agressiveness knob with
somewhat less trouble.


signature.asc
Description: PGP signature


Re: txrep_autolearn range - how does the range influence autolearning

2021-05-16 Thread Greg Troxel

Lucas Rolff  writes:

> I’m currently configuring a new setup for passing through all emails,
> and I opted for SA as my filtering – one thing I also configured are
> txrep ( https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TxRep
> )
>
> One thing I saw in the docs is that “txrep_autolearn” is a range between 0 
> and 5 – 0 meaning it’s disabled.
>
> Now, my question is, what effect does the number have? I’d first have thought 
> that it was simply a Boolean to turn it on or off.
> It (sadly) doesn’t seem to be really documented what a higher or lower value 
> results in (other than 0 disables it).

Unfortunately my suggestion is to read the sources.

> I’ve trained my filter with sa-learn with a quite large chunk of
> emails (both spam and ham), which is why I also want to enable
> autolearning of txrep – I just ideally want to figure out prior to
> doing that, what effect the given numbers have on the autolearning
> process.

* Make sure to use -L with sa-learn if you are using txrep, because
  otherwise there is full eval including DNSBL queries.  Do not believe
  the text in the sa-learn(1) because it was about the bayes module
  only, AIUI.

* sa-learn will train txrep

* txrep outgoing is really useful

* Conventional wisdom seems to be that autolearn is dangerous in terms
  of getting thfings wrong and if you are running sa-learn on ham/spam
  folders, I don't see much point.   However people need to refile
  mis-filed spam into a spam folder and mis-filed ham back into a  ham
  folder, and you need to be clear on what Trash means.   In my world,
  Trash is ham, and any spam that squeaks by is put into spam.manual
  from whence it is learned.


signature.asc
Description: PGP signature


  1   2   3   >