Re: How to get the X-Spam-Flag

2024-05-04 Thread Matija Nalis
On Fri, May 03, 2024 at 08:22:09PM +0200, tba...@txbweb.de wrote:
> when a send a test spam message to my server it recognizes it as spam and
> puts it into /var/lib/amavis/virusmails as a gz file. In this file I can
> find the complete X-Spam-Header, etc:
> 
> But this header is missing in the passed mail. I use the standard settings
> of amavis
> 
> in /etc/amavis/conf.d/20-debian_defaults


Did you check @local_domains_acl in /etc/amavis/conf.d/05-domain_id ?

E.g. parts that talks about:

# amavisd-new needs to know which email domains are to be considered local
# to the administrative domain.  Only emails to "local" domains are subject
# to certain functionality, such as the addition of spam tags.


-- 
Opinions above are GNU-copylefted.


Re: spamassassin with gmail

2024-04-15 Thread Matija Nalis
On Mon, Apr 15, 2024 at 01:48:53PM +, Michael Grant via users wrote:
> > I don't like any daemon connecting to my mail storage. Can you imagine if 
> > your solution gets hacked, how much data would be compromised? I prefer 
> > messages being scanned/marked before stored. I wonder if this is even gdpr 
> > compliant, because you can access private data constantly.
> First, for people like yourself, you would want to run such a daemon
> yourself on your own infrastructure, hence why I am thinking of this could
> be useful to other people as open source.
> 
> Second, there are plenty of people who don't run their own email, as in,
> gmail users, that entrust their email to google.  Though GDPR probably has
> something to say about such a service, I doubt it would be impossible under
> GDPR, especially EU users using a suitable EU server and whatever rules
> necessary were followed.

Not impossible, no. But there are many things needed to implement
GDPR correctly, overhead is huge, and the fines are draconian, so I
wouldn't advise it unless you're willing to choose dealing with all
that as your main life career path. Not to mention that Google
themselves will likely block you (in better case) or sue you for ToS
violations before it could become financially viable model.

> > Why not just forward messages? Register a domain put some mx servers in 
> > front of gmails mx. I recently was testing with such relay/forward, works 
> > perfectly, I am only changing the envelope nothing else. DKIM, spf 
> > everyting perfectly working.
> > 
> I'd be interested to know if anyone runs spamassassin forwarding from gmail
> back into gmail, how does this work?  How to get it so mail isn't in a loop?
> You can't do what I'm talking about just by forwarding.  More below on that.

I haven't really touched gmail in decade or few, but back then IIRC
it was relatively easy: you could choose to forward mail only when
some criteria was met (e.g. using email+extens...@gmail.com, or some
header etc), instead of forwarding everything. And even if gmail no
longer supports that, you could implement loop handling on the other
side alone (just with a little more overhead)

> > So for the whole of Europe you need data processing agreement for accessing 
> > the mail storage as a 3rd party.
> Probably, yes.  Is it any different with a mail server that uses a back end
> scanner as a service?  I know there are several such services for corporate
> email that work with a google workspace account that allows you to modify
> the mail routing which you can't do with a free gmail account.

Well you'd likely need to hire a bunch of lawyers and study
requirements of GDPR for some months to model how it behaves in
corporate environment before engaging in risk assessment and building
your business model on top of those results.

> You can argue that it's really crazy giving access to your whole mailbox to
> your email provider too.  

It *is* crazy. That's why all the cool kids ain't doing it for decades now. 
They run their own VPS with SA, or install a FreedomBox or something. :) 
Definitely don't depend on @gmail.com whatever !

> I guess I don't see the difference here. Your mail service provider
> could be broken into as well. 

Sure, as can Gmail.. The difference is in statistics: even if such
service was technically and financially[1] as secure as Gmail (which
may be debatable), by the simple fact that your mail is now routed
through 2 SPOFs instead of just 1 SPOF means your chance of problems
has increased by at least 100% (i.e. doubled).

> I'm just wondering if there's enough interest in this to do the work to make
> it open source.  If there were a lot of people mailing me saying "Yes!  I've
> been looking for something like this but I don't want to run it myself!",
> then I'd consider making it into a service, as well as probably open
> sourcing it.  Thing is, such a service has to minimally viable.  So far,
> you're the only response I've seen to this and your response appears to be
> overwhelmingly negative.

Here is my advice: don't overthink it in advance. Instead:

- Pick a nice open copyleft FOSS license (e.g. AGPLv3+)
- write a dozen or so lines of most basic requirements and installation 
instructions in README
- publish whatever you have at the moment out on some source-hosting platform 
out there 
  (can't really recommend any really open one; you can self-host, or choose one 
of popular
  ones, not really critical at this point)
- mention it at few related places

If people find it interesting, you'll note it in a number of issues,
feature requests, etc. As the demand grows, you can improve it.
If not, hey, you've wasted barely no effort, and did a good deed, so
it is a karma net positive in life, eh?

If it however turns out that it eventually becomes so popular so you
must choose between your day job and maintaining it, then you might
consider incorporating and launching it as a service. But not before.

> matter how 

Re: OT: Trigger words in email addresses?

2024-04-07 Thread Matija Nalis
On Sun, Apr 07, 2024 at 08:40:40PM -0500, Jerry Malcolm wrote:
> The problem is that gmail, in particular continues to insist on
> putting these in spam folders and (theoretically) discarding some
> of them completely.  Some of users swear they never get them and

And did you check that claim? When you send your mails to some newly
created Gmail account, does it end up in Spam folder? And if it does,
what does the text in that grey "Why is this message in spam" box says?

Does it say the same thing for some of your users having problems?
You'll obviously need some way to reproduce the issue and check if it
is fixed, before you can even try fixing it.

Also, did you create account at https://postmaster.google.com/ and
checked what does it say for your domains after a while how they fare?

Also, did you check your mail server logs, are there any temporary
(4xx) or permanent (5xx) rejections of your mail traveling to Google?
And if so, what do they say?

> So... recommendations, please... should I change donotre...@.com to
> something else, and if so, what is the accepted (non-spam-trigger) email

Since your current e-mail adress has a high spam score relevance by
now, trying to continue using it is not going to help... But do make
sure you fix all potential issues (see link below) before changing
it, or you'll implicate yourselves as spammers even more.

> address to use to still get the point across to not send anything to that
> account?

People will still reply to those, there is no fixing humanity, so you
may as well give up on that. I wouldn't worry too much about that;
vast majority of them won't ever read sender e-mail address anyway
before hitting reply.

Your best bet is to configure your ticketing system to accept
messages being sent to that email address, and inject them into
ticketing system if you care about streamlining that.

> Secondly... more generally, any suggestions on how to crack the gmail code
> and make them know we aren't spammers?

Sure. Convince the users (or at least a lots of employees and family
and friends) to register and click on "not spam" every time it goes
into spam and actually read those e-mail and click on links in them.
Just like people would do if they were interested in those mails.

That will give feedback to train Google. It will still likely take
weeks/months of doing that before the reputation starts to change,
and that is assuming number of people doing that is significant,
and they do not look like sockpuppets.

Also, did you read https://support.google.com/a/answer/81126 ?

(Yes, there is quite a LOT of things to do, but you do need to do it
all if you want Google to recieve your messages)

Also, note that Google *likes* their e-mail user share (although not
yet monopoly), and would like nothing more than to silo it completely. 
Luckily market still does not allow them to do that quite yet.

Note that it also means that Google is unlikely to want your
independent e-mail server easily communicating with their userbase.

In fact, they'll love to make it annoying enough for you to give up
and move your e-mail over to their paid service, but are still
somewhat afraid of government-level antitrust sanctions, so much to
their chargrin they can't make it _too_ annoying and thus too
obvious... Yet. 

-- 
Opinions above are GNU-copylefted.


Re: localhost lookups ?

2024-02-23 Thread Matija Nalis
On Fri, Feb 23, 2024 at 06:43:53PM -0500, J Doe wrote:
> 23-Feb-2024 18:33:02.422 queries: info: (localhost.ca): query:
> localhost.ca IN  +E(0) (127.0.0.1)
> 
> 23-Feb-2024 18:33:02.422 queries: info: (localhost): query: localhost IN
>  +E(0) (127.0.0.1)

> What's interesting is that this is happening on a mail server that has
> a: .ca TLD.  It _looks_ like SA is appending this TLD to: localhost,
> queries for it and it fails and then it queries correctly for:
> localhost, which succeeds.

And what does "ping localhost" (running with the same user as SA) say?
I'd guess it might have the same behaviour, in which case it is not
SA-related...

> I'd like this spurious lookup for: localhost.ca to stop ... has anyone
> seen something similar - either: localhost.ca or: localhost.tld for a
> mail server with another TLD (ie: mail.com -> localhost.com) ?
> 
> If others have seen this, is it result of a configuration parameter ?

I've seen it in the past with misconfigured /etc/hosts (missing
localhost entry) so search (or domain) from /etc/resolv.conf was
being used as it would be for any unqualied host name...

(it also might be a permission problem on those files, or 
chroot / SElinux / Apparmor, or /etc/nsswitch.conf etc)


-- 
Opinions above are GNU-copylefted.


Re: Callout verification with SpamAssassin ?

2024-02-19 Thread Matija Nalis
On Mon, Feb 19, 2024 at 02:38:03PM -0500, Bill Cole wrote:
> On 2024-02-18 at 18:40:45 UTC-0500 (Mon, 19 Feb 2024 00:40:45 +0100)
> Matija Nalis  is rumored to have said:
> > - Firsty: yes, I'm fully aware of all issues associated with
> >   https://en.wikipedia.org/wiki/Callout_verification
> 
> Which is why SA does not support such so-called verification in any way. It
> never will as long as I'm a contributor.

It's fine, really. No need to get all upset :)

> > - I'm not looking for debate about general usefulness of Callout
> >   verification (and the system for which it is being investigated is
> >   not general-purpose e-mail system).
> 
> This is a bit like saying you don't want to debate the general
> usefulness of spamming. And then going on to ask about ways to
> spam.

It's more like saying when one has a question about using SA, they
might not want engaging in a debate about why SA is written in perl
instead of in rust (even though the advantages of rust are obvious).

Or if they ask advice how to use SPF plugin, they might not want to
waste time discussing shortcomings of SPF as a technology.

Anyway, idea was to reduce spam, not promote it, so your comparison
sounds a little too negative :(
(Also, there is generally no need to discuss usefullness of spamming:
it is obviously profitable or people wouldn't be doing it)

> I do not care about why you are trying to use SMTP callback
> verification,

Just as well, as I'm not particularly inclined to waste significantly
more time trying to explain the details, especially if they've put
wall before themselves ("never will...")

> because it is a fundamentally broken concept.

Yes, it is broken to a certain extent. But to be fair, whole SMTP
itself is based on fundamentally broken concepts, don't you agree?

And most of SMTP workarounds are fundamentally broken too in many
ways. SPF is broken. RBLs are definitely way too broken. Pattern
matching is quite broken. Bayes is broken. etc.

That is IMHO why SA (still) exists at all -- each of the attempts of
fixing the broken SMTP is also broken (to some higher or lower
extent), so SA requires that *multiple* broken technologies become
*broken at the same time on the same email* before if produces false
positive (yet, both false positives and false negatives still happen
from time to time. Because all of it is broken, and we do not live in
perfect world).

> If it looks like a solution to you, you are refusing to look at
> solving your real problem.

Real problem is of course that e-mail is still the only universal
widespread communication technology which has not been silo-ed away
by megacoorps (yet; although they are working hard on that), and that
people are refusing to *universally* implement DKIM (or some other
sender authentication scheme) worldwide in my lifetime. And yes, in
this one (of hundreds others) use case it does look like improvement
to me (not a solution though - there is no "solution" for e-mail
except perhaps not using it at all).

> All set then. SA is not the right tool for you. 

But it is. Although, this question was not for me. 

Anyway, unless somebody else has seen something like that, I think
I'll just leave their postfix doing all-or-nothing mail rejection
based on callout verification during SMTP phase; instead of trying to
improve it by just assigning SA score based on it (and doing it on
much reduced number of cases).

(but just because I have other more important things to do at the
moment then dive into writing custom SA plugin; and not because doing
it in SA would not actually improve situation for everyone involved
in this particular case)

-- 
Opinions above are GNU-copylefted.


Callout verification with SpamAssassin ?

2024-02-18 Thread Matija Nalis


Preface:

- Firsty: yes, I'm fully aware of all issues associated with
  https://en.wikipedia.org/wiki/Callout_verification
  (and there is a LOT of them!)

- I'm not looking for debate about general usefulness of Callout
  verification (and the system for which it is being investigated is
  not general-purpose e-mail system).

- I'm also not looking for alternative sender validations and related
  schemes which might give similar results (like SPF / DKIM
  verifications, SpamAssassin AWL/TxRep/whitelist_* etc.) but only
  for checking sender via Callout verification.


The question:

I'm looking for existing solution to check in SpamAssassin (as a part
of custom complex set of meta rules) whether e-mail of the sender[1] has
recently[2] been "callout-verified" [3] by '250 Ok' response to RCPT TO.

The system in question has amavis / postfix beneath, if that helps
(so e.g. re-using postfix verify_cache.db is an option)

Is anyone aware of an existing SpamAssassin plugin or similar which
can do SMTP Callout verification?

Thanks,
Matija

[1] where sender is ideally header "From:" (possibly overriden by
"Reply-To:" header if it exists); but I'd settle for envelope
FROM too if that is the best that can be easily done

[2] caching for callout verification is implied and required; so
e-mail address which have already been queried won't be asked
again for some time.

[3] as noted at start, all caveats with Callout results are known
(e.g. that it does not guarantee that the sender actually exists
or that the e-mail to that address can actually be sent in the
future)

-- 
Opinions above are GNU-copylefted.


Re: Question about forwarding email (not specifically SA, pointers greatly appreciated)

2024-01-19 Thread Matija Nalis
On Fri, Jan 19, 2024 at 10:37:13AM -0600, Thomas Cameron wrote:
> The forwarded email is being *accepted* by GMail. My issue now is that GMail
> drops it into the recipient's spam folder. I suspect it's a reputation
> thing. Once the server is up and running for a while, I'm hoping that GMail
> will stop flagging the emails from the server as spam.


You would need to encourage at least several of the recepients (the
more the better) to click on "Not spam" button on GMail on such
mails. Then it will (eventually) start accepting them normally.

see e.g. 
https://serverfault.com/questions/953486/repairing-e-mail-domain-reputation-on-google

I suspect that Google might even doing it on purpose, in order to
"encourage" even more users to be locked in their e-mail
walled-garden ecosystem.

-- 
Opinions above are GNU-copylefted.


Re: Gift Card Scam

2024-01-04 Thread Matija Nalis


bodyGIFT_CARD   /gift card/i
score   GIFT_CARD   1.5

metaFREEMAIL_GIFTCARDSGIFT_CARD && (FREEMAIL_FROM || !DKIM_VALID)
score   FREEMAIL_GIFTCARDS6.0

If you're not big on gift cards.

Also, you might want to enable and train Bayes...
 
On Thu, Jan 04, 2024 at 01:19:28PM -0800, Kirk Ismay wrote:
> I'm wondering if anyone has any good ideas to catch gift card scam emails. 
> This latest version came from Gmail, and has valid DKIM records and the IPs
> are whitelisted.
> 
> Thanks,
> Kirk
> 
> Here's the hits from SpamAssassin:
> 
> X-Spam-Status: No, score=0.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
>     DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,
> RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,
>     T_SCC_BODY_TEXT_LINE autolearn=disabled version=3.4.6
> 
> And here's the body:
> 
> It’s incredible to see you all consistently pushing the bar to greatness.
> The outcomes you've all achieved are remarkable, especially in light of the
> difficult circumstances we're in. I am so grateful to have everyone as a
> member of the team, and I really value your great skills. My words can never
> express how much I appreciate what you do; the effort and skill you
> contribute consistently go above and beyond what I had anticipated. I'm
> grateful.
> 
> Most times, a simple THANK YOU is what every employee wants to get from
> their "big boss" for their hard work. This is why I'm planning on
> recognizing the efforts of some staff and appreciating them with a little
> surprise gesture. I believe I can count on you to help get this little
> appreciation surprise done in a discreet manner.
> 
> What do you think would be the ideal gift for such a celebration? I'm
> considering gift cards like Visa or Mastercard, given their universal
> acceptance and functionality. I believe this would cater to the diverse
> tastes of our staff, allowing them to use the gift as they prefer without
> being limited to specific stores or locations. I would appreciate your help
> in making these purchases on my behalf, and I need you to check what store
> we have around to make this purchase from.
> 
> Indeed, you all have been great assets to the organization and really
> deserve this recognition.
> 
> 
> Kind Regards,
> 
> The Boss
> Executive Director
> Victim Company
> 
> Sent from my iPhone
> 
> END
> 

-- 
Opinions above are GNU-copylefted.


Re: Filtering emails from word-oliv...@somewhere.com

2023-10-05 Thread Matija Nalis
On Thu, Oct 05, 2023 at 03:15:31PM -0400, Bill Cole wrote:
> On 2023-10-05 at 03:41:59 UTC-0400 (Thu, 05 Oct 2023 14:41:59 +0700)
> Olivier  is rumored to have said:
> 
> > Recently I have received a wave of mails in the form
> > From: word-olivier@somewhere.random
> > To: oliv...@mydomain.com
> > 
> > Where the "olivier" part is a valid username on my domain.
> > 
> > Is there a rule to catch these with SA?
> 
> SA does not have any way to know what the valid usernames in any domain are.

That is of course correct, but I did not read that mail as requesting
user auto-detection, just plain matching for their user? 
E.g. something like:


header  __from_olivier  From =~ /.*-olivier\@/
header  __to_olivierTo =~ /olivier\@mydomain\.com/

metafake_oliviers   __from_olivier && __to_olivier
score   fake_oliviers   7.0

> Special rules for high-spam individuals can also help by acting as "canary"
> rules, if you use the 'autolearn_force' rule tflag. This way, when a spammer
> using the specific pattern starts a run, you will catch one match, autolearn
> it as spam, and (hopefully) recognize its sibling messages as such.

+1 for that.

-- 
Opinions above are GNU-copylefted.


Re: Ensuring SPF/DKIM for @gmail.com

2023-07-26 Thread Matija Nalis


On Thu, Jul 27, 2023 at 07:11:59AM +1000, Noel Butler wrote:
> On 27/07/2023 05:09, Matija Nalis wrote:
> 
> > Any SPF, no matter how correctly configured, will lead to false
> > positives in some cases (e.g. encoutering mailing list
> 
> B.S.

I'd appreciate more civil expressions of disagreement, though, if
this means what I think it means.

> mailing lists have been smart enough for over 20 years to rewrite sender and
> not appear as a basic forwarder - which are you are correct, however there
> are forwarding abilities to rewrite sender which avoids this, its been 15
> years or more since I've used procmail which by default did not.

I personally know several people who still use procmail today, sooo...
Your assumption seems to be that EVERYBODY upgrades on regular
(yearly-or-so?) cycles, and updates their configs to latest recommended 
practices at the same time.

That at least I can attest is not always the case (I still see
systems with custom sendmail.cf which nobody dares to touch, 
and with a good reason!) 

Yeah, I agree that it sure would be nice if world worked that way and
everybody upgraded regularly and configured them according to latest
BCPs, but around here at least, it sometimes (I'm avoiding to say
"often") doesn't.

There are quite a few systems that someone knowledgable setup some
time back, and after they've gone to greener pastures, nobody have
touched them, yet they continue to use them. 

Sure, I'll be first to agree that it is bad and should be fixed. 

But I won't agree that "it does not exist", nor would I agree that it
doesn't matter (if it didn't matter to them, people wouldn't be
asking me to troubleshoot it, and yet they do)

> If you are going to dry-reach to support an argument, please use modern

I'm not aware of that "dry-reach" idiom, would you care to explain?

> facts and not 1990's. I was a *very* early adopter of SPF back in late 90's
> and have had zero issues in 20 years in using SPF (as expected as an early
> adopter, teething issues as with all software needed fine tuning in very
> early days)

Good for you. But that is anecdotal - you are certainly not participating 
in every mailing list in existence, nor do you contact all people on the 
planet which use every kind of mail forwarder.

Neither do I, but I service lots of systems of other people that do,
and with many people, the chances rise. So, still in 2023, I have to
deal with SPF (and DKIM) failing due to such forwarders/ML (as well
as misconfigurations, of course)

Also, 1990s? Weren't first SPF-alike ideas drafted first time in
early-mid 2000s, and SPF itself not published as *proposed* IETF
standard until 2014? 
That was less than a decade ago, barely yesterday :)

-- 
Opinions above are GNU-copylefted.


Re: Ensuring SPF/DKIM for @gmail.com

2023-07-26 Thread Matija Nalis


On Wed, Jul 26, 2023 at 06:44:32PM +, Marc wrote:
> > At the risk of starting a flame war...
> > 
> > What does "correctly setup SPF" mean to you?
> 
> so your ip does not generate a softfail or fail

Only way to make SPF never incorrectly fail/softwail is to use "+all",
but that kind of kills its point :-)

(actually, even with +all, some sites will fail it - especially
because of it, as +all is sign of either intentional sloppy spammer
or incompetent postmaster, both likely leading to spam coming from
that site).

> > What makes your opinion better than someone else's opinion that differs?
> >   (I take it for granted that someone will have a differing opinion.)
> 
> When you configure your spf your result is either pass, softfail or fail
> I think we can agree that a correctly configured spf results in a pass, don't 
> you?

Well *I* don't. Sometimes, maybe even often, it does. But not always.

Any SPF, no matter how correctly configured, will lead to false
positives in some cases (e.g. encoutering mailing list or .forward
not using VERP/SRS). It is inherit in the SPF protocol (which is why
DMARC checks both DKIM and SPF, in order to reduce, but not
eliminate, false positives).

We are NOT living in ideal world where everybody implements every
existing standard. Thus, even most correctly configured SPF will
sometimes softfail/fail, when it should not. 

Trying to pretend that the world is ideal is not really good idea;
one might as well pretend that spam does not exist and save all that
time wasted on implementing antispam measures :-)

-- 
Opinions above are GNU-copylefted.


Re: Sudden surge in spam appearing to come from my email address

2023-07-16 Thread Matija Nalis
On Sun, Jul 16, 2023 at 01:37:39PM +0100, Martin Gregorie wrote:
> Another way to do this is to build either a mail archive or a database
> of addresses you've sent mail to and simply add a positive score to mail
> from anybody who you've sent mail to: this needs the following bits of
> code:

So, something like AWL and TxRep SpamAssassin plugins do?

-- 
Opinions above are GNU-copylefted.


Re: Sudden surge in spam appearing to come from my email address

2023-07-15 Thread Matija Nalis
On Sat, Jul 15, 2023 at 10:04:18PM -0500, Thomas Cameron wrote:
> pass
> fail
> 

So, it fails SPF, but DKIM passes. Meaning, your mail would pass
normally modern servers which check both. 

If you do not want to receive such status messages, you should update
your DMARC records (currently _dmarc.camerontech.com indicates you
want to receive BOTH aggregate "rua=" and forensic "ruf=" reports;
and that you want to receive status updates when the message would've
passed normally via "fo=1")

> So it seems like my emails are being quarantined when I send them to mailing
> lists, even this one.

What? No. At least not in this report you shared. You seem to be
confusing "" section (which is just a dump of DNS
which that server sees) with actual ""s leading to final
"" of "none" (which is good, as opposed to "reject" or
"quarantine" which would not be).

You probably might want to use some nice frontend to visualizing
DMARC results, if reading XML and SPF/DKIM/DMARC protocol internals
is not second nature for you. 
e.g. https://github.com/topics/dmarc-reports

> > +1 for encouraging mailing list operators to get with the times.
> > 
> > You can also do as Robert suggests and use a separate (sub)domain for
> > mailing lists with different SPF settings thereon.
> 
> It's not so much mailing list operators I'm worried about. It's that, when
> my email goes through a listserv mailing list, if I define hard failures, I
> am worried that my email isn't going to get to list members. That's not the
> mailing list admin, it's the admins of the list members' mail servers. If
> I'm not understanding something, please feel free to clarify.

If mailing list is employing SRS, mail reaching final recipients
would not be failing SPF checkes, as envelope sender (i.e. SMTP's
"MAIL FROM: ") would be rewritten as the mail is coming from 
mailing list domain and their servers (as it would), not yours.

See https://en.wikipedia.org/wiki/Sender_Rewriting_Scheme

Only if the mailing list remailing server leaves original (your)
envelope sender (which it shouldn't be doing, yet often does), would
you get such SPF problems. So, SPF problem is solvable from mailing
list server side, if its admins are willing.

Also, if your mails are signed by DKIM, and mailing list software is
not rewriting signed headers nor body (as it shouldn't, but some
mailing lists try to add annoying text to the bottom of messages like
"to unsubscribe, do xyz", thus breaking both DKIM, S/MIME and PGP
signatures), then your mail should pass DKIM checks too. 
So that problem is avoidable on mailing list server side too.

-- 
Opinions above are GNU-copylefted.


Re: Share bayes database between servers

2023-07-09 Thread Matija Nalis
On Sun, Jul 09, 2023 at 07:06:10PM +0200, Robert Senger wrote:
> I've set up a testing environment that also uses master-master
> replication of the mysql bayes database, with priority in dns set to
> equal for both mx to get incoming mail distributed evenly to both
> systems. So far, this seems to work, but this is a low load
> environment.

it boils down on how much you trust mysql master-master replication
stability and performance, which is heavily dependent on your
experiences and exact versions used (are we talking about Oracle
Mysql, or MariaDB or Percona forks? which versions? What replication
setup? etc.)

I've had problems under high concurrent load (not performance, but
replication setup breaking) in the past, so I prefer to avoid
master-master replication if possible, especially if I anticipate
high concurrent load.

But if you are confident in it, sure, go ahead.

> Any suggestions?

Well, how are you training your bayes DB? If it is via cron and
manually curated ham/spam corpuses (the recommended way), I'd rather
suggest keeping databases separate and simply running training on
both servers (you can duplicate or share ham/spam corpuses as you wish,
from rsync to SMB/NFS).

If you are using auto-learn (which was not recommended last time I
looked), well, you'd probably better off NOT syncing bayes at all
IMHO, as it should be prefered that risk of bayes poisoning is
reduced to one server instead of replicating that (and there is not
much benefit, as auto-learn will quickly learn on each server
separately anyway, and if one set of domains is not getting some type
of spam, it is not beneficial to learn it anyway)

-- 
Opinions above are GNU-copylefted.


Re: Problems matching the last word in multi-OR Regex

2023-06-21 Thread Matija Nalis
On Thu, Dec 15, 2022 at 09:17:54AM -0500, Bill Cole wrote:
> On 2022-12-15 at 07:03:25 UTC-0500 (Thu, 15 Dec 2022 12:03:25 + (UTC))
> Pedro David Marco via users  is rumored to have said:
> 
> > HI,
> > Situation:i have 2 twin servers running exactly the same OS, and SA.
> > (3.4.4)

Are there different version of some external plugins installed,
maybe?

> > i have an email with the word 'dog' inside.
> > i have this rule:      body    __ANIMALS    /cat|mouse|bird|dog/i
> > 
> > Problem:Rule  __ANIMALS  its in one server, but in the other one, does
> > not!

Interesting. Is there perhaps some syntax error elsewhere in the file? 
You can check with "spamassassin --lint"

Also, maybe there is another rule with same name defined elsewhere
(maybe editor backup file that SA includes?)

> > i have noticed that if i switch the rule words order, like this:
> > 
> >   body    __ANIMALS    /cat|mouse|dog|bird/i
> > 
> > and 'dog' is not the latest word, then it hits on both servers.
> > 
> > I have tried many permutations and it only fails with the word that
> > appears the last in regular expressions with multiple OR
> > Has anyone seed this before? is that a known bug?  
> 
> This is absolutely NOT a known bug. I'm not sure how it is possible for
> something so fundamental to still be lurking in SA undiscovered. I don't
> think the basic parsing of REs in rules has changed since v2.
> 
> It would help a great deal if you could open a bug at
> https://bz.apache.org/SpamAssassin/ with sample messages that are hit or not
> by different variants of the rule.

I agree. Do mention the issue in this thread when you open it, so
interested parties may follow.


One other obscure situation that comes to mind that might possibly
happen is that one used "sa-compile" in the past for previous version
of the regex, but something went wrong with system clock so SA does
not detect that changed regex needs recompiling and continues to use
old outdated version)

Or are you using spamc/spamd which did not reload new rule?

Ore maybe the word "dog" is copy/pasted instead of type and so it
includes some invisible UTF8 characters.

I'd suggest if you could try creating new unique different name for
the rule (e.g. NEWANIMALS_20230621), typing the rule content manually
instead of copy/pasting, and checking if that rule matches by using
"spamassassin -t" on that?

That should rule out most of the possible other issues above.

-- 
Opinions above are GNU-copylefted.


Re: spamassassin4.x - problem

2023-06-21 Thread Matija Nalis
On Wed, Jun 21, 2023 at 12:00:41PM +0200, natan wrote:
> I tested via configurations
> 
> 1)dovecot10 + spamassasin-3.x - problem not exists
> 2)dovecot11 + spamassasin-3.x - problem not exists
> 3)dovecot10 + spamassasin-4.x - problem exists
> 4)dovecot11 + spamassasin-4.x - problem exists
> 
> all dovecot have this same amavisd-new-2.11.1
> 
> If there was a problem in the path /tmp and/or amavis configuration, the
> problem would be everywhere, not just with only in spamassassin-4.x

That does not logically necessarily follow. It would only follow if
*only* possibilities were "some bug outside of spamassasssin ALWAYS
triggers" and "some bug outside of spamassassin NEVER triggers".

There however also exists the possibility that "some bug outside of
spamassassin ONLY SOMETIMES triggers - depending on some yet undefined
condition (timing, race condition, specific thing produced by SA4 but
not SA3, etc.)"

In which case, the problem might be something other than SA4, but
which triggers (only or just much more regularly) in conjunction with
SA4. But of course, it very well could be SA4.

To test, I'd use "strace -ff -o /tmp/log.txt -e file -p $PID" to
determine which process exactly is creating which files, and which
process is removing them, both on SA3 and SA4 machines, and see which
process creates something which it then doesn't remove.

> Again it says that spamassassin-4.x deletes teporary files but not all and
> it creates more than it deletes

That could be one of the possible reasons, yes.

> It seems as if for some reason specific files (e-mails) could not be deleted

If that is indeed the case, first step would be finding out exactly
which files are not being deleted (and by whom).

Another way to prove that the problem is exclusively in spamassassin
is to reproduce the problem ONLY with spamassassin (i.e. without
dovecot, amavisd, etc. being involved at all) - e.g. by manually 
running  "spamassassin -t /tmp/blah.mbox" and showing that it creates 
and leaves behind some temporary files.

-- 
Opinions above are GNU-copylefted.


Re: comparing sender domain against recipient domain

2023-05-12 Thread Matija Nalis
On Fri, May 12, 2023 at 05:32:30PM +0200, Reindl Harald wrote:
> > On Fri, May 12, 2023 at 09:49:40AM -0500, Dave Funk wrote:
> > > On Fri, 12 May 2023, Matija Nalis wrote:
> > > > That is because those domains are not EQUAL? Od did you wanted a
> > > > rule that checks only on SIMILAR domain names (e.g. with lowercase
> > > > letter "L" replaced with number "1" as in your example)?
> > 
> > It should be relatively easy to write SA plugin for that:
> 
> and with *what* do you replace the "1"?

With one of the similar looking characters. Doesn't really matter
which one, but it needs to be done consistently. Personally I'd 
probably chose lowercase "L", but it can be anything.

e.g. for simple first variant (i.e. for direct matching, not more
advanced statistical similarity based approach suggested in later
step)

sub normalize_domain($)
{
  my ($domain) = @_;

  # (yes I know we have tr///)
  $domain =~ s/1/l/g;# number 1 to lowercase "L"
  $domain =~ s/I/l/g;# uppercase "I" to lowercase "L"

  return lc($domain);  
}

[...]

if (lc($domain1) ne lc($domain2)) { # domains are NOT the same...
   if (normalize_domain($domain1) eq normalize_domain($domain2))) { # ...but 
they LOOK the same
  add_spam_score("domain_is_not_same_but_looks_the_same")
   }
}

so normalize_domain() would return the same string for "paypal.com",
"PayPal.com", "PayPaI.com" or "PayPa1.com": i.e. "paypal.com"

It doesn't matter if the result of it isn't the real domain (as it
will be used only for comparison to simularly mangled other domain),
e.g. if one had real domain "TheReallyBest1.com", it would be
normalized to "thereallybestl.com" -- so while that is NOT how domain
is really named, it doesn't matter, as it would still work for
detecting fakes like "TheReallyBestI.com" (regardless if neither
lowercase "L" nor the uppercase "I" are used in real domain name).


> be careful with "relatively easy" when it comes to reality

Sure, I though I was. Do you spot problems with the code above?
Think of any real-life examples where it would backfire or fail to work?

The code like the above looks trivial to me ("relatively easy" was
more geared toward statistical analyses of the words to return
statistical score in percentage instead of simple fake/not_fake
boolean like above; as it should take into account ordering of the
letters, missed letters, duplicated letters, dyslexia-alike reversal
of two neighboring letters and similar psychological ways in which
human mind can easily be fooled). Still might take few weeks to make
it to reasonably publishable shape...

But I was more interested if SA already has something like that?
I haven't dabbled in 4.0 yet, and there might be code already
writting to accomplish similar things, so it would be a waste to
reinvent a wheel.

-- 
Opinions above are GNU-copylefted.


Re: URL Time-of-Click Protection

2023-05-12 Thread Matija Nalis
On Fri, May 12, 2023 at 11:57:57AM -0400, Alex wrote:
> I'm curious what people think of URL rewriting or otherwise having some

Such rewriting would break digital signatures, and would not work at
all e.g. on encrypted e-mails.

> kind of idea of whether a URL could or should be scanned at some later time
> to determine if it's potentially malicious at the current time where it may
> not have been initially?
> 
> Is anyone implementing that in open source?

Like, for example, in Firefox browser? It does that (by default I
think) when you click on any website.

In Firefox preferences, click under "Privacy & Security" and look for
checkboxes under "Deceptive Content and Dangerous Software Protection".

> What are the disadvantages of doing this? I'm not talking about actually
> checking the URL in advance, but I suppose some kind of wrapper that scans
> it at the time the user visits.

Disadvantage with firefox blocklists like above is that someone has
to report that malicious site is malicious. See:
https://support.mozilla.org/en-US/kb/how-does-phishing-and-malware-protection-work

But there are so many definitions of "malicious" that other more
heuristic-based approaches (which would not need previous reporting)
like antiviruses would employ might not work (e.g. if it is not just
"executable malicious code downloaded to computer"; but for example
Is shopping site that looks vry similar like more popular brand
also malicious? How about banking site? How about fake news? How
about regular news? Opinions might range from "none of those is
malicuos" to "all of them are malicious" :-)

But none of that has much connection with SpamAssassin (well, I guess
a plugin for SA might do URL body rewriting for some other tool to
intercept, but it is way outside of its scope. Just use some
configurable proxy tool if you want to enforce it in your
organization instead of depending on Mozilla lists)

-- 
Opinions above are GNU-copylefted.


Re: comparing sender domain against recipient domain

2023-05-12 Thread Matija Nalis
On Fri, May 12, 2023 at 09:49:40AM -0500, Dave Funk wrote:
> On Fri, 12 May 2023, Matija Nalis wrote:
> > That is because those domains are not EQUAL? Od did you wanted a
> > rule that checks only on SIMILAR domain names (e.g. with lowercase
> > letter "L" replaced with number "1" as in your example)?
> 
> Now I get it, the OP is looking for some kind of comparison function that
> does an "apparent linguistic distance" evaluation of two strings and returns
> a score that indicates a "visual similarity" value.
> (EG replacing 'l' with '1' or 'O' with '0', etc).

It should be relatively easy to write SA plugin for that:

- replace those numeric and uppercase letters in one of the strings,
  convert both to lowercase, and compare them 

- it should also remove spacer characters (like "paypal" vs "pay-pal")

- It should also not only hit on exact matches, but return similarity
  in percentage (so trying to fake "spamassassin" with "spamasassin"
  can be detected).

Of course, non-ASCII would complicate those replacement tables
significantly (there are MANY more similar-looking glyphs then in
pure ASCII), but as I treat any IDN domains as suspicios, and they
are easy to detect, it would probably not be such a big deal.

> I've hand coded rules to check for this stuff when frequently abused but I
> don't know of a programmatic algorithm to do it automagically.

I wonder if someone has already done it, and something sufficiently
similar to be used to that purpose?

-- 
Opinions above are GNU-copylefted.


Re: comparing sender domain against recipient domain

2023-05-12 Thread Matija Nalis
On Thu, May 11, 2023 at 09:41:34PM +, Marc wrote:
> > > I was wondering if spamassassin is applying some sort of algorithm to
> > > comparing sender domain against recipient domain to detect a phishing
> > > attempt?
> > 
> > There is a suite of meta rules and subrules with names containing
> > TO_EQ_FROM in the default rule channel. Consult the rules files for
> > implementation details.
> 
> hmmm, I guess not 
> 
> some test message with these headers
> test2:~# spamassassin -D < spam-test.txt  > out2
> 
> Date: Mon, 24 Oct 2016 22:10:07 +0200
> To: recipi...@alexander.com
> From: Lara 

That is because those domains are not EQUAL? Od did you wanted a
rule that checks only on SIMILAR domain names (e.g. with lowercase
letter "L" replaced with number "1" as in your example)?

Also, most of those rules (like __TO_EQ_FROM_DOM) will not show in
standard output, but only in standard error, so you should call it
like this:

spamassassin -D < spam-test.txt  > out2 2>&1

to be able to see it in:
grep TO_EQ_FROM out2

-- 
Opinions above are GNU-copylefted.


Re: Fine-tuning SA URI extraction

2023-04-26 Thread Matija Nalis


On Wed, Apr 26, 2023 at 03:21:50PM -0400, Kris Deugau wrote:
> http://deepnet.cx/~kdeugau/spamtools/cornell-birds.eml

Thanks. Adding some dbg() in HTML.pm of my SA 3.4.6, it seems it is
triggered this part of the email:



"background" is deprecated (but still supported) HTML attribute:
https://www.w3.org/TR/html4/struct/global.html#adef-background


It seems to happen in this part of the SA HTML.pm code (dbg line added by 
myself):

sub html_uri {
  my ($self, $tag, $attr) = @_;

  use Data::Dumper; dbg ("/mn/ html_uri tag=$tag attr=" . Dumper($attr));
  
  # ordered by frequency of tag groups
  if ($tag =~ /^(?:body|table|tr|td)$/) {
if (defined $attr->{background}) {
  $self->push_uri($tag, $attr->{background});
}

My reading of the HTML specs (and tested in Debian Bullseye firefox and
chromium) is that "background=none" was not any special value (as the
HTML author maybe intended), but is simply taken as relative URI -
meaning picture file with a literal name of "none" in the same
directory as the HTML being viewed.

However, the issue is not restricted to that deprecated "background" attribute.
E.g.  or even  would likely confuse SA in 
the same way.


The browser would treat them as relative URLs. 

I.e. if you were viewing "https://example.com/dir/example.html; those
two would resolve to:

==> https://example.com/dir/none
 ==> https://example.com/dir/none.com

instead of "http://www.none.com; as SA seems to do (and as browser
might do if you typed "none.com" in address bar -- but NOT if it was
invoked via HTML elements)

One should also read comments about "" handling in that same
file.

Now, I see two ways to change SA behaviour here:

- simple but lacking: do not call push_uri() if assumed URI does not look like 
absolute
  URI (i.e. if it does not contain at least '//')
  
  This would avoid false positives, but will not add relative URIs.
  e.g. it might add:
  http://www.example.com/dir
  but it would NOT also add:
  http://www.example.com/newdir/photo1.jpg 
  if for example "" was in there.

- complex but emulating browser behaviour better:
  Add full handling of relative URIs. i.e. have push_uri() detect all
  relative URIs and convert them to absolute URIs before adding them
  to the list of URIs.
  Might not be that hard in base case as $self->{base_href} seems to
  be saved, but what happens if there are for example multiple HTML
  attachments in e-mail? Would/Should it propagate? What if there is
  no "" specified, those relative URIs are invalid then?

-- 
Opinions above are GNU-copylefted.


Re: BAYES_00 BODY. Negative score?

2023-02-16 Thread Matija Nalis


On Thu, Feb 16, 2023 at 05:34:37PM -0500, joe a wrote:
> Oh, of course.  I installed as root initially, being foolish perhaps, but
> did create a specific user "later" and adjusted permissions as needed.  Or,
> so I thought.

well, installing as root (especially with restrictive umask) manually
(e.g. "make install" or "cpan" vs. "yum/rpm/dpkg") may often make
problems, even if you later switch to packages (you need to look not
only at final file permissions, but at directories leading up to it
too).

namei -l /path/to/file.pm is often helpful to quickly check ALL
permissions needed to access file (+x on directories is a must)

> Permissions are (almost) certainly the issue.  Now having the impressive
> locate/mlocate creature at my command, I might actually make progress.

I usually troubleshoot those (if log is insufficient) with:

strace -efile -o /tmp/sa.log spamassassin foobar

then look at /tmp/sa.log to see which open/stat/access returned -1 EPERM
or EACCES error.  Then check all path components for that file using
"namei -l" (or multiple "ls -ld"). Then try to su to that user and
"cat" that file manually.

If not regular DAC (chmod/chown) permissions, it might also be SELINUX
restrictions or more rarely ACL (getfacl(1)).

-- 
Opinions above are GNU-copylefted.


Re: Strange findings debugging bayes results

2023-02-16 Thread Matija Nalis
On Thu, Feb 16, 2023 at 01:02:25PM +0200, Henrik K wrote:
> On Thu, Feb 16, 2023 at 10:18:50AM +0100, hg user wrote:
> > Every score is based on headers, very generic headers. and some
> > related to my setup.
> > 
> > Not a single token from the message body
> 
> The Bayes implementation has been practically unmaintained for a long time,
> so YMMV.
> 
> You can try something like this, most headers are parsed badly and generate
> biasing random garbage (unscientific observation):
> 
> bayes_ignore_header ARC-Authentication-Results
> bayes_ignore_header ARC-Message-Signature

Yeah, bayes of headers (and CSS/HTML stuff) has been doing me much
more misclassifications than good, so I've eventually given up on
updating ever-growing bayes_ignore_header list and disabled bayes on
the headers altogether:

bayes_token_sources none visible uri mimepart

My stance being: If enduser would not be classifying on those sources
(except Subject header), neither should automatic bayes classification...

perhaps OP has bayes_token_sources setting that takes only headers
into the account?

https://man.archlinux.org/man/Mail::SpamAssassin::Conf.3pm.en#bayes_token_sources

-- 
Opinions above are GNU-copylefted.


Re: KAM channel disabling lookups?

2022-10-12 Thread Matija Nalis
On Wed, Oct 12, 2022 at 10:45:06AM +0200, Matus UHLAR - fantomas wrote:
> On 12.10.22 10:41, Noel Butler wrote:
> > or save SA doing extra work, and use the RBL's at MTA level - where they
> > should be used and have been used for 25 years in the ISP world
> 
> you compare uncomparable.
> 
> SA does header scanning and can check on non-direct headers, e.g. at the
> internal network level.
> Also, it can do deep header scanning for open proxies etc.

Also, many uses of RBL (e.g. amavis) do not take them as "absolute
truth" to outright refuse to accept mail, but only as additional
clues to programatically increase or decrease spam score by some
amount (different score depending on the RBL, what other rules matched
in addition to RBL etc) and maybe to autolearn to stop same spam from 
other IPs which are not yet blacklisted (and report them to RBLs too). 

Which is much higher functionality than "this RBL said this IP is
bad, so reject everything from there, regardless if it is good or
bad".

I have lots of free CPU cycles to burn, but I do NOT have human hours
to deal with questions like "where is my mail?" when RBL yields false
positive (Your mail is either in your INBOX, or in your SPAMBOX).
 
-- 
Opinions above are GNU-copylefted.


KAM_OCTET_PHISH=3 ?

2022-09-02 Thread Matija Nalis


Some of legitimate mails here are being hit with rather high KAM_OCTET_PHISH=3

it seems to trigger when I have both text/html and application/octet-stream
MIME parts.

reduced/sanitized example at: https://pastebin.com/D4vqKnLC

It seems to be multi-rule meta, but all those sub-rules seem to check
for mostly the same two things to my untrained eye:

mimeheader  T_OBFU_HTML_ATTACH  Content-Type =~ 
m,\bapplication/octet-stream\b.+\.s?html?\b,i
mimeheader  __KAM_VM5   Content-Type =~ /.s?html?\.?\"?($|;)/i
mimeheader  __KAM_OCTET_PHISH1  Content-Type =~ 
/application\/octet-stream/i

metaKAM_OCTET_PHISH ( __KAM_OCTET_PHISH1 + ( __KAM_VM5 + 
T_OBFU_HTML_ATTACH >= 1) >= 2 )
describeKAM_OCTET_PHISH HTML File with the wrong MIME Type
score   KAM_OCTET_PHISH 3.0

That is on Debian Bullseye spamassassin 3.4.6-1 (with extra KAM rulesets).

Can someone shed a light what is happening here, and is it supposed
to be happening?

-- 
Opinions above are GNU-copylefted.


Re: Spamassassin spamming in log

2022-06-02 Thread Matija Nalis
On Thu, Jun 02, 2022 at 02:47:28PM +0200, Bert Van de Poel wrote:
> For the errors about nonexistent uses you will want to have a look at
> /etc/default/spamassassin I'm guessing.
> For the info messages: this has just got to do with your logging level. You
> will want to decrease it in local.cf or maybe also in the default file.

Also, depending on your distro and init system, /etc/default/spamassassin
might not be processed (e.g. on Debian systems, in many cases /etc/default/*
entries are only read via /etc/init.d/* System-V-init scripts, and
not used when using default systemd init system).

You should use "ps auxw" to determine with what exactly
parameters it is being run, and then grep the system for those flags
if different from ones in /etc/default/spamassassin (esp. when you
change that file and restart, but changes are not applied)

-- 
Opinions above are GNU-copylefted.


Re: Spamhaus spurious positives - how does SpamAssassin check Spamhaus?

2022-05-07 Thread Matija Nalis
On Sat, May 07, 2022 at 09:35:31AM -0700, Paul Pace wrote:
> On 2022-05-07 07:53, Benny Pedersen wrote:
> > On 2022-05-07 16:42, Paul Pace wrote:
> > >   *   10 URIBL_SBL Contains an URL's NS IP listed in the Spamhaus SBL
> > >   *  blocklist
> > >   *  [URIs: wikileaksdotorg]
> 
> The problem with this solution is I don't know which domain is going to be
> next, plus I'm not so much looking for a solution to this specific result,
> but rather I want to understand why there is a disparity between what
> SpamAssassin is reporting and what the Spamhaus website is reporting.

If you do:

grep -r URIBL_SBL /var/lib/spamassassin/
you'll see it does this:

/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:uridnssub   
URIBL_SBLzen.spamhaus.org.   A   127.0.0.2
/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:body
URIBL_SBLeval:check_uridnsbl('URIBL_SBL')
/var/lib/spamassassin/3.004006/updates_spamassassin_org/25_uribl.cf:describe
URIBL_SBLContains an URL's NS IP listed in the Spamhaus SBL 
blocklist

which means if it wanted to check (for example) 195.35.109.44 it would do
DNS A record lookup on "44.109.35.195.zen.spamhaus.org" (note reversed quads),
and check if the result is "127.0.0.2" (which happens to be true in this case
at the moment - but might not be some time later):

% host -t a 44.109.35.195.zen.spamhaus.org
44.109.35.195.zen.spamhaus.org has address 127.0.0.2

Same procedure can be used for others RBLs. 

As to why web lookup returns different result, is might be because
DNS results was cached earlier (maybe by some previous spam message),
and/or because you did not look it up fast enough. Data on RBL
servers changes all the time, and there is usually delay between
their current database (which is likely what the web interface looks
up directly) and their published DNS records (which would lag behind
it).

Anyway if you do DNS check at the same time (or very close; I think
default TTL there is 60 seconds) as spamassasin does it, you should
get the same result. If you do it minutes or hours later, the results
might be different again (how often they change depend on the RBL in
question, as well as your luck).

-- 
Opinions above are GNU-copylefted.


Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?

2022-05-05 Thread Matija Nalis
You should probably check that none of your ham (i.e. non-spam)
messages contains SPAM_99 or SPAM_999. It can happen when spammers
poison your bayes database, and increased score in that case might
lead to legitimate mail being misclassified as a spam. 

On Thu, May 05, 2022 at 10:37:40AM -0500, Thomas Cameron wrote:
> I understand that turning knobs without understanding the consequences can
> do bad thing, but almost all of the spam that gets through SA on my server
> has SPAM_99 or SPAM_999 set in the headers. It is obviously spam, so I don't
> really get how it wasn't flagged, but it wasn't. What are the risks of
> giving more weight to SPAM_99 and/or SPAM_999? Explain it like I'm five,
> sorry, it's probably something simple that I just don't understand.
> 
> Thomas
> 

-- 
Opinions above are GNU-copylefted.


Re: sub-test syntax

2022-04-04 Thread Matija Nalis


On Mon, Apr 04, 2022 at 07:45:02AM +0100, Niamh Holding wrote:
> Hello Matija,
> Sunday, April 3, 2022, 11:13:13 PM, you wrote:
> 
> MN> For closer example to yours requirements then, perhaps look into 
> 72_active.cf 
> MN> regex for RCVD_IN_IADB_LISTED
> 
> So you suggest [26] instead of (2|6)

I suggest you have to use *all* parts of the syntax as shown in that
example, not *only* character class [26]. 

Otherwise (for example if you do not escape the dots, or don't add
beggining/end anchors, etc), you're likely NOT going to be matching
correctly.

-- 
Opinions above are GNU-copylefted.


Re: sub-test syntax

2022-04-03 Thread Matija Nalis
On Mon, Apr 04, 2022 at 12:19:23AM +0100, Martin Gregorie wrote:
> For instance, I whitelist any email sender who I've previously sent mail
> to. To do this I maintain am email archive held in a PostgreSQL 
> database and wrote an SA plugin that searches the archive for any
> message(s) I've previously sent to the sender of the message being
> checked: if I've sent mail to them they get whitelisted.

That sounds interesting, is it published somewhere?

-- 
Opinions above are GNU-copylefted.


Re: sub-test syntax

2022-04-03 Thread Matija Nalis
On Sun, Apr 03, 2022 at 10:06:51AM +0100, Niamh Holding wrote:
> Hello Matija,
> Saturday, April 2, 2022, 7:12:42 PM, you wrote:
> 
> MN> grep -r check_rbl_sub /var/lib/spamassassin
> MN> for examples of what's possible and how (e.g. 25_dnswl.cf)
> 
> Looking there I see nothing equivalent to alternates like in ordinary regexes 
> (2|6) for 2 or 6

It shows how command must look to be able to correctly use regexes there 
(instead of plain string).

"grep" command above should've returned more examples for you...

Then you can use similar principle to look for any other things you
want to accomplish in the future, simply by looking how others have used it.
That's why I provided it that way instead of simple copy/pasting the final 
result.

For closer example to yours requirements then, perhaps look into 72_active.cf 
regex for RCVD_IN_IADB_LISTED

-- 
Opinions above are GNU-copylefted.


Re: sub-test syntax

2022-04-02 Thread Matija Nalis


On Sat, Apr 02, 2022 at 06:09:20PM +0100, Niamh Holding wrote:
> Will this work to check 2 ip address responses, or do I have to write 
> separate ruled for 127.0.0.2 & 127.0.0.6
> 
> header  __NH_HOLTRBL_X1 
> eval:check_rbl_sub('holtrbl-lastexternal','127.0.0.(2|6)')

You can do it one rule, but you have to learn to use correct regexses for 
check_rbl_sub().

Do a:

grep -r check_rbl_sub /var/lib/spamassassin

for examples of what's possible and how (e.g. 25_dnswl.cf)


-- 
Opinions above are GNU-copylefted.


Re: Getting right GPG key for KAM

2022-03-21 Thread Matija Nalis
On Mon, Mar 21, 2022 at 06:31:07AM -0600, @lbutlr wrote:
> On 2022 Mar 21, at 04:37, Henrik K  wrote:
> > Right, it does seem you haven't imported the key..
> 
> Thanks! That's what was missing. Odd, considering there were KAM files 
> present, just not recent ones. Anyway, not my system, but all sorted now.

note that gpg by default saves keyrings under user home directory, so
if the script was previously being run as another user, that would
case exactly the behaviour you're seeing.

-- 
Opinions above are GNU-copylefted.


Re: Txrep, add-addr-to-whitelist

2021-12-28 Thread Matija Nalis
On Sun, Dec 19, 2021 at 12:18:15AM +1030, Peter wrote:
> Today I got my life back.
> 
> Decided to ditch TXrep and go back to AWL. It might not be as clever,
> but at least it works!
> 
> The inability to do working manual changes to scores meant wasting a lot of
> time having to add addresses to my whitelist file even for addresses that
> might not ever send another email in future.
> 
> Relief...

Probably a good choice, as TxRep is currently quite broken in several
regards, see for example:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7943

and a list of tickets in:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7173

-- 
Opinions above are GNU-copylefted.


Re: X-Originating-IP fires too much

2021-12-03 Thread Matija Nalis
On Wed, Dec 01, 2021 at 01:52:16PM +0100, Matus UHLAR - fantomas wrote:
> > 
> > > results
> > > - ALL_TRUSTED doesn't fire because 192.0.2.1 in X-Originating-IP
> > > 
> > > - HELO_NO_DOMAIN fires
> > > - RDNS_NONE fires
> > > - both because X-Originating-IP contains no helo/DNS data.
> > > 
> > > any idea what could I do here, besides disabling X-Originating-IP
> > > generation?

One workaround might be to use
"clear_originating_ip_headers"  and then re-add all other headers
except that one with "originating_ip_headers", eg.:

clear_originating_ip_headers
originating_ip_headers X-Yahoo-Post-IP X-Apparently-From
originating_ip_headers X-SenderIP X-AOL-IP
originating_ip_headers 
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp


This is not perfect beceuase it would ignore X-Originating-IP from
everyone.

Another perhaps cleaner solution is if your roundcube box is trusted
not to send spam, to bypass spamassassin completely for outgoing
mails from there.

Or simply make a negative score meta rule for all mails identifying
themselves as coming from your roundube (originating IP, X-mailer,
SPF/DKIM passed etc.) that will undo the spam score it gets from
other rules.

-- 
Opinions above are GNU-copylefted.


Re: MIME_BASE64_TEXT only on us-ascii

2021-11-30 Thread Matija Nalis
On Tue, Nov 30, 2021 at 12:03:15PM -0700, Philip Prindeville wrote:
> > On Nov 17, 2021, at 9:50 AM, Bill Cole 
> >  wrote:
> > SpamAssassin rules are not laws in any sense. They do not prescribe or 
> > proscribe any action. They do not reflect any sort of moral or ethical 
> > judgment. They do not express or define technical correctness.
> 
> Isn't that exactly what we're discussing here?  "Technical correctness"?

Hm, no? App encoding pure ASCII is Base64 is not breaking any RFC?
So it is behaving "technically correctly".

> Good internetworking implementations follow (to the extent they don't 
> conflict with good security practices) Postel's Law, "be conservative in what 
> you send, be liberal [but not naive] in what you accept".

Well, antispam efforts (as is security for important stuff) are
mostly exactly the OPPOSITE of good internetworking implementations
of the old Postel's law.

And for the good reasons - in the internetworking implementations of
the old, the vast majority of peers (if not all) you interacted with
were GOOD guys trying to do good things.

In today e-mail (and security), the majority of the actors are
enemies trying to penetrate your defensive lines. 

Also, see https://en.wikipedia.org/wiki/Robustness_principle#Criticism


> Rereading:
> > Base64 encoding is only necessary if there are non-ASCII characters used. 
> > UTF-8 is a superset of ASCII & it is normal for MUAs to not encode more 
> > than needed.
> 
> Exactly.  Encoding is only used when and where necessary.

...by legitimate users. Spammers on the other hand will sometimes 
encode even when it is NOT needed, probably in attempt to avoid less
advanced antispam tools (or due to sheer laziness when writing spam
tool). 

The fact that such encoding is tehnically allowed does NOT change the
fact that the tecnique is vastly more used by spammers than by
innocent parties.

> Properly encoded HTML uses HTML-Entity naming, which is also ASCII-friendly, 
> i.e.  instead of Latin1  etc. or raw 8bit characters.

There are several "proper" (ie. allowed by different RFCs) ways to
encode that information in mail. Statistical analyses seem to say that
some of the ways are used much more by spammers then by legitimate
users. Hence, the score for those methods.

-- 
Opinions above are GNU-copylefted.


Re: SPF_NONE scoring

2021-11-30 Thread Matija Nalis
On Tue, Nov 30, 2021 at 11:47:36AM -0700, Philip Prindeville wrote:
> I'm looking at the 0.001 scoring for SPF_NONE and scratching my head.  This 
> was discussed a bit in early 2015, but maybe it needs revisiting with new 
> perspective.

SPF is double edged sword. Sure, when it great to authenticate
envelope senders when it works, but:

- when used in combination with mailing list, plain message
  forwarding etc. it will break with false positive, marking
  (for example) this perfectly valid message of mine as a fake.
  See https://en.wikipedia.org/wiki/Sender_Policy_Framework#FAIL_and_forwarding

  This is the reason why you can only really use it for "SPF OK"
  validation - "SPF FAIL" does not really tell you anything, as it
  will happen as often for forged senders, as for valid senders.

  This is why it will often end as "?all" or "~all" and not "-all"
  (and/or soft DMARC policies)

- Also, envelope sender (on which SPF operates) is something
  completely different thing from header "From:" which is what vast
  majority of users will see, so it does not provide protection which
  one might expect.
  See https://en.wikipedia.org/wiki/Sender_Policy_Framework#Header_limitations

  And this makes "SPF OK" much less useful then it sounds in theory.

- Then there are misconfigurations (hitting limit of max 10 DNS
  lookups, SPF records which were setup once but not kept up-to-date,
  etc).

Thus, SPF is IMHO not very usable for scoring on its own, but it does
have a useful purpose for creating custom SA rules and is often very
usable for short circuiting with whitelist_auth.

> Surely no one who cares about maintaining their reputation by protecting 
> themselves against spoofing would fail to provide SPF records...  

For example, I do not provide it on my few other e-mail accounts by
choice (especially most of them which deal with many mailing lists,
or with users which use non-SRS e-mail forwarding), as mere existence
of SPF there causes much more damage then the potential help it
brings.

> So how is this score arrived at?

That, I am not sure. Perhaps how well it is an indicator on
ham/spam corpuses run to determine scores in general in SA? 

> And of Ham, how much of it has a valid SPF?

For my recent hams, I get this:

714 SPF_PASS=
128 SPF_NONE=
 67 SPF_NEUTRAL_ALL=
  9 SPF_FAIL=
  1 SPF_SOFTFAIL=

So, about 1 message in 7 hams does not have SPF.

> And of Spam, how much of it lacks a valid SPF?

For recent spams that reach any kind of mailbox here (eg. not
hitting very-safe RBLs, and not having very high SA scores - ie. 
having at least a minimum of potential for being misclassified
non-spam):

   2291 SPF_PASS=
667 SPF_SOFTFAIL=
472 SPF_NONE=
353 SPF_FAIL=
154 SPF_NEUTRAL_ALL=
129 SPF_PERMERROR=
 53 SPF_NEUTRAL=
 17 SPF_TEMPERROR=

So, about 1 message in 9 spams does not have SPF.

In summary, there does not seem to be big difference between
adoption of SPF in spammers as opposed to legitimate users

-- 
Opinions above are GNU-copylefted.


Re: Fw: spam from gmail.com

2021-11-11 Thread Matija Nalis
On Thu, Nov 11, 2021 at 02:21:06PM -0500, Greg Troxel wrote:
> yes, what I really want is something like
> 
> exclude_from_dnswlgmail

I guess you could disable default DNSWL_MED score with:

score DNSWL_MED 0

and then create your own score:

metaMY_DNSWL_MEDDNSWL_MED && !FREEMAIL_FROM
score   MY_DNSWL_MED-2.5

That would score MY_DNSWL_MED only if it is *not* coming from some
freemail account.

If you want it to score on all other freemail providers, but not on
GMAIL, you would replace FREEMAIL_FROM with your own header rule, of
course - like "header FROM_GMAIL From =~ /\@gmail\.com" or similar)


-- 
Opinions above are GNU-copylefted.


Re: Fw: spam from gmail.com

2021-11-11 Thread Matija Nalis
I use DNSWLh spamassassin plugin from 
http://www.chaosreigns.com/dnswl/sa_plugin/ 
which allows that "spamassassin --report" also reports to DNSWL, thus improving 
DNSWL database for everybody.

Also, I reduce effect of RCVD_IN_DNSWL_MED to -0.5 as default seems
somewhat unreasonable.

On Thu, 11 Nov 2021 12:19:10 +0100, Philipp Ewald  
wrote:
> You can report it. Gmail is on DNSWL
>
> @gmail.com>
> RCVD_IN_DNSWL_MED=-2.3
>
> https://www.dnswl.org/?page_id=17
>
> As far as i know DNSWL is used by default
>
> On 11/8/21 7:27 PM, Rupert Gallagher wrote:
>> Spammers are using gmail.com. Congratulations to Google for their fine 
>> work...
>> 
>>  Original Message 
>> On Nov 8, 2021, 10:42, Mrs.Marann Silvia < marannsilv...@gmail.com> wrote:
>> Good day my dear,
>> How are you doing and your family.I am Mrs.Marann Silvia,a sick widow
>> writing from one of the America hospitals.I am suffering from a long
>> time cancer of breast,my health situation is becoming worse,my life is
>> no longer guaranteed hence i want to make this solemn donation.I want
>> to donate my money to help the orphans, widows and handicap people
>> through you because there is no more time left for me on this earth.I
>> take this decision because i have no child who will inherit my wealth
>> after my death.Please,i need your urgent reply so that i can tell you
>> more on how you will handle my wish before i die.I will be waiting to
>> hear from you immediately by God grace amen,
>> yours sincerely.
>> Mrs.Marann Silvia
>> 
>


-- 
Opinions above are GNU-copylefted.



Re: [Sare-users] painting everybody in Taiwan with the same brush

2010-01-29 Thread Matija Nalis
Firstly, the instructions for reading this e-mail: please read it whole,
and understand that (although it may sound harsh at places) I am actually
trying to help you. Only then reply (if needed). It is also somewhat long,
but it does contain some technical info (and not only my rants :) Thanks.

On Thu, Jan 28, 2010 at 09:34:46AM +0800, jida...@jidanni.org wrote:
 Long ago, I tried mailing directly direct-to-mx style, but that of
 course didn't work, e.g., http://www.spamhaus.org/pbl/query/PBL109625
 So only 5% of my mail got through.
 
 So then I tried mailing through The ISP Here, Hinet.Net's SMTP server,
 but of course Hinet.Net has a bad name. So only 50% of my mail got through.

Yeah, well, there is this thing about SMTP... It haven't really work
correctly for at least last 10 years. It's doomed protocol. Nothing can save
it nowadays. It is taken from granted that some percentage of *anyone's*
e-mail is going to be lost and never reach its destination.  That percentage
might be lower or higher, depending on many factors, most prominent of
which is luck.

It's too bad, it was a nice and happy and simple (hence the name) protocol
before spammers got it and pretty much destroyed it.

Ok, now that we've got THAT part over with, we can get down to the point how
to minimize the pain you *will* suffer by using SMTP if you decide to
continue using it.

 So, upon people like you guy's recommendation, I (asked my mom to buy)
 me a dreamhost.com account.

Does it work better then 50% you got with HInet.Net SMTP ?
If so, then it is great - you've got better deal then before, right ?
Maybe you wanted even better, but hay... nothing is perfect, remember.

If it however works worse with dreamhost than before with Hinet.Net SMTP
server, than it was wasted money. That is sad, but such things happen all
the time too, you pay for something only to find out it was not a good deal
for you.

One thing to note - you (or anybody else) will never *ever* get it so that
100% of your mail always reaches the other side. Those days when such a
thing was possible (no matter in what country the mail originated) are long
gone -- and even before spam and all the antispam measures, mail did get
lost occasionally. Nowadays, it is quite everyday that some mails gets lost.
It is considered acceptable collateral damage in full-fledged war to protect
mailboxes from spam.

 However I can't shake off the Original Sin of Being in Taiwan. All
 people with Taiwan Colored Skin will have points deducted, no matter

Knock it off with that you're all wanna-be racists stuff, will you please? 
It is clear that racism has absolutely nothing to do with your problems, and
you are just insulting people who are trying to help you. 

Furthermore, people on this list who are replying to you are (in great
majority at least) just users of the rules, they did not write them - the
SARE Ninjas did. So even if your intent *is* to insult people who wrote
rules which are making you problems (which I hope it is not), you're
insulting the wrong people.

You've come to this mailing list (presumably) to ask people to invest their
time to help *you*, something they have no obligation to. At least you could
try to be polite to them (of course nobody can *make you*, but it will just
lower your chances of getting help).

Also note that SARE Ninjas are long gone -  see main page
http://www.rulesemporium.com/. So nobody could fix those rules even if they
thought it was a good idea (and at least some people are not convinced it is
a bad idea); and even if the rules could be fixed, still at least half the
world would *never* update them to new versions. So you would still get
blocked, only perhaps a little less. That is just a fact (based on extensive
mailadmin experience), so trust me on that.

Also please note that even when SARE Ninjas were here, they did not write
those rules because they were racists that hated Taiwanese people - they wrote
them them because they were effective (see below for technical info).

 what. We use the Telephone Company's ISP.

Yup. And somebody once decided that mail coming from your Telephone
Company's ISP (and other places) is mostly spam. The last updates and test
done in that rules file are from 2006, though, so it may have changed since.

Here is the technical data (note: I'm not a SARE Ninja and never was, but I
can read most rules and have written quite a few of my own):

http://www.rulesemporium.com/rules.htm lists the problematic 
70_sare_header1.cf rule with following comments:

the 70_sare_header1.cf ruleset contains rules which do (or in the past have)
hit ham during SARE mass-check tests. The S/O calculated by SA's
hit-frequencies scripts are all at or above 0.900. This file also contains
rules which hit only spam, but fewer than 10 spam in our mass-check tests.
Systems which are highly sensitive to false positives and/or tight on
resources may want to exclude this ruleset, pick and choose among its rules,
or lower their scores.

In