from:"Martin Gregorie"

Re: Building Red Hat Rawhide SA 4.0.0 package for RHEL/CentOS 7

2023-12-06 Thread Martin Gregorie

On Tue, 2023-12-05 at 23:25 -0800, Kenneth Porter wrote:
> On 12/5/2023 10:57 PM, Benny Pedersen wrote:
> > mimedefang does not use spamd, you only need either spamassassin
> > only 
> > with spamd or mimedefang with spamassassin not running spamd 
> 
> It's a small server so I can afford to run SA twice, once at the MTA 
> level through mimedefang (which can potentially reject egregious
> spam), 
> and once during delivery via procmail, which invokes spamc.
> 
> 
I'm run ing the XFCE spin of Redhat Linux and found 
vmlinuz-6.6.2-101.fc38.x86_64's display handling so flaky that I
replaced it with vmlinuz-6.6.3-100.fc38.x86_64  six days later. 

vmlinuz-6.6.3-100.fc38.x86_64 is a huge improvement. 

I hopethis is useful feedback. 

MCG

Re: OT - Re: DNFTEC - was My apologies

2023-08-05 Thread Martin Gregorie

On Sat, 2023-08-05 at 14:06 -0500, Grant Taylor via users wrote:
> On 8/5/23 1:51 PM, Kevin A. McGrail wrote:
> > REDACTED is the definition of something I learned decades ago as an
> > energy 
> > creature.
> 
> Is there anything to differentiate an Energy Creature from a Troll?
> 
Yes given that he is, invariably in my experience, at worst abusive and
never less than condescending in his tone and seldom if ever offers
useful and relevant advice, this makes him somebody who is best ignored.
So, add him to your most-unwelcome list if you maintain one:
otherwise create a local rule that drops his messages straight in the
discards bin. 

To put it another way: if you don't like his attitude or find his advice
irrelevant to any problems with unwanted mail that you've received,
write that 'unconditionally discard messages from him' rule, add it to
your private ruleset and be sure to update it if his headers or address
should change in future.

Martin

Re: Sudden surge in spam appearing to come from my email address

2023-07-16 Thread Martin Gregorie

On Sat, 2023-07-15 at 22:04 -0500, Thomas Cameron wrote:
> 
> On 7/14/23 20:30, Grant Taylor via users wrote:
> > On 7/14/23 6:06 PM, Thomas Cameron wrote:
> > > I'm trying to figure out how to block this stuff. Something like
> > > "if 
> > > it appears to come from me, but it's not actually coming from my 
> > > email server," block it.
> > 
> > SPF with hard fail in your own domain /and/ filtering that respects 
> > SPF hard fail will almost certainly stop this like a switch.
> 
> I'd love to do this, but see below. I get TONS of warnings every time
> I 
> send email to lists (even this list) that make me hesitant to do hard
> fails.
> 
Another way to do this is to build either a mail archive or a database
of addresses you've sent mail to and simply add a positive score to mail
from anybody who you've sent mail to: this needs the following bits of
code:

I use PostgreSQL as the databaee and Postfix as my local MTA, which I
needed anyway to distribute internal mail on my local LAN.

Capturing outgoing mail destination addresses: I added a Postfix BCC
directive that sends a copy of outgoing mail to a local mailbox. Once a
day this mailbox is scanned for destination addresses: any new ones are
added to the database.

Scanning incoming mail: I wrote an SA extension to look up sender
addresses of incoming mail in the 'outbound mail address database' and
an SA rule to trigger it: this adds a negative score to mail containing
any FROM address(es) that I've previously sent mail to. The SA extension
is a Perl module that looks up the sender address on all incoming mail. 

Since the OP is a programmer, this should be easily within his
capabilities: 

- he needs to know some Perl to write the SA extension module (the
  O'Reilly Camel book is a well-organised guide to Perl and/or he's
  welcome to a copy of my Perl module) 
.
- Almost any database will do for this job (even a flat text file if he
  uses awk to update it and awk or grep to search it) though a proper
  database such as Postgresql or MariaDB would be faster of the sent
  address list is large, but he needs to know some fairly basic SQL to 
  add addresses to it and to do the lookups.

Martin

Re: BAYES_00 BODY. Negative score?

2023-02-17 Thread Martin Gregorie

On Fri, 2023-02-17 at 10:54 -0500, joe a wrote:

> Could it have been that simple?
> 
If, like myself, you find reference books useful, you may want to get a
copy of "Linux in a Nutshell" - an O'Reilly book.

It tends to assume you know at least one other OS fairly well, is well
organised and concise. I've also found "Debian Reference"

 http://www.debian.org/doc/manuals/debian-reference/

useful for most flavours of Linux (I use Fedora and Raspbian)

Martin

Re: BAYES_00 BODY. Negative score?

2023-02-17 Thread Martin Gregorie

On Thu, 2023-02-16 at 23:32 +0100, hg user wrote:
> root can do anything. a restricted user can't: it's only allowed to do
> what
> others allowed it.
> 
> it also runs with another environment, so it may miss PATHes or @INC
> directories.
> 
You can check this by running 

env | less

from a command line under the appropriate user and making sure that all
the environment variables you expect to see defined are, and have the
values you expect.

Martin

Re: DecodeShortURL fails with postgresql

2023-01-18 Thread Martin Gregorie

On Wed, 2023-01-18 at 22:47 +0100, Benny Pedersen wrote:
> 
> https://github.com/apache/spamassassin/blob/trunk/lib/Mail/SpamAssassin/Plugin/DecodeShortURLs.pm#L594-L601
> 
> only me testing postgresql ?
> 
> 
I'm using it with a self-written Perl data retrieval module that was
tested long ago, so not currently testing, but not yet using SA 4
either.


Martin

Re: Rule Help - not sure what is wrong with my syntax

2023-01-12 Thread Martin Gregorie

On Wed, 2023-01-11 at 16:56 -0800, Loren Wilton wrote:
> Why not do a simple rule rather than inventing some Perl code?
> 
> header TO_SPECIFIC_EMAIL To:addr ~=
> '(?:\bus...@example.com|\bus...@example.com|\bus...@example.com)'
> describe TO_SPECIFIC_EMAIL Mail to a specific email address
> score TO_SPECIFIC_EMAIL -2
> 
> header TO_SPECIFIC_DOMAIN To:addr '(?:'\@example1\.com |
> \@example2\.com | \@example3\.com)'
> describe TO_SPECIFIC_DOMAIN Mail to specific email domain
> score TO_SPECIFIC_DOMAIN -2
> 
>     or possibly
> 
> header TO_SPECIFIC_DOMAIN To:addr '\@(?:example1\.com | example2\.com
> | example3\.com)$'
> 
> 
Agreed, though after a while the regex can get rather long and unwieldy,
but its easy enough to keep the address list as a simple text file (one
address per line) and write a simple program to create a syntactically
correct SA rule from the list. That is easily done with Perl or (better)
an awk script.


Martin

Re: Rule Help - not sure what is wrong with my syntax

2023-01-12 Thread Martin Gregorie

On Wed, 2023-01-11 at 18:39 -0500, Joey J wrote:
> Hello All,
> 
> I created this rule to check for email addresses matching a list to
> get
> added some negative value.
> I also tried it with just domains so it would be more efficient, but I
> can't seem to get them to run.
> Any suggestions?
> 
Use a database to store addresses you accept mail from. Apart from the
database, you'll need a Perl module to let SA look up addresses in the
database. How to populate the database is up to you: but adding
addresses you send mail to and having your SA interface mark these
addresses as not-spam is unlikely to cause false positives. 

My preferred way of populating the database depends on you running a
local copy of Postfix. Configure Postfix to BCC all mail to a mailbox
thats's scanned for outgoing mail and run an overnight process to add
destination addresses from outbound mail to the database and discard the
messages as they're processed.

That said, I use this mechanism to populate a mail archive and a view to
select the addresses I've sent mail to from the archive. 

This approach runs adequately fast and requires minimal maintenance
apart from a weekly backup. 

HTH, Martin

Re: Refused by block lists

2023-01-08 Thread Martin Gregorie

> > On 07.01.23 14:06, joe a wrote:
> > > Pretty sure.  Or, I was.  Ran various tests with unbound running
> > > and 
> > > not running confirmed it was working, at least providing a
> > > response. 
> > 
Thats pretty simple to check, provided you've got Wireshark installed:
Fire it up and tell it to watch for DNS and/or blacklist lookup traffic
on the appropriate ports.

Then feed known spam to SA. Wireshark will show you if spam is causing
external lookup requests to be generated, where they are being sent, and
what replies are being received 

Martin

Re: awl postgresql

2023-01-03 Thread Martin Gregorie

On Wed, 2023-01-04 at 00:43 +0100, Benny Pedersen wrote:
> 
> i have dumped all i have in posgres without data so only structure is 
> here
> 
> https://usercontent.irccloud-cdn.com/file/WJmDq7xc/spamassassin_dump_tables%20only.txt
> 
> dont know what package means on gentoo, its stable versions i use,
> just 
> not latest stable
> 
> if more info is needed i can provide it

There's enough detail there to make an informed guess about what could
be wrong. 

The tables public.awl and public.txrep contain identical sets of column
names, so a reference to any of these column names will be rejected
unless it is qualified by referring to it as

table_name.column_name 

Without specifying the fully qualified name, e.g public.awl.email, the
database engine can't know which table contains the column that the
script its executing is meant to use.

Martin

Re: awl postgresql

2023-01-03 Thread Martin Gregorie

On Wed, 2023-01-04 at 10:24 +1300, Sidney Markowitz wrote:
> Benny Pedersen wrote on 4/01/23 3:19 am:
> 
> If anyone else reading this is using 4.0.0 and postgres for AWL, are
> you 
> seeing or not seeing this problem?
> 
I use Postgresql, though not with SA.

I agree with your suggestion, but it would also  be useful to see the
definition of the table. 'psql', the postgres interactive command tool,
can be used to show this info: the psql command 

"\d tablename" 

displays columns in the 'tablename' table as well as other relevant
information about it. If Postgres was installed from a standard package,
the psql interactive program (and its manpage) should also have been 
installed. 

Martin

Re: Spam DKIM signed by Paypal coming from their Microsoft Tenant?

2022-11-14 Thread Martin Gregorie

On Mon, 2022-11-14 at 15:14 -0500, Shawn Iverson wrote:
> How do I stop this?  paypal.com is in the default DKIM whitelist!
> 
I'd treat it as spam because the domain name in the From header doesn't
match the domain name in the Message-ID header. 

That works for me, with virtually no false mail rejections.

Martin

Re: subscribe to blacklist for domains

2022-08-23 Thread Martin Gregorie

On Tue, 2022-08-23 at 12:11 +0200, Vincent Lefevre wrote:
> On 2022-08-18 19:40:33 +0100, Martin Gregorie wrote:
> > - if the reverse lookup fails, or the domain it retrieved does not
> > match the one in the From address, send a bare 550 REJECT because
> > the failed
> > reverse lookup implies the sending domain is a forgery. 
> 
> It doesn't. There are IPs that host several domains, e.g. in case
> of shared web hosting. For instance, I have 2 domains vinc17.net
> and vinc17.org, and both are handled by the same machine, thus
> with a single IP address. So, necessarily, the reverse lookup will
> not match for one of these domains.
> 
Fair enough: I did say that some of this was off the top pf my head at
the end of a longish day.

Would doing the lookup trick on the URL in the Message-ID header be any
more reliable?

Martin

Re: subscribe to blacklist for domains

2022-08-18 Thread Martin Gregorie

On Thu, 2022-08-18 at 12:11 -0400, Kris Deugau wrote:
> Mmm.  So how would you, as sender or sender's mail provider, 
> troubleshoot a message rejected with "550 Too spammy"?  I have seen 
> several rejections that were equally clear and to the point, without 
> divulging any particular detail about what, exactly, was
> objectionable.
> 
> What details should the receiving system include in that 550, such
> that legitimate senders can adjust or fix something in their message,
> that spammers can't also abuse to slip their glop through that filter
> as well?

The only reasonably foolproof way I can think of gently telling friendly
senders why their message is being treated as spam while not helping
spammers to send more believable and/or less obvious spam requires
something line the following:

You should maintain some form of mail archive. It needn't be all that
big or complex: for this purpose all it needs to contain is a list of
valid addresses that you have previously sent mail to. If you keep this
information set then, as an initial guess the spam response logic can be
as simple as:

- extract the domain name from the incoming mail's From header and use 
  it to find the domain IP. Use that IP to do a reverse domain lookup.

- if the reverse lookup fails, or the domain it retrieved does not match
  the one in the From address, send a bare 550 REJECT because the failed
  reverse lookup implies the sending domain is a forgery. 

  This is a manual check I often use if I suspect a message of being
  spam and get curious about it for some reason or other. FWIW my next
  step is to use Lynx to see what the associated website (if any) is
  associated with the domain - an amazing amount of spam sources have an
  associated website - and its almost always an off-the-peg generic
  page. I use Lynx for this because it is a text-only browser that can
  also disable all cookie handling, so is a relatively safe way of
  looking at possibly dodgy websites.

- if the mail archive shows that we've previously sent mail to the 
  sender of this message, either send a bounce or a 550 rejection
  together with a polite explanation of why you think their message
  might be spam.

- if mail has NOT previously been sent to the sender of this message,
  send a bare 550 REJECT because (a) they may well be a spammer and (b)
  you don't know them and so don't (yet) have any need to be nice to
  them. 

This is pretty much off the top of my pointy head, after a warmish day
spent driving round part of SE UK, so probably obvious flaws, but this
would be my starting point if I was planning to reject spam and similar
dross rather than simply tossing it in the wastebasket and it does at
least suggest a way of not telling a spammer why you dejected his junk.

Martin

Re: subscribe to blacklist for domains

2022-08-14 Thread Martin Gregorie

On Sun, 2022-08-14 at 11:39 +1000, Noel Butler wrote:
> On 14/08/2022 02:38, Martin Gregorie wrote:
> 
> > 3) It would be rather trivial to return spam to sender with a
> > suitable
> 
> WTF, that has been a terrible idea since the 90s, given most spam is 
> spoofed, the end result of this will be your mail server getting the 
> poor reputation as source of backscatter and going into blacklists :)
> 
greed - I don't do that, but almost as long as I've been on this list
there have been advocates of it. As I said, I thought about it, but the
effort of writing a filter to determine what, if anything should be
bounced or rejected, has never seemed worth the effort for such a low
volume mail used as myself.

Martin

Re: subscribe to blacklist for domains

2022-08-13 Thread Martin Gregorie

On Sat, 2022-08-13 at 14:05 -0400, joe a wrote:
> To add my comment, returning SPAM, assuming it even reaches the
> original sender, may serve only to assure them of the effectiveness of
> their campaign to reach valid addresses. In effect "helping" them.
> 
Agreed - I've occasionally thought about returning spam, but never found
a good reason to do it. 

Here's my reinforcement for doing nothing: a year or two back, I somehow
got added to some mailing list belonging to a Florida Hospital (as
useless a thing for somebody to do as can be imagined seeing I don't
live in the USA, let alone Florida, so probably a spammer infected their
mailing list or stole their list address). As usual, I added that
address to my personal blacklist: problem solved. 

However, I was feeling helpful that day, so also emailed their 'abuse'
address to let them know they had a problem. Didn't bounce, so they must
have got it. 

Did they do anything? Apparently not. I still get their spam, but at
least my system bins it automatically.

Martin

Re: subscribe to blacklist for domains

2022-08-13 Thread Martin Gregorie

On Sat, 2022-08-13 at 17:46 +0200, Reindl Harald wrote:
> and the main downside is that you can't REJECT clear spam and if "This
> puts spam into a holding area, where A cron job deletes it after a
> week" nobody knows in case of false positives
>
1) OF COURSE I have a daily cron job that reports any mail that is
   treated as spam and added to the quarantine area. There's no point in
   having a quarantine area without reporting what goes into it. Equally
   obviously, not being able to retrieve and read mis-identified spam
   would be a really stupid omissjon: thats what a 7 day quarantine
   period is for.

2) There's no mandatory need to REJECT spam. It has always been up to
   the recipient to decide whether to return it to the sender or not.

3) It would be rather trivial to return spam to sender with a suitable
   admonishment but I decided that its not worth my time to write such 
   a discriminator and maintain yet another set of rules about what gets
   quarantined and what gets returned: better to quarantine it so
   it can be analysed with the mk 1 eyeball.

Martin

Re: subscribe to blacklist for domains

2022-08-13 Thread Martin Gregorie

On Sat, 2022-08-13 at 10:21 -0400, joe a wrote:
> This is a low volume system consisting of postfix, SA, clamav and 
> fetchmail.
> 
> The mailserver (postfix) is not exposed to the internet, mail traffic
> is sent to it by "fetchmail", which itself goes out to several
> providers where mail accounts reside.
> 
I've been running a very similar small volume mail system for some time
and am happy with it. My mail handling pipeline has these steps:

- I use getmail, running on my house server, rather than fetchmail, to
  fetch incoming mail. I don't read or write mail on the house server.
- All incoming mail is then passed to SA, which has quite a lot of 
  custom rules
- SA output is passed to a privately-written filter. 
  This puts spam into a holding area, where A cron job deletes it 
  after a week. 
  It calls the Postfix sendmail to pass ham to Postfix for delivery.
- I run Dovecot to pass incoming mail to systems on my local LAN:
- All mail sent on my LAN is sent via the Postfix instance on the house
  server, so all system messages get forwarded to the laptop I use as my
  main terminal device
- The house server sends outgoing mail directly to my ISP.

The main benefits of this setup are:
- since externally sourced mail is fetched from my mailbox on my ISP,
  my firewall can be set to refuse all external connection requests.
- all non-spam mail (both external and from other systems on my LAN)
  ends up in the mailbox on this 'ere laptop
- implementing a mail archive only needs a storage mechanism and
  configuring Postfix on the house server to BCC everything to it.

Thats probably more than you wanted to know, but hopefully will show you
that this sort of mail handling system is both easy to set up and robust
in operation. Its been running with essentially no changes for over ten
years now.
   
Martin

Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-11 Thread Martin Gregorie

On Tue, 2022-05-10 at 18:19 -0600, Philip Prindeville wrote:
> I can't think of a single way to match each header, and then test for
> any of them not matching the pattern...
> 
> 
I had in mind a subrule that triggers on valid header names, combined
with a meta rule that inverts the subrule result. At least, that's what
I'd try as a starting point.

Martin

Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Martin Gregorie

On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
> 
> You're correct that they're different in every message received.
> 
So write a rule that fires on any header name that *doesn't* match
anything in the list of legit headers as defined in the relevant RFCs.

Of course you may need to extend that list to include some extras, such
as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.

Martin

Re: Running spamassassin only with specific rules

2022-04-22 Thread Martin Gregorie

On Fri, 2022-04-22 at 09:20 -0400, Michael Grant wrote:
> Is there some way to run spamassassin with only a specific set of
> rules and scores?
> 
If I'm trying to target specific specific sorts of spam I write rules
that sort of follow these guidelines:

- their rule names all start with my initials followed by an underscore.
  followed by something specific, e.g XXX_FAKE_INVOICE. Any subrules
  append a number to this name: XXX_FAKE_INVOICE2

- if a subrule will always be part of a more complex rule, i.e. linked
  in with a meta-rule it will initially be named as described and only,
  when debugged and working will its name be changed to, say,
  __XXX_FAKE_INVOICE2 
  to stop subriule names from cluttering the header area of processed
  messages.

- these rules don't reference any standard rules 

The result of the above is that it doesn't matter whether other rules
also run because I can see exactly which part(s) of my rules are firing
and know they won't be affected by any other rules because there are no
references to any standard rules or (usually) to my other self-developed
rules: naming rules, if done carefully, is as good a way as any to
isolate your own rules from the standard rule set and/or any others
you've found or been given. 

I do all rule development on a separate machine, which also has SA
installed. This is configured so it only runs when triggered by a shell
script. This starts SA, pipes a set of test messages into it, and stops
SA when all test messages have been run. SA's output is sent to stdout
so it can be inspected using 'less', filtered with grep to only show
output from my rules or however else I want to handle it to make it more
readable.

When I'm happy with a new rules its gets put live by ftping the .cf file
containing it to the live machine's repository and restarting the live
SA daemon to pick up the new rule(s). Last, but not least, all my
private rules are put under version control in a git repository.

HTH
Martin

Re: Linting of local.cf

2022-04-16 Thread Martin Gregorie

On Sat, 2022-04-16 at 05:30 +0200, Benny Pedersen wrote:
> On 2022-04-16 00:35, J Doe wrote:
> 
> > That's an interesting point.  I guess the use case I was thinking of
> > is if I added an address or domain for a particularly egregious
> > spammer, but made a typo in the SA syntax, I would want to know
> > about it on load so that it didn't continue to slip through.
> > 
> > On the other hand, as Reindl notes, I can adjust the startup script
> > myself or have a wrapper for it.
> 
Consider doing something similar to what I do: 

- I don't test SA-issued rules updates because they've been verified
  before being issued and I've never found errors in them.

- I have a second 'development' SA install thats on a different computer
  to my main MTA and associated SA. This computer also holds the master
  copy of my local rules and a collection of spam that is used as test
  data for these rules. Whenever local rules are added or modified they
  are first checked to be error free and then run against the relevant
  test data to see if they do what they're supposed to do. If, and only
  if, they pass both lint and functional checks, my local rule set is
  uploaded to my live SA installation. These various operations are
  carried out by bash scripts. Additionally, I have a script that can
  run my local rule set against my entire library of test messages.

Martin

Re: sub-test syntax

2022-04-04 Thread Martin Gregorie

On Mon, 2022-04-04 at 01:45 +0200, Matija Nalis wrote:
> On Mon, Apr 04, 2022 at 12:19:23AM +0100, Martin Gregorie wrote:
> > For instance, I whitelist any email sender who I've previously sent
> > mail
> > to. To do this I maintain am email archive held in a PostgreSQL 
> > database and wrote an SA plugin that searches the archive for any
> > message(s) I've previously sent to the sender of the message being
> > checked: if I've sent mail to them they get whitelisted.    
> 
> That sounds interesting, is it published somewhere?
> 
https://www.libelle-systems.com/mailarchive/

The mail archive schema may suit you or not, but that's not very
important since the SA plugin uses an SQL view to check whether I've
ever sent mail to the sender of a message I've received: if you don't
want a mail archive you could use a single table database with
the table containing same columns and indices as my SQL View.

More important points: 
- I use Postscript's BCC facility to send copies of every mail I send
  or receive to a mail queue and a daily cron job to load the contents
  of this queue into the database.

- The loader program is written in Java and will need modification (read
  simplification) if you don't want a mail archive. In this case I
  assume that you'd replace the SQL View with an equivalent table as
  described above and rewrite the MAloader program so that it only adds
  unrecorded outbound mail addresses to the new table.
 
- The loader and a couple of interactive programs are also written in
  Java. These:
  - search the database and optionally send copies of archived mail to
a nominated local MUA
  - remove unwanted mail from the database

  These are not needed if you don't want a mail archive.

- I retrieve mail from my inbox at my ISP using getmail plus a small
  C program which passes mail to SA, accepts the mail as returned by MA
  and uses the spam score to decide whether to quarentine it or pass it
  to Postfix for delivery. You don't need to use getmail if you let your
  local MTA retrieve mail from your ISP, but you'd still want to run a
  local MTA, preferably Postfix, so it can pass incoming mail though SA,
  quarentine of discard spam and you can use its BCC facility to send
  copies of incoming and outgoing mail to the loader's input queue.

- Postfix delivers outgoing mail directly to my ISP's outbox.
 
Martin

Re: sub-test syntax

2022-04-03 Thread Martin Gregorie

On Mon, 2022-04-04 at 00:13 +0200, Matija Nalis wrote:
> On Sun, Apr 03, 2022 at 10:06:51AM +0100, Niamh Holding wrote:
> > Hello Matija,
> > Saturday, April 2, 2022, 7:12:42 PM, you wrote:
> > 
> > MN> grep -r check_rbl_sub /var/lib/spamassassin
> > MN> for examples of what's possible and how (e.g. 25_dnswl.cf)
> > 
> > Looking there I see nothing equivalent to alternates like in
> > ordinary regexes (2|6) for 2 or 6
> 
> It shows how command must look to be able to correctly use regexes
> there (instead of plain string).
> 
> "grep" command above should've returned more examples for you...
> 
Using 'grep -P ' is better because it forces grep to use Perl regex
notation - SA is written in Perl so uses Perl regular expression (regex)
syntax.

If you want to write your own SA rules its also a good idea to have a
copy of the 'Camel Book' ("Programing PERL" by Wall, Christiansen and
Orwant, pub. O'Reilly) because SA is written in Perl. his means it uses
the Perl dialect of regex expressions, and the book will also help a lot
if/when you want to write your own SA plugins.

For instance, I whitelist any email sender who I've previously sent mail
to. To do this I maintain am email archive held in a PostgreSQL 
database and wrote an SA plugin that searches the archive for any
message(s) I've previously sent to the sender of the message being
checked: if I've sent mail to them they get whitelisted.

> Then you can use similar principle to look for any other things you
> want to accomplish in the future, simply by looking how others have
> used it. That's why I provided it that way instead of simple
> copy/pasting the
> final result.
> 
Good advice.

Martin

Re: spam declared mail - contentless - lost?

2022-04-02 Thread Martin Gregorie

On Sat, 2022-04-02 at 16:42 +0200, mau...@gmx.ch wrote:
> Hello
> 
> i have mails that are signed as [SPAM] from Spamassassin 3.4.6, please
> it's possible to catch the input from this mail, or it's this lost?
> 
SpamAssassin [SA] only adds headers to the message. One of these is
always the X-Spam-Status header which contains the spam score, the score
required to mark a message as spam and a list of the SA rules that fired
on that message.

What happens to the message once SA has added this header depends
entirely on your mail subsystem. Messages with a score less than the
'required' score should be delivered to the 'To' Address.

Messages with a score equal to or greater than the 'required' are not
delivered to the 'To' address. What is done with them depends entirely
on your mail delivery subsystem. Usual things done with them are (best
ideas first:
- Hold them in quarantine for 'x' days before deleting them
- Delete them immediately
- Return them to abuse@senderdomainname with a subject of 
  "Spam from senderdomain"
- Return them to sender, with a subject of "Rejected spam"
  
> At this time me setup are so, that all mails that are declared as
> Spam, are contentless.
> 
So, what *is* in a contentless message? Is it blank, just mail headers,
or what?

Since you haven't described you expect what your mail system to do with
a 'contentless message', how it recognises which messages are
contentless, or what you expect it to do with one, nobody on this list 
can't say what, if anything, is wrong with your mail system. 

Martin

Re: using spamassassin to classify spam

2022-03-25 Thread Martin Gregorie

On Thu, 2022-03-24 at 18:34 -0600, Grant Taylor wrote:
> On 3/24/22 5:00 PM, Michael Grant wrote:
> > List-Unsubscribe: 
> > 
> > 
> > I want to extract the mumble.aidemxwzlwt.bwbibibi.edu and run it 
> > through AskDNS and if I get an NXDOMAIN, I want to score it.
> 
> Remember, there are historic mechanisms for an MX for parent domains
> to 
> handle child domains even if the child domain in question doesn't have
> it's own MX record.
> 
> I don't recall the current state of support for this, so don't rely on
> it without testing it.
> 
> > Is it possible to do this within a cf file?
> 
Yes. You'll need to write a Perl plugin and a rule to trigger it. The
rule should extract the domain name from the 'mailto:' string and pass
it to the Perl plugin, which in turn calls AskDNS with the string as a
parameter and either returns a positive score or zero depending on
whether AskDNS returned NXDOMAIN or not.

Its all simple enough and requires only a few lines pf Perl: I haven't
needed a plugin to do what you want, but did write one that searches a
PostgreSQL database and whitelists e-mail from anybody that I've
previously sent mail to. 

Get a copy of the 'Camel' book of you don't have one ("Programming Perl"
by Wall, Chrtiansen & Orwant, pub: O'Reilly). 

The requirements for writing plugins are on the SA website. 

Martin

Re: Up tick in missed SPAM from co domain

2022-02-03 Thread Martin Gregorie

On Thu, 2022-02-03 at 10:50 -0500, joea- lists wrote:
> SA version 3.4.5
> 
> Since yesterday 2/2/22 (gasp!) . . . I've noticed an up tick in missed
> SPAM from .co domain.  Though obvious SPAM
> weight loss, phish, "personals", they are scoring rather low.   
> 
> Added a custom rule for that domain, which should deal with it, but
> wondering if I missed some changes that 
> might cause this?
> 
IMO that's too specific: it will deal with spam from that address, but
each new address needs its own rule. I only use that type of rule to
ding endless sales messages from companies that I bought one item from
and who are unlikely to ever sell me anything else. 

IMO its worth scanning though spam looking for odd phrases or spellings
and making rules to add points for these features. Done carefully, you
can end up with rules that trap that type of spam no matter where it
comes from, i.e. pron, "girls looking for men", banking  scams, etc.

Martin


> joe a.
>

Re: Managing long welcome_senders list

2021-12-03 Thread Martin Gregorie

For Dominic Raferd:

Another approach also works for me: if you can automatically capture the
addresses you've sent mail to, these addresses make a perfect, self-
maintaining whitelist.
 
If you're running Postfix then you can use its automatic BCC option to
feed a copy of all mail, including outbound messages, whatever process
you use to build a list of your mail recipients. Other MTAs probably
have a similar ability, but I don't use them, so can't comment further. 

A database makes a convenient place to keep the your correspondent list
because discarding duplicate addresses then becomes a built-in facility
and writing an SA plugin plus  associated rule to interrogate the list
and add negative points to the message is simple.

My correspondent list is part of my mail archive, which is held as a
PostgreSQL database. The associated functions I use to maintain and
interrogate the correspondent list are:

a) a BCC directive added to the Postfix configuration or the equivalent
   if you use a different MTA
b) a Java application run each night to load the previous day's mail,
   both received and sent, into the database
c) an SQL view that selects any message(s) in the archive that were sent
   to the address being checked
d) a Perl plugin to execute the view using the message's sender as its
   search key and return TRUE if any messages were selected
e) an SA rule to trigger the Perl plugin and add a negative score
   if the Perl plugin returns TRUE

You'd need code to implement all five functions, but if you store your
correspondent address list as a sorted text file, then all the code
would be much simplified: 

- 'b' could be a Perl or awk script run as an additional 'logwatch'
  report that scans the previous day's part of the mail log, adds any
  new addresses to the sorted list

- 'c' and 'd' could be combined as a single Perl plugin. 

Martin

Re: Managing long welcome_senders list

2021-12-02 Thread Martin Gregorie

On Thu, 2021-12-02 at 13:42 +, Dominic Raferd wrote:
> I have a score-reducing algorithm for SA based on known 'good' senders. 
>  From a simple one-address-per-line file (which can easily be manually
> or automatically edited) is built a local_welcoming.cf file which is 
> used by SA - with lines like this:
> 
> score LOCAL_WELCOMING_4 -4
> header LOCAL_WELCOMING_4 From =~ 
> /(\@myfriend\.com|jennifer_smith\@btinternet\.com|\
> fred321@gmail\.com)>?\s*$/i
> 
I ran into this problem quite some time ago and wrote 'portmanteau', a
tool easing the maintenance of Spamassassin rules which consist of very
large lists of alternates. It does this by storing the rule definition
in a form that's much easier to edit than it would be if it was written
as a single very long line or split into a set of subrules plus a meta
rule to combine them. It is a bash shell script that uses a gawk script
to do the heavy lifting.

The elements of each SA rule it constructs are held in an easily edited
'.def' file which, among other things, expects each regex in a list of
alternates to be on a separate line. Normallly, you'd hold the set of
rule definitions in the same directory. Running the 'portmanteau' script
constructs one valid SA rule, which may be built from subrules and
metas, from each .def file in thr directory and writes the complete set
of generated rules to a single .cf file. The rule building process is
fast enough that its not worthwhile building them separately.

The rule constructor is an awk script, so there's nothing exotic in it
and no external dependencies, always assuming yo have awk or gawk
installed.

I use 'portmanteau' rules for everything from maintaining personal
blacklists to constructing complex rules that do things like recognising
toxic attachment types or sets of phrases that, if found in several
headers and/or body text that together identify specific spam types and
score the message accordingly.

You can find the 'portmanteau' tool here:
https://www.libelle-systems.com/free/portmanteau/portmanteau.tgz

Martin

Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Martin Gregorie

On Tue, 2021-11-16 at 08:33 -0500, Bill Cole wrote:
> 
> Worth noting: locate & updatedb aren't always installed.
> 
Fair comment: they're a standard part of Fedora. IIRC they are also part
of the RaspberryPi OS distro, so are likely to be included in Debian and
most of its clones.

But: how many "intro to Linux" books mention 'updatedb' or 'locate'?
They're not mentioned in 'UNIX in a Nutshell' (so maybe not in 'Linux in
a Nutshell' either) or in the 'RaspberryPi User Guide', though they are
in Michael Kopfler's 1999 Linux sysadmin book.  

You don't see them mentioned much in newsgroups either, so I'm left
wondering how many Linux users who either taught themselves to use Linux
or picked it up at work from a colleague would ever know about 'locate'
and 'updatedb'?  

For that matter how many know about 'apropos'? And, even if they do,
they may not discover 'locate' because 'apropos search' doesn't find
either 'updatedb' or 'locate'. You have to enter 'apropos find' to
discover that 'locate' exists, and even then you could get side tracked
into trying to use the much more complex 'find' utility.  

Martin

Re: MIME_BASE64_TEXT only on us-ascii

2021-11-16 Thread Martin Gregorie

On Tue, 2021-11-16 at 11:32 +0100, Philipp Ewald wrote:
> This is correct. But why is us-ascii requeired for this rule? Are
> spammer only in US?
> 
No, its because the base character set for e-mail bodies is USASCII. 

Base64 encoding is a way of making sure that attachments using other
charsets (UTF8, and those using 16 bit encoding) will look just like
USASCII attachments to mail-handling programs, etc and not cause those
programs to have reject the mail message. As far as I know it has no
other common, legitimate use, but it does have the side effect of making
anything thats base 64-encoded unreadable.

So, you can see that the ONLY effect of using base64 encoding on an
attachment containing usascii text is to make it unreadable. This is why
spammers use it: they've worked out that SA will spot and score
malicious URLs, shortners, etc. So, some spammers think that using
base64 encoding will hide those bad URLs from SA, which is quite true.
However their tiny minds don't see that using base64 encoding on a
usascii attachment is a fairly reliable spam indicator all by itself.

Martin

Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Martin Gregorie

On Mon, 2021-11-15 at 17:12 -0700, Philip Prindeville wrote:
> 
> 
> > On Nov 15, 2021, at 5:06 PM, Greg Troxel  wrote:
> > 
> > 
> > Philip Prindeville  writes:
> > 
> > > Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
> > > 
> > > Why can't I even find the rule?
> > 
try "locate txrep"

On my Fedora system 'locate' says that TxRep is a plugin, enabled by
installing:  /usr/share/spamassassin/60_txrep.cf

and, using "locate" again, that the plugin's manpage is  
/usr/share/man/man3/Mail::SpamAssassin::Plugin::TxRep.3pm.gz

So, "man 3 Mail::SpamAssassin::Plugin::TxRep" describes the TxRep plugin
and 'locate' says it is installed as
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm

Of course, other Linux distros may put it somewhere else, so use
'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
again. 

Not trying to teach you to suck eggs, but, incredible as it may sound,
there are still some people who don't know about the 'locate' command.
 
Martin

>

Re: Fw: spam from gmail.com

2021-11-08 Thread Martin Gregorie

On Mon, 2021-11-08 at 18:27 +, Rupert Gallagher wrote:
> Spammers are using gmail.com. Congratulations to Google for their fine
> work...
> 
The more 'enterprising' ones are apparently sex come-ons, but contain
links to known-malicious URL shorteners.

Martin

Re: Decoding Google URL redirections and check VS URI Blacklists

2021-11-02 Thread Martin Gregorie

On Tue, 2021-11-02 at 09:52 +0100, Benoit Panizzon wrote:
> Hi SA Community
> 
You can find out quite a lot about a spamming site with a few common
commandline tools:

- 'ping' tells you of the hostname part of the UREL is valid
- 'host hostname' should get the sender's IP
- 'host ip'   IOW a reverse host lookup, tells yo if the first
  sender address was an alias
- 'lynx hostname' lets you see if there's a website there, which is
  often useful (when prompted to accept cookies hit 
  'V' to never accept them. This is IMO safer then
  using Firefox etc because lynx shows all pages as
  plaintext.

Generally using those in the sequence I've listed them tells me enough
to decide whether to treat the site as a spam source.

In this case, either feed that URL to your favourite blacklist or write
a local rule that fires if that url you spotted is in body text.

I've recently started to see regular Google gmail spam. This looks like
boring sex spam, but that's probably a disguise since it contains
attachments with suspicious (i.e. executable) file types. Fortunately, a
more complex rule, built from a set of subrules, that I wrote years ago
to trap mail with this sort of attachment is catching them now.

Martin

Re: Correct KAM.cf location?

2021-10-20 Thread Martin Gregorie

On Wed, 2021-10-20 at 11:50 -0500, Jerry Malcolm wrote:
> is working as it should.  I'm pretty confident I've got the basic SA 
> function working.  But along with the bayes issue from a couple of posts
> back, I can't seem to make the KAM.cf file get involved.  In previous
> installations, I would see a lot of KAM rules showing up in the spam
> reports on emails.  I also have written some rules on my own and put I'm
> not seeing any of my rules get hit.
> 
Have you tried starting spamd with the debugging option 
( -D or --debug ) set?

The output will be quite large, but it will at least show you which
configuration files are being opened and, in the case of any that are
not being opened, where its looking for them.

> I created a rule that triggers if the subject or sender is my company 
> name.  I sent an email from an outside email address and specifically 
> added this name to the subject (and it's also in my sender email 
> address).  Nothing in the spam report.
>
That sounds like the .cf file containing your rule is not being read.
Debugging output should tell you why.

FWIW I found it helpful to have a secondary copy of SA installed on
another system but using the same set of file names etc as the
'production' version. I run that by starting spamd and feeding it test
messages by running something like "spamc

Re: Spamc - connection refused

2021-09-28 Thread Martin Gregorie

On Tue, 2021-09-28 at 15:30 +0200, mau...@gmx.ch wrote:
> Hello
> 
> never found the solutions for this..
> 
The error messages aren't a lot of use without also knowing:

- what arguments are you using on the spamc command line?
- where is the spamd instance you're trying to connect to, 
  i.e. is it on localhost or somewhere else??
- what port is spamd listening on?

I run spamc and spamd on the same machine (i.e. spamd is on localhost)
and default the spamc arguments that describe how it connects to spamd,
so presumably you're doing something different.

Martin

Re: Disabling autolearn on given rule

2021-09-22 Thread Martin Gregorie

On Tue, 2021-09-21 at 18:57 -0700, Loren Wilton wrote:
> 
> Well, from the few I've seen, they all seem to have a relatively
> constant structure. Someone pointed you to a plugin that is at least
> dealing in this having a better suggestion.
> 
> While I wrote a little Perl a decade ago I've forgotten many of the 
> pecularities, but there are some good web sites out there, and there
> is one of the animal books on the subject. Perl is a bit pecular in
> syntax and function compared to the C/C++ I did much of my career, but
> I didn't have much trouble picking up enough to make some local SA
> hacks long ago, so if you can program in most anything it probably
> won't be too much trouble.
> 
What Loren said. The book you need in "The Camel Book": Its an O'Reilly
publication, "Programming Perl by Larry Wall, Tom Christiansen & Jon
Orwant  - my copy is the 3rd edidtin, dated 2000, so there are probably
more recent editions. Its well written and organised and, equally
important, has a whole chapter on Perl regular expressions, which are
not the same as,e.g C or Java regexes.

I also know very little perl, but this book, together with an example SA
plugin, were enough to let me write an SA plugin for doing lookups on a
PostgreSQL database containing my mail archive I use this plugin to
whitelist mail from anywhere I've previously sent mail to).

Martin

Re: Score for certain spam

2021-08-17 Thread Martin Gregorie

On Tue, 2021-08-17 at 18:03 +0200, David Bürgin wrote:
> In your experience, what is a good ‘certain spam’ threshold? By that I
> mean the score above which messages are virtually always spam, no
> false positives.
> 
I pushed it one notch, to 6.0, but:

(a) I've accumulated a fair collection of private rules which are
specific to my mail stream

(b) I have a private mail archive, stored in a PostgreSQL database,
and an SA plugin which whitelists any sender who is recorded in my
archive as somebody that I've previously sent mail to.

(c) Spam is quarantined as it arrives.
Ham is delivered via Postfix + Dovecot and also queued for archiving

(d) spam gets quarantined for 7 days before being discarded

(e) An overnight cronjob loads ham thats queued for archiving into the
mail archive. It also expires & deletes week-old quarentined spam,
and I added a report to logwatch that lists new spam, so I know its
arrived and can be retrieved from quarentine if I decide I should
see it.

I've listed these steps and associated conditions in case any are useful
to you. This has all been up and running since 2007, so its tolerably
well tested.

Martin

Re: Question about whitelisting of naadac.org

2021-08-12 Thread Martin Gregorie

On Wed, 2021-08-11 at 20:43 -0700, John Hardin wrote:
> As Kenneth said, contact Spamhaus regarding why that domain is listed.
> 
> 
I took a look at it with a text-mode web browser, Lynx, thats too simple
to try to process nastys and with all cookies disabled. It looked more
than slightly suspect to me - AFAICT entries in its top-level menu link
only to a recursive chain of identical top-level menus.

It reminded me of nothing so much as the mazes in Colossal Cavern and
their 'little twisty passages which all look the same' - and built the
same way too!

My bottom line take - a useless URL that deserves to be listed.

Martin

Re: CHAOS: v1.2.2: Of Documentation

2021-07-23 Thread Martin Gregorie

On Fri, 2021-07-23 at 19:49 +1000, Noel Butler wrote:
> I've still yet to see a list post explaining what this thing does
> so no he has not answered all questions about it, the most common sense
> thing of all time is if you advertise your wares, you at least tell
> people WTF it does, you don't send them to some web site to find out
> (which as some posters have indicated apparently does not even tell
> you).
> 

Yes, that is the same problem I have.

I understand that CHAOS generates rules and has fancy ways of setting
their scores but I've yet to understand:

- why it was developed in the first place, i.e. what problem(s) does it
  solve that manually written rules fail to address?

- what are its design principles?

- what do its generated rules do that that can't be done with manually
  written rules?

- how, if at all, does it test the rules it writes and what does it do
  with rules that either don't work as intended or hit ham instead of
  spam? 

- does it accept human input about what is spam and what is ham and if
  so, how is this input provided, maintained, and stored for future
  reference? 

  IOW: 
  - is it working entirely from messages found in the incoming mail
stream?
  - what about the outbound mail stream?
  - does it use mail archives or spam collections to test the rules it
generates

Martin

Re: number in sender name

2021-07-11 Thread Martin Gregorie

You're right: my copy of the Camel book is the Third edition, from 2000.
The cover has a corner tagged with 'revised and updated'. The 4th
edition was released on 2012. 

It covers Perl 5.6 with the title changed to 
"Programming Perl: Unmatched power for text processing and scripting" 

and also covers:

New keywords and syntax
I/O layers and encodings
New backslash escapes
Unicode 6.0
Unicode grapheme clusters and properties
Named captures in regexes
Recursive and grammatical patterns
Expanded coverage of CPAN
Current best practices

so is probably worth having if you write a lot of Perl code. Disclosure:
I write mostly C and Java with a little bash and awk on the side, so
value having a comprehensive book like the Camel to hand if I need it.

BTW, the online regex development page URLs I gave were working as
expected at the time I wrote that note.

Martin


On Sun, 2021-07-11 at 11:17 -0400, Kevin A. McGrail wrote:
> 
> Martin Gregorie wrote:
> > If you have a copy of "The Camel Book", otherwise known as
> > "Programming
> > Perl" by Larry Wall, Tom Christiansen & John Orwant"  pub. O'reilly,
> > or
> > know somebody who has a copy, have a read of Chapter 5 'Pattern
> > Matching' which contains about the clearest explanation of how regexes
> > work and how to write them that I've seen anywhere - and better yet,
> > it
> > describes Perl regexes, which is what SA uses.
> 
> Martin,
> 
> My version of The Camel, admittedly the 2nd edition from 1996, has 
> chapter 5 as Packages, Modules, and Object Classes.  Section 2.4, 
> however, is a detailed section on Regular Expressions.  Is that the one 
> you meant because I'd like to reference it correctly along with your 
> other great advice.  Is it Chapter 5 in another edition?
> 
> Regards,
> KAM
>

Re: number in sender name

2021-07-10 Thread Martin Gregorie

Not a direct reply, but:

If you have a copy of "The Camel Book", otherwise known as "Programming
Perl" by Larry Wall, Tom Christiansen & John Orwant"  pub. O'reilly, or
know somebody who has a copy, have a read of Chapter 5 'Pattern
Matching' which contains about the clearest explanation of how regexes
work and how to write them that I've seen anywhere - and better yet, it
describes Perl regexes, which is what SA uses. 

FWIW there are also C and Java regex dialects, which are not at all
useful in SA rules.

https://www.regular-expressions.info/ is also useful.

Its worth knowing about these too:
https://www.regexplanet.com/advanced/java/index.html
https://regex101.com/

They are both pages for testing regexes: both let you type in a regex
plus test strings to check whether the regex does what you expect - or
not!

Martin

Re: Office phish

2021-07-05 Thread Martin Gregorie

On Tue, 2021-07-06 at 00:16 +0200, Benny Pedersen wrote:
> On 2021-07-05 23:45, RW wrote:
> 
> > > 
> 
> https://www.w3resource.com/javascript/introduction/html-documents.php
> 
> embeeded javascript is possible
>
Yes, but it may well depend on how the e-mail was assembled.

A message Cut from a web page formatted with both
.. and ... formatting and displayed using Brave
to construct a new e-mail written, sent and received using Evolution
with the message composer set to use plaintext gave a single block of
body text that didn't contain any HTML formatting.

However, with composer preferences set to use HTML formatting, Evolution
restructured the HTML that was cut and pasted in as an attachment with 

Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="attachment.html"
Content-Type: text/html; charset="utf-8"; name="attachment.html"

as a preamble. and all the HTML formatting pretty much rewritten from
scratch and formatted as a block rather that keeping the original page's
indent structure. The plaintext section again had all HTML formatting
stripped out.

So, it would be interesting to know how similar the output of other
browser/MUA combos is to what Brave+Evolution generates. I would not be
surprised if the e-mail content has a close dependence on what MUA is
used and how its composer preferences are set - and possibly which
browser is being used as well.

Martin

Re: Office phish

2021-07-03 Thread Martin Gregorie

On Fri, 2021-07-02 at 21:25 -0400, Jared Hall wrote:
> I never would've caught this except it hit an old header rule I use
> for certain Hotmail Porn detection.
> 
> Content-Type: multipart/mixed;
> boundary="_c23d8b80-2b40-49d4-8897-08b0026dddfb_"
> 
> Thanks for that: added it to a private rule I use to test for
> potentially dodgy extension types.

Martin

Re: Process of domain submission for inclusion in 60_whitelist_auth.cf

2021-07-01 Thread Martin Gregorie

On Thu, 2021-07-01 at 16:32 -0600, @lbutlr wrote:
> Sending spam, viruses, ransom demands, and/or spearfishing from
> "known" addresses is extremely common, so how effective that is
> depends a lot on the sort of mail and the amount of mail you receive.
> 
Agreed, but I'm not silly enough to have the whitelisting check trigger
shortcircuiting or use a whitelist score high enough to override a bunch
of spam hits: I wouldn't be using it in that case. 

> It is very common for me to get spam mail that appears to be from
> known addresses, mostly clients and the less sophisticated family
> members (computer sophisticated, at least) who have the bad habit of
> sharing their contacts with whatever random app they download.
> 
Different sender population, then. I get very little spam that spoofs
regular correspondents.
 
> 
> If a company insists on sending me advertising mail I do not want, I
> don't want to do any business with that company.
> 
Agreed, but most of those, at least in my experience, use a different
sending address for adspam than they use for invoices, dispatch notes,
etc. so my adspam blacklist works rather well. Before you ask, my daily
logwatch reports monitor the performance of local SA rules: I wrote
report modules to do that. Seems to me there's little point in writing,
testing and tuning local rule sets if you can't easily see how well
they're working.
 

Martin

Re: Office phish

2021-07-01 Thread Martin Gregorie

On Thu, 2021-07-01 at 18:59 +0200, Benny Pedersen wrote:
> On 2021-07-01 17:03, RW wrote:
> 
> > > I realize blocking all javascript is prone to error,
> > What legitimate email uses javascript?
> 
> and what mua will show html attachment as default ?

Evolution is as configurable as any MUA I've used:
 
- Whether it defaults to showing plain text or the HTML attachment(s) is
  configurable (I use it defaulted to plain text).
- If showing plain text is configured, HTML, if any, appears as
  clickable attachments. 
- Animation scan be suppressed
- Remote content will only be loaded and displayed if the sender is in
  your contacts list
- It prompts you about sending HTML text to contacts who don't want it.

Evolution was developed as part of the Linux Gnome Desktop toolset, but
rapidly spread to other Linux desktops (I use XFCE) and is also a free
download for Windows.

Martin

Re: Process of domain submission for inclusion in 60_whitelist_auth.cf

2021-06-29 Thread Martin Gregorie

On Tue, 2021-06-29 at 00:52 -0400, Bill Cole wrote:
> On 2021-06-28 at 17:04:05 UTC-0400 (Mon, 28 Jun 2021 23:04:05 +0200)
> Robert Harnischmacher 
> is rumored to have said:
> 
> > In which form can one submit the subdomain of a mail sender for the 
> > integration in 60_whitelist_auth.cf. Which information is required
> > for 
> > consideration?
> 
> 
There's nothing preventing yo from maintaining your own whitelist (and
blacklist).

I wrote my own automatic whitelister, which whitelists mail from anybody
I've sent mail to. It works by scanning my outgoing mail stream: almost
no maintenance needed and it would be quite difficult to spoof.

I also manually maintain a private blacklist, which contains the 'From'
addresses of advertising e-mails from companies that I've dealt with in
the past. This works because many (most?) companies use different
subdomains for advertising messages than they use for order
confirmations etc. This makes blacklisting the advertising 'From'
addresses very simple to do and is a manual process.

Martin

Re: Maybe it's time to revive EvilNumbers?

2021-06-18 Thread Martin Gregorie

On Thu, 2021-06-17 at 17:10 -0700, Loren Wilton wrote:
> A number of the rules I passed along are generic "order" rules rather
> than Amazon specific. I had to go back to last month's spam to find an
> Amazon order spam, but I've gotten a dozen or so fake orders for other
> things this month, all of which hit on the LW_BOGUS_ORDER rule.
>
I'm not at all surprised about that: several years back when I was on
the Wine mailing list I was getting a lot of sales spam from it.
Unsurprising: Wine uses a combined web forum and mailing list where
emails get posted to the web forum and vice versa, and if almost anybody
can join the web forum, then the mailing list will be rather spammy. 

Anyway, I ended up developing a number of rules to deal with this:
typically they are sets of two or more subrules plus a linking meta-
rule. Both subrules are long lists of alternates, one containing, say
'sales phrases' (including miss-spellings, odd word order and
obfuscations) and the other containing product names and descriptions. 

Other pairings that work have been bank names and financial terms where
the sender's address doesn't match the Message ID, endearments combined
with sex terms, or web commerce sites and invoices.

The good thing about rules like this is, as Loren also found, that they
will quite often correctly match spam from sources or containing phrase
combinations you've never seen before. Their only disadvantage is
maintaining them: a lengthy alternates list is difficult to maintain
with the usual text editors, so I ended  up writing a reformatting tool
which takes a file containing rule names, scores, descriptions etc, and
with the elements in each list of regex alternates on separate line.
This makes for a file that's easy to edit, and is fairly easy to convert
into the small set of lines that define a valid SA rule.

I wrote my converter as an awk script, but it can be written in almost
any language, e.g. C, Java, Perl or even (if you must) BASIC or
Javascript. Or you can find my tool here: 

https://www.libelle-systems.com/free/portmanteau/portmanteau.tgz

Martin

PS: I realise many list regulars have seen all this stuff before, but
there are a number of new arrivals who won't have seen it and may find
it useful and/or get new ideas from it.

Re: Detect Emoticons in Subject

2021-05-20 Thread Martin Gregorie

On Thu, 2021-05-20 at 18:34 +0200, Bert Van de Poel wrote:
> We've started getting lots of spam with emoji in the subject too the 
> past few weeks, so I've looked into this as well. As mentioned by RW, 
> you would need to create some kind of UTF8 regex header Subject rule. As
> I'm not too excited about writing such a regex, it's way at the bottom
> of my todo list 
>
Should be easy enough - IsASCII is just a name for [\x00-\x7f] and
IsXDigit is [0-9a-fA-F], so the same logic can be applied to define a
regex that triggers on any character within the three Unicode emoji
ranges. See Wikipedia doe more detail: 

https://en.wikipedia.org/wiki/Emoticon#Unicode

I haven't yet seen any emojis in Subject lines, regardless of whether
the message was spam or not, or I'd probably have already written such a
rule and given it a minimal score so it can be used in a more spam-
specific meta rule.

Martin

Re: SPF plugin ignores existing Authentication-Results

2021-05-18 Thread Martin Gregorie

On Tue, 2021-05-18 at 10:00 +0200, David Bürgin wrote:
> David Bürgin:
> > Bother. I think I will try to modify my SpamAssassin milter, so that
> > it
> > will add a synthetic ‘internal’ Received header right after the
> > Authentication-Results headers … that should trick SpamAssassin into
> > recognising them as internal.
> 
Have you set the 'internal_networks' configuration parameter (in
local.cf)? If not, try that first.

Martin

Re: Why does SA add SPF check fail to this message?

2021-04-24 Thread Martin Gregorie

On Sat, 2021-04-24 at 03:22 -0700, Yuri wrote:
> All messages from the FreeBSD mailing list are labeled as 'SPF check
> fail'.
> 
> Here is the message: 
> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=224393
> 
> People said that SA does this by mistake: 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255356
> 
> Is it a mistake? A bug in SA? Or can something be done to fix this?
> 
There's no SPF record for bugs.freebsd.org though there is for
freebsd.org

But don't just take my word for it check it yourself with
https://www.kitterman.com/spf/validate.html

Martin

Re: Script or command for testing new rules to ensure new rules don't generate false positives/negatives?

2021-04-23 Thread Martin Gregorie

On Fri, 2021-04-23 at 16:28 -0400, Steve Dondley wrote:
> I'm experimenting with writing a library of my own SA rules and
> scores.
>
I do this on a separate computer, which has Spamassassin installed but
not linked into anything else. It also has a copy of all the live SA
configuration files. Alongside this I have a directory filled with
examples of spam to function as testing input.

Along with I have a bash script or two which is used to do things like:

1) start SA in debug mode to check the testing config for errors. 
   No messages are processed - its just looking for configuration
   errors.

2) run SA against a spam sample and only display the list of spam hits

3) run SA against a spam sample and display the entire output message
   using less so it can be scrolled through

4) run SA against the complete spam collection and only display
   references to messages which are not scored as spam

5) replace the live SA configuration with with the current testing
  configuration, i.e. make the most set of changes live.

In practise (1) through (3) are east to combine into a single script
with an option to select the required action while (4) and (5) are best
kept separate.  

It helps a lot of to name the items in the spam collection to relate
each set of similar spam to the local rule that's intended to trap this
spam type.
 
> I'd like to be sure that the rules I write don't turn ham into spam
> and vice versa.
>
It won't if you test the rules against related spam and give some
thought to the score you apply to each rule.
 
> I imagine a utility like this must exists so figured I'd ask here
> before re-inventing the wheel and writing my own (probably bugg)
> script.
>
The sort of scripts I use are fairly short and simple. 
> 
> The script would need to check against all email files in .INBOX.* and
> .Spam directory in a user's IMAP directory.
>
No. Treat this like any other code development project: use a rule
development SA installation like I describe so you never develop rules
using the live mail stream. This way your rules will be better written
and tested and you'll cause fewer false positives in your live mail
stream.

Martin

Re: Bypass RBL checks for specific address

2020-12-23 Thread Martin Gregorie

On Wed, 2020-12-23 at 20:44 +0100, Benny Pedersen wrote:

Fhis requirement is almost exactly rgew opposite of something I've been
running for years:

- In my case I run every message through SA, diverting spam into 
  a quarantine directory and passing the rest to Postfix for delivery.

- In your case you want to pass mail, which is being sent to a small set
  of recipients on your server directly to your local MTA for delivery.
  The rest gets run through SA before being handed to your local MTA,
  again for local delivery.

The logic needed to do both tasks would seem to be essentially the same:

read the names of all recipients that are NOT scanned by SA into a
searchable no-scan recipients list.

for every message received
   read the message into a buffer
   look up its recipient in the no-scan recipients list
   if the recipient is in the list
  pass the message on for delivery by writing it to stdout
   else
  pass the message to SA for markup as ham or spam via spamc
  i.e, pipe the buffer content into spamc's stdin channel
  receive the marked-up message back from spamc
  i.e. read spamc's stdout channel into a buffer 
   fi

   pass the marked-up message to your MTA for delivery
  i.e write the buffer content to stdout
   clear the buffer
end-for

Assuming that you're using Postfix as your MTA, you just replace spamc
in the Postfix process chain with a program that does the above.

In a little more detail:
- This implies writing a program in C, Java or (possibly) Python 
  or Perl.

- if your list of no-filter recipients has more than about ten
  entries, consider using a B-tree to efficiently search the list.
  In C use bsearch() and in Java use a TreeMap - both are very fast.

- the message buffer needs to be self-extending to match any received
  message. This is a no-brainer in Java (all Strings automatically
  resize to hold what is put in them, but needs a little more care if
  written in C because you don't know how big an incomming message is
  until you've read it.

- this approach doesn't need any modifications to your existing SA
  configuration 

I hope this gives you some useful ideas.
   
Martin

Re: Spamassassin 3.4.4 on centos7

2020-12-10 Thread Martin Gregorie

On Fri, 2020-12-11 at 12:45 +1300, Sidney Markowitz wrote:
> 2. Using the rpm command to install a local rpm file does not
> automatically install dependencies from a repo. Always use yum (or dnf
> in newer CentOS or in Fedora) instead of the now old rpm program.
> 
Minor correction: the rpm program is not 'old' or outmoded. It does what
it has always done: installs the contents of an .rpm file without any
attempt to resolve dependencies. Apart from having the ability to
contain an optional dependency list and to list its contents and package
descriptions, rpm does exactly the same job as zip or tar.

Using the dependency list in an rpm archive to pull in any other rpm
archives that an rpm archive depends on is the job of yum or dnf, which
use the dependency list on the requested archive(s) to search the rpm
repository for any other rpm archives that the one(s) you requested
depend on.

What has changed is that yum has been replaced by dnf in more recent
releases of Fedora, RHEL and CentOS. yum and dnf do the same job:
resolve dependencies and then install the requested rpm archive(s)
*along with* any other .rpm files that the ones you requested depend on.

Bottom line: always use dnf or yum to install, erase, or update rpm
packages held in a Redhat or third party repository. Only use rpm itself
to install freestanding rpm archives which are not distributed as part
of an rpm repository.

Martin

Re: Legitimate message being flagged as spam

2020-11-30 Thread Martin Gregorie

On Mon, 2020-11-30 at 07:27 -0600, Daryl Rose wrote:
> How do I get the SA headers?
> 
Either:

- tell your mail reader to show all headers and cut'n'paste the whole
  email from the screen

- Save the entire email as a TXT file and cut'n'paste from there

Then drop the entire email into PasteBin or similar free repository
and post a link to it here  - this way your message to the SA mailing
list can't be incorrectly recognised as spam.

Martin

Re: Legitimate message being flagged as spam

2020-11-29 Thread Martin Gregorie

Showing us the SA headers and hits would be a good idea: without them we
don't know why SA rejected the mail.

I notice that domain in the Message-ID is ficticious may not be
significant, but I usually think this is suspicious.

Martin


On Sun, 2020-11-29 at 09:40 -0600, Daryl Rose wrote:
> I get an email/receipt from a vendor on a payment made.  This message
> continuously gets flagged as spam even though I've added it to the
> whitelist_from.cf list.
> 
> Received: (qmail 26946 invoked by uid 30297); 27 Nov 2020 20:52:17
> -
> > Received: from unknown (HELO p3plibsmtp02-
> > 04.prod.phx3.secureserver.net)
> >  ([68.178.213.4])
> >   (envelope-sender
> >  @sendgrid.net>)
> >   by p3plsmtp23-04-26.prod.phx3.secureserver.net (qmail-
> > 1.03) with
> >  SMTP
> >   for ; 27 Nov 2020 20:52:17 -
> > Received: from o1.3nn.shared.sendgrid.net ([167.89.100.129])
> > (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits)
> > (Client did not present a certificate)
> > by CMGW with ESMTP
> > id ikj3kLwOeFeQXikj3kiQrL; Fri, 27 Nov 2020 13:52:17 -0700
> > X-CMAE-Analysis: v=2.4 cv=SdYyytdu c=1 sm=1 tr=0 ts=5fc16701 b=1
> > cx=a_idp_nop
> >  a=d87GDerR7hnUjA61tTL9RQ==:117 a=d87GDerR7hnUjA61tTL9RQ==:17
> >  a=kj9zAlcOel0A:10 a=zPYWiABU:8 a=5-f5ixlAKy49-4MjWEkA:9
> >  a=O-7aY5Sf57aUu7p3:21 a=_W_S_7VecoQA:10 a=CjuIK1q_8ugA:10
> > a=5LfDJFqq-uUA:10
> >  a=AWL3az150N33eOPX4RKm:22 a=Z5ABNNGmrOfJ6cZ5bIyy:22
> > a=UDnyf2zBuKT2w-IlGP_r:22
> > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
> > d=sendgrid.net;
> > h=from:subject:mime-version:to:content-type:content-transfer-
> > encoding;
> > s=smtpapi; bh=5/eVCwWUZDl73ybzUYFmyMNdYNgvUvrvS9S5NJHu8QU=;
> > b=kDKnSU9Bb2Mi5khPiwjinzdlOorchkBuNfEWHSiqVeWqCaZPHmztDB3ZeQXPLVkVbL
> > uH
> > 6NgvFXajs2aidTnh9bSKSMn4RaTPC+nvQU4DxFoXj0dL9yy9rjBGsdmS0BBD6+qzBl6g
> > Si
> > i2UwAMxRGXKbODjK5T5Ll1us3XKXKt9cI=
> > Received: by filterdrecv-p3iad2-5dc87598f5-8bxxp with SMTP id
> >  filterdrecv-p3iad2-5dc87598f5-8bxxp-19-5FC16700-AD
> > 2020-11-27 20:52:16.878084415 + UTC m=+951689.287978429
> > Received: from spiderdoor.com (unknown)
> > by ismtpd0118p1mdw1.sendgrid.net (SG) with ESMTP
> > id ceyKf2F5QpyH7v63ZKS3nA
> > Fri, 27 Nov 2020 20:52:16.783 + (UTC)
> > Date: Fri, 27 Nov 2020 20:52:16 + (UTC)
> > From: no-re...@spiderdoor.com
> > Message-ID: <5fc1670079f34_26fd3171828...@api1.mail>
> > Subject: Payment Receipt for Unit G030 - paid from SpiderApp
> > Mime-Version: 1.0
> > X-SG-EID:
> > 
> >  =?us-
> > ascii?Q?nNFctdm0BWd6iTjLSzehWYRyQOg6=2FUycD+ddLrh9vGVcvZBTHPJYDTCViD
> > qyYQ?=
> >  =?us-ascii?Q?Li3bEIOOksE35=2FhSgezGSc37DN46Fkbxk1TO9E8?=
> >  =?us-ascii?Q?MGQPgTWt6k58DhiRQTG0=2F+79xc=2FO7jtyaG0XkLO?=
> >  =?us-ascii?Q?1DjUXyElg+pd9Ry=2Fm1Wy7CmJWR0I1zJgLk=2FUjTC?=
> >  =?us-ascii?Q?=2F7EUOycJlpjn1eLS5JSN9MBpwsXNk7EKGYPvDxO?=
> >  =?us-ascii?Q?duJHjPbILEuJJjx1g=3D?=
> > To: i...@myspace.rent, 
> > X-Entity-ID: eEuAPys4acQ9ere1FZlp6A==
> > Content-Type: text/html; charset=us-ascii
> > Content-Transfer-Encoding: 7bit
> > X-CMAE-Envelope:
> > 
> >  MS4xfLrAfEKlWNG6dcz1a05VWlMXnGyOE7soLGjybMz1QFzvpZ8a8cRDyTGNbMY9ezX
> > 311xKb9zb5aWg3AtH7xkCUlT7kaAYASl+bOfJ3EEdSfKKIoPXjO+i
> > 
> >  gjrerNiIxiRiWOcLF0BuxQKyIc/5BN0U4rxx20N0k1kPbaXyR06Ty99IgAWy9imxFxs
> > ms0GP03MmGWur7XyGwMcP6r/JKJ3ntGwGN1Diolw7WC+ywjp9VBM5
> >  X6m7dicNVVVO+LUx/qLWyQ==
> > X-Nonspam: None
> > 
> > 
> > 
> Any idea why it gets flagged and what rule I need to put in place to
> prevent it from happening?
> 
> Thank you.
> 
> Daryl

Re: SPF_FAIL

2020-11-11 Thread Martin Gregorie

On Wed, 2020-11-11 at 09:52 +0100, Tobi wrote:
> > If I only had a ready-made list of those important domains.
> 
> If you filter for customer domains then maybe (depending the customer
> domain) adding the customer domain to spf checks is worth a look too.
> 
That's easy: keep a database of addresses you've sent mail to and treat
that as a whitelist. Should work at almost any scale and about the only
essential maintenance it needs is the ability to remove addresses you no
longer want to whitelist. 

I suppose some may find it useful to datestamp entries with the last
time mail was sent to them and remove any addresses that haven't been
sent mail for 'x' days/weeks/months/years but I've never needed that
ability.

Martin

Re: What can one do abut outlook.com?

2020-10-25 Thread Martin Gregorie

On Sun, 2020-10-25 at 12:08 -0600, Bob Proulx wrote:
> Martin Gregorie wrote:
> > I use this to send a copy of all outbound mail to a local mailbox.
> > Then periodically a cronjob scans and erases the mailbox content,
> > adding the To: address(es) to a list of correspondents. IME this is
> > safe because its quite unlikely that you'll ever need to blacklist
> > anybody you've sent mail to.
> 
> Oh I wish that were true in general!  I have one user that I help with
> email things and they like to respond to spammers.  They shout, they
> rant, they rave.  I guess it is a catharsis for them and they feel
> better afterward.  I have not been able to convince them that this is
> a worthless thing to do in the best cases and a bad thing to do in the
> worse cases.
> 
I didn't say it works in all cases! In my case it works just as I hoped
it would, but of course those with different mailstream content may not
find it so good.

If I was you I'd quietly point out to those you help that their rants
only amuse spammers if they take any notice at all, but sending them to
you as well pisses you off mightily since you can't do anything about
said spammers, so if they want help from you in future they'd better
stop copying yo in on their rants.

It would also be fairly easy to modify the auto-whitelister code to
auto-remove a spamming correspondent from the list. Or, being slightly
more friendly, datestamp the correspondent entry when a message from
them is spam. This would let your SA module:

a) avoid whitelisting them for, say, the next month after their last
   spam.

b) or rather less friendly, send them a message each time you receive
   spam from them saying you're ignoring the message because it was
   spam.

> It is a clever idea!  I might add something similar to my own setup.
> :-)
> 
I'm pleased you like it.

Martin

Re: What can one do abut outlook.com?

2020-10-25 Thread Martin Gregorie

On Sat, 2020-10-24 at 16:46 -0700, John Hardin wrote:
> ...and then whitelist specific desireable-correspondent outlook.com 
> addresses.
> 
Its easy enough to create a list all desirable correspondents, at least
if your MTA has the equivalent of Postfix's 'always_bcc' directive. 

I use this to send a copy of all outbound mail to a local mailbox. Then
periodically a cronjob scans and erases the mailbox content, adding the
To: address(es) to a list of correspondents. IME this is safe because
its quite unlikely that you'll ever need to blacklist anybody you've
sent mail to.

In my case I keep the correspondents list in a database. I use a custom
Perl SA module to access the database and a CORRESPONDENTS_LIST rule to
trigger it and add negative points to incoming mail email with a
matching From: address.

I also have a tool for weeding undesirables from the correspondent list
because spamming addresses can creep onto the list, but its very
infrequently needed.

Martin

Re: check doman against uri bl of spamassassin

2020-10-21 Thread Martin Gregorie

On Wed, 2020-10-21 at 22:22 +0200, Marc Roos wrote:
> :D I thought I could query the blacklists from the command line with
> dig or so
>  
Sounds possible, but what use is a command line query when what you need
is something that can be triggered by getmail, your MTA, an MUA or
whatever? You might be able to do that from a shell script, but a Perl
program would be better, so find your copy of the 'Camel Book', open a
terminal and design a program and start coding.

At least, that's what I would do and have done in similar circumstances.
The only difference is that, apart from an SA module, I've written my
special mail handlers in C and Java rather than Perl. All these
languages have built-in or library routines for reading mail and
interrogating servers.
 
Martin
> 
> -Original Message-
> From: @lbutlr [mailto:krem...@kreme.com] 
> Sent: Wednesday, October 21, 2020 10:20 PM
> To: users@spamassassin.apache.org
> Subject: Re: check doman against uri bl of spamassassin
> 
> On 21 Oct 2020, at 13:35, Marc Roos  wrote:
> > What is the best way to check an url against the default active 
> > spamassassin uribl, on a linux server that does not have
> > spamassassin 
> > installed?
> 
> This is clearly in the "how do I do a thing while imposing conditions 
> that make  impossible to do" class of question.
> 
> "How do I dive 300 meters under water without an oxygen supply or 
> pressure suit?"
> 
> "How can I get from New York City to Los Angels in less than 10 hours 
> without flying?"
> 
> If you want to test something against spamasassin you need one thing
> for 
> sure, access to spamassassin.
> 
> --
> 'I really should talk to him, sir. He's had a near-death experience!'
>   'We all do. It's called living.'
> 
> 
>

Re: to: header is not in my domain

2020-10-20 Thread Martin Gregorie

On Tue, 2020-10-20 at 22:49 +0100, RW wrote:
> On Tue, 20 Oct 2020 21:34:08 +0100
> Martin Gregorie wrote:
> 
> , not exactly what you're asking for, but e-mails where the From:
> > domain doesn't match the domain in Message-ID: are very often spam
> > and
> > so could be worth a point or two.
> 
> And lots of ham will fail that too. 

Fair comment: I don't use that sort of rule myself. Instead, I have a
module that does a sender lookup in my mail archive. The rule triggering
the lookup adds some negative points if I've ever sent mail to that
address.

Martin

Re: to: header is not in my domain

2020-10-20 Thread Martin Gregorie

On Tue, 2020-10-20 at 21:34 +0100, Martin Gregorie wrote:
> On Tue, 2020-10-20 at 19:29 +0100, Miki wrote:
> > Hi, how to score this e-mails?
> > I know I can give negative score if To: IS my domain, but I do not
> > like this solution.
> > Any suggestions?
> > 
> Why do that? Its the exact reverse of something that does work pretty
> well: write a rule that gives a positive score to any mail whose To:
> or
> BCC: headers contain your email address(es).
> 
Brain fart: sorry. I should have said "give a positive score to an email
where neither the To or BCC addresses contain your email adress(es),
i.e write subrules for To and BCC that contain addresses you
want,combine them and negate the result in a meta rule, something like
this:

header __TORULE  /(addr1addr2|...)/
header __BCCRULE /(add1|addr2|...)/
meta  MYRULE !( __TORULE || __BCCRULE)
score MYRULE 6.0

Martin

Re: to: header is not in my domain

2020-10-20 Thread Martin Gregorie

On Tue, 2020-10-20 at 19:29 +0100, Miki wrote:
> Hi, how to score this e-mails?
> I know I can give negative score if To: IS my domain, but I do not
> like this solution.
> Any suggestions?
> 
Why do that? Its the exact reverse of something that does work pretty
well: write a rule that gives a positive score to any mail whose To: or
BCC: headers contain your email address(es).

Also, not exactly what you're asking for, but e-mails where the From:
domain doesn't match the domain in Message-ID: are very often spam and
so could be worth a point or two.

Martin

Re: Spamassassin Email Alert

2020-09-02 Thread Martin Gregorie

On Wed, 2020-09-02 at 15:44 +0530, KADAM, SIDDHESH wrote:
> Hi Folks,
> 
> Using spamassassin is there any way of trigger email notification to
> specific ID, if email body matches with list of pattern.
> 
You can put anything you care write or install to downstream from SA to
scan messages and take action depending on its content: this can do
anything you want or can imagine.

As an example of what can be done, I use a C program to quarantine any
message with a positive SA score and pass everything else to Postfix for
delivery. I also wrote a Perl logwatch script to summarise whats in
quarantine each night, a PHP script to let me use a web browser inspect
quarantined spam and a shell script, run as a cron job, to delete
quarantined messages after 7 days.

Martin

Re: Amazon, dhl, fedex, etc. phishing

2020-08-24 Thread Martin Gregorie

On Mon, 2020-08-24 at 11:51 -0700, John Hardin wrote:
> Might want some \b in there, just to be safe. The from check would
> also 
> hit domains like "amazon-river.org". Perhaps:
> 
>   header SUBRULE13a From:name =~ /\bAmazon\b/
>   header SUBRULE13b From:addr =~ /\bamazon\.com$/
> 
Indeed
> 
> > meta   SUBRULE13  (SUBRULE13a != SUBRULE13b)
> 
> That seems too broad, you're assuming mail from amazon.com always has 
> "amazon" in the sender name. Perhaps:
> 
>meta  SUBRULE13  SUBRULE13a && !SUBRULE13b
> 
Also true.

What I *thought* I was doing was: 

* firstly, to show the OP how to write a rule that examined a From
header and would only fire if there was a match on the name part and no
match on the address part - a very common spam feature (as is the From
Address not containing the same domain as the Message-ID). 

* to give some guidance that testing is essential (i.e. keep some known
spam to be looked at when writing the rules and for testing the new
rules) AND to remind the OP that the significant parts of name and
address strings may differ, should be copied from known spam, and may
need further tweaks as the rules are tested

* trying to explain that this type of rule cannot and will only work
reliably if its written against a single spamming domain.

Martin

Re: Amazon, dhl, fedex, etc. phishing

2020-08-24 Thread Martin Gregorie

On Mon, 2020-08-24 at 12:00 -0400, micah anderson wrote:
> We are regularly getting phishes from dhl, fedex, usps, amazon,
> netflix, spotify that fakes the from (eg. amazon <
> p...@biggung1892301.com> wants to send me a amadon-legit.pdf).
>
> I'm wondering if anyone has made a rule that looks to see if the From
> contains amazon, but it is not amazon.com/.ca/.jp (all their TLDs), 
>
Try it yourself: something like this: 

header SUBRULE13a From:name =~ /Amazon/
header SUBRULE13b From:addr =~ /amazon/
meta   SUBRULE13  (SUBRULE13a != SUBRULE13b)
score  SUBRULE13  10

should work though the text in the regex will probably need tweaking to
match actual spam. You'll need to collect examples of spam from all
these sources to test your rules against. Also:

- the regexes may need alternates if, say, you see variations in the
  name text or if you want the addr regex to include more than just 
  the bare domain name

- of course you'll need a separate rule for each spam source

- another spam warning is emaile where the domain name 
  in the  Message-id doesn't match the one in the From address.

I'm not seeing anything that looks like the spam you're getting, but if
I did, that's the type of rule I'd be writing to trap the garbage.

Martin

Re: Why the new changes need to be "depricated" forever

2020-07-23 Thread Martin Gregorie

On Thu, 2020-07-23 at 15:01 +, Riccardo Alfieri wrote:
> I think that rspamd's approach is correct. Rspamd just takes SA rules 
> and use them. It doesn't provide the rules, meaning that you most
> likely 
> need to have an installation of at least sa-update on the same
> machine 
> that runs rspamd to keep rules updated.
> SA rules are also distributed under Apache 2.0 license and I guess
> that 
> license permits reuse of existing code in other projects, but IANAL :)
> 
I had a look at the Rspamd docs and thought about it bit. 

Yes, they can run your private rules and, probably, some of the base
rules, but that's about it:

- They don't seem to have a way to let SA rules find out anything about
  which UBLs have fired or to include that info in an SA rule.

- Similarly, because its a C program, there's no simple way to execute
  an SA plugin without running it as an external Perl process. To do
  that you'd also need to provide some way of passing input data to it
  and of receiving the reply.

They also say that running a heap of regexes in Rspamd will slow it down
noticeably.


Martin

Re: Thanks to Guardian Digital & LinuxSecurity for the nice post about SpamAssassin's upcoming change

2020-07-23 Thread Martin Gregorie

On Thu, 2020-07-23 at 09:36 +0700, Olivier wrote:
> I am wondering what grey list should be renamed...

Ageist!


;-)

Martin

Re: Why the new changes need to be "depricated" forever

2020-07-23 Thread Martin Gregorie

On Wed, 2020-07-22 at 21:53 -0700, Ted Mittelstaedt wrote:
> You could even fork the SpamAssassin code if you like, you know.  In
> fact, let's do that.  We will make a new fork and call it the
> "SpamAssasin-N-W" short for SpamAssassin Non Wussy, put it up on
> Sorceforge for download, and just mirror the regular SpamAssassin
> distribution when new releases come out with the exception of this
> change.
> 
That would be fine for the Perl source code, but in case you didn't
notice, the terms-we-must-not-use ALSO appear in visible text, i.e.
names of base rules, which can't be hidden from SA users. Changing them
WILL break some private rules written by SA users who don't subscribe to
this mailing list and so will not be expecting any such change. 

AFAICT this side effect was not considered by the SA maintainers until
the name of one base rule was changed a week or so back and some list
members' rules were broken by it. I'm not blaming the maintainers
because something like that is very easy to miss: its a fair bet that
base rule names no not appear anywhere in SA source code.

OK, Post SA 4.0 it appears that there's a plot to maintain both old and
new-style rule names for a while, but I predict that there will be much
wailing and gnashing of teeth from those who are not on this list when
either some name change is missed or further down the line the old names
vanish and all those who never update software get caught out.

Martin, 

who is a retired professional developer and has seen this sort of thing
before.

Re: Why the new changes need to be "depricated" forever

2020-07-22 Thread Martin Gregorie

On Tue, 2020-07-21 at 18:25 -0700, Loren Wilton wrote:
> I do strongly wonder whether this is "society" or only "people in the
> USA". It should be noted that historically bkacks were enslaved just
> as little or much as any other race in other countries, and I don't
> see those contries bending over to appease blacks because the Romans
> and Greeks would enslave them (as well as anyone else).
> 
>From my POV (I'm from NZ, resident in the UK) I think the racial use of
'black' in everyday speech is pretty much limited to the USA and South
Africa.

When I was resident in NZ we always referred to the major resident
groups as pakeha (the Maori word for white-skins), Maori (or possibly a
person's tribe if you know it and are on a marae) and everybody else by
ancestral nationality: e.g. there is a fair size Chinese population
dating largely from the Gold Rush.

Britain is much the same as NZ apart from distinguishing between
English, Northern Irish, Scots and Welsh and using the generic
'Caribbean' rather than the specific - Jamaican, Barbadian, etc. which
is to opposite to people from Africa: calling anybody an 'African' is
rare: specific nationalities are almost invariably used just as they are
for the rest of the world. The main generic term yo hear for non-
europeans is 'people of colour', which still seems rather long and
stilted to me. 

'Russian', Soviet' or (not so much) 'Communist' used to be generics for
residents of the USSR, but now those terms have vanished and been
replaced by the use of specific nationalities. I don't think there are
more than a handful of genuine communists left anywhere in what used to
be the Soviet Union.

In general the so-called hard right here would appear to align more with
the Democrats in the USA, so to me a recent comment describing Obama as
a hard-left radical seems ridiculous: he's no more a leftie than former
UK Prime Ministers Tony Blair (Labour), David Cameron (Conservative) or
Jacinda Ardern (NZ Prime Minister) are.

Martin

Re: IMPORTANT NOTICE: Rules referencing WHITELIST or BLACKLIST in process of being Renamed

2020-07-20 Thread Martin Gregorie

On Mon, 2020-07-20 at 09:30 -0700, John Hardin wrote:
> It would be helpful if we could be informed whether anyone has post-
> SA processing that looks for these rulenames in the SA hit results,
> e.g. for making message delivery decisions.
> 
Repeating previously posted info for completeness: one of my private
rules uses URIBL_BLACK as a subrule. I have no other potential conflicts
with SA rule name changes and no postprocessing that's dependent on SA
rule names.

Martin

Re: Screwed-up scoring

2020-07-20 Thread Martin Gregorie

On Sun, 2020-07-19 at 20:27 -0400, Kevin A. McGrail wrote:
> On 7/19/2020 8:23 PM, Martin Gregorie wrote:
> > The only way I can see to prevent the name changes from affecting SA
> > users private rules is to duplicate the affected rules
> 
> Yeah, I just posted this idea on the dev list to use a meta like this
> which I think will allow it to work backwards to 3.3.x. Will that work
> for your install?
> 
Your suggested workround should work here although, because my private
rules don't reference any standard ruleset rules with names containing
'BLACKLIST' or 'WHITELIST', I'm not affected by these name changes:
thats pure luck.

Your idea is neater than my suggestion because it can't mess up private
rules that make use of numeric score values. However, both workrounds
will, I suspect, make standard rule maintenance more complex. What about
maintaining one format in ruleQA output and including a configurable
rule name conversion step in the rules update process? If that was
controlled by a new local.cf directive it should be a pretty small code
change.

Martin

Re: Screwed-up scoring

2020-07-19 Thread Martin Gregorie



On Sun, 2020-07-19 at 15:44 -0700, Luis E. Muñoz wrote:
> On 19 Jul 2020, at 10:54, Kevin A. McGrail wrote:
> 
> > Great question.  That's really a third party rule.  I would like to 
> > see it
> > change eventually but maybe that's another phase.  Thoughts?
> 
The only way I can see to prevent the name changes from affecting SA
users private rules is to duplicate the affected rules: one copy using
BLACKLIST/WHITELIST and the other using BANNEDLIST/WELCOMELIST and,
since both copies will fire, with their scores halved. This will allow
private rules to work as normal until the BLACK/WHITE rule names are
removed from the standard set: the overall score for a message will
remain unchanged.

The above should solve the problem for cases (the majority?) where the
private rules only care whether subrules fire or not. However, if
anybody's private rules compare subrule score values then the private
rules may fail completely unless rewritten.

Martin

Re: Screwed-up scoring

2020-07-19 Thread Martin Gregorie

On Sun, 2020-07-19 at 11:59 -0400, Kevin A. McGrail wrote:

> Whitelist will become welcomelist and blacklist will become
> blocklist. Are you running a modern SA like 3.4.4?  If so, you should
> be able to proactively add entries for this.
> 
Just been grepping my local rules for WHITELIST and BLACKLIST without
finding any that none are affected by those changes.

Then I also grepped them for WHITE and BLACK and this time I saw that
two of my local rules reference the standard URIBL_BLACK rule. Is this
name likely to change?

Martin

Re: spamhaus enabled by default

2020-07-15 Thread Martin Gregorie

On Tue, 2020-07-14 at 22:57 -0400, Kevin A. McGrail wrote:
> > A pointer to the wiki might be useful in the config files as well as
> > > the
> > > docs.  Suggestions of which files?
> > 
> > local.cf is the obvious one.
> > 
> 
> Might not be a bad choice.  I've never even looked at a stock local.cf
> from the project in 20 years though.  Need to do a vanilla install and
> see what is in there and where it is generated.

> What about the docs?  Where would you look for this nugget of
> information as a user?
> 
As a long-time UNIX/Linux developer I'd naturally look in the
Spamassassin manpage, but I suspect a non-developer may not unless
they're a power user. I'd also guess that others, including Mac and
Windows users, would be looking for Help|About from a GUI app
(nonapplicable here of course) or, failing that, to point a web browser
at http://spamassassin.apache.org/

I think the best place to put warnings and advice about exceeding free
usage RBL limits are:

In the manpage:
- the manpage, which only needs a single line saying something like:
  "Free RBL use limits: URL" added to its WEBSITES section

On the Spamassassin website:
- The top-level README
- The Start Using page in the Wiki
- the FAQ page. 

All these can link to a common page that describes low volume free use
policies of Spamhaus and others RBL providers and how NOT using a local
non-forwarding DNS can cause the user to get unexpected subscription
requests from them.
 
BTW,  the page "https://www.spamtips.org/p/about-spamtipsorg.html -
Configuration tips and tricks to maximize the effectiveness of
SpamAssassin", which is linked to from the 'Docs' page seems to have
vanished though the www.spamtips.org website is up and running.

> Also: init.pre  v330.pre and maybe v340.pre
> 
> Well those pre files are pretty specific for new features in those
> versions.
>
Fair point - I only included them because they load what look to be
relevant plugins for this discussion. However, a note there may well
propagate to future SA versions because typical developer version
forking.

Anyway, I hope these comments are useful.

Martin

Re: spamhaus enabled by default

2020-07-14 Thread Martin Gregorie

On Tue, 2020-07-14 at 18:39 -0400, Bill Cole wrote:
> 
> There are far too many ways that people have BIND already installed
> and configured for a 3rd-party package to be able to safely provide a
> full named.conf that will work for 90% of users who have modified
> their configurations away from the defaults.
> 
Fair enough, but I wasn't on about them.

What I WAS on about is the steady flow of folks who install SA and then
post on this list about problems resulting from using a shared DNF and
their consequent receipt of getting a letter from RBL providers
suggesting that service could be restored by application of cash.

THOSE are the newish SA users who are unlikely to have a non-forwarding
DNS installed and might well be helped by prominent messages in the SA
config files they will certainly be editing and that I'm suggesting
should contain unmissable messages about the need for having a non-
forwarding DNS setup.

The others who need cluestick treatment are any companies selling home
servers with SA installed and either no DNS installed or a forwarding
DNS as part of the package.

> As noted on the page that Kevin cited, the default configuration for 
> BIND, Unbound, and the PDNS Resolver as packaged for the dominant
> Linux distros is correct for a non-forwarding caching resolver. For
> BIND and Unbound, this is also true on FreeBSD. For macOS, there is no
> 'standard package' but the MacPorts packages for both BIND and Unbound
> do the right thing with the default variants.
> 
> Everywhere that I have used it, Unbound has been configured thus when 
> installed from the standard system package where one exists.
> 
Fair enough: job done then.

We can now declare the surprised punter with a forwarding DNS who sent
this list an e-mail saying that RBL service has been cut off until cash
is sent is now officially extinct as a subspecies, never to be heard
from again.

Martin

Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-14 Thread Martin Gregorie

On Tue, 2020-07-14 at 16:50 -0500, sha...@shanew.net wrote:
> That last bit is plain wrong.  Jamestown had Africans as slaves as
> early as 1619, 
>
Fair enough - I was ignoring the Spanish because it seems to me,
possibly wrongly, that what they did in that sphere had little influence
on the English-speaking world.
 
> As for the influence of religion at this time, surely you're aware of
> Biblical defenses of racism and slavery, whether in the form of the
> "curse of Ham" or the suggestion that slavery was a necessary evil
> because it would control the sinful, less humane, black race.
> 
Sure, but we're discussing the root of the Xtian association of black
with an evil soul, not with biblically sanctioned skin colour-neutral
slavery.

> > Out of pure curiosity, when was the current racist use of 'black'
> > first coined and where did that happen?

> The quick version is that various "natural philosophers" in the late
> 1600s tried to describe and account for the different "races" that
> they encountered in the world.  One famous account is from François
> Bernier, entitled "New Division of the Earth by the Different Species
> or 'Races' of Man that Inhabit It."
> 
That just makes my point: that the term 'black list', first documented
to be used by Charles II in 1640 about assuredly used by English persons
with probably some Scandinavian ancestry (William of 1066 fame was of
mixed Norse-French ancestry) was referring to 'black sin' rather than
black skins before said 'natural philosophers', Linnaeus, etc. chose to
apply it to black-skinned people with a racial meaning.

Thanks for that confirmation.

Martin

Re: spamhaus enabled by default

2020-07-14 Thread Martin Gregorie

On Tue, 2020-07-14 at 16:32 -0400, Kevin A. McGrail wrote:
> Well, that is documented quite expressly here:
> https://cwiki.apache.org/confluence/display/SPAMASSASSIN/CachingNameserver
> 
> A pointer to the wiki might be useful in the config files as well as
> the
> docs.  Suggestions of which files?

local.cf is the obvious one.

Also: init.pre  v330.pre and maybe v340.pre

I'm suggesting those because the new user MUST modify them (local.cf)
and the others because they would seem to be controlling modules that
issue DNS-like queries that a new user might consider killing off.

I also think that supplying simple boilerplate config files for bind and
unbound that cause them to simply issue non-forwarded DNS queries would
be a good idea because configuring bind for the first time is non-
trivial. I would have found configuring it quite difficult without
buying the O'Reilly 'locust' book "DNS and Bind".

I haven't used unbound so have no idea how easy it would be to set up to
support just non-forwarded queries.

Martin

Re: spamhaus enabled by default

2020-07-14 Thread Martin Gregorie

On Tue, 2020-07-14 at 22:59 +0200, Antony Stone wrote:
> On Tuesday 14 July 2020 at 21:46:11, Martin Gregorie wrote:
> 
> > This info should include lots of black (hashmarks, asterisks etc).
> 
> You should be careful of the language you use these days, especially
> on this 
> list.
> 
> Yes, I am being sarcastic about what you wrote, but I'm also being
> serious 
> about the apparent power of the language police.
> 
I don't underestimate the power of the thought police (McCarthy was the
standout example of *THAT*) or their, sometimes wilful, ignorance. You
know what I meant, but if I'd written something like "include big blocks
of attention-getting high-density characters", might that be interpreted
as an attack  on the comprehensionally challenged?

Martin

Re: spamhaus enabled by default

2020-07-14 Thread Martin Gregorie

On Tue, 2020-07-14 at 12:53 -0400, Kevin A. McGrail wrote:
> I agree with you about the idea of turning off everything and just
> delivering 100% commented configuration files..  I believe SA is a
> framework that must have walls & paint added to make it a
> house.  Others want it ready to go as a pre-fab house aka a drop-in
> spam filter.  As a project, the majority supports the drop-in model so
> I support the will of the PMC.  The DNSBlocklist inclusion policy from
> 2011 has served us well with a lot of users and very few
> complaints.  But if you think of edits it might need, we can always
> improve it.  DNS Blocklists and the free for some model really help
> the drop in spam filter be effective.
> 
Maybe all that's needed is to better emphasize the point that that free
use of RBLs, whose use by SA is configured on by default, require the
user to have their own non-forwarding DNS installed and explain why.

This should go in:
- the online docs in the SA website 
- SA manpages
- the standard SA configuration file included in the SA package

This info should include lots of black (hashmarks, asterisks etc). The
main thing is to put these warnings were they can't be missed - and some
people can miss almost anything.

As an added bonus, the SA installation package might include basic
config files for popular DNSes, say bind and unbound, that let it
support SA out of the box by simply: 
(a) installing one of the supported DNS packages
(b) putting the supplied configuration where the DNS expects to find
it. 

If the SA user wants their DNS to do more, they can read its docs and
add their own tweaks.  

But the important point is to have SA docs say, in places that a new
user can't miss that "If you want free use of the default RBLs then
INSTALL YOUR OWN NON-FORWARDING DNS.

Martin


> Regards,
> KAM
> --
> Kevin A. McGrail
> Member, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
> 
> 
> On Tue, Jul 14, 2020 at 12:08 PM M. Omer GOLGELI 
> wrote:
> 
> > July 14, 2020 6:07 PM, "Kevin A. McGrail" 
> > wrote:
> > 
> > > The question you ask is exactly why we have the DNSBL Inclusion
> > > policy
> > and require the free for
> > > some model.
> > > 
> > > We might need to kick up the need for the BLOCKED rule with
> > > instructions
> > in that description on how
> > > to disable the rules. What are your thoughts on that?
> > > 
> > 
> > Don't get me wrong, I use them in the scoring process as well and
> > I'm glad
> > to use them along with a few others as I'm not that hard bent on
> > keeping
> > everything free.
> > 
> > And if I hit the limits somehow, I'll either pay for them or turn
> > them off.
> > 
> > But there will always be people that doesn't want it.
> > Or those who wouldn't want to see their OSS software relies on
> > commercial
> > products.
> > Or there will be those who does this non-commercially.
> > Or there will be people who installed it as part of their OSS mail
> > product
> > and doesn't know that there's such a limit etc.
> > 
> > So for that matter, maybe these can be left for the admins decision
> > to
> > enable them after installation.
> > Or all users should be made aware of these limitations in a better
> > manner
> > and clearly for each semi-commercial RBL used.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > M. Omer GOLGELI
> >

Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-14 Thread Martin Gregorie

On Tue, 2020-07-14 at 12:24 -0400, Kevin A. McGrail wrote:
> We'll have to agree to disagree.  To me it is clearly racially charged
> language and you are cherry picking your sources.  Here's a well
> researched
> and documented article from a medical journal on the topic with expert
> citations: https://jmla.pitt.edu/ojs/jmla/article/view/490  The
> abstract
>
The first *recorded* use of the term 'blacklist' or 'black list' was in
1660 when Charles II of England used it to refer to a list of those who
had killed his father, Charles I. From the context it is far more likely
that 'black list' was referring to the sin of regicide than to anybody's
skin colour.

I notice that the abstract you quoted has no references earlier than
1962, so I find it hard to take it seriously, especially as the earlier
religious links between 'black' and 'sin' appear to be ignored by it.
This is odd considering how much influence religion had on society in
the 17th century and that there was no slavery in North America before
about 1640.

Out of pure curiosity, when was the current racist use of 'black' first
coined and where did that happen?

Me? I grew up in NZ where the social norms were against any attempt to
denigrate Maoris: anybody who would not let a Maori meter-reader in to
read his electricity meter would not be sent a pakeha meter reader and
so was more or less guaranteed to get a heavy fine for late payment and
failing to get his meter read. Similarly, I don't remember the All
Blacks, national rugby team, ever not having Maoris in it.

Martin

Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-11 Thread Martin Gregorie

On Sat, 2020-07-11 at 06:32 -0600, Eric Broch wrote:
> Obama was a community organizer, and that's what community organizers 
> do. They stir up trouble where no trouble exists. This is a Marxist 
> tactic to overturn a society in the school of Saul Alinsky (Author: 
> 'Rules for Radicals').
> 
Maybe so, but one thing I know it that the people were not fooled by
their Warsaw Pact governments. They knew very well that what they got
was not what they were promised.

I was in Chechoslovakia when Solidarity was the name of the game on
Poland and the Berlin Wall hadn't yet fallen. We met a lot of young East
Germans who were holidaying there because it was the only country they
could get holiday visas for. The one thing we heard from these young
East Germans at some point in a conversation was a variation on "Of
course we know about Marx and his brand of Communism because we had to
study that in school. It sounds wonderful: we just wish we had it in our
country".

We knew then that something was about to change soon, so weren't
surprised when the Wall came down.

Anyway: that was realpolitik. Political correctness is not realpolitik,
even slightly. Its a pity George Orwell isn't around now.

> One does not concede ground to radicals one punishes them because
> they are intent on destroying anything civilized.
> 
I don't think you have the faintest idea of what a radical is.

Martin

Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-10 Thread Martin Gregorie

On Fri, 2020-07-10 at 15:01 -0700, jdow wrote:
> On 20200710 13:43:21, Bill Cole wrote:
> > On 10 Jul 2020, at 8:37, Mauricio Tavares wrote:
> > 
> > >   I do agree that accept works better than welcome here.
> > 
> > There's a practical issue in that: we have the WLBLEval plugin that
> > has cemented 
> > the initial.
> > 
> > FWIW, the use of "blocklist" in spamfighting goes back to the '90s,
> > when the 
> > primary resistance to "blacklist" was by people who were
> > uncomfortable with its 
> > McCarthyist connotation.
> 
> Well, Bill, it was stupid then. What makes it not stupid today? The
> exact same 
> logic applies, doesn't it?
> 
And the term 'blacklist' goes back a long way: first documented use was
in 1639. Next l=use seems to have been Charles II of England, in 1660,
when he constructed a 'black list' of people he intended to punish for 
killing his father, Charles I so any connection with skin colour seems
to be entirely irrelevant since in that era it would be referring to the
black souls of the regicides.

Martin

Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-10 Thread Martin Gregorie

On Fri, 2020-07-10 at 12:07 +, Pedro David Marco wrote:
> OK... who starts??? :-)
> once Finished we can rewrite "El Quixote" as well...
> 
That's already been sort of redefined: see https://xkcd.com/556/

Martin

Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-10 Thread Martin Gregorie

On Fri, 2020-07-10 at 10:36 +0200, Matus UHLAR - fantomas wrote:
> On 10.07.20 08:50, Axb wrote:
> > the US problems won't be fixed with renaming B lists.
> > Seriously.. you have more important issues...
> 
> while I am not a fan of renaming, I think that
> "welcome list" and "block list" are more informational.
> While people working with these terms know them, others may not so
> well.
> 
Still a bit woolly methinks. I think, acceptlist and rejectlist are a
more meaningful pair of terms.

Similarly I, and the places I've worked, have always used client and
server rather than slave and master. While I see no need to use 'slave'
term outside its historic and ongoing human designation, I do see a
continuing use for 'master', where it describes a unique object that is
frequently replicated of a person with extensive understanding of a body
of knowledge, e.g. master (of a sailing ship), master craftsman, or a
qualification: MSc, MA, etc.

Martin

Re: Multiple regex on same URL

2020-07-07 Thread Martin Gregorie

On Tue, 2020-07-07 at 22:07 +, Pedro David Marco wrote:
> Thanks Martin, but  the meta may be possitive if one URL triggers
> SUBRULE1 and another different URL triggers SUBRULE2...
>  how can you be sure both SUBRULES are possitive in the "same" URL? 
>
I didn't spot the requirement that the URIs must match: I read your
requirement as being that two matches from a group of URLs within a
defined set or with the same second level domain would do. My mistake.

Might it be easier to define and implement with a decent RDBMS and a
clever SQL query? 

Martin

Re: Multiple regex on same URL

2020-07-07 Thread Martin Gregorie

On Tue, 2020-07-07 at 20:39 +, Pedro David Marco wrote:
>  
> 
>>On Tuesday, July 7, 2020, 03:16:34 PM GMT+2, Henrik K <
> h...@hege.li> wrote:  
>  
> > Also newer SpamAssassin already has URIDetail plugin which can also
> > do what you want:
> >   uri_detail SYMBOLIC_TEST_NAME key1 =~ /value1/  key2 !~ /value2/
> > ...
> if it uses the same key more than once, then uri_detail joins them
> with "OR", but we need an "AND" 
> -Pedro
> 
That should be easy enough to do with a metarule:

uri   __SUBRULE1 /(URL alternateslist1)/
uri   __SUBRULE1 /(URL alternateslist2)/
meta  MYMETARULE (__SUBRULE1 &&
__SUBRULE2)
score MYMETARULE 6.0

...or something like that

Martin

Re: Rule HK_SCAM is triggered by standard business email

2020-07-01 Thread Martin Gregorie

On Wed, 2020-07-01 at 16:20 -0400, Aner Perez wrote:
> It looks like to me like the logic in __HK_SCAM_S7 is a little
> > off...
> > 
> > /(?:(?:investment|proposed|lucrative)
> > (?:business|venture)|(?:business|venture) 
> > (?:enterprise|propos(?:al|ition)))/i
> > 
> > seems like it should be:
> > 
> > /(?:(?:investment|proposed|lucrative)
> > (?:business|venture)|(?:business|venture|enterprise) 
> > propos(?:al|ition))/i
> > 
> 
IME using a meta-rule that ANDs two rules of that type works well. 

The key is to put words or phrases that often occur in spam in each of
the sub-rules, for instance having selling jargon ("lowest prices",
"unbeatable value") in one rule and product names ("flip flops",
"vodka", "power packs") in the other. As a benefit, if the lists are
well-chosen from words and phrases from spam you've received, it will
also hit on sales spam using combinations you've not previously seen
while being surprisingly good at not giving FPs on business or personal
letters.

The only disadvantage is that the subrules get a bit unwieldy and hard
to edit once their definitions get much longer than 80 characters. That
aside, they're easy to understand and maintain.

Martin

Re: White listing messages processed by a previous milter

2020-06-26 Thread Martin Gregorie

On Sat, 2020-06-27 at 00:46 +0200, Marc Roos wrote:
> 
> What would be the best practice to whitelist / not process, messages 
> that have already been processed by a previous milter. 
> 
If you've already whitelisted a message and want it to bypass SA, then
you will, by definition, have total confidence that your milter does not
generate FPs or FNs. In that case, why pass it through SA when it would
be much simpler for the milter to pass it directly to your MTA for
delivery without any further processing?

I've been doing the opposite for years: in my case getmail collects
incoming mail and passes it through SA, which sends it to a
discrimination program which quarantines spam and passes non-spam to my
internal MTA for delivery. After tuning SA to deal with my particular
incoming mail stream, this has very few FNs or FPs (which are
retrievable from quarantine).

This works for my low volume mail stream: there's no reason why higher
volume sites shouldn't use a full-monty MTA to feed the incoming stream
through SA and a spam discriminator before passing the clean stream to a
second MTA for delivery. 

Martin

Re: Slipping through the cracks

2020-06-19 Thread Martin Gregorie

On Fri, 2020-06-19 at 13:54 -0400, micah anderson wrote:

> 2. gmail (amusingly saying my amazon prime membership is going to
> expire)
>
That would make an obvious local rule if you're continuing to see
messages like that since a Prime expiry notice thats NOT from Amazon is
unlikely to be valid:

Score 5+ if:
 - body or subject mention amazon prime 
and
 - sender and/or Message-ID do not contain a valid Amazon host name.

Remember to keep 2-3 example messages for testing your new rule before
you adding it to your live system.

Martin

Re: score sender domains with 4+ chars in TLD?

2020-06-13 Thread Martin Gregorie

On Sat, 2020-06-13 at 15:25 +0100, RW wrote:
> On Sat, 13 Jun 2020 03:10:52 +0100
> Martin Gregorie wrote:
> 
> > You can easily update the rbldnsd zone data (just write/update the
> > > data file, no need to restart spamd) and could create a custom
> > > scoring value based on the DNS data (EG 127.0.0.2 for really
> > > 'good'
> > > TLDs, 127.0.0.4 for 'so-so' and 127.0.0.8 
> > > for truely spammy names).
> > The advantage of this approach is that if you use a less-than-basic
> > database, i.e. one that allows multiple simultaneous connections,
> > rather than a single connection DBMS like sqlite, you can share it
> > between several SA instances aand use anything from an interactive
> > SQL tool to a mobile app to maintain the blacklist. And there's no
> > need stop anything to update the database content.
> 
> FWIW I've added 6 TLDs and 2 exceptions in the past 5 years.
>
I did wonder how many 4+ character TLDs there are - Can't remember when
I last saw one, but my main point was that the sort of setup I described
is easy and pretty quick to set up if you know a bit of Perl and -
equally important - is very easy to replicate for a different spam type
once you've got one running. Its also a lot less of a kludge than the
'portmanteau rules' I use, with maintenance being simple in both cases.

Martin

Re: score sender domains with 4+ chars in TLD?

2020-06-12 Thread Martin Gregorie

You can easily update the rbldnsd zone data (just write/update the
> data file, no need to restart spamd) and could create a custom scoring
> value based on the DNS data (EG 127.0.0.2 for really 'good' TLDs,
> 127.0.0.4 for 'so-so' and 127.0.0.8 
> for truely spammy names).
> 
A blocklist system that would be a little harder to write, but MUCH
easier to maintain, would be to put the list in a lightweight database,
e.g. MariaDB, and use a Perl plugin module to interface it to SA. The
easy way to do this is to find a similar Perl plugin and hack it to suit
- thats not hard to do.

The database is dead simple: one table containing one column to hold
unwanted domains/addresses declared as the prime key to index it.
Something like:

create table blacklist
{
   domain  varchar(80) primary key;
};   

The advantage of this approach is that if you use a less-than-basic
database, i.e. one that allows multiple simultaneous connections, rather
than a single connection DBMS like sqlite, you can share it between
several SA instances aand use anything from an interactive SQL tool to a
mobile app to maintain the blacklist. And there's no need stop anything
to update the database content.

Martin



> 
> 
> 
> -- 
> Dave Funk   University of Iowa
>  College of Engineering
> 319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S
> Capitol St.
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include 
> Better is not better, 'standard' is better. B{

Re: handling spam from gmail.

2020-06-11 Thread Martin Gregorie

On Thu, 2020-06-11 at 11:27 +0200, Marc Roos wrote:
>  
> I know I need to update, moving to containerized or centos8 when
> ready. 
> However I do not think it will solve much, that is why I am asking
> for 
> this procedure.
> 
You could always write a private rule the adds points to gmail users you
don't want mail from.

Martin

Re: generate rule, wrong?

2020-05-22 Thread Martin Gregorie

On Fri, 2020-05-22 at 11:18 +0200, Maurizio Caloro wrote:
> Hello
> After generating this rule rawbody, spam mail like this words still
> appear, possible mistake from my syntax?
> 
> > required_score 5
> > use_pyzor 1
> > use_razor2 1
> > rawbody BECAUSE_OPTIN
> > /(geschiedene|sexuellen|beziehungen|singlefrauen|zweisamkeit|Dating-
> > Szene|datingszene|sex|männern|wild|unersättlich|dates|girl)/i
> > score BECAUSE_OPTIN 5.0
> 
Two things to look at:

- Its a good idea to only write a new rule after you've had one or two
  messages containing the things your new rule is meant to trigger on.
  Always test the rule against captured spam before using it on
  your live main stream. 

  I have a copy of SA installed on a laptop where its only used only for
  rule testing and I wrote a set of scripts to make testing easier and
  to install the modified ruleset in the live mail stream.

- you need some sort of filter between SA and the end recipient of
  incoming mail that will remove all messages that SA has marked as
  spam. This is what stops the end recipient from seeing the spam. 

  What the filter does is up to you: mine quarantines spam for 7 days
  before deleting it and reports new spam to me via logwatch.   

  So, if you have a filter, test that it works with new rules, and if
  not, build or configure one. Procmail is commonly used as a per-user
  filter, but this does mean that every user has their own spam folder,
  while an arrangement like mine (filter immediately after SA) puts all
  spam in a common holding area.

HTH 

Martin

Re: support-intelligence.net list

2020-05-14 Thread Martin Gregorie

On Thu, 2020-05-14 at 15:43 +, Henry Castro wrote:
> Hi everyone,
> 
> Are DNS queries to support-intelligence.net lists working for you
> right now?
> 
Not from here (UK):

$ host support-intelligence.net
;; connection timed out; no servers could be reached
$ whois support-intelligence.net
[Querying whois.verisign-grs.com]
[Redirected to whois.alices-registry.com]
[Querying whois.alices-registry.com]
[Unable to connect to remote host]
$ traceroute support-intelligence.net
support-intelligence.net: Name or service not known
Cannot handle "host" cmdline arg `support-intelligence.net' on position
1 (argc 1)
$ 

Martin

Re: another extortion email check

2020-05-04 Thread Martin Gregorie

On Mon, 2020-05-04 at 16:25 -0600, Grant Taylor wrote:
> I think $DatabaseTechnology's main benefit is keeping the password
> data outside of the configuration files.
> 
Agreed, in this sort of corner case.

> select count(*) where log.key = md5(key);
> 
Neat.

> You can move the md5 generation into the SQL server.  Of course,
> you'd want to be mindful of the communications channel between
> SpamAssassin and the SQL server.
> 
I was thinking that the database/whatever would be populated by feeding
in lists of dsto=len passwords, since they seem to be more or less
freely and at least semi-legally available, but now I'm wondering if it
would be possible to use some sort of generic pattern matching to
trigger a rule associated with a more complex Perl module which would
extract the password string from a message, encrypt it and check it
against the database. It it hits, score the message as spam. If not,
either:

(a) lob its encrypted form into the DB and return a zero score

or

(b) write its plaintext form to a file for somebody to look at and
decide whether its a false positive or to load it into the database.
Either way the nonmatch should score zero.

Note: this depends on whether SA modules can adjust their triggering
rule's score  -I don't know if this is possible.

The sort of recognition rule I'm thinking of could be something as
simple as if its a Subject header with no spaces in the title and either
a known length or a prefix such as "password:" 

> I'm not sure how to have SpamAssassin iterate all the words in the 
> message through this routine.
> 
Presumably passwords in body text have some surrounding text you could
match on along the lines of money with menaces: 

"I know your bank password, which is x so pay me now".

However, since I've not seen any of these messages, this is probably the
point where somebody who is getting them will speak up and say that I'm
talking garbage. 

> If it requires a custom SpamAssassin module, then it might be better
> to do calculate the MD5 hash in SpamAssassin and avoid the security 
> implications between SpamAssassin and SQL server.
> 
That makes sense because the closer to source you can encrypt the
password the more secure the whole system becomes. Doing it at the
database end would require a fairly gung-ho database - one that supports
a built-in (usually proprietary) procedural language. 

PostgreSQL, SQLServer, Sybase, DB2 and other heavyweights have provide
procedural languages / stored procedures, but the likes of SQLite,
Derby, MariaDB probably do not.
 
Martin

Re: another extortion email check

2020-05-04 Thread Martin Gregorie

On Mon, 2020-05-04 at 15:14 -0600, Grant Taylor wrote:
> I see little benefit of an SQL database vs rules with the encrypted 
> (hashed) passwords (possibly salted with the usernames / email
> address) 
> in the SpamAssassin config file.  Well, save for possible ease of 
> administration in that SA's config file doesn't need to be updated
> when 
> passwords are compromised.
> 
The list of such passwords might get rather long, so using a database
both makes maintenance easier, as you spotted, and also keeps a lot of
junk out of the rule sets. One Perl module and one or two rules
triggering it seem a lot easier to maintain than a whole heap of rules
containing unreadable junk but of course ymmv unless, of course you
write code to autogenerate the rules, but it still sounds like a good
way to inflate the ruleset to no good purpose.

However, I've long realised that not everybody is as keen on databases
as I am. 

> > You get points for added security by obscurity it you can stick it
> > in a corner of an existing, unrelated database.
> 
> 
> 
Yep, not really a serious suggestion.

Martin

Re: another extortion email check

2020-05-04 Thread Martin Gregorie

On Mon, 2020-05-04 at 13:03 -0600, Grant Taylor wrote:

> Which is why I have not.  It's also why I asked if there was a way to 
> compare hashed text.  To quote:
> 
> "Is there any way to compare hashed strings of text?"
> 
> I'll note that my question hasn't been answered.  Instead, people
> have 
> focused on something not germane to my question.
> 
Encrypt them and put them in a single column database table that's also
the prime key for the table? 

Lookup by encrypting the item being checked before looking for an SQL
hit count: 

select count(*) where log.key = key;

0=miss, 1=hit, 2+ = error. Should run fast. Of course, that would need
an SA plugin, but Perl SQL interface code isn't hard and is fairly
compact. For added protection, keep the database on an encrypted
partition. Any database should do: MariaDB, SQLite, PostgreSQL,...

You get points for added security by obscurity it you can stick it in a
corner of an existing, unrelated database.

Martin

Re: Spamassassin always says DKIM_INVALID

2020-01-17 Thread Martin Gregorie

On Sat, 2020-01-18 at 01:29 +0200, Jari Fredriksson wrote:
> On 14.1.2020 15.38, Alex Woick wrote:
> > Spamassassin (3.4.3, the same with previous) declares all or almost 
> > all the incoming DKIM-signed messages as DKIM_INVALID, and I'm not 
> > understanding why.
> > I'm running opendkim on the mail server as milter with Postfix, and 
> > the opendkim headers say the same dkim signatures are all valid.
> > 
> > Example headers of some mail from this list.
> > Opendkim says ok:
> > Authentication-Results: mail.wombaz.de;
> > dkim=pass (2048-bit key) 
> > header.d=linkcheck.co.ukheader.i=@linkcheck.co.uk  header.b="PXrrNHd
> > B"
> > 
> > But Spamassassin says it's invalid:
> > X-Spam-Status: No, score=-15.5 required=5.0
> > tests=BAYES_00,DKIM_ADSP_ALL,
> > DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,
> > MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS,TXREP
> > ,
> > USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no
> > version=3.4.3
> 
> I had the same problem on my mail server, while the server only 1 Gb
> and 
> was an old box. I swapped it to a 4 Gb box and installed OS and SA
> and 
> all as new install.
> 
> Magically the problem went away.
> 
> One more thing: I got DKIM_VALID & DKIM_VALID_AU allright for a day 
> after reboot, but it the started to be DKIM_INVALID. I set the mail 
> server to reboot once a day and it worked. But current system works
> fine 
> without any artificial reboots.
> 
That looks like a combination of:

(a) buffer truncation when memory limits are reached. IOW SA can't get
the full buffer size it asked for, and so truncates the message it was
putting into the buffer rather than aborting on a buffer overflow. Doing
this would certainly screw a checksum.

(b) there's some sort of memory leak, i.e. when releasing a dynamically
requested piece of memory doesn't return all of it, which could slowly
shrink the process's available heap space OR there's code that is
failing to return previously claimed heap space. Something like that
would explain your second issue, which you got round rebooting the
system before SA ran out of heap memory.

Both are things you might expect to see in badly written C programs
and/or C programs whose testing skimped on edge case testing,
particularly when the code uses calls of malloc(), free() and friends to
manage dynamic heap memory use. Equally these are things that I would
not expect to see in Java code because the JVM has a decent garbage
collector and anyway, errors of that are treated as fatal and so would
cause program termination with a diagnostic stack dump. 

However, I'm not familiar enough with Perl to know how it behaves in
these circumstances. 

Still, I hope the above helps with ideas about what to look for. On a
UNIX/Linux box 'top' should show the program size expanding over time of
stack space isn't being released correctly. I've forgotten how you'd
trouble-shoot a Windows system - haven't touched it for over 15 years. 

Martin

Re: txrep duplicated key with postgresql

2019-12-09 Thread Martin Gregorie

On Mon, 2019-12-09 at 11:41 -0800, John Hardin wrote:
> This sounds more like the "does that tuple already exist?" logic is 
> failing, causing it to think it needs to create a new entry, which
> the unique key is (correctly) preventing.
> 
> You don't lightly bypass unique keys. They are there for a reason.
> 
Fair enough. Since this is the first reference I remember seeing to
using PostgreSQL with TxRef I assumed that Benny's cry for help was due
to a difference in the way it handled duplicate keys compared with the
database that normally supports it.

Martin

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1056 matches

Mail list logo