Re: Spamassassin "ignoring" mail with embedded picture

2019-02-15 Thread Ian Zimmerman
On 2019-02-15 16:07, Claudio Kuenzler wrote:

> The man page calls it "will be returned unprocessed"
> What does that mean for Postfix, what kind of response does it get from
> spamc?

It depends on how spamc is invoked.  Please read the whole manpage.

If you invoke it just for the exit status, it will exit the same way as
if the mail were determined by spamd to be ham.

If you invoke it to output a modified copy of message (or just headers)
on standard output, it will just echo the original.

I have no idea how postfix calls spamc; I think that should in fact be
your first line of investigation.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: Is the SA Bayes implementation mathematically sound?

2018-12-24 Thread Ian Zimmerman
On 2018-12-23 17:02, Rick Macdougall wrote:

> I'm just going to jump in here and mention that I train my bayes in SA
> and in Thunderbird email client.
> 
> Thunderbird catches 99%+ and SA catches under 60% with the same
> training data.

Have you also compared the rates of False Positives?

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: Howto - Full Report in Mail Header

2018-12-16 Thread Ian Zimmerman
On 2018-12-16 08:30, Kevin A. McGrail wrote:

> > add_header all Report _REPORT_

> This can cause issues though.  That feature is not header safe to my
> knowledge.

_TESTSCORES_

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Slightly OT: list multiposting

2018-11-22 Thread Ian Zimmerman
Can anyone think of a quick way to flag identical emails posted to
multiple mailing lists under different message-ids?  I guess I'd need
something like a local instance of DCC, do you agree?  Anything simpler
than just taking the real DCC and configuring it for this special
purpose?

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: unexpected FN, how to improve/tune to catch

2018-11-16 Thread Ian Zimmerman
On 2018-11-16 09:52, Matus UHLAR - fantomas wrote:

> such spam should be filtered at mailing list level before this happens.

And it almost always is.  Not in this case.

> what can help you

> - BAYES

understood, I am trying to do without Bayes for now, because I want to
avoid the maintenance (training and, especially, expiring).

> - network rules

those are on

> - URI blacklists

those are on

> did you enable/install razor, pyzor, dcc, spf and dkim libraries?

not dcc, but it would be useless in this case (mailing list is bulk by
definition).  The others are on.

> apparently it does not contain any URI.

It does.  Two web (bitly, masking a redirection to Facebook; plus
wecareusa) and one mailto.

Three followup questions about this last point:

1. Am I correct in assuming that SA decodes base64 MIME parts so it does
act on these links?  Reading the -D output surely indicates so.

2. I remember some discussion here about following shortener links like
bitly.  What is the resolution of that?  Does SA currently (as of 3.4.2)
follow such links, to see (for example) that the link in my spample led
to Facebook?

3. The documentation for the HashBL plugin shows how to set it up to
check addresses from headers.  Is there a way to also check addresses
from mailto links in the body?  If not now, is anything like that
planned for upcoming releases?

Thanks

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


unexpected FN, how to improve/tune to catch

2018-11-15 Thread Ian Zimmerman
This little pearl got through upstream filter on a mailing list.

  https://pastebin.com/JhDGvAAA

I show the body only, but the MIME headers were:

  Content-Transfer-Encoding: base64
  Content-Type: text/plain; charset="utf-8"; Format="flowed"

Also:

  From: yourfrugalstore 
  Message-ID: <88ca9f91-131f-e584-3331-074c5139c...@yourfrugalstore.club>

My scores for it were:

  RCVD_IN_DNSWL_MED=-2.3,SPF_HELO_PASS=-0.0,MAILING_LIST_MULTI=-1.0,TOTAL=-3.3

Here is my user_prefs file:

  # This one disables Bayes.  If you want to use Bayes remove or comment
  # out this line.  You'll need to manage your Bayes database with a
  # cronjob or something.  I can help but I won't do the last tiny detail.
  use_learner 0

  # This means spamassassin will just add headers to the message, and not
  # wrap it as an attachment in a new message.
  report_safe 0

  # Tells spamassassin which Received headers it can trust not to be
  # forged.  In our case, it is a single address, the public address of
  # the server.
  clear_trusted_networks
  trusted_networks 12.34.56.78/32

  # This is not really needed but I included it to be explicit
  clear_dns_servers
  dns_server 127.0.0.1

  # Set the backend library for geoip functionality

  country_db_type GeoIP

Where are all the other scores?  I would have expected at least
something for bit.ly and for the misspelled closing line, which is a
dead spam give-away to a human ...

I have run spamassassin -D on it and everything seems to work as
designed i.e. the tests including URIBL run fine, they just don't catch
anything.  It's disappointing.

Maybe the KAM rules would have got this one?

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Fwd: CVE-2018-12558: DOS in perl module Email::Address

2018-06-20 Thread Ian Zimmerman
This is probably of interest to readers of this list.

http://www.openwall.com/lists/oss-security/2018/06/19/3

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: List From and Reply-To

2018-05-31 Thread Ian Zimmerman
On 2018-05-31 12:25, Antony Stone wrote:

> Anyone is free to set a Reply-To header in the emails they send.  This
> will be preserved by the list server.
> 
> I believe both Ian and Bill are doing this, yes.

Correct.  But Reply-To doesn't mean "follow up with list posts to this
address"; it means "I don't want private replies on this list, ever".
The relevant header for normal list follow-ups is List-Post.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: List From and Reply-To

2018-05-30 Thread Ian Zimmerman
On 2018-05-30 15:49, Palvelin Postmaster wrote:

> Why does this list apparently use the original From header of the
> poster’s message and doesn't set a Reply-To header at all?

Because that is the only right way.

A list manager has no business modifying the contents of posted
messages.  It should be satisfied with the humble role of forwarding
them to subscribers (simplifying, but only slightly so).

If you want to reply to the list, use the appropriate UI in your
client.  For example, in mutt I hit 'L' to send this post.

Hope this helps :-P

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: Mysterious false positives in inbox

2018-05-09 Thread Ian Zimmerman
On 2018-05-09 13:08, Eggert Ehmke wrote:

> > Wild stab - maybe they're entering the system already with
> > ***SPAM*** in the subject?

> The mail also originated from the same server.

All the more reason to suspect the "wild stab" is correct.

In my experience this is quite common on some poorly configured mailing
list servers.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: OFF-TOPIC: Re: Just to lighten your day?

2018-05-03 Thread Ian Zimmerman
On 2018-05-02 14:03, John Hardin wrote:

> Or maybe "He's still moving towards the keyboard! LART him again!"

I thought the funniest part was the last line.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: razor?

2018-03-09 Thread Ian Zimmerman
On 2018-03-09 09:26, David Jones wrote:

> RAZOR like DCC and PYZOR shouldn't be used as a sole source of
> determining spam.  These are indicators that combine with other rule
> hits and scores to be one of many factors.  If the score was 10 or
> more then you would worry about reporting FPs.

Well, _someone_ has to report the FP (I think Razor, confusingly, terms
that "whitelisting") for the misclassification to be reversed.  That's
how Razor is supposed to work - it is a reputation service, both
positive and negative, not just a list of badness.  Making the score
less than a poison pill helps _you_ avoid a FP but it leaves the wrong
result in place for other recipients.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: Bayes not auto-learning?

2018-02-23 Thread Ian Zimmerman
On 2018-02-23 22:32, Amir Caspi wrote:

> So, I've been trying to tweak my setup and noticed that VERY few of my
> emails are being autolearned as spam, even when their spam threshold
> is far above the autolearn threshold.  The threshold is set to 12; I
> just saw a spam with score >25 not being autolearned.

Sigh.  This really is a FAQ, and I did ask it myself (maybe more than
once).

Read the fine documentation.  Shortned: the score that is compared to
the threshold for autolearning is _not_ the normal score that determines
spam/ham.

Despite the fact that is is documented, I find the algorithm to be too
opaque to feel in control.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: pyzor internal error on some messages

2018-02-21 Thread Ian Zimmerman
On 2018-02-20 22:20, Alex wrote:

> Hi,
> 
> Does anyone know what could be causing this? This is on fedora with
> pyzor-1.1.0-1.20170904gitd14e980
> 
> Feb 20 22:08:07.475 [28639] dbg: pyzor: network tests on, attempting Pyzor
> Feb 20 22:08:13.098 [28639] dbg: pyzor: pyzor is available: /usr/bin/pyzor
> Feb 20 22:08:13.100 [28639] dbg: pyzor: opening pipe: /usr/bin/pyzor
> --homedir /var/spool/amavisd --log-file

[...]

> "/usr/lib/python3.5/site-packages/pyzor/client.py", line 258, in run\n

Isn't pyzor a Python 2 program?

Did this start when your distro switched default Python from 2.x to 3.x ?

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Unchecked ??? [Was: Can't locate object method "trim_domain"]

2018-01-26 Thread Ian Zimmerman
What is this ***UNCHECKED*** goo in the subjects?  Has someone played
with the list manager configuration?

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet, fetch the TXT record for the domain.


Re: Penalty for no/bad SPF

2018-01-24 Thread Ian Zimmerman
On 2018-01-24 18:10, Bill Cole wrote:

> 1. Mail with an envelope sender domain that has no SPF record is more
> likely to be spam than the overall mail stream.
> 
> 2. Mail whose envelope sender domain has a published SPF record which
> repudiates the sending IP is more likely to be spam than the overall
> mail stream.
> 
> I don't see evidence that either of those are true now, that they have
> ever been true, or that they are becoming closer to true over time.

I am not taking sides in this dispute, but I'd like to point out that
there is a phenomenon called "self-fulfilling prophesy".  As in all
affairs that involve human behavior and persuasion.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet, fetch the TXT record for the domain.


Re: skipping nameserver '0.ns.spamhaus.org' because it is a CNAME

2018-01-14 Thread Ian Zimmerman
On 2018-01-14 19:30, Alex Lasoriti wrote:

> > things falling apart at spamhaus?
> 
> Not that I am aware of :)  The infrastructure keeps consolidating
> and things are getting stronger and stronger!  What other news are you
> referring to ?

I probably had lodged in my memory (what remains of it) the thread on
SDLU started by Glenn English mid-last month.  Sorry for the
conflation/confusion, and thanks for the detailed explanation of the
current issue.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet, fetch the TXT record for the domain.


Re: skipping nameserver '0.ns.spamhaus.org' because it is a CNAME

2018-01-14 Thread Ian Zimmerman
On 2018-01-14 17:07, Per Jessen wrote:

> AFAIK, bind does not accept NS records with CNAMEs, only A or 
> records.  It looks like spamhaus updated their nameserver config and
> added cloudflare by way of CNAME.

I am getting these, too.  With other news in the last few weeks, are
things falling apart at spamhaus?

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet, fetch the TXT record for the domain.


Re: Malformed spam email gets through.

2018-01-03 Thread Ian Zimmerman
On 2018-01-03 14:36, Bill Cole wrote:

> I have run an environment where each MTA node in the external gateway
> layer would add a MID with its own FQDN to any message passing through
> missing a MID. Those names could not be resolved in the world at
> large, but they were absolutely valid and guaranteed unique.

This is what I do with my personal outgoing messages.  Free 3rd level
DNs are available at freedns.org and I use a bogus (from the DNS POV)
4th level name under one of those, distinct for each host, as the RHS in
my Message-ID.  There's no good reason to use "localhost" or
"localdomain".

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet, fetch the TXT record for the domain.


Perl module to extract body URLs

2017-12-10 Thread Ian Zimmerman
I know that in some cases at least spamassassin relies on perl modules
that are independent of the spamassassin project.  Is there such a
module for extracting URLs from a message body?

OTOH, if that code is specific to spamassassin where in the source tree
can I find it?

Sorry for this slightly off-topic question.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet, fetch the TXT record for the domain.


Re: Why doesn't HK_RANDOM_FROM trigger on this email address?

2017-11-18 Thread Ian Zimmerman
On 2017-11-18 15:46, Mark London wrote:

> FWIW: It seems to me that HK_RANDOM_FROM should trigger on an email
> address like this:
> 
> mqsjkeqgy...@sina.com
> 
> But it doesn't.   Yet it does trigger on this:
> 
> dxn...@sina.com

The first one contains vowels in the local part.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet, fetch the TXT record for the domain.


listed by xbl [Was: SPF check though external relay]

2017-11-14 Thread Ian Zimmerman
~$ rblcheck 81.17.24.158
81.17.24.158 not listed by sbl.spamhaus.org
81.17.24.158 listed by xbl.spamhaus.org
81.17.24.158 not listed by pbl.spamhaus.org
81.17.24.158 not listed by bl.spamcop.net
81.17.24.158 not listed by psbl.surriel.com
81.17.24.158 not listed by dul.dnsbl.sorbs.net

[I wanted to react privately, but soemthing about your address told
me it would go to /dev/null if I did.]

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet, fetch the TXT record for the domain.


Re: improving detection to cloudmark-like levels?

2017-10-12 Thread Ian Zimmerman
On 2017-10-12 09:25, AJ Weber wrote:

> So I'm sure they have some "secret sauce" and I'm not asking for that
> to be revealed, but since pyzor is supposedly using their database,
> I'm just trying to figure out if there's a way to get my SA filter to
> improve even further and close the gap?

I don't know how you got the supposition about pyzor.

pyzor is completely independent of Cloudmark (unlike razor) and AFAIK
pyzor scores are  based on participating users' reports and nothing
else.

pyzor is also libre software, including the server (unlike razor).  That
means anyone can run their own server.  I started doing so a couple of
weeks ago, see [1].  You're welcome to join :-)

[1]
https://lists.gt.net/spamassassin/users/205264

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: Blocking senders that are whitelisted

2017-10-04 Thread Ian Zimmerman
On 2017-10-04 10:52, David Jones wrote:

> I bet this user signed up for this email somehow, possibly a while ago and has
> forgotten about doing so.  So many times, when you register for accounts on
> websites, the check box to opt-in to a mailing list is already checked and 
> most
> users don't take the time to read the page and uncheck the box before clicking

Then it's not really opt-in except to a lawyer.

Sorry, I know this is beating a dead horse.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


OT: toy pyzord server available

2017-09-26 Thread Ian Zimmerman
I started running an open pyzord instance on the host whose domain is my
email domain, on the "normal" port (the one in the example config file).

My main goal is to get familiar with the operation of the server so I
can contribute to the development, but maybe we can do some useful
filtering too!  But clearly, for that I need some users other than
myself.  Anyone can use it as the anonymous user but only for checking;
for reporting, be it spam or ham, a real account is needed.  If you
like, do the preparation described in the pyzor doc (ie. the generation
of a key) and contact me privately.  If you can, please crypto-sign your
message; I will look much more favorably on your request if you do.

May the spammers rot in bithell!

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


signature.asc
Description: PGP signature


Re: ISIPP - Re: bb.barracudacentral.org

2017-09-20 Thread Ian Zimmerman
On 2017-09-20 17:02, Chris wrote:

> So, IIUC it would be a good idea to remove the resolv.conf symlink in
> /run/resolvconf ?

Definitely _not_ a good idea while the resolvconf package is installed.

What I meant was remove the package first, then clean up.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: ISIPP - Re: bb.barracudacentral.org

2017-09-20 Thread Ian Zimmerman
On 2017-09-20 11:15, Martin Gregorie wrote:

> I don't know why you'd want to do that since you should be running
> named instead of dnsmasq.
> 
> Delete the version you just installed via the apt package manager and
> do a search and destroy mission to get rid of both the other copy of
> it and the associated configuration.
> 
> Running "updatedb; locate dnsmasq" is probably the fastest way of
> finding it and its associated files. Anything with a similar name in
> /etc/init.d is probably its launcher script, so that can go too. If
> you have an /etc/rc.local file, check its contents because its run as
> part of the sysVinit process. It shouldn't have anything about dnsmasq
> in it but you never know...

Another thing to check in this kind of mess (and I think it wasn't
mentioned yet) is the state of /etc/resolv.conf.  In Debian (and so in
Ubuntu, too) packages that provide DNS daemons, whether authoritative or
caching only, attempt to manage that file automatically, if the
resolvconf (traditionally) or openresolv package is also installed.  If
you do something "unexpected" you can end up with /etc/resolv.conf in a
strange state.

To avoid that, on my Debian hosts I usually purge resolvconf/openresolv,
make sure that /etc/resolv.conf is a real file (not a symlink), and
manually edit it to the correct state.  If the host is on DHCP I also
make sure the ISC DHCP client is in use (not dhcpcd which seems to be
much less flexible), and change /etc/dhcp/dhclient.conf to not request
(or override) the DNS info provided by DHCP, as that also messes with
resolv.conf.

Finally (and getting really OT), it helps to keep relevant /etc files
under version control, so you know when the system helpfully shifts the
ground under you.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: ISIPP - Re: bb.barracudacentral.org

2017-09-19 Thread Ian Zimmerman
On 2017-09-19 19:53, David B Funk wrote:

> So now you have -two- dnsmasq kits, one installed by "apt" and managed
> thru the "systemctl" tools, and another one that somebody put there
> which is outside the realm of "apt" & "systemctl" (thus they don't
> know how to manange it).
> 
> You should really pick one method of installing/managing software and
> stick with it.
> 
> This is similar to the mess you get when you mix CPAN with
> yum/yast/rpm/apt for installing Perl modules.

Similar but worse, as you can have a safe CPAN + distro mix with local::lib.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: In anyone else getting 325KB spams from cont...@cron-job.org?

2017-09-15 Thread Ian Zimmerman
On 2017-09-15 13:32, RW wrote:

> The default is 500kB for spamc, 256kB is a default for sa-learn.  

I have asked this before:

Does this mean 500 * 1000 bytes or 512 * 1024 bytes, or something else
still?

(this is relevant when configuring other stuff which only understands
straight byte counts with no suffixes)

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: Config option to skip pyzor check on empty body emails?

2017-09-12 Thread Ian Zimmerman
On 2017-09-12 12:33, RW wrote:

> It is a bit confusing, but it's not that the .pyzor directory is use
> inconsistently, it's that pyzor defines 
> 
>   --homedir=HOMEDIR configuration directory

The confusing part is the spelling of the option.  The mistake is clear
from the last line quoted above: it should be "configdir" and not
"homedir".  Admittedly pyzor will put the data there by default as well
(when backed by gdbm) but that's a minor quibble by comparison.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: pyzor config and sig15

2017-09-08 Thread Ian Zimmerman
On 2017-09-08 10:56, Steven Conrad Bayer wrote:

> is the Pyzor network down again?

Works for me now:

ahiker!2 itz$ pyzor check < 
Mail/mail.net.spamassassin.users/new/1504861340.17441_1.ahiker 
public.pyzor.org:24441  (200, 'OK') 0   0

but it was down earlier this week, as discussed in the thread.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: pyzor config and sig15

2017-09-04 Thread Ian Zimmerman
On 2017-09-04 20:11, Alex wrote:

> I'm curious about the options people use for configuring pyzor with
> SA? I've always just had it with --homedir /etc/mail/spamassassin but
> I wanted to make sure I wasn't missing something.

pyzor works fine without any configuration, or with an empty
configuration, as long as you only use the default public server.
And in fact I don't know if there are any other publicly available
servers - that is a question for others on the list.

> I've also noticed it always exits with SIGTERM. It appears to be
> working properly with PYZOR_CHECK hits, but I'm not using any of the
> other options like --accounts-file. Is there anything else I should be
> doing?
> 
> I also notice it consumes the vast majority of time required to
> process each message.

This is because the public server went offline today, thus the client
waits until the default timeout which I think is 5 seconds.  You can
configure the timeout to something less.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: message/rfc822 to mbox script for use with sa-learn workflow

2017-08-14 Thread Ian Zimmerman
On 2017-08-14 20:08, Scott wrote:

> I would like to turn around and put those individual messages back
> into mbox format, again, without changing their original headers.

The first question is: why?  sa-learn works on just about any format:
individual messages, multiple messages in a flat directory, maildirs.

If in spite of the above you _must_ have a mbox file, I would just setup
a trivial procmail config (maybe even an empty one, supplemented with
one or two environment variables including DEFAULT) and pipe the
messages through procmail one by one.

You probably need the -f option to force generation of the From_ mbox
delimiter.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: Bayes auto-learn - not happening

2017-08-08 Thread Ian Zimmerman
On 2017-08-08 15:20, Scott wrote:

> Another new one  big score, auto-learn disabled.  This one is fairly small.  
> 
> X-Spam-Status: Yes, score=29.428 tag=- tag2=5 kill=6.4
> tests=[DATE_IN_PAST_03_06=1.076, DCC_CHECK=3.2,
> DIGEST_MULTIPLE=0.001,
> FILL_THIS_FORM=0.001, FROM_MISSPACED=0.001, FROM_MISSP_SPF_FAIL=1,
> HEADER_FROM_DIFFERENT_DOMAINS=0.001, HEXHASH_WORD=1,
> HTML_EXTRA_CLOSE=0.001, HTML_MESSAGE=0.001,
> HTML_MIME_NO_HTML_TAG=0.635, MIME_HTML_ONLY=1.105, MISSING_MID=0.14,
> NORMAL_HTTP_TO_IP=0.001, RAZOR2_CF_RANGE_51_100=0.365,
> RAZOR2_CF_RANGE_E8_51_100=2.43, RAZOR2_CHECK=2.5,
> RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274, SPF_FAIL=4,
> SPF_HELO_FAIL=4, STYLE_GIBBERISH=3.093,
> T_HTML_TAG_BALANCE_CENTER=0.01, URIBL_ABUSE_SURBL=1.948,
> WEIRD_QUOTING=0.001] autolearn=unavailable autolearn_force=no
> 
> Can you tell if this one has the 3 point match?

Scott,

when I tried to use the autolearn feature I was as confused as you are.
As far as I remember, the 3 point each from header and body is not the
only requirement; the full truth is that some rules are "privileged" and
can contribute to autolearning while others cannot.  I found it opaque
in the extreme and essentially unpredictable, and so I stopped
autolearning and hacked up some scripts that put duplicate of each ham
message into a folder which is then processed by sa-learn from a
cronjob, with sufficient delay that I can review the contents and remove
any false negatives; and similarly with spam, excluding the utterly
horrible category which just goes to /dev/null.

It may not be possible for you to adopt such a process if your volume is
high, but OTOH in that case you probably have users to help you :)

I think this is what RW is telling you, too.

FWIW, this is documented (sort of) by:

perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: Logwatch from local machine being flagged as spam

2017-08-07 Thread Ian Zimmerman
On 2017-08-06 10:37, Scott wrote:

> Centos7
> Posftfix 3.2.2
> Amavisd 2.11.0
> spamassassin-3.4.0

> To: r...@mail2.myserver.com
> From: logwa...@mail2.myserver.com

Since these are locally submitted messages (i.e. not SMTP), IMO the best
and cleanest way to deal with it is to tell the MTA not to pass them to
amavisd, if you can.  This is easy to do with Exim, for example - I'm
not sure about Postfix.  Then you don't have to care about the IP
addresses or domains.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: tflags

2017-08-03 Thread Ian Zimmerman
On 2017-08-03 10:38, sha...@shanew.net wrote:

> The most common ones that I make use of are "multiple" and "maxhits"
> in order to allow a rule to be scored for each time it hits, but to
> stop counting after some threshold.  I also use the "net" tflag so
> that RBL checks only run when a net-based ruleset is loaded.

Where is the concept of "ruleset" in general documented, and in
particular what makes it "net-based"?  Not in Mail::SpamAssassin::Conf.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: Direct download link detection

2017-07-27 Thread Ian Zimmerman
On 2017-07-27 13:08, Rupert Gallagher wrote:

> The rfc prescribes (MUST) the use of your public domain in the domain
> part of your mid.

If you mean RFC 5322, this is not true.  Section 3.6.4:

   The message identifier (msg-id) itself MUST be a globally unique
   identifier for a message.  The generator of the message identifier
   MUST guarantee that the msg-id is unique.  There are several
   algorithms that can be used to accomplish this.  Since the msg-id has
   a similar syntax to addr-spec (identical except that quoted strings,
   comments, and folding white space are not allowed), a good method is
   to put the domain name (or a domain literal IP address) of the host
   on which the message identifier was created on the right-hand side of
   the "@" (since domain names and IP addresses are normally unique),
   and put a combination of the current absolute date and time along
   with some other currently unique (perhaps sequential) identifier
   available on the system (for example, a process id number) on the
   left-hand side.  Though other algorithms will work, it is RECOMMENDED
   that the right-hand side contain some domain identifier (either of
   the host itself or otherwise) such that the generator of the message
   identifier can guarantee the uniqueness of the left-hand side within
   the scope of that domain.

Or do you mean some other RFC, which one?

> So the dns tests are just the first in the queue. The dimain must also
> match early in the Reveived list.

Huh?  Even corrected for the obvious typos, this doesn't make sense.
We're talking about the Message-ID here.

> If you fail with it, then you have problems with every rfc-compliant
> smtp server world-wide. This filter is especially useful against
> scripts, spamming programs, and web-based mailers.

You're free to lose any incoming mail you like, including mine :-)
Though apparently you do get my messages, so I am confused about what
your filter actually does.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: Direct download link detection

2017-07-26 Thread Ian Zimmerman
On 2017-07-26 02:48, Rupert Gallagher wrote:

> When a mail arrives without mid, either the sender did not use a real
> SMTP server or tried to hide it. We have a custom SA rule for it. We
> also reject upfront any mid with a syntax error, or whose domain does
> not have a rdns (eg. @localhost.localdomain or @test.com).

I suspect you'll miss this message, then.

My Message-IDs intentionally identify the originating host, which makes
me more confident that they're unique.  The originating host is behind
two layers of NAT and DHCP, and naturally doesn't have rDNS.

I don't know how to ensure uniqueness if I use the relaying SMTP
server's domain, or the domain of the perimeter of the NATted network,
which can have rDNS (and does, via a dyn-like update service), but which
I do not own or control.

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: ramsonware URI list

2017-07-15 Thread Ian Zimmerman
On 2017-07-15 12:19, David B Funk wrote:

> Another way to use that data is to extract the hostnames and feed them
> into a local URI-dnsbl.

> Using "rbldnsd" is an easy to maintain, lightweight (low CPU/RAM
> overhead) way to implement a local DNSbl for multiple purposes (EG an
> IP-addr based list for RBLDNSd or host-name based URI-dnsbl).

> The URI-dnsbl has an advantage of being easy to add names (just 'cat'
> them on to the end of the data-file with appropriate suffix) and
> doesn't require a restart of any daemon to take effect.

But one still needs to signal rbldnsd to reload the data, right?

If one has just hostname data or fixed IP address data (no ranges) yet
another option is the "constant database" cdb [1].  I use it a lot for
these purposes.  You can even match domain wildcards, by successively
stripping the most significant parts of the subject domain before trying
the match.

I am wondering if (or why not) a similar no-daemon option exists for
CIDR range data.  There are definitely perl modules that manipulate such
data, but none I'm aware of with a built-in compiled, quickly loaded
dataset format.

[1]
https://cr.yp.to/cdb.html
-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: ramsonware URI list

2017-07-15 Thread Ian Zimmerman
On 2017-07-15 11:59, Antony Stone wrote:

> Maybe other people have further optimisations.

With awk already part of the pipeline, all those seds are screaming for
a vacation.

Also, isn't the following command just a no-op?

sed -n p

A couple of quick tests failed to detect any difference from cat ;-)

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.


Re: envelope_sender_header

2017-06-26 Thread Ian Zimmerman
On 2017-06-26 16:17, RW wrote:

> > One runs exim and inserts Return-Path: , the other runs sendmail and
> > inserts Return-path: .
> 
> That's strange, the Sendmail in the FreeBSD base that handles my local
> mail uses Return-Path.

You're right, I got it backwards.  Sorry 8-0

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign:
http://primate.net/~itz/blog/the-problem-with-gpg-signatures.html


envelope_sender_header

2017-06-25 Thread Ian Zimmerman
I would like to unify my user_prefs file on two different servers.

One runs exim and inserts Return-Path: , the other runs sendmail and
inserts Return-path: .

So, is the setting case-sensitive?

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign:
http://primate.net/~itz/blog/the-problem-with-gpg-signatures.html


Re: DKIM_VALID EnvelopeFrom

2017-05-05 Thread Ian Zimmerman
On 2017-05-05 16:00, Merijn van den Kroonenberg wrote:

> So the only thing I want with the envelop from is to extract the
> domain and test if the mail was DKIM signed (and valid) by that
> domain.
> 
> This tells me the envelope from is not some random spoofed address,
> but actually controlled by someone who handled the e-mail before it
> arrived at our mta.

Yes, this is a valid thing to do.

I do this check completely in the MTA (Exim).  Even if for some reason
you reallly need to do it in SA, the easiest way to get the envelope
sender in SA is have the MTA insert a header, such as X-Envelope-From.
Exim can do that and I'm guessing other major MTAs such as Postfix can
too.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign:
http://primate.net/~itz/blog/the-problem-with-gpg-signatures.html


Re: sa-compile will not configure

2017-04-20 Thread Ian Zimmerman
On 2017-04-20 17:31, Robert Steinmetz AIA wrote:

> >>> thelma@thelma:~$ echo $PATH

BTW, do you have any connection to the Thelma who's asking a constant
stream of close-to-newbie questions in the Gentoo user mailing list?

It's not that common a name, so forgive me for the short-circuit in my
brain :-)

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign:
http://primate.net/~itz/blog/the-problem-with-gpg-signatures.html


Re: sa-compile will not configure

2017-04-18 Thread Ian Zimmerman
On 2017-04-18 10:17, Robert Steinmetz wrote:

> tty is in /usr/bin

But it is stty, not tty, which fails to be found.  And stty is
(normally) in /bin.  So it looks a lot like /bin (and probably /sbin) is
missing from the PATH.

This could be related to the long-advertised switch to a unified /usr
tree.  Perhaps Ubuntu went ahead with that switch but some packages
haven't been updated to reflect it?

One other thing which springs to mind is the distinction between login,
interactive, and other shells.  Double-check in which shell startup file
you set the PATH.

BTW, plain text (not HTML) would be appreciated.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign:
http://primate.net/~itz/blog/the-problem-with-gpg-signatures.html


Re: Fastest listing RBL ?

2017-02-15 Thread Ian Zimmerman
On 2017-02-15 16:30, Tom Hendrikx wrote:

> Note that the period that you describe as 'seen by SA a bit later' is
> typically less than a second.

Not in my case.  I have a custom Exim configuration where I
intentionally wait for a period of time (currently 4 minutes) between
SMTP acceptance and delivery (SA runs at delivery time), precisely
because I want to give all the collaborative mechanisms the maximum
chance to kick in.

When I wrote my OP, 4 minutes was shorter than my BIND max-ncache-ttl
parameter.  I have since set that to 180 (3 minutes), so that angle
shouldn't matter any more.  Still the balance between bouncing the most
junk outright and the risk of false positives means it's something to
think about.

> Which RBLs to use, depends on the typical spam you receive, and the
> policies that you wish to apply. IMHO, the trust you put in RBLs (and
> their listing policies) should be more important in making decisions
> than their typical response time to new (types of) spam and their
> TTLs.

Agreed.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Fastest listing RBL ?

2017-02-14 Thread Ian Zimmerman
Given a piece of horrible spam, on which RBL is the sending IP address
likely to appear first?

I want to rationally decide which RBL/s to consult at SMTP time.  Afraid
to use all of them, not just due to false positives, but also due to
negative caching in DNS, which could affect the result when the spam is
seen by SA a bit later.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: pyzor options

2017-02-11 Thread Ian Zimmerman
On 2017-02-11 18:11, David Jones wrote:

> >pyzor_options --homedir=/usr/local/pyzor
> 
> >What am I doing wrong?
> 
> You were close.  No equals sign:
> 
> pyzor_options --homedir /usr/local/pyzor

But the pyzor help text (shown when run without args) tells me there is
an equal sign.  Besides, pyzor is a python program and the usual arg
parsing modules for python understand both spellings.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


pyzor options

2017-02-11 Thread Ian Zimmerman
This may have been part of the reason why I stopped using pyzor.  Taking
a second look now, but the configuration still seems somewhat less than
obvious.

I want to set the pyzor "homedir", that is the directory where the
servers file lives.  I tried (in local.cf):

pyzor_options --homedir=/usr/local/pyzor

pyzor_options "--homedir=/usr/local/pyzor"

Both result in spamassassin logging:

info: config: SpamAssassin failed to parse line,
"--homedir=/usr/local/pyzor" is not valid for "pyzor_options", skipping:
pyzor_options --homedir=/usr/local/pyzor

What am I doing wrong?

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: RFC compliance pedantry (was Re: New type of monstrosity)

2017-02-07 Thread Ian Zimmerman
On 2017-02-07 18:33, Ruga wrote:

> I follow the actual RFC standard, not the proposed revisions. The To
> From and Cc fields are defined by a grammar AND a natural language
> description. Such fields MUST hold addresses, were an address is a
> username the "@" symbol and a domain name. The string "undisclosed
> recipients: ;" does not parse the grammar, and it does not pass the
> natural language requirement for an address. If the sender hides the
> recipients, why should I care delivering its junk to my valued
> accounts?

FWIW, I regularly get completely legitimate non-commercial messages with
headers of this form.  People use it to conceal from each recipient the
addresses of other recipients - just like a list or an alias, but (I'm
guessing) done entirely in the senders MUA.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: New type of monstrosity

2017-02-07 Thread Ian Zimmerman
On 2017-02-07 09:37, Matus UHLAR - fantomas wrote:

> 11.5 - 3.5 = 8.0

And of course 1.2.3.x is not the true relay address, so

> 1.5 BOTNET Relay might be a spambot or virusbot
> [botnet0.8,ip=1.2.3.12,rdns=disorder.censored.net,maildomain=outlook.fr,baddns]

this goes out of the window as well, and you're down to 6.5

> the op may be early recipient, which is why you've got PYZOR hit,
> while the OP had not.  If the OP doesnt't use pyzor, I recomment to
> use it - using razor, pyzor and DCC is very good idea although they
> need external software.

I used to have pyzor, but I dropped it for some reason I don't
remember.  It may be time to have another look at it.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: New type of monstrosity

2017-02-06 Thread Ian Zimmerman
On 2017-02-06 20:06, Kevin A. McGrail wrote:

> > Last couple of weeks I saw some messages whose entire contents is in
> > the Subject.

> never seen such a monster.  likely killed by some other piece in the
> puzzle.  Throw it up on pastebin?

http://pastebin.com/PYaMcZa7

(I was wrong, the subject is actually one enormous line, it was my MUA
that folded it.)

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


New type of monstrosity

2017-02-06 Thread Ian Zimmerman
Last couple of weeks I saw some messages whose entire contents is in the
Subject.  They have both a text/plain and text/html part but both are
empty (in the case of html, there is some markup but no character
data).  The Subject is maybe 400 or 500 chars long.

Needless to say, this is a 100% spam trait, but some escaped.

Is there already a rule somewhere to deal with this?  (not among the
ones bundled with SA, I don't think)

If I'm writing my own, is the naive way to match the Subject going to
work?  I'm asking mostly because the header is properly split and
continued around 60 character bonudaries.  That is, does SA join
continued lines before matching?

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Ignore third-party SA headers

2017-01-25 Thread Ian Zimmerman
On 2017-01-26 01:03, RW wrote:

> Probably what's happening is that these are emails over 500 kB which
> by default are just passed through by spamc without sending them to
> spamd.  If they don't get sent to spamd the existing SA headers don't
> get stripped.
> 
> You can to set the -s parameter on spamc to something larger that the
> largest spam you want to filter.

I have never been clear about this, in two ways.

The relevant bit of man spamc says:

 -s max_size, --max-size=max_size

 Set the maximum message size which will be sent to spamd -- any bigger
 than this threshold and the message will be returned unprocessed
 (default: 500 KB).  If spamc gets handed a message bigger than this, it
 won't be passed to spamd.  The maximum message size is 256 MB.

 The size is specified in bytes, as a positive integer greater than 0.
 For example, -s 50.

My first confusion is that even if there's a knob I can turn up on
spamc, there's a "maximum message size".  What does that mean?  Does
spamd have its own limit?  Is it really that high?  And what happens if
I break it?

Second, is the default 500 * 1000 bytes or 512 * 1024 bytes?  The
example seems to suggest the latter.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Detecting Valid Message Replies

2017-01-03 Thread Ian Zimmerman
On 2017-01-03 13:47, Antony Stone wrote:

> Given the increasing usage of Google-based business email services
> (and others, similar), wouldn't that tend to prevent you being able to
> manipulate the Message-ID header, because you are no longer in charge
> of the outbound server used by senders on your domain?

Most MUAs insert a Message-ID header by themselves, and the MTA doesn't
touch it.  That is definitely how it works here, with mutt and exim.  In
fact my Message-IDs are generated by a script I wrote to override the
mutt built-in ones.

Even many gmail patrons use an IMAP capable MUA and use gmail as just a
SMTP submission server.  It doesn't work perfectly due to quirks in the
gmail IMAP implementation, but it works.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Another DKIM related question (or problem?)

2016-12-31 Thread Ian Zimmerman
On 2016-12-31 20:20, RW wrote:

> Yes, whitelist_auth requires DKIM_VALID_AU. The use of the subdomain
> is something that's allowed under DMARC.

> whitelist_from_dkim my...@aol.com mx.aol.com

Thanks!  That explains things to a large degree.

Now, what about the case when envelope and header sending domains
differ?  For example, I get notifications from craigslist searches, and
they have

From: ale...@craigslist.org

but the envelope sender is something along the lines of

nonsense_hash-itz=primate@alerts.craigslist.org

and the DKIM signature domain is just craigslist.org.

I know that I can have 2 whitelist entries, one for each form of the
address, and that works (ie. I get a -100 score), but it's a bit ugly ;-)

FWIW, the MTA inserts a Return-path header with the envelope sender, and
I do tell spamassassin about it.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Another DKIM related question (or problem?)

2016-12-31 Thread Ian Zimmerman
I have a frequent correspondent on AOL.  I have whitelisted her with

whitelist_auth my...@aol.com

and that is in fact the address on her mails (both envelope and From:).
But the whitelist rule doesn't fire, even though DKIM_VALID _does_
fire.  How so?

I noticed that the domain with which AOL DKIM-signs is not aol.com, but
mx.aol.com.  Could that be the reason?  If yes, is there a way to make
the whitelist work in this case?

(I have other whitelist_auth lines, and they work as expected; in all
those cases the domain of the address is exactly the same as the domain
of the DKIM signature.)

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: T_DKIM_INVALID from yahoo.com

2016-12-25 Thread Ian Zimmerman
On 2016-12-24 19:50, Michael Orlitzky wrote:

> > All mail I get from yahoo customers [1] scores on T_DKIM_INVALID,
> > and always has.  Why?
> 
> Is there any correlation between the DKIM result and the size of the
> message?

Hmm.  I got a few more messages from those domains and they seem to be
passing now.  I suspect this is related to changes in my setup that I
made in response to the quagmire I mention below the fold; i.e. earlier,
I was getting some messages (not necessarily 8 bit or distinct in any
way other than phase of moon at delivery time) which were modified
invisibly in transit by a gateway MTA.

So, I'm putting this on hold for now.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: T_DKIM_INVALID from yahoo.com

2016-12-24 Thread Ian Zimmerman
On 2016-12-24 16:32, Groach wrote:

> I have just done a test and do not get the same results as you.  My 
> yahoo incoming emails pass ok:

And yours passed for me, too.  So it's only a subset of yahoo senders,
apparently :-(

> This might explain it: 
> http://spamassassin.1065346.n5.nabble.com/
> I-m-getting-T-DKIM-INVALID-from-gmail-td109464.html

Clearly not, since some pass (and _all_ legit mail passes from gmail,
earthlink, aol, and so on).

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


T_DKIM_INVALID from yahoo.com

2016-12-24 Thread Ian Zimmerman
All mail I get from yahoo customers [1] scores on T_DKIM_INVALID, and
always has.  Why?

Maybe I can prepare a spample, but it will take some work to find a
privacy friendly specimen, since it obviously can't be altered.

[1] same for hotmail, while other big domains get DKIM_VALID.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: recent increase in spam getting through

2016-12-15 Thread Ian Zimmerman
On 2016-12-15 11:32, Kevin A. McGrail wrote:

> I'm a fan of MIMEDefang but I am not very familiar with Arch Linux so
> I don't know what mta you are using nor it's capabilities.

By now I have heard of MIMEDefang many times, and each time I wanted to
try it.  But it seems to require the milter interface in the MTA
(ie. sendmail or _maybe_ postfix), and I'm married to Exim. :-(

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Spam with attachments and UNPARSEABLE_RELAY

2016-11-25 Thread Ian Zimmerman
On 2016-11-25 13:57, Bill Cole wrote:

> It LOOKS like that is being generated by a PHP script on the host that's 
> delivering it, which appears to be running some atrocious mail handler 
> calling itself 'nullmailer' that doesn't do Received headers in any 
> useful way.

FWIW nullmailer is a respected minimalist MTA:

 [1+0]~$ apt-cache show nullmailer
Package: nullmailer
Version: 1:1.13-1+deb8u1
Installed-Size: 2360
Maintainer: Nick Leverton 
Architecture: amd64
Replaces: mail-transport-agent
Provides: mail-transport-agent
Depends: lsb-base, debconf (>= 0.5) | debconf-2.0, libc6 (>= 2.15),
 libgnutls-deb0-28 (>= 3.3.0), libstdc++6 (>= 4.1.1)
Recommends: rsyslog | system-log-daemon
Conflicts: mail-transport-agent
Description-en: simple relay-only mail transport agent
 Nullmailer is a replacement MTA for hosts, which relay to a fixed set of
 smart relays. It is designed to be simple to configure and especially
 useful on slave machines and in chroots.
Description-md5: cf5bb13c21a01ffa34dc0048e9689c33
Homepage: http://untroubled.org/nullmailer/
Tag: interface::daemon, mail::transport-agent, network::server,
 protocol::smtp, role::program, works-with::mail
Section: mail
Priority: extra
Filename: pool/main/n/nullmailer/nullmailer_1.13-1+deb8u1_amd64.deb
Size: 92642


-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Why is RP_MATCHES_RCVD so "heavy"?

2016-11-22 Thread Ian Zimmerman
On 2016-11-22 14:54, Eric Abrahamsen wrote:

> Can anyone tell me why it's scored so heavily? Would it be a bad idea
> to just drop it down to -1.5 or something?

I score it as 0, and I think a number of others on this list (with much
more expertise than me) do the same.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Best place to filter spam (x-original-to, no_address_mappings)

2016-11-22 Thread Ian Zimmerman
On 2016-11-21 14:27, @lbutlr wrote:

> It’s unclear why you are doing this, but if you want to run SA after
> delivery then the time to do that is in your LDA. *HOW* to do that,
> depends on your LDA. If you are using dovecot, then you can call SA
> from sieve. If not, you can setup procmail as an LDA (or others), and
> call SA from there.

I don't currently use it, but Exim has a "transport filter" feature
where a process is inserted in a pipeline before delivery, running with
the creds of the target user.  _Maybe_ postfix has similar.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Best place to filter spam (x-original-to, no_address_mappings)

2016-11-19 Thread Ian Zimmerman
On 2016-11-18 21:18, MRob wrote:

> I am looking at a system where SpamAssassin is called out from the 
> delivery agent. I know there will be a difference here in terms of the 
> envelope information but I'm not familiar enough to know the pitfalls of 
> this versus calling SA from the postfix content_filter.
> 
> Specifically, I believe it's recommended to call SA in context of 
> receive_override_options=no_address_mappings but this wouldn't be the 
> case when we are in the delivery agent I think. What are the effects of 
> this?

I do a similar thing, but with Exim.

I preserve the envelope information by configuring Exim to insert
Return-Path and X-Envelope-To headers.

Any way you can do this with Postfix is probably quite specific to that
MTA.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Custom rule based on AWL score

2016-10-20 Thread Ian Zimmerman
On 2016-10-20 08:34, simplerezo wrote:

> My understanding is that AWL is helping frequent senders who are known
> to not send spam to "reduce" their spam score, preventing false
> positive. That's exactly what I want to rely on for my rules: adding
> score for mail with "invoice" pretention and an attachment but only
> for very unknown users (or spammers).

Just add your custom rules globally, with reasonable scores.

Whitelisted senders get a _huge_ bonus (I think it's 100 points by
default, maybe customizable), so they won't be affected if you do it
right.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: Tuning recommendations?

2016-09-12 Thread Ian Zimmerman
On 2016-09-12 11:06, John Hardin wrote:

> Consider greylisting.

This will depend on the OP business needs, but a poor man's version of
graylisting is to just delay deliveries unconditionally for a couple of
minutes.  (I use 2 minutes).  If you do this in the MTA make sure the
delay is before SA in the processing pipeline.  This will allow time for
RBLs and collaborative claeringhouses like Razor to see the spam before
SA on your system checks it.

I do this because I don't control the MX host, and thus I have no
information about the connecting IP address, except by inspecting the
headers, which feels "wrong" do to in the MTA.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Why does the arrow on Hillary signs point to the right?


Re: What are the T_ rules ?

2016-09-05 Thread Ian Zimmerman
On 2016-09-05 16:14, @lbutlr wrote:

> > but -1.653 is just a bad joke because it means every homeuser which
> > manages to get some DNS records fine (as well as every spammer which
> > registers a ton of domains and cheap hosts) get a large benefit
> > compared to any professional mainatained server hosting hundrets of
> > domains with responsibility
> 
> RP_MATCHES_RCVD scores a -0.1 and T_RP_MATCHES_RCVD scores a -0.0 on
> my system. I see those scores in emails from 2011.  Don’t know where
> you are finding -1.653, but that is not the score that is getting
> applied here.

FWIW, I see the same score as Mr. rhsoft.  And yes, I agree that it is
way too strong; my meta rule was meant to neutralize it.  But maybe I'll
just take the easy way out and disable it.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Why does the arrow on Hillary signs point to the right?


Re: What are the T_ rules ?

2016-09-05 Thread Ian Zimmerman
On 2016-09-05 21:31, Axb wrote:

> In what file do you see T_RP_MATCHES_RCVD ?

 [1+0]~$ cd /usr/share/spamassassin/
  [2+0]spamassassin$ fgrep T_RP_MATCHES_RCVD *
  72_active.cf:##{ T_RP_MATCHES_RCVD if version >= 3.003000 ifplugin
  Mail::SpamAssassin::Plugin::WLBLEval
  72_active.cf:header   T_RP_MATCHES_RCVD
  eval:check_mailfrom_matches_rcvd()
  72_active.cf:describe T_RP_MATCHES_RCVD  Envelope sender domain
  matches handover relay domain
  72_active.cf:tflags   T_RP_MATCHES_RCVD  nice
  72_active.cf:##} T_RP_MATCHES_RCVD if version >= 3.003000 ifplugin
  Mail::SpamAssassin::Plugin::WLBLEval
  
-- 
Please *no* private Cc: on mailing lists and newsgroups
Why does the arrow on Hillary signs point to the right?


Re: What are the T_ rules ?

2016-09-05 Thread Ian Zimmerman
On 2016-09-05 12:21, John Hardin wrote:

> header  __RP_MATCHES_RCVD  eval:check_mailfrom_matches_rcvd()
> 
> ...which means you'd need to go digging around in the perl code to find 
> out what it's doing.
> 
> Basically, it's a check that the return-path (the SMTP "MAIL FROM" 
> envelope value, if available) matches a received header in the message.

Based on the description string, I think (in fact I hope) that this is
not quite right; it's not "matches _a_ Received header" but "matches
_the_ Received header emitted by my MX host".

It would be a bit too general for my meta rule to rely on it, were it
otherwise.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Why does the arrow on Hillary signs point to the right?


Re: What are the T_ rules ?

2016-09-05 Thread Ian Zimmerman
On 2016-09-05 20:38, li...@rhsoft.net wrote:

> > Since I have seen other rules in results with the T_ prefix (for example
> > T_DKIM_INVALID) I think it must be some kind of convention with an
> > accepted meaning.  What is this conventional meaning, and how do these
> > rules relate to the ones without the T_ prefix?
> 
> T_ is testing - stff which performans questionable for different reaosns 
> like T_DKIM_INVALID failing randomly and nobody knows why or rules where 
> nobody is sure about their impact and if it's ok

Ok, thanks!  But That still leaves my original problem:

>> I want to use RP_MATCHES_RCVD in a meta rule.  I thought I'd check
>> its definition before I plunged in and wrote any code, so I grepped
>> in /usr/share/spamassassin where all the original rules seem to live
>> on my system (debian jessie).  But all the hits are either for
>> __RP_MATCHES_RCVD (which I assume is an internal rule not to be used
>> by outsiders) or for T_RP_MATCHES_RCVD.

So how is RP_MATCHES_RCVD defined?

-- 
Please *no* private Cc: on mailing lists and newsgroups
Why does the arrow on Hillary signs point to the right?


What are the T_ rules ?

2016-09-05 Thread Ian Zimmerman
I want to use RP_MATCHES_RCVD in a meta rule.  I thought I'd check its
definition before I plunged in and wrote any code, so I grepped in
/usr/share/spamassassin where all the original rules seem to live on my
system (debian jessie).  But all the hits are either for
__RP_MATCHES_RCVD (which I assume is an internal rule not to be used by
outsiders) or for T_RP_MATCHES_RCVD.

Since I have seen other rules in results with the T_ prefix (for example
T_DKIM_INVALID) I think it must be some kind of convention with an
accepted meaning.  What is this conventional meaning, and how do these
rules relate to the ones without the T_ prefix?

-- 
Please *no* private Cc: on mailing lists and newsgroups
Why does the arrow on Hillary signs point to the right?


Re: Childish actions of Harald Reindl

2016-08-05 Thread Ian Zimmerman
On 2016-08-05 09:46 +0100, Martin wrote:

> The biggest reason is the way this mailing list is set up, when you
> click reply it replies to the poster not the list, this has always
> been a bug bare of mine and something that probably should be
> addressed.

Then don't "click reply" but use a proper mail user agent (like mutt,
but there are many others) that have a separate List Reply/Followup
function. 

What "should be addressed" is the misconfigured mailing lists that mess
with sender-supplied headers.

-- 
Please *no* private Cc: on mailing lists and newsgroups
Why does the arrow on Hillary signs point to the right?


Re: Issue on disable ipv6

2016-07-01 Thread Ian Zimmerman
On 2016-07-01 20:25 +0200, Massimo Sandolo wrote:

> Hi,
> I have an issue when try to disable ipv6.
> I'm running Debian 8.3 with SpamAssassin version 3.4.0 (running on Perl
> version 5.20.2).
> In /etc/defualt/spamassassin the options line is the following:
> OPTIONS="-4 --create-prefs --max-children 5 --helper-home-dir -x -u
> usermail"
> 
> I tried also with --ipv4-only, but it doesn't work, I'm still receiving the
> following error "spamc[22477]: connect to spamd on ::1 failed, retrying (#1
> of 3): Connection refused".

What is the line or lines containing "localhost" in /etc/hosts?  You'll
need to comment out the one with the IPv6 address (::1), and leave the
one with IPv4 address (127.0.0.1) uncommented.

This is all assuming you run spamd and spamc on the same host.  If not,
please tell us about the network setup between the two hosts.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Why does the arrow on Hillary signs point to the right?


Re: sa-update through proxy

2016-05-04 Thread Ian Zimmerman
On 2016-05-04 08:13 -0700, John Hardin wrote:

> > alias sa-update='env http_proxy=http://myserver:myport/
> > https_proxy=http://myserver:myport/  sa-update'
> 
> Lose the "env"?

Why?  Apart from using an extra process, this should work exactly the same.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Reporting [Was: Disabling spamcop plugin]

2016-04-21 Thread Ian Zimmerman
On 2016-04-07 13:55 -0700, Ian Zimmerman wrote:

> sa-learn doesn't do any reporting, right?

[snip snip]

> By the way, manpage for spamc says:
> 
>-C report type, --reporttype=type
>Report or revoke a message to one of the configured
>collaborative filtering databases.
>The "report type" can be either report or revoke.
> 
> "To one of the databases"?  Which one?  Isn't this a bug in the manpage?

Unfortunately the thread went sideways into opinion territory after
this, but I'd still like to clarify these factual points.  Anyone?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: [OT] still configuring [Was: Disabling spamcop plugin]

2016-04-13 Thread Ian Zimmerman
On 2016-04-13 09:12 -0400, Michael Orlitzky wrote:

> package will be recompiled automatically as part of the updates. Any
> packages *depending on* that package (like, if they're statically linked
> to it) will also be recompiled.

But also _direct_ dependencies of the affected package, if the latest
version has new requirements.  And this is the heart of the problem.
With a dedicated security channel like debian has, the fixes are
recompiled targeted to the base release, so (for example) I'd never have
to update perl because of a fix in spamassassin.

In fact you can leave debian servers to update themselves unattended,
most of the time.  This is too huge a benefit for me to drop, even
weighed against the recent debian annoyances.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


[OT] still configuring [Was: Disabling spamcop plugin]

2016-04-12 Thread Ian Zimmerman
On 2016-04-12 10:57 -0400, David Niklas wrote:

> You could use Gentoo, you get to configure it all yourself!

Funny you'd say that, I _am_ actually switching to it - on my
"workstation" role computers.  I'm already over 50% over the hump, I
think. 

But on "server type" computers, I just cannot spare a dedicated security
branch.  I really don't have the time, and more importantly the nerves,
to scramble and recompile the world when each new vulnerability is
announced.

> You might also try Arch or Devuan.  What distro are you using now?

Debian.  Have been using it over 15 years now, and watched some of the
fun vanish over the last few.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Disabling spamcop plugin

2016-04-07 Thread Ian Zimmerman
On 2016-04-07 14:37 +0100, RW wrote:

> What exactly are you trying to do here?
> 
> The pyzor plugin does testing and reporting, use_pyzor is mostly there
> to control the test. The spamcop plugin does reporting only.

So, if I don't do any explicit reporting (neither spamc -C nor
spamassassin -r), the spamcop plugin is not actually used at all?

sa-learn doesn't do any reporting, right?

My high-level goal here is to get rid of as many configuration changes
as I can in the system-managed area (/etc in my case) and achieve the
same effects by other means.  This is because I'm learning that I cannot
trust my distro not to screw me over anymore.

I noticed that I had disabled the spamcop plugin before by commenting it
out in /etc/*/init.pre, and I wanted to continue not using it even after
I reverted that file to its pristine distro state.

By the way, manpage for spamc says:

   -C report type, --reporttype=type
   Report or revoke a message to one of the configured
   collaborative filtering databases.
   The "report type" can be either report or revoke.

"To one of the databases"?  Which one?  Isn't this a bug in the manpage?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Disabling spamcop plugin

2016-04-06 Thread Ian Zimmerman
Is there any way to disable the spamcop plugin for an individual user
(i.e. from ~/.spamassassin/user_prefs) if the plugin is loaded by
/etc/spamassassin/*.pre ?

By comparison, I seem to be able to disable pyzor even if it is loaded,
by writing

  use_pyzor 0

in my user_prefs.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Bayes expiry vs. sync, again

2016-03-15 Thread Ian Zimmerman
I am sorry to return to this horse which has perhaps been beaten
enough.  But I still don't know and don't understand (_after_ reading
the docs) if I can, at the same time:

1. completely disable expiry

2. force a sync of the journal

I just saw with my own eyes that passing --sync to sa-learn does _not_
necessarily force one.  (The manpage is ambiguous about it.)  But I
don't want to pass --force-expire because of 1.

I am asking in the context of using the default db backend for Bayes,
but if there is a way to do this with one of the other options, I'll
consider it.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Interesting rule combo results

2016-03-09 Thread Ian Zimmerman
On 2016-03-09 07:12 -0800, Marc Perkel wrote:

> >>HAM RULES:
> >>...
> >>   80056 HTML_MESSAGE
> >
> >What's happening here? This seems to imply that  HTML_MESSAGE only
> >appears in ham.
> >
> >
> 
> I think my results are a little strange in that I might not be
> training off all the data but just that which gets past all my other
> filters. I'm still working on this but thought I'd share what it came
> up with for better or worse.

If I take your explanation in the OP verbatim, what happens here is that
HTML_MESSAGE _without any other rule hits_ only appears in ham.  Which
seems entirely plausible, even if perhaps not very useful.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Is BAYES filtering working? Having doubts.

2015-12-29 Thread Ian Zimmerman
On 2015-12-29 20:41 -0500, Bill Cole wrote:

> Neither su nor sudo magically changes the permissions or ownership of
> files. If you pass filenames as arguments they must be readable by the
> user actually running sa-learn, which is the *unprivileged* user
> handling the system-wide BayesDB ("amavis" in the case originating
> this thread, but "spamd" and "defang" are other common ones...) In
> most reasonably well-secured systems using Maildir message stores, the
> Maildirs are all owned by individual users or by one user that handles
> delivery to "virtual users" understood by the MTA and IMAP or POP
> server by not by the OS. That is generally NOT the same user running
> spamd or content filters for a system-wide BayesDB. As a result,
> relearning has to be done as root, shuttling data from files owned by
> one user into a process running as another.

You are right.  The reason it works for me is that I don't use a
systemwide DB.

May I ask that you turn down the sarcasm a bit?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Is BAYES filtering working? Having doubts.

2015-12-29 Thread Ian Zimmerman
On 2015-12-29 19:44 -0500, Bill Cole wrote:

> On 29 Dec 2015, at 18:54, Ian Zimmerman wrote:
> 
> >In fact sa-learn accepts multiple named arguments on the command line,
> >so the alternative I use is to go through the spambox N files at a time
> >in a shell loop.  (I have N=100 but obviously this depends.)
> 
> Which successfully ignores the original issue of this thread completely: that 
> the
> user sa-learn must run as cannot read the files being learnt. If you pass 
> unreadable
> filenames as arguments, sa-learn just whines and fails. Shockingly, that is 
> not the
> desired result.

Clearly you can do the su magic if needed.  The point is that the
overhead which you fear is reduced N times.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Is BAYES filtering working? Having doubts.

2015-12-29 Thread Ian Zimmerman
On 2015-12-29 17:50 -0500, Bill Cole wrote:

> Yes, with the advantage of using Mail::SpamAssassin::Util::secure_tmpfile() 
> rather
> than whatever I happen to roll up in a bit of Q shell that I never get 
> around to
> reviewing for edge cases...
> 
> The main reason to do something like that is to avoid the heavyweight sudo & 
> load of
> a Perl script for each message.
> 
> >
> >>The alternative without formail would be to pipe each raw message into
> >>its own sa-learn.
> >
> >The alternative is to give it a directory.

In fact sa-learn accepts multiple named arguments on the command line,
so the alternative I use is to go through the spambox N files at a time
in a shell loop.  (I have N=100 but obviously this depends.)

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: A Plan to Stop Violence on Social Media

2015-12-16 Thread Ian Zimmerman
On 2015-12-16 14:21 -0800, jdow wrote:

> One thing worth pointing out is if this CAN be done refusing to do it
> yourself is a shallow gesture.

No, it is not.  Refusing to take part in what you believe is wrong, even
if you know the wrong will be done eventually because the Zeitgeist
favors it, is a legitimate point of view.

Then again, I don't give a rodent's back what Facebook or Twitter does.
But I am afraid it won't stop there.

Of course this is totally OT, so I won't post anymore of this here, but
I could discuss it off-list.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Trying Bayes / Redis

2015-12-11 Thread Ian Zimmerman
On 2015-12-11 14:29 -0800, Marc Perkel wrote:

> Anyone using this rule timing plugin? Having trouble getting it to
> work. Just wondering if it's worth it?
> 
> Mail::SpamAssassin::Plugin::RuleTimingRedis

I use it and I have no trouble now.  But I remember I had to disable the
LUA scripting stuff when I set it up, it wouldn't work even though my
Redis version should be recent enough to support it.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Debian jessie - new setup, missing data directory

2015-11-09 Thread Ian Zimmerman
On 2015-11-09 16:42 +0100, Antony Stone wrote:

> What did Jessie install it as?
> 
> > > /var/mail/.spamassassin/user_prefs

This is very strange.  Are you really sure it is not operator error?

I run wheezy, so I can't flat out exclude it, but it flies in the face
of too much Debian tradition. /var/mail is just for the spool mailboxes.

> 1. I seriously doubt that on a Debian system exim is running as root.

Indeed:

 [6+0]~$ ps axl | fgrep 'exim4 -bd'
5   101  3230 1  20   0  46824  2860 ?  Ss   ?  0:06
/usr/sbin/exim4 -bd -q30m
0  1000  8368  8311  20   0   7800  1760 -  S+   pts/1  0:00
fgrep exim4 -bd
 [7+0]~$ awk 'BEGIN { FS=":" } ( $3 == "101" ) { print $0 }' <
 /etc/passwd
Debian-exim:x:101:103::/var/spool/exim4:/bin/false

> 2. It sounds like we're talking slightly at cross-purposes here.  Exim may be 
> calling spamassassin (PS: how?)

It matters a good deal.  If it's called from the content filtering hook
or the ACLs, spamassassin runs as the exim UID (unless it is itself
setuid, of course).  But if it's called as a "transport filter", it runs
as the destination user.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Checking if sa-learn is actually learning

2015-10-16 Thread Ian Zimmerman
On 2015-10-16 20:59 -0500, Ryan Coleman wrote:

> sa-learn commands:
> [scans domains for specified folders and scans them]
> > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.New*' -type d -exec 
> > /usr/bin/sa-learn --no-sync --spam --progress {}* \;
> > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.Suspected*' -type d -exec 
> > /usr/bin/sa-learn --no-sync --spam --progress {}* \;
> 
> I swear I had issues in the past without having —no-sync, but is that causing 
> it?

If you do the routine learning with --no-sync, you must have one run with
--sync as well, maybe in a cron job.  Or just run with --sync once at
the end of this same script.  That much is straightforward, and should
be clear from the man/pod pages.

The part that caused me some trouble, and is somewhat underdocumented
IMO, is the interaction of --sync with --force-expire.  I'm afraid I
can't help you with that because I took the extreme step of disabling
expiration, and instead re-creating a fresh database monthly from the
recent corpus which I keep around.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: best way to whitelist this list?

2015-09-19 Thread Ian Zimmerman
On 2015-09-19 20:12 +0200, A. Schulze wrote:

> today I was notified by ezmlm that my MTA rejected messages to
> me. Messages to this list where classified as spam by .. spamassassin.

All of today's messages here scored around -7.5 for me, with no special
handling.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: [Announce] SA-Plugins: RedisAWL, RuleTimingRedis

2015-09-15 Thread Ian Zimmerman
On 2015-06-09 17:57 +0200, Benning, Markus wrote:

> RuleTimingRedis - collect SA rule timings in redis

I'm trying this out.  I have a little annoying problem: the logs
beginning on line 178 seem to go to stdout or stderr as well as syslog.
The result is that cron sends me email every time spamd is restarted
(after every rule update).  Do you know how to change that?  I find
nothing about logging in perldoc Mail::SpamAssassin::Conf.

I suppose I could just delete those lines from the module :-)  But then
I would have extra work when I merge with any new versions you have.

Thanks for your ideas.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Live upgrade safe?

2015-09-11 Thread Ian Zimmerman
On 2015-09-11 17:35 +0200, Reindl Harald wrote:

> >>>Can I safely upgrade SA from 3.4.0 to 3.4.1 without changing any local
> >>>configuration files, and without regenerating the Bayes database?  (I
> >>>use the default bdb Bayes store.)
> >>
> >>yes, but you need to run "sa-update" before restart to fetch the
> >>latest rules and hopefully have a distribution which restarts
> >>automatically after update the package
> >
> >Isn't this a contradiction?  If my distribution automatically restarts
> >(which it does), how can I sneak in a sa-update run after the upgrade
> >but before the restart?
> 
> i hope you have a testing environment for production and so just make
> the "sa-update" there and rsync the rule-updates to the liveserver

I appreciate you trying to help, but you don't really answer my
question.  Even if I could do what you suggest, the rsync would still
take finite time - longer than the interval between the upgrade and the
restart on the production system.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Live upgrade safe?

2015-09-11 Thread Ian Zimmerman
On 2015-08-14 17:45 +0200, Reindl Harald wrote:

> >Can I safely upgrade SA from 3.4.0 to 3.4.1 without changing any local
> >configuration files, and without regenerating the Bayes database?  (I
> >use the default bdb Bayes store.)
> 
> yes, but you need to run "sa-update" before restart to fetch the
> latest rules and hopefully have a distribution which restarts
> automatically after update the package

Isn't this a contradiction?  If my distribution automatically restarts
(which it does), how can I sneak in a sa-update run after the upgrade
but before the restart?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Live upgrade safe?

2015-08-14 Thread Ian Zimmerman
Can I safely upgrade SA from 3.4.0 to 3.4.1 without changing any local
configuration files, and without regenerating the Bayes database?  (I
use the default bdb Bayes store.)

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



bayes expiry not happening when it should

2015-08-05 Thread Ian Zimmerman
~$ grep '^bayes_expiry_max_db_size' ~/.spamassassin/user_prefs | awk '{print 
$2}' 
200
~$ sa-learn --force-expire
bayes: synced databases from journal in 0 seconds: 2784 unique entries (2805 
total entries)
~$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  24501  0  non-token data: nspam
0.000  0  23548  0  non-token data: nham
0.000  02009202  0  non-token data: ntokens
0.000  0 100071  0  non-token data: oldest atime
0.000  0 1438755640  0  non-token data: newest atime
0.000  0 1438755988  0  non-token data: last journal sync atime
0.000  0 1438756034  0  non-token data: last expiry atime
0.000  0   11059200  0  non-token data: last expire atime delta
0.000  0  20174  0  non-token data: last expire reduction 
count

??wth???  I thought I _finally_ understood this stuff :-(

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: bayes expiry not happening when it should

2015-08-05 Thread Ian Zimmerman
On 2015-08-05 12:58 +0100, RW wrote:

 The number of tokens is within 0.5% of the configured value. It's
 designed to produce a value between 75% and roughly 150%.

I can't quite parse that answer, so let's be more specific.

Doc says:

  bayes_expiry_max_db_size  (default: 15)

What should be the maximum size of the Bayes tokens database?  When
expiry occurs, the Bayes system will keep either 75% of the maximum
value, or 100,000 tokens, whichever has a larger value.

From this (and the more elaborate description in the EXPIRATION section,
which I've also read) I thought it worked roughly like this:

if (ntokens  bayes_expiry_max_db_size)
do_nothing()
else
goal_ntokens = max(10, 0.75 * bayes_expiry_max_db_size)
while (ntokens  goal_ntokens)
kill_oldest_tokens()

If I misunderstood, how/where?  Sorry for my density :-(

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: bayes expiry not happening when it should

2015-08-05 Thread Ian Zimmerman
On 2015-08-05 19:34 +0100, RW wrote:

 What it actually does is estimate a cut-off time and then delete all
 tokens older than that. How it gets the cut-off time is described the
 next two sections:  EXPIRE LOGIC and ESTIMATION PASS LOGIC.

OMG.  For one thing, are the clauses in the definition of weird
conjunctive or disjunctive?

A more insolent question, why this complexity?  Why can't I force an
expire when I feel like it? :-P  Or can I?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



Re: no reporting methods available

2015-07-31 Thread Ian Zimmerman
On 2015-07-31 18:28 -0500, David B Funk wrote:

 Reporting is separate from learning.
 
 It is the case that spamassassin -r is supposed to report and learn.
 However it isn't quite the same as sa-learn --spam in that unlike
 sa-learn --spam it won't override the spam learn prohibition of BAYES_00.

Thanks, that is useful to know.  However, it isn't really relevant to
this situation.  My point is: if learning _is_ part of the job of
spamassassin -r, then does it have to fail for the no method available
message to be emitted?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



no reporting methods available

2015-07-31 Thread Ian Zimmerman
I run spamassassin -r from cron nightly.  Last night I got this output:

Jul 30 23:00:11.830 [31065] warn: reporter: no reporting methods
available, so couldn't report
Jul 30 23:00:11.830 [31065] warn: spamassassin: warning, unable to
report message
Jul 30 23:00:11.830 [31065] warn: spamassassin: for more information,
re-run with -D option to
see debug output

I tried to follow the instructions and run

spamassassin -D -r `ls spam`

but that hangs without producing any output.

The only external reporting method I'm aware of that should be active is
Razor.  Running razor-report `ls spam` works normally as expected.

Aside from getting an explanation of what happened this time, I'd also
like to clarify more generally what spamassassin -r does.  From a recent
thread here I learned that it also does the equivalent of sa-learn
--spam.  Right?  So presumably it doesn't consider this a reporting
method or how could it be not available?

Also I recently installed the bogofilter plugin by Christian Laußat, and
my understanding is that (when bogofilter_learn is set to 1, as it is),
it advertises itself as another external reporting agent.  So shouldn't
this also happen during a spamassassin -r run, and how could it be not
available?


-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



another bayes oddity

2015-07-23 Thread Ian Zimmerman
I have

bayes_auto_learn0
bayes_auto_expire 0
bayes_learn_to_journal 0

add_header all Autolearn _AUTOLEARN_


and indeed, all messages are tagged with

X-Spam-Autolearn: disabled


Nevertheless, the mtime _and_ size of ~/.spamassassin/bayes_journal
inches forward with every delivery.  Why?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.



  1   2   >