Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-06 Thread Chris Cappuccio
Adam Wolk [adam.w...@tintagel.pl] wrote:
> On Fri, 4 Sep 2015 11:37:09 -0700
> Chris Cappuccio  wrote:
> 
> > Adam Wolk [adam.w...@tintagel.pl] wrote:
> > > > > -rw---  1 _spampd  _spampd   9.8M Sep  3 22:52 bayes_seen
> > > > > -rw---  1 _spampd  _spampd  65.3G Sep  3 22:55 bayes_toks
> > > > > 
> > > > 
> > > > What are your memory limits for the user/daemon class that runs
> > > > spamassassin?
> > > 
> > > Touche, not set. Though it was running like that since ~December
> > > last year hence my question to misc@ if anyone noticed it behaving
> > > differently since the last release. In no way I'm assuming that
> > > something is wrong on the OS / software level - in fact I assumed
> > > that my setup was performed incorrectly by me. So far I learned a
> > > ton of useful info by asking on the list here, hope no one feels
> > > offended :)
> > > 
> > > $ cat /etc/login.conf | grep -i spam 
> > > $ 
> > > 
> > 
> > Well it still runs with some class, perhaps as daemon ?
> > 
> > I guess I'm really asking, is your login.conf modified? Post it and
> > your rc.conf.local
> > 
> 
> Not modified by hand.
> 

In that case, I wonder if you are hitting some kind of bug.
I have been having regular crashes under perl in 5.8/5.8 current, 
I think from spamassassin (called via mailscanner). It looks
like I am hitting some occasional corruption within the sqlite
library after being called through the perl module. 



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-05 Thread Stuart Henderson
On 2015-09-04, Adam Wolk  wrote:
> It's quite possible that Bayesian filtering started working for me only
> since this snapshot. I would appreciate it if you could check the size
> of your bayes_toks db & some info on general growth per email (seems to
> be around 30-60M on my server) as that's the only thing I think could
> be wrong with it atm. 65.3G accumulated in less than 24h for a DB that
> serves around 11k emails *per month* seems a lot (and most of that
> traffic are OpenBSD mailing lists).

That definitely seems wrong, my bayes_toks from 500-1000 mails/day with
amavis+spamassassin is around 5MB. I'm not sure where to start looking
though, I'd probably try wiping the db and starting again, though the
only time I remember having to do that myself is when someone was relaying
spam through a host in DNSWL which got auto-learned as ham (i.e bogus data
not corruption).

$ sudo -u _vscan sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   4202  0  non-token data: nspam
0.000  0   1799  0  non-token data: nham
0.000  0 151022  0  non-token data: ntokens
0.000  0 1422052584  0  non-token data: oldest atime
0.000  0 1441425256  0  non-token data: newest atime
0.000  0 1441426503  0  non-token data: last journal sync atime
0.000  0 1441412100  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire reduction 
count



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread Adam Wolk
On Fri, 4 Sep 2015 12:31:13 -0400
Michael McConville  wrote:

> k...@kurawa.no-ip.org wrote:
> > Adam Wolk  wrote:
> > > After deleting the file, restarting the service processing a
> > > single email brought the DB to reported size 37.9M, few emails
> > > later it's already reported as 113M I have a hunch that it will
> > > bloat again really fast.
> > 
> > try to disable bayes, set parameter "use_bayes 0" and placed into
> > the server-wide local.cf configuration file.
> 
> I administrate a mail server running Debian Jessie that uses the shell
> script method of calling SpamAssassin from Postfix. It uses a ton of
> CPU, so I don't think this is an OpenBSD problem.
> 
> That said, you probably shouldn't disable Bayesian filtering. IIUC,
> that's the main point of using SpamAssassin, and it's necessary to
> block almost all spam.

Thanks, I had an initial suspicion that something was misconfigured on
my previous snapshots as I saw spamassasin being executed but never
used a lot of CPU (though it did flag 1 - literally one, email as spam
- but that's expected volume for a server with 2 accounts).

It's quite possible that Bayesian filtering started working for me only
since this snapshot. I would appreciate it if you could check the size
of your bayes_toks db & some info on general growth per email (seems to
be around 30-60M on my server) as that's the only thing I think could
be wrong with it atm. 65.3G accumulated in less than 24h for a DB that
serves around 11k emails *per month* seems a lot (and most of that
traffic are OpenBSD mailing lists).

Regards,
Adam



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread Michael McConville
k...@kurawa.no-ip.org wrote:
> Adam Wolk  wrote:
> > After deleting the file, restarting the service processing a single
> > email brought the DB to reported size 37.9M, few emails later it's
> > already reported as 113M I have a hunch that it will bloat again
> > really fast.
> 
> try to disable bayes, set parameter "use_bayes 0" and placed into the
> server-wide local.cf configuration file.

I administrate a mail server running Debian Jessie that uses the shell
script method of calling SpamAssassin from Postfix. It uses a ton of
CPU, so I don't think this is an OpenBSD problem.

That said, you probably shouldn't disable Bayesian filtering. IIUC,
that's the main point of using SpamAssassin, and it's necessary to block
almost all spam.



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread Paul de Weerd
On Fri, Sep 04, 2015 at 10:20:01AM +0200, Adam Wolk wrote:
| Hi misc@
| 
| I upgraded my mail server to an amd64 snapshot from Sep 2nd and found
| the server stuck delivering mail in the morning with spamassasin
| churning at 90% CPU usage.
| 
| Quick investigation lead me to a huge bayes_toks file of 65.3G in
| /var/spampd/.spamassasin/.
| 
| $ ls -alh
| total 4738352
| drwx--  2 _spampd  _spampd   512B Sep  4 10:00 .
| drwxr-xr-x  3 _spampd  _spampd   512B Sep  3 15:57 ..
| -rw---  1 _spampd  _spampd36B Sep  4 09:53 bayes.lock
| -rw---  1 _spampd  _spampd   9.8M Sep  3 22:52 bayes_seen
| -rw---  1 _spampd  _spampd  65.3G Sep  3 22:55 bayes_toks
| 
| $ file
| bayes_toks bayes_toks: Berkeley DB 1.85 (Hash, version 2, native
| byte-order)
| 
| 
| Interestingly I don't see that much space used with df (anyone knows
| why?):

You should read up on sparse files.  Here's a quick trick from the
sparse files book of tricks:

# First we create a file 'bigfile' using dd:
[weerd@despair] $ dd if=/dev/zero of=bigfile bs=1048576 count=10 seek=1024
10+0 records in
10+0 records out
10485760 bytes transferred in 0.178 secs (58799094 bytes/sec)

# ls will tell us how big this file is:
[weerd@despair] $ ls -lh bigfile
-rw-r--r--  1 weerd  weerd   1.0G Sep  4 19:51 bigfile

# du will tell us how much space is in use by this file:
[weerd@despair] $ du -sh bigfile
10.1M   bigfile

# cp is even better at the sparse files game:
[weerd@despair] $ cp bigfile bigfile2

# bigfile2 is the same as bigfile:
[weerd@despair] $ ls -lh bigfile2
-rw-r--r--  1 weerd  weerd   1.0G Sep  4 19:54 bigfile2

# No, really .. exactly the same:
[weerd@despair] $ md5 bigfile*
MD5 (bigfile) = 5ec6988d232a445bc40b9dca003b95f7
MD5 (bigfile2) = 5ec6988d232a445bc40b9dca003b95f7

# However, it uses a lot less disk space:
[weerd@despair] $ du -sh bigfile2
48.0K   bigfile2


TL;DR: files with lots of emptiness (consecutive ranges of all 0 data)
are efficiently stored using "sparse files"

| $ df -h
| Filesystem SizeUsed   Avail Capacity  Mounted on
| /dev/sd0a 1008M   90.1M868M 9%/
| /dev/sd0k  9.8G   80.3M9.3G 1%/home
| /dev/sd0d  3.9G118K3.7G 0%/tmp
| /dev/sd0f  3.9G1.0G2.7G28%/usr
| /dev/sd0g 1001M212M738M22%/usr/X11R6
| /dev/sd0h  9.8G572M8.8G 6%/usr/local
| /dev/sd0j  3.9G2.0K3.7G 0%/usr/obj
| /dev/sd0i  2.0G2.0K1.9G 0%/usr/src
| /dev/sd0e  598G4.3G564G 1%/var
| 
| I removed the file and disk usage dropped by 2.3G on /var.
| 
| 
| Did anyone experience issues with spamassasin/spampd similar to the
| one reported above?
| 
| p5-Mail-SpamAssassin-3.4.1p2 (installed)
| spampd-2.30p3 (installed)
| 
| After deleting the file, restarting the service processing a single
| email brought the DB to reported size 37.9M, few emails later it's
| already reported as 113M I have a hunch that it will bloat again really
| fast.
| 
| Regards,
| Adam
| 

-- 
>[<++>-]<+++.>+++[<-->-]<.>+++[<+
+++>-]<.>++[<>-]<+.--.[-]
 http://www.weirdnet.nl/ 



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread Chris Cappuccio
Adam Wolk [adam.w...@tintagel.pl] wrote:
> Hi misc@
> 
> I upgraded my mail server to an amd64 snapshot from Sep 2nd and found
> the server stuck delivering mail in the morning with spamassasin
> churning at 90% CPU usage.
> 
> Quick investigation lead me to a huge bayes_toks file of 65.3G in
> /var/spampd/.spamassasin/.
> 
> $ ls -alh
> total 4738352
> drwx--  2 _spampd  _spampd   512B Sep  4 10:00 .
> drwxr-xr-x  3 _spampd  _spampd   512B Sep  3 15:57 ..
> -rw---  1 _spampd  _spampd36B Sep  4 09:53 bayes.lock
> -rw---  1 _spampd  _spampd   9.8M Sep  3 22:52 bayes_seen
> -rw---  1 _spampd  _spampd  65.3G Sep  3 22:55 bayes_toks
> 

What are your memory limits for the user/daemon class that runs spamassassin?



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread Chris Cappuccio
Adam Wolk [adam.w...@tintagel.pl] wrote:
> > > -rw---  1 _spampd  _spampd   9.8M Sep  3 22:52 bayes_seen
> > > -rw---  1 _spampd  _spampd  65.3G Sep  3 22:55 bayes_toks
> > > 
> > 
> > What are your memory limits for the user/daemon class that runs
> > spamassassin?
> 
> Touche, not set. Though it was running like that since ~December last
> year hence my question to misc@ if anyone noticed it behaving
> differently since the last release. In no way I'm assuming that
> something is wrong on the OS / software level - in fact I assumed that
> my setup was performed incorrectly by me. So far I learned a ton of
> useful info by asking on the list here, hope no one feels offended :)
> 
> $ cat /etc/login.conf | grep -i spam 
> $ 
> 

Well it still runs with some class, perhaps as daemon ?

I guess I'm really asking, is your login.conf modified? Post it and your 
rc.conf.local



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread fab
> $ cat /etc/login.conf | grep -i spam
> $ 

UUOC

grep -i spam /etc/login.conf

But that is not actually answering the question as we don't know the login 
class you are using and what it's limits are like ;-)

You can get the login class by using id(1). For the limits I think you need to 
read login.conf.

Frank.



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread Adam Wolk
On Fri, 4 Sep 2015 11:37:09 -0700
Chris Cappuccio  wrote:

> Adam Wolk [adam.w...@tintagel.pl] wrote:
> > > > -rw---  1 _spampd  _spampd   9.8M Sep  3 22:52 bayes_seen
> > > > -rw---  1 _spampd  _spampd  65.3G Sep  3 22:55 bayes_toks
> > > > 
> > > 
> > > What are your memory limits for the user/daemon class that runs
> > > spamassassin?
> > 
> > Touche, not set. Though it was running like that since ~December
> > last year hence my question to misc@ if anyone noticed it behaving
> > differently since the last release. In no way I'm assuming that
> > something is wrong on the OS / software level - in fact I assumed
> > that my setup was performed incorrectly by me. So far I learned a
> > ton of useful info by asking on the list here, hope no one feels
> > offended :)
> > 
> > $ cat /etc/login.conf | grep -i spam 
> > $ 
> > 
> 
> Well it still runs with some class, perhaps as daemon ?
> 
> I guess I'm really asking, is your login.conf modified? Post it and
> your rc.conf.local
> 

Not modified by hand.

$ grep -i spam /etc/passwd  

_spamd:*:62:62:Spam Daemon:/var/empty:/sbin/nologin
_spamdaemon:*:506:506:SpamAssassin:/var/db/spamassassin:/sbin/nologin
_spampd:*:746:746:spampd user:/var/spampd:/sbin/nologin
$ id _spamd
uid=62(_spamd) gid=62(_spamd) groups=62(_spamd)
$ id _spamdaemon
uid=506(_spamdaemon) gid=506(_spamdaemon) groups=506(_spamdaemon)
$ id _spampd
uid=746(_spampd) gid=746(_spampd) groups=746(_spampd)
$ 



$ cat /etc/login.conf
# $OpenBSD: login.conf,v 1.5 2015/07/20 18:53:18 sthen Exp $

#
# Sample login.conf file.  See login.conf(5) for details.
#

#
# Standard authentication styles:
#
# passwdUse only the local password file
# chpassDo not authenticate, but change users password (change
#   the YP password if the user has one, else change the
#   local password)
# lchpass   Do not login; change user's local password instead
# radiusUse radius authentication
# rejectUse rejected authentication
# skey  Use S/Key authentication
# activ ActivCard X9.9 token authentication
# cryptoCRYPTOCard X9.9 token authentication
# snk   Digital Pathways SecureNet Key authentication
# tis   TIS Firewall Toolkit authentication
# token Generic X9.9 token authentication
# yubikey   YubiKey authentication
#

# Default allowed authentication styles
auth-defaults:auth=passwd,skey:

# Default allowed authentication styles for authentication type ftp
auth-ftp-defaults:auth-ftp=passwd:

#
# The default values
# To alter the default authentication types change the line:
#   :tc=auth-defaults:\
# to be read something like: (enables passwd, "myauth", and activ)
#   :auth=passwd,myauth,activ:\
# Any value changed in the daemon class should be reset in default
# class.
#
default:\
:path=/usr/bin /bin /usr/sbin /sbin /usr/X11R6/bin /usr/local/bin 
/usr/local/sbin:\
:umask=022:\
:datasize-max=512M:\
:datasize-cur=512M:\
:maxproc-max=256:\
:maxproc-cur=128:\
:openfiles-cur=512:\
:stacksize-cur=4M:\
:localcipher=blowfish,8:\
:ypcipher=old:\
:tc=auth-defaults:\
:tc=auth-ftp-defaults:

#
# Settings used by /etc/rc and root
# This must be set properly for daemons started as root by inetd as well.
# Be sure reset these values back to system defaults in the default class!
#
daemon:\
:ignorenologin:\
:datasize=infinity:\
:maxproc=infinity:\
:openfiles-cur=128:\
:stacksize-cur=8M:\
:localcipher=blowfish,9:\
:tc=default:

#
# Staff have fewer restrictions and can login even when nologins are set.
#
staff:\
:datasize-cur=1536M:\
:datasize-max=infinity:\
:maxproc-max=512:\
:maxproc-cur=256:\
:ignorenologin:\
:requirehome@:\
:tc=default:

#
# Authpf accounts get a special motd and shell
#
authpf:\
:welcome=/etc/motd.authpf:\
:shell=/usr/sbin/authpf:\
:tc=default:

#
# Building ports with DPB uses raised limits
#
pbuild:\
:datasize-max=infinity:\
:datasize-cur=4096M:\
:maxproc-max=1024:\
:maxproc-cur=256:\
:tc=default:

#
# Override resource limits for certain daemons started by rc.d(8)
#
bgpd:\
:openfiles-cur=512:\
:tc=daemon:

unbound:\
:openfiles-cur=512:\
:tc=daemon:

dovecot:\
:openfiles-cur=512:\
:openfiles-max=2048:\
:tc=daemon:



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread Adam Wolk
On Fri, 4 Sep 2015 11:08:35 -0700
Chris Cappuccio  wrote:

> Adam Wolk [adam.w...@tintagel.pl] wrote:
> > Hi misc@
> > 
> > I upgraded my mail server to an amd64 snapshot from Sep 2nd and
> > found the server stuck delivering mail in the morning with
> > spamassasin churning at 90% CPU usage.
> > 
> > Quick investigation lead me to a huge bayes_toks file of 65.3G in
> > /var/spampd/.spamassasin/.
> > 
> > $ ls -alh
> > total 4738352
> > drwx--  2 _spampd  _spampd   512B Sep  4 10:00 .
> > drwxr-xr-x  3 _spampd  _spampd   512B Sep  3 15:57 ..
> > -rw---  1 _spampd  _spampd36B Sep  4 09:53 bayes.lock
> > -rw---  1 _spampd  _spampd   9.8M Sep  3 22:52 bayes_seen
> > -rw---  1 _spampd  _spampd  65.3G Sep  3 22:55 bayes_toks
> > 
> 
> What are your memory limits for the user/daemon class that runs
> spamassassin?

Touche, not set. Though it was running like that since ~December last
year hence my question to misc@ if anyone noticed it behaving
differently since the last release. In no way I'm assuming that
something is wrong on the OS / software level - in fact I assumed that
my setup was performed incorrectly by me. So far I learned a ton of
useful info by asking on the list here, hope no one feels offended :)

$ cat /etc/login.conf | grep -i spam 
$ 

Regards,
Adam



spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread Adam Wolk
Hi misc@

I upgraded my mail server to an amd64 snapshot from Sep 2nd and found
the server stuck delivering mail in the morning with spamassasin
churning at 90% CPU usage.

Quick investigation lead me to a huge bayes_toks file of 65.3G in
/var/spampd/.spamassasin/.

$ ls -alh
total 4738352
drwx--  2 _spampd  _spampd   512B Sep  4 10:00 .
drwxr-xr-x  3 _spampd  _spampd   512B Sep  3 15:57 ..
-rw---  1 _spampd  _spampd36B Sep  4 09:53 bayes.lock
-rw---  1 _spampd  _spampd   9.8M Sep  3 22:52 bayes_seen
-rw---  1 _spampd  _spampd  65.3G Sep  3 22:55 bayes_toks

$ file
bayes_toks bayes_toks: Berkeley DB 1.85 (Hash, version 2, native
byte-order)


Interestingly I don't see that much space used with df (anyone knows
why?):

$ df -h
Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/sd0a 1008M   90.1M868M 9%/
/dev/sd0k  9.8G   80.3M9.3G 1%/home
/dev/sd0d  3.9G118K3.7G 0%/tmp
/dev/sd0f  3.9G1.0G2.7G28%/usr
/dev/sd0g 1001M212M738M22%/usr/X11R6
/dev/sd0h  9.8G572M8.8G 6%/usr/local
/dev/sd0j  3.9G2.0K3.7G 0%/usr/obj
/dev/sd0i  2.0G2.0K1.9G 0%/usr/src
/dev/sd0e  598G4.3G564G 1%/var

I removed the file and disk usage dropped by 2.3G on /var.


Did anyone experience issues with spamassasin/spampd similar to the
one reported above?

p5-Mail-SpamAssassin-3.4.1p2 (installed)
spampd-2.30p3 (installed)

After deleting the file, restarting the service processing a single
email brought the DB to reported size 37.9M, few emails later it's
already reported as 113M I have a hunch that it will bloat again really
fast.

Regards,
Adam



Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df

2015-09-04 Thread koko
On Fri, 4 Sep 2015 10:20:01 +0200
Adam Wolk  wrote:

> After deleting the file, restarting the service processing a single
> email brought the DB to reported size 37.9M, few emails later it's
> already reported as 113M I have a hunch that it will bloat again really
> fast.
> 

try to disable bayes, set parameter "use_bayes 0" and
placed into the server-wide local.cf configuration file.