Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
Adam Wolk [adam.w...@tintagel.pl] wrote: > On Fri, 4 Sep 2015 11:37:09 -0700 > Chris Cappucciowrote: > > > Adam Wolk [adam.w...@tintagel.pl] wrote: > > > > > -rw--- 1 _spampd _spampd 9.8M Sep 3 22:52 bayes_seen > > > > > -rw--- 1 _spampd _spampd 65.3G Sep 3 22:55 bayes_toks > > > > > > > > > > > > > What are your memory limits for the user/daemon class that runs > > > > spamassassin? > > > > > > Touche, not set. Though it was running like that since ~December > > > last year hence my question to misc@ if anyone noticed it behaving > > > differently since the last release. In no way I'm assuming that > > > something is wrong on the OS / software level - in fact I assumed > > > that my setup was performed incorrectly by me. So far I learned a > > > ton of useful info by asking on the list here, hope no one feels > > > offended :) > > > > > > $ cat /etc/login.conf | grep -i spam > > > $ > > > > > > > Well it still runs with some class, perhaps as daemon ? > > > > I guess I'm really asking, is your login.conf modified? Post it and > > your rc.conf.local > > > > Not modified by hand. > In that case, I wonder if you are hitting some kind of bug. I have been having regular crashes under perl in 5.8/5.8 current, I think from spamassassin (called via mailscanner). It looks like I am hitting some occasional corruption within the sqlite library after being called through the perl module.
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
On 2015-09-04, Adam Wolkwrote: > It's quite possible that Bayesian filtering started working for me only > since this snapshot. I would appreciate it if you could check the size > of your bayes_toks db & some info on general growth per email (seems to > be around 30-60M on my server) as that's the only thing I think could > be wrong with it atm. 65.3G accumulated in less than 24h for a DB that > serves around 11k emails *per month* seems a lot (and most of that > traffic are OpenBSD mailing lists). That definitely seems wrong, my bayes_toks from 500-1000 mails/day with amavis+spamassassin is around 5MB. I'm not sure where to start looking though, I'd probably try wiping the db and starting again, though the only time I remember having to do that myself is when someone was relaying spam through a host in DNSWL which got auto-learned as ham (i.e bogus data not corruption). $ sudo -u _vscan sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 4202 0 non-token data: nspam 0.000 0 1799 0 non-token data: nham 0.000 0 151022 0 non-token data: ntokens 0.000 0 1422052584 0 non-token data: oldest atime 0.000 0 1441425256 0 non-token data: newest atime 0.000 0 1441426503 0 non-token data: last journal sync atime 0.000 0 1441412100 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
On Fri, 4 Sep 2015 12:31:13 -0400 Michael McConvillewrote: > k...@kurawa.no-ip.org wrote: > > Adam Wolk wrote: > > > After deleting the file, restarting the service processing a > > > single email brought the DB to reported size 37.9M, few emails > > > later it's already reported as 113M I have a hunch that it will > > > bloat again really fast. > > > > try to disable bayes, set parameter "use_bayes 0" and placed into > > the server-wide local.cf configuration file. > > I administrate a mail server running Debian Jessie that uses the shell > script method of calling SpamAssassin from Postfix. It uses a ton of > CPU, so I don't think this is an OpenBSD problem. > > That said, you probably shouldn't disable Bayesian filtering. IIUC, > that's the main point of using SpamAssassin, and it's necessary to > block almost all spam. Thanks, I had an initial suspicion that something was misconfigured on my previous snapshots as I saw spamassasin being executed but never used a lot of CPU (though it did flag 1 - literally one, email as spam - but that's expected volume for a server with 2 accounts). It's quite possible that Bayesian filtering started working for me only since this snapshot. I would appreciate it if you could check the size of your bayes_toks db & some info on general growth per email (seems to be around 30-60M on my server) as that's the only thing I think could be wrong with it atm. 65.3G accumulated in less than 24h for a DB that serves around 11k emails *per month* seems a lot (and most of that traffic are OpenBSD mailing lists). Regards, Adam
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
k...@kurawa.no-ip.org wrote: > Adam Wolkwrote: > > After deleting the file, restarting the service processing a single > > email brought the DB to reported size 37.9M, few emails later it's > > already reported as 113M I have a hunch that it will bloat again > > really fast. > > try to disable bayes, set parameter "use_bayes 0" and placed into the > server-wide local.cf configuration file. I administrate a mail server running Debian Jessie that uses the shell script method of calling SpamAssassin from Postfix. It uses a ton of CPU, so I don't think this is an OpenBSD problem. That said, you probably shouldn't disable Bayesian filtering. IIUC, that's the main point of using SpamAssassin, and it's necessary to block almost all spam.
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
On Fri, Sep 04, 2015 at 10:20:01AM +0200, Adam Wolk wrote: | Hi misc@ | | I upgraded my mail server to an amd64 snapshot from Sep 2nd and found | the server stuck delivering mail in the morning with spamassasin | churning at 90% CPU usage. | | Quick investigation lead me to a huge bayes_toks file of 65.3G in | /var/spampd/.spamassasin/. | | $ ls -alh | total 4738352 | drwx-- 2 _spampd _spampd 512B Sep 4 10:00 . | drwxr-xr-x 3 _spampd _spampd 512B Sep 3 15:57 .. | -rw--- 1 _spampd _spampd36B Sep 4 09:53 bayes.lock | -rw--- 1 _spampd _spampd 9.8M Sep 3 22:52 bayes_seen | -rw--- 1 _spampd _spampd 65.3G Sep 3 22:55 bayes_toks | | $ file | bayes_toks bayes_toks: Berkeley DB 1.85 (Hash, version 2, native | byte-order) | | | Interestingly I don't see that much space used with df (anyone knows | why?): You should read up on sparse files. Here's a quick trick from the sparse files book of tricks: # First we create a file 'bigfile' using dd: [weerd@despair] $ dd if=/dev/zero of=bigfile bs=1048576 count=10 seek=1024 10+0 records in 10+0 records out 10485760 bytes transferred in 0.178 secs (58799094 bytes/sec) # ls will tell us how big this file is: [weerd@despair] $ ls -lh bigfile -rw-r--r-- 1 weerd weerd 1.0G Sep 4 19:51 bigfile # du will tell us how much space is in use by this file: [weerd@despair] $ du -sh bigfile 10.1M bigfile # cp is even better at the sparse files game: [weerd@despair] $ cp bigfile bigfile2 # bigfile2 is the same as bigfile: [weerd@despair] $ ls -lh bigfile2 -rw-r--r-- 1 weerd weerd 1.0G Sep 4 19:54 bigfile2 # No, really .. exactly the same: [weerd@despair] $ md5 bigfile* MD5 (bigfile) = 5ec6988d232a445bc40b9dca003b95f7 MD5 (bigfile2) = 5ec6988d232a445bc40b9dca003b95f7 # However, it uses a lot less disk space: [weerd@despair] $ du -sh bigfile2 48.0K bigfile2 TL;DR: files with lots of emptiness (consecutive ranges of all 0 data) are efficiently stored using "sparse files" | $ df -h | Filesystem SizeUsed Avail Capacity Mounted on | /dev/sd0a 1008M 90.1M868M 9%/ | /dev/sd0k 9.8G 80.3M9.3G 1%/home | /dev/sd0d 3.9G118K3.7G 0%/tmp | /dev/sd0f 3.9G1.0G2.7G28%/usr | /dev/sd0g 1001M212M738M22%/usr/X11R6 | /dev/sd0h 9.8G572M8.8G 6%/usr/local | /dev/sd0j 3.9G2.0K3.7G 0%/usr/obj | /dev/sd0i 2.0G2.0K1.9G 0%/usr/src | /dev/sd0e 598G4.3G564G 1%/var | | I removed the file and disk usage dropped by 2.3G on /var. | | | Did anyone experience issues with spamassasin/spampd similar to the | one reported above? | | p5-Mail-SpamAssassin-3.4.1p2 (installed) | spampd-2.30p3 (installed) | | After deleting the file, restarting the service processing a single | email brought the DB to reported size 37.9M, few emails later it's | already reported as 113M I have a hunch that it will bloat again really | fast. | | Regards, | Adam | -- >[<++>-]<+++.>+++[<-->-]<.>+++[<+ +++>-]<.>++[<>-]<+.--.[-] http://www.weirdnet.nl/
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
Adam Wolk [adam.w...@tintagel.pl] wrote: > Hi misc@ > > I upgraded my mail server to an amd64 snapshot from Sep 2nd and found > the server stuck delivering mail in the morning with spamassasin > churning at 90% CPU usage. > > Quick investigation lead me to a huge bayes_toks file of 65.3G in > /var/spampd/.spamassasin/. > > $ ls -alh > total 4738352 > drwx-- 2 _spampd _spampd 512B Sep 4 10:00 . > drwxr-xr-x 3 _spampd _spampd 512B Sep 3 15:57 .. > -rw--- 1 _spampd _spampd36B Sep 4 09:53 bayes.lock > -rw--- 1 _spampd _spampd 9.8M Sep 3 22:52 bayes_seen > -rw--- 1 _spampd _spampd 65.3G Sep 3 22:55 bayes_toks > What are your memory limits for the user/daemon class that runs spamassassin?
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
Adam Wolk [adam.w...@tintagel.pl] wrote: > > > -rw--- 1 _spampd _spampd 9.8M Sep 3 22:52 bayes_seen > > > -rw--- 1 _spampd _spampd 65.3G Sep 3 22:55 bayes_toks > > > > > > > What are your memory limits for the user/daemon class that runs > > spamassassin? > > Touche, not set. Though it was running like that since ~December last > year hence my question to misc@ if anyone noticed it behaving > differently since the last release. In no way I'm assuming that > something is wrong on the OS / software level - in fact I assumed that > my setup was performed incorrectly by me. So far I learned a ton of > useful info by asking on the list here, hope no one feels offended :) > > $ cat /etc/login.conf | grep -i spam > $ > Well it still runs with some class, perhaps as daemon ? I guess I'm really asking, is your login.conf modified? Post it and your rc.conf.local
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
> $ cat /etc/login.conf | grep -i spam > $ UUOC grep -i spam /etc/login.conf But that is not actually answering the question as we don't know the login class you are using and what it's limits are like ;-) You can get the login class by using id(1). For the limits I think you need to read login.conf. Frank.
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
On Fri, 4 Sep 2015 11:37:09 -0700 Chris Cappucciowrote: > Adam Wolk [adam.w...@tintagel.pl] wrote: > > > > -rw--- 1 _spampd _spampd 9.8M Sep 3 22:52 bayes_seen > > > > -rw--- 1 _spampd _spampd 65.3G Sep 3 22:55 bayes_toks > > > > > > > > > > What are your memory limits for the user/daemon class that runs > > > spamassassin? > > > > Touche, not set. Though it was running like that since ~December > > last year hence my question to misc@ if anyone noticed it behaving > > differently since the last release. In no way I'm assuming that > > something is wrong on the OS / software level - in fact I assumed > > that my setup was performed incorrectly by me. So far I learned a > > ton of useful info by asking on the list here, hope no one feels > > offended :) > > > > $ cat /etc/login.conf | grep -i spam > > $ > > > > Well it still runs with some class, perhaps as daemon ? > > I guess I'm really asking, is your login.conf modified? Post it and > your rc.conf.local > Not modified by hand. $ grep -i spam /etc/passwd _spamd:*:62:62:Spam Daemon:/var/empty:/sbin/nologin _spamdaemon:*:506:506:SpamAssassin:/var/db/spamassassin:/sbin/nologin _spampd:*:746:746:spampd user:/var/spampd:/sbin/nologin $ id _spamd uid=62(_spamd) gid=62(_spamd) groups=62(_spamd) $ id _spamdaemon uid=506(_spamdaemon) gid=506(_spamdaemon) groups=506(_spamdaemon) $ id _spampd uid=746(_spampd) gid=746(_spampd) groups=746(_spampd) $ $ cat /etc/login.conf # $OpenBSD: login.conf,v 1.5 2015/07/20 18:53:18 sthen Exp $ # # Sample login.conf file. See login.conf(5) for details. # # # Standard authentication styles: # # passwdUse only the local password file # chpassDo not authenticate, but change users password (change # the YP password if the user has one, else change the # local password) # lchpass Do not login; change user's local password instead # radiusUse radius authentication # rejectUse rejected authentication # skey Use S/Key authentication # activ ActivCard X9.9 token authentication # cryptoCRYPTOCard X9.9 token authentication # snk Digital Pathways SecureNet Key authentication # tis TIS Firewall Toolkit authentication # token Generic X9.9 token authentication # yubikey YubiKey authentication # # Default allowed authentication styles auth-defaults:auth=passwd,skey: # Default allowed authentication styles for authentication type ftp auth-ftp-defaults:auth-ftp=passwd: # # The default values # To alter the default authentication types change the line: # :tc=auth-defaults:\ # to be read something like: (enables passwd, "myauth", and activ) # :auth=passwd,myauth,activ:\ # Any value changed in the daemon class should be reset in default # class. # default:\ :path=/usr/bin /bin /usr/sbin /sbin /usr/X11R6/bin /usr/local/bin /usr/local/sbin:\ :umask=022:\ :datasize-max=512M:\ :datasize-cur=512M:\ :maxproc-max=256:\ :maxproc-cur=128:\ :openfiles-cur=512:\ :stacksize-cur=4M:\ :localcipher=blowfish,8:\ :ypcipher=old:\ :tc=auth-defaults:\ :tc=auth-ftp-defaults: # # Settings used by /etc/rc and root # This must be set properly for daemons started as root by inetd as well. # Be sure reset these values back to system defaults in the default class! # daemon:\ :ignorenologin:\ :datasize=infinity:\ :maxproc=infinity:\ :openfiles-cur=128:\ :stacksize-cur=8M:\ :localcipher=blowfish,9:\ :tc=default: # # Staff have fewer restrictions and can login even when nologins are set. # staff:\ :datasize-cur=1536M:\ :datasize-max=infinity:\ :maxproc-max=512:\ :maxproc-cur=256:\ :ignorenologin:\ :requirehome@:\ :tc=default: # # Authpf accounts get a special motd and shell # authpf:\ :welcome=/etc/motd.authpf:\ :shell=/usr/sbin/authpf:\ :tc=default: # # Building ports with DPB uses raised limits # pbuild:\ :datasize-max=infinity:\ :datasize-cur=4096M:\ :maxproc-max=1024:\ :maxproc-cur=256:\ :tc=default: # # Override resource limits for certain daemons started by rc.d(8) # bgpd:\ :openfiles-cur=512:\ :tc=daemon: unbound:\ :openfiles-cur=512:\ :tc=daemon: dovecot:\ :openfiles-cur=512:\ :openfiles-max=2048:\ :tc=daemon:
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
On Fri, 4 Sep 2015 11:08:35 -0700 Chris Cappucciowrote: > Adam Wolk [adam.w...@tintagel.pl] wrote: > > Hi misc@ > > > > I upgraded my mail server to an amd64 snapshot from Sep 2nd and > > found the server stuck delivering mail in the morning with > > spamassasin churning at 90% CPU usage. > > > > Quick investigation lead me to a huge bayes_toks file of 65.3G in > > /var/spampd/.spamassasin/. > > > > $ ls -alh > > total 4738352 > > drwx-- 2 _spampd _spampd 512B Sep 4 10:00 . > > drwxr-xr-x 3 _spampd _spampd 512B Sep 3 15:57 .. > > -rw--- 1 _spampd _spampd36B Sep 4 09:53 bayes.lock > > -rw--- 1 _spampd _spampd 9.8M Sep 3 22:52 bayes_seen > > -rw--- 1 _spampd _spampd 65.3G Sep 3 22:55 bayes_toks > > > > What are your memory limits for the user/daemon class that runs > spamassassin? Touche, not set. Though it was running like that since ~December last year hence my question to misc@ if anyone noticed it behaving differently since the last release. In no way I'm assuming that something is wrong on the OS / software level - in fact I assumed that my setup was performed incorrectly by me. So far I learned a ton of useful info by asking on the list here, hope no one feels offended :) $ cat /etc/login.conf | grep -i spam $ Regards, Adam
spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
Hi misc@ I upgraded my mail server to an amd64 snapshot from Sep 2nd and found the server stuck delivering mail in the morning with spamassasin churning at 90% CPU usage. Quick investigation lead me to a huge bayes_toks file of 65.3G in /var/spampd/.spamassasin/. $ ls -alh total 4738352 drwx-- 2 _spampd _spampd 512B Sep 4 10:00 . drwxr-xr-x 3 _spampd _spampd 512B Sep 3 15:57 .. -rw--- 1 _spampd _spampd36B Sep 4 09:53 bayes.lock -rw--- 1 _spampd _spampd 9.8M Sep 3 22:52 bayes_seen -rw--- 1 _spampd _spampd 65.3G Sep 3 22:55 bayes_toks $ file bayes_toks bayes_toks: Berkeley DB 1.85 (Hash, version 2, native byte-order) Interestingly I don't see that much space used with df (anyone knows why?): $ df -h Filesystem SizeUsed Avail Capacity Mounted on /dev/sd0a 1008M 90.1M868M 9%/ /dev/sd0k 9.8G 80.3M9.3G 1%/home /dev/sd0d 3.9G118K3.7G 0%/tmp /dev/sd0f 3.9G1.0G2.7G28%/usr /dev/sd0g 1001M212M738M22%/usr/X11R6 /dev/sd0h 9.8G572M8.8G 6%/usr/local /dev/sd0j 3.9G2.0K3.7G 0%/usr/obj /dev/sd0i 2.0G2.0K1.9G 0%/usr/src /dev/sd0e 598G4.3G564G 1%/var I removed the file and disk usage dropped by 2.3G on /var. Did anyone experience issues with spamassasin/spampd similar to the one reported above? p5-Mail-SpamAssassin-3.4.1p2 (installed) spampd-2.30p3 (installed) After deleting the file, restarting the service processing a single email brought the DB to reported size 37.9M, few emails later it's already reported as 113M I have a hunch that it will bloat again really fast. Regards, Adam
Re: spamassasin large CPU usage on new snapshot and a huge bayes_toks file not reported in df
On Fri, 4 Sep 2015 10:20:01 +0200 Adam Wolkwrote: > After deleting the file, restarting the service processing a single > email brought the DB to reported size 37.9M, few emails later it's > already reported as 113M I have a hunch that it will bloat again really > fast. > try to disable bayes, set parameter "use_bayes 0" and placed into the server-wide local.cf configuration file.