On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote:
On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote:
I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin
(user prefs via mysql) server that I've been running for a few years
The configuration you pasted below does not show any user_* options.
Unless there are more cf files you omitted, you do not use user_prefs
via SQL.
now. It's just a few of my private domains, not a lot of traffic. In
the
last 6 months, the amount of spam getting through has gone from one or
two a week to 30 a day. I had sa-learn setup on imap folders called
SPAM
and HAM running as root, so I just started tossing emails in there. It
Training as root rather than the system user receiving the mail (and
calling SA) is only possible with site-wide Bayes setup. The pasted
configuration doesn't show that, either, so you would need to train as
the mail receiving / scanning user.
Ya, that was what I was worried about. Just to clarify, postfix runs
as the regular "postfix" user. I'm configured very similar to this:
http://www.akadia.com/services/postfix_spamassassin.html
Notice the spamchk script. My process list has this entry:
postfix 10477 12953 0 22:20 ? 00:00:00 pipe -n spamchk -t
unix flags=Rq user=spamd argv=/usr/local/bin/spamchk -f ${sender} --
${recipient}
My spamchk is functionally identical to the one in the link above.
(I'm using the sideline option, rather than just dumping the email, or
sending it to another mailbox). My spamd service runs as the user spamd:
root 6188 1 0 15:56 ? 00:00:08 /usr/bin/spamd -d -m10
-q -x -u spamd -r /var/run/spamd.pid
spamd 6190 6188 0 15:56 ? 00:01:27 spamd child
So when I run spamassassin manually, I'm using sudo to switch to that
user (cat test.mail.left | sudo -u spamd /usr/bin/spamc -u
k...@gnukai.com > test.mail.right)
So if I turn sa-learn back on, I should make sure that I run it as the
spamd user.
seemed like I had groups of emails around 2, 0, -1, and -2 (my
threshold
to dump to my JUNK folder is 3, and I have spamchk sideline things
above
7). I still get legitimate email in the 2-3 range, but I haven't had
legitimate email above 3 in a long time. After a bit, the 2s became 3s
and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did
this habitually for more than a month, and the progress seemed to stop.
I googled around a bit and realized that I didn't do a very good job
setting up rules, so I added pyzor and razor2, and they seem
functional.
Spam got better, and it's down to maybe 10 a day, but they still range
all the way up to 5.
Mixing in Razor or Pyzor sure can help. But that "setting up rules" you
just considered your job is a bit weird. Local rules of course also can
help, but are (a) an advanced topic, and (b) not the task of a regular
SA instance. You didn't mention any of that in your configuration
either, so it's unclear what you're about here.
I think by "setting up rules" I meant "adding configurations for pyzor
and razor2" and the likes. Are they called plugins?
What really gets me is that if I take an email that scores -2, strip
the X-Spam* headers, and run it through spamc by hand (even as the
spamd
user) just like the spamchk script does, it scores around a 4. I have
It is not necessary to strip X-Spam headers. SA ignores these, if
present.
You just mixed in a third user, spamd -- in addition to root and the
real mail receiving user. Without site-wide Bayes you are comparing
apples to oranges, and now peaches. All yummy, though not the same.
What is that "spamchk script" you just mentioned, and how does it fit
into your setup? You should review your entire mail-processing chain.
Describing it in detail might help here, too.
In the link above, it describes my process pretty closely. I deviate
by having a sql.cf:
# cat /etc/mail/spamassassin/sql.cf
user_scores_dsn DBI:mysql:spamassassin:localhost:3306
user_scores_sql_password spampass
user_scores_sql_username spamd
user_scores_sql_custom_query SELECT preference, value FROM _TABLE_
WHERE username = _USERNAME_ OR username = '$GLOBAL' OR username =
CONCAT('%',_DOMAIN_) ORDER BY username ASC
Here's some of the db:
mysql> select * from userpref where username='$GLOBAL';
+----+----------+----------------+-------+----------+---------------------+----------+---------------------+
| id | username | preference | value | descript |
added | added_by | modified |
+----+----------+----------------+-------+----------+---------------------+----------+---------------------+
| 1 | $GLOBAL | required_score | 4.5 | NULL | 2003-01-01
00:00:00 | | 2010-08-23 10:23:26 |
| 28 | $GLOBAL | auto_learn | 0 | NULL | 2014-05-22
16:20:01 | | 2014-05-22 16:20:01 |
| 29 | $GLOBAL | use_razor2 | 1 | NULL | 2014-05-22
16:20:52 | | 2014-05-22 16:20:52 |
| 30 | $GLOBAL | use_pyzor | 1 | NULL | 2014-05-22
16:20:59 | | 2014-05-22 16:20:59 |
| 31 | $GLOBAL | use_dcc | 1 | NULL | 2014-05-22
16:21:04 | | 2014-05-22 16:21:04 |
+----+----------+----------------+-------+----------+---------------------+----------+---------------------+
5 rows in set (0.00 sec)
I know that user_prefs are working because my email has required_score
3.0, which is not 4.5 (from the database) or 5.0 (from local.cf).
one here that scores a 4.1 if it comes through the mail, and a 6.6 if I
run it manually. What can I do to reconcile these scores? I would like
the scores I'm getting from the commandline over the ones I'm getting
through postfix, but I don't know the system well enough to know
what is
causing the difference.
Highlighting the differences, removing common rule hits:
================== Via postfix
0.0 HTML_IMAGE_RATIO_08 BODY: HTML has a low ratio of text to
image
area
================ Via commandline (cat test.mail | sudo -u spamd
/usr/bin/spamc -u <myemail> > postsa.mail)
2.5 URIBL_DBL_SPAM Contains an URL listed in the DBL
blocklist
The Bayesian probability is ~identical, merely differing a thousands.
Hitting URIBL_DBL_SPAM in the later manual check, but not at receiving
time may be due to timing and the URI actually getting listed later.
What's odd is, that the subsequent manual check is *missing* the HTML
image ratio rule triggering. Something altered the message.
Ok, since fixing my config based on the advice below, I haven't had
(time for any) spam delivered to my mailbox. If fixing issues with
lint makes all the spam go away, I'll be going away. Otherwise, I'll
be back tomorrow with more tests.
================ /etc/mail/spamassassin.cf (I added the last 4 lines in
a desperate attempt to see something change, but to no effect)
/etc/mail/spamassassin/local.cf
Which one? The latter spamassassin/local.cf is default (though packager
dependent), the claimed (typo'ed ?) one is custom, if it exists at all.
Snip, skipping to the last four lines:
auto_learn 0
use_razor2
use_dcc
use_pyzor
auto_learn is not a valid option. That would be bayes_auto_learn.
The other use_* options require arguments (0 or 1). The lines as pasted
do not enable them, and instead produce lint warnings. See
spamassassin --lint
That lint check is a good starting point anyway...
--lint is very nice, thanks!