Re: I'm doing it wrong.

Kai Meyer Fri, 23 May 2014 12:49:28 -0700

On 05/22/2014 10:36 PM, Kai Meyer wrote:

On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote:
On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote:
I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin
(user prefs via mysql) server that I've been running for a few years
The configuration you pasted below does not show any user_* options.
Unless there are more cf files you omitted, you do not use user_prefs
via SQL.
now. It's just a few of my private domains, not a lot of traffic. Inthe
last 6 months, the amount of spam getting through has gone from one or
two a week to 30 a day. I had sa-learn setup on imap folders calledSPAM
and HAM running as root, so I just started tossing emails in there. It
Training as root rather than the system user receiving the mail (and
calling SA) is only possible with site-wide Bayes setup. The pasted
configuration doesn't show that, either, so you would need to train as
the mail receiving / scanning user.
Ya, that was what I was worried about. Just to clarify, postfix runsas the regular "postfix" user. I'm configured very similar to this:
http://www.akadia.com/services/postfix_spamassassin.html
Notice the spamchk script. My process list has this entry:
postfix 10477 12953 0 22:20 ? 00:00:00 pipe -n spamchk -tunix flags=Rq user=spamd argv=/usr/local/bin/spamchk -f ${sender} --${recipient}My spamchk is functionally identical to the one in the link above.(I'm using the sideline option, rather than just dumping the email, orsending it to another mailbox). My spamd service runs as the user spamd:root 6188 1 0 15:56 ? 00:00:08 /usr/bin/spamd -d -m10-q -x -u spamd -r /var/run/spamd.pid
spamd     6190  6188  0 15:56 ?        00:01:27 spamd child
So when I run spamassassin manually, I'm using sudo to switch to thatuser (cat test.mail.left | sudo -u spamd /usr/bin/spamc -uk...@gnukai.com > test.mail.right)So if I turn sa-learn back on, I should make sure that I run it as thespamd user.
seemed like I had groups of emails around 2, 0, -1, and -2 (mythresholdto dump to my JUNK folder is 3, and I have spamchk sideline thingsabove
7). I still get legitimate email in the 2-3 range, but I haven't had
legitimate email above 3 in a long time. After a bit, the 2s became 3s
and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did
this habitually for more than a month, and the progress seemed to stop.
I googled around a bit and realized that I didn't do a very good job
setting up rules, so I added pyzor and razor2, and they seemfunctional.
Spam got better, and it's down to maybe 10 a day, but they still range
all the way up to 5.
Mixing in Razor or Pyzor sure can help. But that "setting up rules" you
just considered your job is a bit weird. Local rules of course also can
help, but are  (a) an advanced topic, and  (b) not the task of a regular
SA instance. You didn't mention any of that in your configuration
either, so it's unclear what you're about here.
I think by "setting up rules" I meant "adding configurations for pyzorand razor2" and the likes. Are they called plugins?
What really gets me is that if I take an email that scores -2, strip
the X-Spam* headers, and run it through spamc by hand (even as thespamd
user) just like the spamchk script does, it scores around a 4. I have
It is not necessary to strip X-Spam headers. SA ignores these, if
present.

You just mixed in a third user, spamd -- in addition to root and the
real mail receiving user. Without site-wide Bayes you are comparing
apples to oranges, and now peaches. All yummy, though not the same.

What is that "spamchk script" you just mentioned, and how does it fit
into your setup? You should review your entire mail-processing chain.
Describing it in detail might help here, too.
In the link above, it describes my process pretty closely. I deviateby having a sql.cf:
# cat /etc/mail/spamassassin/sql.cf
user_scores_dsn DBI:mysql:spamassassin:localhost:3306
user_scores_sql_password         spampass
user_scores_sql_username         spamd
user_scores_sql_custom_query SELECT preference, value FROM _TABLE_WHERE username = _USERNAME_ OR username = '$GLOBAL' OR username =CONCAT('%',_DOMAIN_) ORDER BY username ASC
Here's some of the db:
mysql> select * from userpref where username='$GLOBAL';
+----+----------+----------------+-------+----------+---------------------+----------+---------------------+| id | username | preference | value | descript |added | added_by | modified |+----+----------+----------------+-------+----------+---------------------+----------+---------------------+| 1 | $GLOBAL | required_score | 4.5 | NULL | 2003-01-0100:00:00 | | 2010-08-23 10:23:26 || 28 | $GLOBAL | auto_learn | 0 | NULL | 2014-05-2216:20:01 | | 2014-05-22 16:20:01 || 29 | $GLOBAL | use_razor2 | 1 | NULL | 2014-05-2216:20:52 | | 2014-05-22 16:20:52 || 30 | $GLOBAL | use_pyzor | 1 | NULL | 2014-05-2216:20:59 | | 2014-05-22 16:20:59 || 31 | $GLOBAL | use_dcc | 1 | NULL | 2014-05-2216:21:04 | | 2014-05-22 16:21:04 |+----+----------+----------------+-------+----------+---------------------+----------+---------------------+
5 rows in set (0.00 sec)
I know that user_prefs are working because my email has required_score3.0, which is not 4.5 (from the database) or 5.0 (from local.cf).
one here that scores a 4.1 if it comes through the mail, and a 6.6 if I
run it manually. What can I do to reconcile these scores? I would like
the scores I'm getting from the commandline over the ones I'm getting
through postfix, but I don't know the system well enough to knowwhat is
causing the difference.
Highlighting the differences, removing common rule hits:
================== Via postfix
0.0 HTML_IMAGE_RATIO_08 BODY: HTML has a low ratio of text toimage
area
================ Via commandline (cat test.mail | sudo -u spamd
/usr/bin/spamc -u <myemail> > postsa.mail)
2.5 URIBL_DBL_SPAM Contains an URL listed in the DBLblocklist
The Bayesian probability is ~identical, merely differing a thousands.

Hitting URIBL_DBL_SPAM in the later manual check, but not at receiving
time may be due to timing and the URI actually getting listed later.

What's odd is, that the subsequent manual check is *missing* the HTML
image ratio rule triggering. Something altered the message.
Ok, since fixing my config based on the advice below, I haven't had(time for any) spam delivered to my mailbox. If fixing issues withlint makes all the spam go away, I'll be going away. Otherwise, I'llbe back tomorrow with more tests.
================ /etc/mail/spamassassin.cf (I added the last 4 lines in
a desperate attempt to see something change, but to no effect)
/etc/mail/spamassassin/local.cf
Which one? The latter spamassassin/local.cf is default (though packager
dependent), the claimed (typo'ed ?) one is custom, if it exists at all.

Snip, skipping to the last four lines:
auto_learn 0
use_razor2
use_dcc
use_pyzor
auto_learn is not a valid option. That would be bayes_auto_learn.

The other use_* options require arguments (0 or 1). The lines as pasted
do not enable them, and instead produce lint warnings. See

  spamassassin --lint

That lint check is a good starting point anyway...
--lint is very nice, thanks!

So it seems that when I find a problem where command-line is scoring ithigher, it's always because of the addition of the URIBL_DB_SPAM score.This seems like a "normal" issue then, and I can deal with that.

However, I'm getting email that is definitely spam, but they are gettingnegative scores. Should I seek out further configuration help from thislist? Or should I enable site-wise bayesian learning? It seems like I'vereceived 10-20 spam messages (about 40% of my usual volume that isn'tfiltered out of my inbox) in the last 12 hours. Is that considered"reasonable" and I just need to deal with it, or what?

I'm happy to provide details, but I'm certain that copy-pasting anexample spam email to this mailing list wouldn't produce desirableresults. I'm perhaps I'm looking for a little hand holding, if anybodyhas the time. I'd be happy to take this off line, provide http urls tospam emails, ect.

Re: I'm doing it wrong.

Reply via email to