Re: Bayes auto-learn - not happening

David Jones Thu, 10 Aug 2017 08:44:35 -0700

On 08/10/2017 10:06 AM, techlist06 wrote:

Update:  Still NOT working, but I'm giving it hell trying to figure out why :)


First a couple of answers to other's questions:
- John, others, not an ISP, high is relative I'm sure but the volume is much 
higher than I can duplicate and review every flagged message.  Right now 
running at about 10% before I migrate one of my larger domains.  Mail is 
relayed to exchange servers.  Users do not have imap accounts on box.  A few 
local users with POP only.  I don't configure or allow anyone to  submit 
messages for training directly.

- re no, or careful auto-training.  I get it.  I'm migrating from a server that's run for 
years with auto-learn on set at conservative learn values.  Never had any trouble with it 
thank goodness.  As I look at the messages that would be autolearned, I've never found 
one that would have learned that should not have in my corpus.  The volume would just be 
too high to personally go through each one of them myself.  I have had 
"problem" users that get a lot of spam misses and I plan to set up a way for 
them to submit their spam to me (not autolearn) for review and manual training as needed.

- Matus:  re:" autolearn=unavailable apparently due to not accessible bayes database 
[due to permissions]".  I hope you are right.  That would make sense to me.  See 
below please.  I think I listed them all.  Config and permissions look good to me, I'm 
grateful to have anything I missed pointed out by an experienced eye.

My old server, running embarrassingly old versions of everything works great.  
So the auto-learn in general has been a good fit for my environment.  I get it 
that it's not for everyone.  But a tleast it SHOULD work, and let me choose to 
tweak it or turn it off.  As far as I can tell it is not working, at all.

So here's where I am:

1.  I stepped back and went through all my configurations carefully.  
spamassassin is being run via amavisd, as the amavis user.  Site wide config, 
no other users have direct access.  POP accounts and relay accounts only.

2.  From prior research before asking for help, I understood no spam was 
necessary for auto-learn to work but one person here said I had to be at the 
minimum (200 default) before it would.  So, to rule that out as the issue, I 
manually fed it plenty of spam and ham.  For others who might read this thread 
archived, I was having trouble getting enough learned due to the default size 
limit my version of SA/sa-learn had.  With some digging I found out how to 
raise that limit and then I had plenty of spam to feed:
su amavis -c 'sa-learn -D --spam --showdots --max-size=1000000 --mbox 
/home/mail/spam'

[root@mail2 amavisd]# su amavis -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0        349          0  non-token data: nspam
0.000          0        478          0  non-token data: nham
0.000          0     166030          0  non-token data: ntokens
0.000          0 1501594564          0  non-token data: oldest atime
0.000          0 1502289189          0  non-token data: newest atime

3.  Next up were questions about the config and permissions.  I checked my 
setup, it looked OK, but I even opened some directories up 777 for testing
This is my config, I'd be grateful if anyone sees anything wrong point it out:
I include the amavis stuff just to show it is running and invoked as and by 
amavis user

3a. amavis
in /usr/lib/systemd/system/amavisd.service
User=amavis
Group=amavis
ExecStart=/usr/sbin/amavisd -c /etc/amavisd/amavisd.conf

amavis user's home dir per /etc/passwd is:

/var/spool/amavisd
verified with cd ~amavis

3b. local.cf

My spamassassin local.cf is at:

/etc/mail/spamassassin/local.cf

verified this is the one being used by putting an error
line and restarting amavisd.  It compalins about the error.
Fixed of cousre and continue...

in local.cf I have these related settings:

use_bayes               1
bayes_auto_learn        1
bayes_auto_learn_threshold_nonspam -1.7
bayes_auto_learn_threshold_spam 10.0
bayes_path              /etc/mail/bayes/bayes
bayes_file_mode         0777

3c. bayes

for troubleshooting I set the permissions to 777 on /etc/mail/bayes and it's 
files
This is the only occurrence of the "bayes" files on the server

[root@mail2 amavisd]# ls -la /etc/mail/bayes
total 4196
drwxrwxrwx 2 amavis amavis    4096 Aug  9 13:49 .
drwxr-xr-x 4 amavis amavis    4096 Aug  3 13:02 ..
-rwxrwxrwx 1 amavis amavis   86016 Aug  9 09:51 bayes_seen
-rwxrwxrwx 1 amavis amavis 5246976 Aug  9 13:49 bayes_toks

3d. amavis spamassassin folder settings

For amavis which is calling spamassassin via it's
perl libraries (I am not running spamd),
I have it's related configuration parts as:

$MYHOME = '/var/spool/amavisd';   # a convenient default for other settings, -H
$TEMPBASE = "$MYHOME/tmp";   # working directory, needs to exist, -T
$ENV{TMPDIR} = $TEMPBASE;    # environment variable TMPDIR, used by SA, etc.
$db_home   = "$MYHOME/db";        # dir for bdb nanny/cache/snmp databases, -D
#$helpers_home = "$MYHOME/var";  # working directory for SpamAssassin, -S
$helpers_home = "$MYHOME";  # working directory for SpamAssassin, -S

3e. spamassassin directory

And for spamassassin, it's files are being placed in the amavisd home directory 
as configured in amavisd.conf.
I am careful to only run sa-update, or SA debug commands as amavisd user so as 
not to create any other
.spamassassin folders under root, etc.
this is the only occurrence of .spamassassin on the server:

[root@mail2 amavisd]# locate .spamassassin
/var/spool/amavisd/.spamassassin
/var/spool/amavisd/.spamassassin/user_prefs

3f. amavis (spamassassin's user) home directory
[root@mail2 amavisd]# ls -la /var/spool/amavisd
total 32
drwxr-x--- 6 amavis amavis 4096 Aug  9 20:49 .
drwxr-xr-x 8 root   root   4096 Nov  5  2016 ..
-rw------- 1 amavis amavis  101 Aug  9 11:17 .bash_history
-rw-r--r-- 1 amavis amavis    0 Aug  9 20:49 black.lst
drwxr-x--- 2 amavis amavis 4096 Aug  9 20:30 db
drwxr-x--- 2 amavis amavis 4096 Apr 19 07:28 quarantine
drwx------ 2 amavis amavis 4096 Aug  8 15:32 .spamassassin
drwxr-x--- 5 amavis amavis 4096 Aug 10 08:26 tmp
-rw-r--r-- 1 amavis amavis   37 Aug  7 19:28 white.lst

3g.  .spamassassin folder
[root@mail2 amavisd]# ls -la /var/spool/amavisd/.spamassassin
total 12
drwx------ 2 amavis amavis 4096 Aug  8 15:32 .
drwxr-x--- 6 amavis amavis 4096 Aug  9 20:49 ..
-rw-r--r-- 1 amavis amavis 1869 Aug  8 15:32 user_prefs


4. Logging
I managed to get Amavisd configured to let the more verbose rule listing for 
the header, and score details in the log come through for my troubleshooting as 
well.

5, results:

After running this config now, with a loaded bayes database, it has yet to 
auto-learn a single spam (or ham).  Just through yesterday my spam quarantine 
has over 50 pretty high scoring spams in it.  I've studied tflags and now 
understand what they are (for others here's a good link):
http://commons.oreilly.com/wiki/index.php/SpamAssassin/SpamAssassin_Rules

I understand SA requires at least 3 points from the header and 3 points from 
the body, to auto-learn as spam.  I understand some tflags preclude the use of 
the test in the autolearn score.  I understand bayes points don't count.  But 
surely one of the 50 high scores I caught yesterday qualified.  Yet, no 
autolearn.  Always autolearn=unavailable or no.  I've turned on verbose 
debugging for bayes but I don't see any errors or feedback on reasons for the 
no-learn.

Looked at yesterday's log:

cat /var/log/maillog.1|grep autolearn=unavailable|wc -l
60

Now amavisd has the option of giving a verbose log line with all the score stuff.  Now 
amavis adds a "autolearn score" to the log as well.  Not sure how that is 
calculated, but it's interesting anyway.  Be great if it were h/b/t (header/body/total).  
Anyway, sample:

Aug 10 00:38:39 mail2 amavis[15959]: (15959-08) Blocked SPAM {DiscardedInbound,Quarantined}, 
[89.43.62.101]:47955 [89.43.62.101] ESMTP/LMTP <[email protected]> -> 
<[email protected]>, (ESMTP://[89.43.62.101]:47955), quarantine: [email protected], Queue-ID: 7F64A70, 
mail_id: yxtV5c7b1N8r, b: tDtWV84sR, Hits: 23.553, size: 365419, Subject: "Joanna Gaines Drops 
Bombshell.", From: <[email protected]>, helo=hewis.versateye.com, Tests: 
[BAYES_999=0.2,BAYES_99=3.5,DATE_IN_PAST_03_06=1.592,DCC_CHECK=3.2,DIGEST_MULTIPLE=0.293,HTML_MESSAGE=0.001,HTML_MIME_NO_HTML_TAG=0.377,MIME_HTML_ONLY=0.723,MISSING_MID=0.497,NORMAL_HTTP_TO_IP=0.001,RAZOR2_CF_RANGE_51_100=0.5,RAZOR2_CF_RANGE_E8_51_100=1.886,RAZOR2_CHECK=2.5,RCVD_IN_BRBL_LASTEXT=1.449,RDNS_NONE=0.793,SPF_HELO_PASS=-0.001,SPF_PASS=-0.001,STYLE_GIBBERISH=3.093,URIBL_ABUSE_SURBL=1.25,URIBL_BLACK=1.7],
 autolearn=unavailable autolearn_force=no, autolearnscore=21.113, 5061 ms

As usual, autolearn=unavailable.

My suspicion is many of those "unavailable" should have been a learn.  Surely 
out of 60, one was valid to autolearn.

I don't know what to look for next to troubleshoot.  Sure hoping it's just a 
permissions issue.

I'm back to a brick wall.  How can I help you help me?

You might want to setup an iRedMail server/VM real quick to havesomething to compare with on amavis configs and permissions. It onlytakes a few minutes on a fresh OS install.

As I mentioned before, I split a copy of all messages to a hidden mailserver running iRedMail on an internal-only domain. I setup rules inRoundCube to sort ham and spam in to folders. All I have to do each dayis quickly scan subjects and mark them a read to put them into theMaildir "cur" directory that is used to sa-learn. This improved myBayes scores dramatically and also allows me to help the SpamAssassinmasscheck processing.


--
David Jones

Re: Bayes auto-learn - not happening

Reply via email to