Re: Bayes learning for legitimate users

2015-03-15 Thread Matthias Leisi
 
 Am 14.03.2015 um 16:45 schrieb Matus UHLAR - fantomas uh...@fantomas.sk:

 ...but as I mentioned before, training spam from mail to non-existent
 recipients may be even a good thing…

I would not train from mail to non-existent recipients, but would restrict to a 
defined set of spamtraps (which may have been non-existent addresses at some 
point in the past…). 

— Matthias




smime.p7s
Description: S/MIME cryptographic signature


Re: Bayes learning for legitimate users

2015-03-14 Thread Filip Havlíček
I manage email through ISPConfig, I think wildcard for any domain is not 
set.


Dne 13.3.2015 v 16:02 Matus UHLAR - fantomas napsal(a):

Filip Havlí?ek wrote:

I would like to ask you, how can I *allow **only **legitimate* email
addresses (existing users) for bayes learning?


On 13.03.15 14:54, Filip Havlíček wrote:

there is my configuration:
/etc/spamassassin/local.cf: http://pastebin.com/PM5jN8wi
/etc/postfix/main.cf: http://pastebin.com/KWN7Ebyi
/etc/amavis/conf.d/50-user: http://pastebin.com/ijSaqhuJ


you have virtual domains set up. Did you set up wildcard in any of them?





Re: Bayes learning for legitimate users

2015-03-14 Thread Matus UHLAR - fantomas

On 14.03.15 15:00, Filip Havlíček wrote:
I manage email through ISPConfig, I think wildcard for any domain is 
not set.


seems you have relay_recipient_maps set, isn't your domain listed there?
note that postfix rejects non-existing recipients by default
(http://www.postfix.org/postconf.5.html#smtpd_reject_unlisted_recipient)
and in such case mail to non-existing recipients should not get to proxy,
filter or milter so it could be learned from

...but as I mentioned before, training spam from mail to non-existent
recipients may be even a good thing...



Dne 13.3.2015 v 16:02 Matus UHLAR - fantomas napsal(a):

Filip Havlí?ek wrote:

I would like to ask you, how can I *allow **only **legitimate* email
addresses (existing users) for bayes learning?


On 13.03.15 14:54, Filip Havlíček wrote:

there is my configuration:
/etc/spamassassin/local.cf: http://pastebin.com/PM5jN8wi
/etc/postfix/main.cf: http://pastebin.com/KWN7Ebyi
/etc/amavis/conf.d/50-user: http://pastebin.com/ijSaqhuJ


you have virtual domains set up. Did you set up wildcard in any of them?



--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #98652: Operation completed successfully.


Re: Bayes learning for legitimate users

2015-03-13 Thread Matus UHLAR - fantomas

Filip Havlí?ek wrote:

I would like to ask you, how can I *allow **only **legitimate* email
addresses (existing users) for bayes learning?


On 13.03.15 14:54, Filip Havlíček wrote:

there is my configuration:
/etc/spamassassin/local.cf: http://pastebin.com/PM5jN8wi
/etc/postfix/main.cf: http://pastebin.com/KWN7Ebyi
/etc/amavis/conf.d/50-user: http://pastebin.com/ijSaqhuJ


you have virtual domains set up. Did you set up wildcard in any of them?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #9: Out of error messages.


Re: Bayes learning for legitimate users

2015-03-13 Thread Filip Havlíček

Hi,

there is my configuration:
/etc/spamassassin/local.cf: http://pastebin.com/PM5jN8wi
/etc/postfix/main.cf: http://pastebin.com/KWN7Ebyi
/etc/amavis/conf.d/50-user: http://pastebin.com/ijSaqhuJ


So, what I should modify? Thanks
Dne 4.3.2015 v 20:39 Reindl Harald napsal(a):



Am 04.03.2015 um 19:57 schrieb Matus UHLAR - fantomas:

On Wed, 04 Mar 2015 13:35:55 +0100
Filip Havlí?ek wrote:

I would like to ask you, how can I *allow **only **legitimate* email
addresses (existing users) for bayes learning?


On 04.03.15 14:37, RW wrote:

Why send them through SpamAssassin in the first place?


He apparently wants to filter mail for spam but can't reject nonexistent
users recipients :-)


in other words he is a backscatter and part of the spam-problem 
because if you don't know your own valid users where do your MTA 
deliver to and what happens with mail not rejected but not deliverable?


However, that would also mean that spam going to random - nonexistent 
users

will NOT be trained even if it scores damn high


and luckily all the not catched spam with a damned low score too





Re: Bayes learning for legitimate users

2015-03-04 Thread John Hardin

On Wed, 4 Mar 2015, Filip Havlíček wrote:

I would like to ask you, how can I *allow **only **legitimate* email 
addresses (existing users) for bayes learning?


Reject invalid users at the MTA level during SMTP before the message even 
hits SA.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Failure to plan ahead on someone else's part does not constitute
  an emergency on my part. -- David W. Barts in a.s.r
---
 4 days until Daylight Saving Time begins in U.S. - Spring Forward

Re: Bayes learning for legitimate users

2015-03-04 Thread Matus UHLAR - fantomas

On Wed, 04 Mar 2015 13:35:55 +0100
Filip Havlí?ek wrote:

I would like to ask you, how can I *allow **only **legitimate* email
addresses (existing users) for bayes learning?


On 04.03.15 14:37, RW wrote:

Why send them through SpamAssassin in the first place?


He apparently wants to filter mail for spam but can't reject nonexistent
users recipients :-)

However, that would also mean that spam going to random - nonexistent users
will NOT be trained even if it scores damn high.


Don't turn-off auto-training unless you have a strategy for replacing
it.


agreed.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Where do you want to go to die? [Microsoft]


Re: Bayes learning for legitimate users

2015-03-04 Thread Reindl Harald



Am 04.03.2015 um 19:57 schrieb Matus UHLAR - fantomas:

On Wed, 04 Mar 2015 13:35:55 +0100
Filip Havlí?ek wrote:

I would like to ask you, how can I *allow **only **legitimate* email
addresses (existing users) for bayes learning?


On 04.03.15 14:37, RW wrote:

Why send them through SpamAssassin in the first place?


He apparently wants to filter mail for spam but can't reject nonexistent
users recipients :-)


in other words he is a backscatter and part of the spam-problem because 
if you don't know your own valid users where do your MTA deliver to and 
what happens with mail not rejected but not deliverable?



However, that would also mean that spam going to random - nonexistent users
will NOT be trained even if it scores damn high


and luckily all the not catched spam with a damned low score too



signature.asc
Description: OpenPGP digital signature


Re: Bayes learning for legitimate users

2015-03-04 Thread RW
On Wed, 04 Mar 2015 13:35:55 +0100
Filip Havlí?ek wrote:

 Hi,
 
 I would like to ask you, how can I *allow **only **legitimate* email 
 addresses (existing users) for bayes learning?

Why send them through SpamAssassin in the first place?
 
 Table bayes_token grow up to 0,5GB right now, because there are 
 thounsands of unknown email addresses like:

That table shouldn't grow without limit , you can run
sa-learn --force-expire from cron to prevent this. You may want to
increase bayes_expiry_max_db_size before to prevent the size plummeting.
Alternately you can expire directly using SQL based on time.

Some people add a timestamp field to the bayes_seen table to expire
entries from SQL. Alternately you can simple empty the table
occasionally,  the information is only needed to reverse or forget
training.

Don't turn-off auto-training unless you have a strategy for replacing
it.


Re: Bayes learning for legitimate users

2015-03-04 Thread Reindl Harald

don't reply offlist!

Am 04.03.2015 um 14:13 schrieb Filip Havlíček:

So you recommend set parameter *bayes_auto_learn* to value *0*? I had
truncate tables and try set bayes_auto_learn 0 in
/etc/spamassassin/local.cf but it does not work - new hundrends records
of unknown email addresses occured in tables *bayes_vars*, *bayes_token*
and *bayes_seen* :-(. Any ideas?


is /etc/spamassassin/local.cf really correct?
if it si are the permissions correct?

/etc/mail/spamassassin/local.cf is the correct path here

how is your SA called?
look for user_prefs in ~/.spamassassin/

no idea what bayes_vars is


Dne 4.3.2015 v 13:45 Reindl Harald napsal(a):

Am 04.03.2015 um 13:35 schrieb Filip Havlíček:

I would like to ask you, how can I *allow **only **legitimate* email
addresses (existing users) for bayes learning?

Table bayes_token grow up to 0,5GB right now, because there are
thounsands of unknown email addresses like:
a...@hotmail.com
ablewi...@hotmail.com
abl...@hotmail.com


don't use auto-learning or at least adjust the scores which are taken
for autolearning - SpamAssassin can't know if a address exists while
you could use http://www.postfix.org/ADDRESS_VERIFICATION_README.html
on the MTA level

*but* be careful with sender verification, you need to place a lot of
DNSWL in front to not become blacklisted for your own

i guess your main problem is that way too much mail makes it to SA at
all instead block it by RBL scoring and other MTA restrictions long
before - see below an example, all the stuff before the bayes stats
never touched SpamAssassin
__

Connections:   314179

Postscreen:171577
Helo:  1435
Subject:   187
Attachment:29
Header Length: 8
Sender Regex:  263
Sender Blocked:174
Sender Verify: 301
Sender Invalid:1622
Sender Spoofed:10
Sender Parked: 10
PTR Missing:   227
PTR Generic:   447
SPF:   709
__

BAYES_00 46223   77.63 %
BAYES_05   7331.23 %
BAYES_20   8941.50 %
BAYES_40   9571.60 %
BAYES_50  6463   10.85 %
BAYES_60   6411.07 %
BAYES_80   4720.79 %
BAYES_95   3440.57 %
BAYES_99  28144.72 %
BAYES_999 24524.11 %




signature.asc
Description: OpenPGP digital signature


Re: Bayes learning for legitimate users

2015-03-04 Thread Reindl Harald


Am 04.03.2015 um 13:35 schrieb Filip Havlíček:

I would like to ask you, how can I *allow **only **legitimate* email
addresses (existing users) for bayes learning?

Table bayes_token grow up to 0,5GB right now, because there are
thounsands of unknown email addresses like:
a...@hotmail.com
ablewi...@hotmail.com
abl...@hotmail.com


don't use auto-learning or at least adjust the scores which are taken 
for autolearning - SpamAssassin can't know if a address exists while you 
could use http://www.postfix.org/ADDRESS_VERIFICATION_README.html on the 
MTA level


*but* be careful with sender verification, you need to place a lot of 
DNSWL in front to not become blacklisted for your own


i guess your main problem is that way too much mail makes it to SA at 
all instead block it by RBL scoring and other MTA restrictions long 
before - see below an example, all the stuff before the bayes stats 
never touched SpamAssassin

__

Connections:   314179

Postscreen:171577
Helo:  1435
Subject:   187
Attachment:29
Header Length: 8
Sender Regex:  263
Sender Blocked:174
Sender Verify: 301
Sender Invalid:1622
Sender Spoofed:10
Sender Parked: 10
PTR Missing:   227
PTR Generic:   447
SPF:   709
__

BAYES_00 46223   77.63 %
BAYES_05   7331.23 %
BAYES_20   8941.50 %
BAYES_40   9571.60 %
BAYES_50  6463   10.85 %
BAYES_60   6411.07 %
BAYES_80   4720.79 %
BAYES_95   3440.57 %
BAYES_99  28144.72 %
BAYES_999 24524.11 %



signature.asc
Description: OpenPGP digital signature


Re: Bayes learning for legitimate users

2015-03-04 Thread Filip Havlíček

Sorry for bad reply only for you.

How can I found out right path for config file: local.cf ? Maybe config 
is loaded from other path.


I used MySQL structure from this file: 
http://spamassassin.apache.org/full/3.0.x/dist/sql/bayes_mysql.sql


Thanks

Dne 4.3.2015 v 14:43 Reindl Harald napsal(a):

don't reply offlist!

Am 04.03.2015 um 14:13 schrieb Filip Havlíček:

So you recommend set parameter *bayes_auto_learn* to value *0*? I had
truncate tables and try set bayes_auto_learn 0 in
/etc/spamassassin/local.cf but it does not work - new hundrends records
of unknown email addresses occured in tables *bayes_vars*, *bayes_token*
and *bayes_seen* :-(. Any ideas?


is /etc/spamassassin/local.cf really correct?
if it si are the permissions correct?

/etc/mail/spamassassin/local.cf is the correct path here

how is your SA called?
look for user_prefs in ~/.spamassassin/

no idea what bayes_vars is


Dne 4.3.2015 v 13:45 Reindl Harald napsal(a):

Am 04.03.2015 um 13:35 schrieb Filip Havlíček:

I would like to ask you, how can I *allow **only **legitimate* email
addresses (existing users) for bayes learning?

Table bayes_token grow up to 0,5GB right now, because there are
thounsands of unknown email addresses like:
a...@hotmail.com
ablewi...@hotmail.com
abl...@hotmail.com


don't use auto-learning or at least adjust the scores which are taken
for autolearning - SpamAssassin can't know if a address exists while
you could use http://www.postfix.org/ADDRESS_VERIFICATION_README.html
on the MTA level

*but* be careful with sender verification, you need to place a lot of
DNSWL in front to not become blacklisted for your own

i guess your main problem is that way too much mail makes it to SA at
all instead block it by RBL scoring and other MTA restrictions long
before - see below an example, all the stuff before the bayes stats
never touched SpamAssassin
__

Connections:   314179

Postscreen:171577
Helo:  1435
Subject:   187
Attachment:29
Header Length: 8
Sender Regex:  263
Sender Blocked:174
Sender Verify: 301
Sender Invalid:1622
Sender Spoofed:10
Sender Parked: 10
PTR Missing:   227
PTR Generic:   447
SPF:   709
__

BAYES_00 46223   77.63 %
BAYES_05   7331.23 %
BAYES_20   8941.50 %
BAYES_40   9571.60 %
BAYES_50  6463   10.85 %
BAYES_60   6411.07 %
BAYES_80   4720.79 %
BAYES_95   3440.57 %
BAYES_99  28144.72 %
BAYES_999 24524.11 %






Re: Bayes learning differences: v3.3.2 to v3.4.0

2014-11-05 Thread Kevin A. McGrail

On 11/4/2014 6:06 PM, John Woods wrote:

Everyone,

We're having problems with auto learning on v3.4.0 that we aren't 
having on v.3.3.2. The number of spam e-mails being auto-learned has 
dropped significantly, and the amount of spam being let through (false 
negatives) is higher as well.After looking through the wiki and 
the code, I'm pretty sure this change is related to the rule that says 
you must have 3 body only points and 3 header only points, which 
are hardcoded values in 
Mail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it looks 
like body-points equals the head-points, and in 3.4.0, they are changed.


You are correct.  There were changes and bugs found in the logic that 
were resolved on 3.4.0. See 
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5503

I've got a few questions:

1) How does SpamAssassin derive and sum the body_only and 
head_only points? It doesn't look like the body_only points 
correspond to any scores from individual tests.
There is a test_type flag.  It was sometimes lost in previous parsing of 
messages.


2) How can we affect the configuration, to increase the number of 
spam e-mails being auto-learned?
3) Instead, do we need to completely change our strategy for how 
we're using Bayes?
I will leave Bayes comments to other experts but in general, I believe 
you will find that some sort of NON automated learning will produce 
better results.  My concern with auto-learning is you are just 
self-perpetuating any flaws in the current classification not really 
helping to stop new and different spam.  I will likely setup a flamewar 
if I continue discussing Bayes.


Perhaps you can buy a six pack for AXB and convince him to add his $0.04 
on Bayes.  He's the resident expert.


regards,
KAM


Re: Bayes learning differences: v3.3.2 to v3.4.0

2014-11-05 Thread RW
On Tue, 04 Nov 2014 17:06:54 -0600
John Woods wrote:


  1) How does SpamAssassin derive and sum the body_only and 
 head_only points? It doesn't look like the body_only points
 correspond to any scores from individual tests.

Scoring uses one of four score sets, chosen according to whether
Bayes and network tests are off or on. Auto-training uses the scoreset
that you would have with Bayes turned-off. Also rules marked
noautolearn, learn and userconf are ignored.



Re: Bayes learning differences: v3.3.2 to v3.4.0

2014-11-05 Thread John Woods

Kevin,

I did skim bug 5503 earlier, but didn't understand it at first. 
Knowing the history now, it makes a little more sense, although I'm 
still fuzzy on why the value of 3 for the body and head points is 
important.


It might be nice to have local.cf directives to allow admins to be 
able to affect the $required_body_points and $required_head_points in 
AutoLearnThreshold.pm. That way, admins could tune tweak this behavior 
to allow more/less auto-learning... (i.e. 1 body points, and 2.5 head 
points) Thoughts?


As for Bayes strategies (and without starting a flamewar), we just 
started implementing an IMAP folder in everyone's mailbox called Learn 
As Spam, that gets processed through sa-learn --spam. It sounds like 
we may need to leave auto-learning to SA's defaults, and ask users to 
put e-mails in Learn As Spam and Learn As Non-Spam folders. Perhaps 
relying on out-of-the-box auto-learning, and tempering Bayes with 
user-based learning, may yield positive results.


Thanks again, Kevin and RW, for your input.

Sincerely,
John

On 11/05/14 06:40, Kevin A. McGrail wrote:

On 11/4/2014 6:06 PM, John Woods wrote:

Everyone,

We're having problems with auto learning on v3.4.0 that we aren't 
having on v.3.3.2. The number of spam e-mails being auto-learned has 
dropped significantly, and the amount of spam being let through 
(false negatives) is higher as well.After looking through the 
wiki and the code, I'm pretty sure this change is related to the rule 
that says you must have 3 body only points and 3 header only 
points, which are hardcoded values in 
Mail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it looks 
like body-points equals the head-points, and in 3.4.0, they are changed.


You are correct.  There were changes and bugs found in the logic that 
were resolved on 3.4.0. See 
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5503

I've got a few questions:

1) How does SpamAssassin derive and sum the body_only and 
head_only points? It doesn't look like the body_only points 
correspond to any scores from individual tests.
There is a test_type flag.  It was sometimes lost in previous parsing 
of messages.


2) How can we affect the configuration, to increase the number of 
spam e-mails being auto-learned?
3) Instead, do we need to completely change our strategy for how 
we're using Bayes?
I will leave Bayes comments to other experts but in general, I believe 
you will find that some sort of NON automated learning will produce 
better results.  My concern with auto-learning is you are just 
self-perpetuating any flaws in the current classification not really 
helping to stop new and different spam.  I will likely setup a 
flamewar if I continue discussing Bayes.


Perhaps you can buy a six pack for AXB and convince him to add his 
$0.04 on Bayes.  He's the resident expert.


regards,
KAM




Re: Bayes learning differences: v3.3.2 to v3.4.0

2014-11-05 Thread John Hardin

On Wed, 5 Nov 2014, John Woods wrote:

   As for Bayes strategies (and without starting a flamewar), we just 
started implementing an IMAP folder in everyone's mailbox called Learn As 
Spam, that gets processed through sa-learn --spam. It sounds like we may 
need to leave auto-learning to SA's defaults, and ask users to put e-mails in 
Learn As Spam and Learn As Non-Spam folders.


A warning: you should not blindly accept the training data from users, 
apart from a (likely small) group of users whose judgement and 
responsibility you trust. As a general rule, a mail admin or other skilled 
person should vet all user-submitted training data.


Also, the training messages should be retained so that they can be 
correctly retrained if the user misclassified them or reported them 
improperly.


Too many users will use the learn as spam folder as a substitute for 
unsubscribing from valid newsletters and such that they did voluntarily 
subscribe for but no longer are interested in.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Our government wants to do everything it can for the children,
  except sparing them crushing tax burdens.
---
 6 days until Veterans Day


Re: Bayes learning differences: v3.3.2 to v3.4.0

2014-11-05 Thread Kevin A. McGrail

On 11/5/2014 2:12 PM, John Woods wrote:
I did skim bug 5503 earlier, but didn't understand it at first. 
Knowing the history now, it makes a little more sense, although I'm 
still fuzzy on why the value of 3 for the body and head points is 
important.
Can disagree.  I don't know the history either.  I just know that 3 was 
the magic number and the code did not work as logically documented.


It might be nice to have local.cf directives to allow admins to be 
able to affect the $required_body_points and $required_head_points in 
AutoLearnThreshold.pm. That way, admins could tune tweak this behavior 
to allow more/less auto-learning... (i.e. 1 body points, and 2.5 head 
points) Thoughts?

Agreed.  Can you work on a patch to provide this?

As for Bayes strategies (and without starting a flamewar), we just 
started implementing an IMAP folder in everyone's mailbox called 
Learn As Spam, that gets processed through sa-learn --spam. It 
sounds like we may need to leave auto-learning to SA's defaults, and 
ask users to put e-mails in Learn As Spam and Learn As Non-Spam 
folders. Perhaps relying on out-of-the-box auto-learning, and 
tempering Bayes with user-based learning, may yield positive results.
Agreed.  Hand sorted corpora for spam and ham will lead to the best 
Bayes results and the system you are implementing is the closest 
practical method to achieve such a system.


Regards,
KAM


AW: SpamAssassin and Bayes learning

2012-06-05 Thread francwalter
 One crucial thing you didn't post: you ran the learning as root. Is the
 user that spamd is running as also root? The bayes database is
 user-specific, and a common problem is to train the database as a
 different user than the MTA+spamd is running under.

Owner and Group of the folder .spamassassin and the files in it are both amavis.
I hope this is the right user.

But I made a big mistake: I recently changed the server and all the learned 
spams were on the old server.
This is why there was so few spam only (801). So now I took the old files of 
learned spam and ham into the new server and now I have more (6225):

root@example:~# sa-learn --dbpath /var/lib/amavis/.spamassassin/bayes --dump 
magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   6225  0  non-token data: nspam
0.000  0  52634  0  non-token data: nham
0.000  01884302  0  non-token data: ntokens
0.000  0 1279163247  0  non-token data: oldest atime
0.000  0 1338890042  0  non-token data: newest atime
0.000  0 1338889064  0  non-token data: last journal sync atime
0.000  0 1284701742  0  non-token data: last expiry atime
0.000  05529600  0  non-token data: last expire atime delta
0.000  0   2438  0  non-token data: last expire reduction 
count


So maybe this was the real problem.




SpamAssassin and Bayes learning

2012-06-01 Thread francwalter
Hello

I use SpamAssassin 3.3.1 on Ubuntu 12.04 with Postfix 2.9.1-4 and AMaViS 2.6.5

All the time I move Spam when I get, to my Spam-folder, where I have some spam 
together since the last two years.
All night I use the script salearn-from-mails, to learn from the spam which 
is:

#!/bin/bash -e

SADIR=/var/lib/amavis/.spamassassin
DBPATH=/var/lib/amavis/.spamassassin/bayes
SPAMFOLDERS=\
/home/vmail/example.org/franc/.Spam/new \
/home/vmail/example.org/franc/.Spam/cur \

HAMFOLDERS=\
/home/vmail/example.org/franc/cur \


for spamfolder in $SPAMFOLDERS ; do \
echo Learning spam from $spamfolder ; \
nice sa-learn --spam --showdots --dbpath $DBPATH $spamfolder
done

for hamfolder in $HAMFOLDERS ; do \
echo Learning ham from $hamfolder ; \
nice sa-learn --ham --showdots --dbpath $DBPATH $hamfolder
done

chown -R amavis:amavis $SADIR



When I look of the learnings I get some results:

root@example:~# sa-learn --dbpath /var/lib/amavis/.spamassassin/bayes --dump 
magic
0.000  0  3  0  non-token data: bayes db version
0.000  0801  0  non-token data: nspam
0.000  0   5585  0  non-token data: nham
0.000  0 127343  0  non-token data: ntokens
0.000  0 1332999307  0  non-token data: oldest atime
0.000  0 1338539336  0  non-token data: newest atime
0.000  0 1338535082  0  non-token data: last journal sync atime
0.000  0 1338524715  0  non-token data: last expiry atime
0.000  05529600  0  non-token data: last expire atime delta
0.000  0   1989  0  non-token data: last expire reduction 
count


in my 

/etc/spamassassin/local.cf

I have:

use_bayes 1
bayes_auto_learn 1
bayes_auto_expire 0
bayes_path /var/lib/amavis/.spamassassin/bayes


But when I send an email with the content and Subject of an old spam-mail this 
passes without much bayes-score:

...
X-Virus-Scanned: Debian amavisd-new at ew6.org
X-Spam-Flag: NO
X-Spam-Score: 2.49
X-Spam-Level: **
X-Spam-Status: No, score=2.49 required=5 tests=[BAYES_50=0.8,
FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001,
T_RP_MATCHES_RCVD=-0.01, URIBL_DBL_SPAM=1.7] autolearn=no
...

What am I doing wrong?

Thanks in advance,

frank





Re: SpamAssassin and Bayes learning

2012-06-01 Thread RW
On Fri, 1 Jun 2012 10:52:05 +0200
francwal...@gmx.net wrote:


 But when I send an email with the content and Subject of an old
 spam-mail this passes without much bayes-score:
 

 What am I doing wrong?

You are testing a message that's part spam and part non-spam and
expecting  BAYES to detect it as spam. 


What happens with actual spam?


Re: SpamAssassin and Bayes learning

2012-06-01 Thread Frank Walter
There is very few spam in the spam folder and then these mails have a very 
small Bayes score (e.g. 0.8).
But there is more spam in the inbox.

I thought, if I put a mail into the spam folder and after sa learned it, there 
would be no question that the Bayes score for this mail would be high, the mail 
would be detected as spam. But it happens often that I get this kind of spam 
mail again.

Are the settings I posted all right? 




RW rwmailli...@googlemail.com schrieb:

On Fri, 1 Jun 2012 10:52:05 +0200
francwal...@gmx.net wrote:


 But when I send an email with the content and Subject of an old
 spam-mail this passes without much bayes-score:
 

 What am I doing wrong?

You are testing a message that's part spam and part non-spam and
expecting  BAYES to detect it as spam. 


What happens with actual spam?


Re: SpamAssassin and Bayes learning

2012-06-01 Thread John Hardin

On Fri, 1 Jun 2012, Frank Walter wrote:


There is very few spam in the spam folder and then these mails have a very 
small Bayes score (e.g. 0.8).
But there is more spam in the inbox.

I thought, if I put a mail into the spam folder and after sa learned it, there 
would be no question that the Bayes score for this mail would be high, the mail 
would be detected as spam. But it happens often that I get this kind of spam 
mail again.

Are the settings I posted all right?


One crucial thing you didn't post: you ran the learning as root. Is the 
user that spamd is running as also root? The bayes database is 
user-specific, and a common problem is to train the database as a 
different user than the MTA+spamd is running under.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The more you believe you can create heaven on earth the more
  likely you are to set up guillotines in the public square to
  hasten the process. -- James Lileks
---
 5 days until the 68th anniversary of D-Day


Re: SpamAssassin and Bayes learning

2012-06-01 Thread Niamh Holding

Hello John,

Friday, June 1, 2012, 3:31:23 PM, you wrote:

JH One crucial thing you didn't post: you ran the learning as root. Is the
JH user that spamd is running as also root? The bayes database is 
JH user-specific, and a common problem is to train the database as a 
JH different user than the MTA+spamd is running under.

Not always true, I run sa-learn as root, but it updates the database
for the user spamtest

-- 
Best regards,
 Niamhmailto:ni...@fullbore.co.uk

pgp4ugAv3ztmc.pgp
Description: PGP signature


Re: SpamAssassin and Bayes learning

2012-06-01 Thread RW
On Fri, 01 Jun 2012 14:52:45 +0200
Frank Walter wrote:

 There is very few spam in the spam folder and then these mails have a
 very small Bayes score (e.g. 0.8). But there is more spam in the
 inbox.
 
 I thought, if I put a mail into the spam folder and after sa learned
 it, there would be no question that the Bayes score for this mail
 would be high, the mail would be detected as spam. 

That's a false assumption. If you learn a spam and retest the exact same
spam it's very likely to hit BAYES_99, but that doesn't mean that
similar spams will be caught. Some types of spam are very resistant
to learning. Most ham is usually learned easily - check that most of it
is hitting BAYES_00.


 But it happens
 often that I get this kind of spam mail again.
 
 Are the settings I posted all right? 

IIWY I'd increase the bayes_expiry_max_db_size to 50 which is about
the maximum you can have without the expiry algorithm failing to find a
solution due to its hard-coded 256 day limit. If  801 is the total
spams from two years, that's about one a day; with a 64 day token
retention you are probably not retaining enough spammy tokens.

If a lot of spam isn't being caught, make sure you have network tests
running and the trusted and/or internal network is setup properly. 


Re: Apply Bayes learning to all users?

2011-12-16 Thread RW
On Fri, 16 Dec 2011 08:54:36 +0100
Benny Pedersen wrote:

 On Fri, 16 Dec 2011 06:30:31 +, Martin Hepworth wrote:
  Created a shared iMap or similar email account with a spam and ham
  folder for users to drag email into (not forward as that breaks
  headers in thing like outlook)
 
 yes, here i found that dovecot-antispam helpfull in the way 

I think you've both misread the question. The OP wants to use spamtrap
mail to train the individual user Bayes accounts.


The best way to do this would be to use the global database to adjust
the probabilities for low count tokens in the user database. Nothing
like that is supported.

Doing it via sa-learn sounds like more trouble than it's worth. It's
probably a good thing for high volume accounts, but swamping low
volume accounts may make things worse. 


Re: Apply Bayes learning to all users?

2011-12-16 Thread Steve Freitas

On 12/16/11 05:53, RW wrote:

On Fri, 16 Dec 2011 08:54:36 +0100
Benny Pedersen wrote:


On Fri, 16 Dec 2011 06:30:31 +, Martin Hepworth wrote:

Created a shared iMap or similar email account with a spam and ham
folder for users to drag email into (not forward as that breaks
headers in thing like outlook)

yes, here i found that dovecot-antispam helpfull in the way

I think you've both misread the question. The OP wants to use spamtrap
mail to train the individual user Bayes accounts.


The best way to do this would be to use the global database to adjust
the probabilities for low count tokens in the user database. Nothing
like that is supported.

Doing it via sa-learn sounds like more trouble than it's worth. It's
probably a good thing for high volume accounts, but swamping low
volume accounts may make things worse.


Thanks RW, you understood the question correctly. I'll take a look at 
those suggestions.


Stev3e


Apply Bayes learning to all users?

2011-12-15 Thread Steve Freitas

Hi all,

I have some spamtraps which get lots of spam. After a few precautions, I 
use sa-learn to train a single Bayes profile. This profile is used for 
many of my users. A significant amount of other users maintain their own 
Bayes profiles, and I'd like to make this training apply to their 
profiles as well. Is there an efficient way to do this? Repeatedly doing 
sa-learn for every user in my system doesn't seem like a good way to go 
about it.


Thanks,

Steve


Re: Apply Bayes learning to all users?

2011-12-15 Thread Martin Hepworth
Created a shared iMap or similar email account with a spam and ham folder
for users to drag email into (not forward as that breaks headers in thing
like outlook)

Then find one of the many perl scripts lying about the net to grab this
email and SA-learn it to the main bayes db.

Martin

On Friday, 16 December 2011, Steve Freitas sfl...@ihonk.com wrote:
 Hi all,

 I have some spamtraps which get lots of spam. After a few precautions, I
use sa-learn to train a single Bayes profile. This profile is used for many
of my users. A significant amount of other users maintain their own Bayes
profiles, and I'd like to make this training apply to their profiles as
well. Is there an efficient way to do this? Repeatedly doing sa-learn for
every user in my system doesn't seem like a good way to go about it.

 Thanks,

 Steve



Re: Apply Bayes learning to all users?

2011-12-15 Thread Benny Pedersen

On Fri, 16 Dec 2011 06:30:31 +, Martin Hepworth wrote:

Created a shared iMap or similar email account with a spam and ham
folder for users to drag email into (not forward as that breaks
headers in thing like outlook)


yes, here i found that dovecot-antispam helpfull in the way that users 
just move spam into a spam folder, then dovecut-antispam will learn it 
as spam, if mails are moved out of that folder its learned as ham, if 
users delete it in spam folder it does nothing


neat solution imho, since it support every client via imap protocol, no 
bug no problem, even for depricated clients like outlook express :-)






Can bayes learning be turned on and off in one procmailrc

2011-05-04 Thread Harry Putnam
I've been thinking about using bayes in learning mode, but I want to
do it without disturbing my current mail setup.

I thought I might (using procmail) channel a copy of all incoming mail
through spamassassin with bayes learning turned on.


I'd want bayes learning off in the main mail setup.  So I wondered if
one could turn bayes learning on by way of a call to spamassassin in
.promailrc but have bayes learning turned off in a different call to
spamassassin?

I'm thinking something along this line (but turning bayes learning
off/on as needed):

.procmailrc:

:0 c
{
  :0fw ## How to turn bayes learning on here?
  | /usr/bin/spamc

  :0:
  * ^X-Spam-Status: Yes
  tspama_spam.in

  :0 ## Here I can check results with out involving main setup 
  post_tspama_spam.in
  
}

  [...] Then after my other recipes the 2nd call

  :0fw ## How to turn bayes learning off here?
  | /usr/bin/spamc


  :0:
  * ^X-Spam-Status: Yes
  spama_spam_.in

[end .procmailrc]

Is there someway at those calls to turn bayes learning off or on?



Re: is bayes learning?

2010-02-19 Thread Matus UHLAR - fantomas
On 18.02.10 09:56, tonjg wrote:
 well this has certainly thrown a spanner in the works and I don't know what
 to do next. I was under the impression that sa was scanning my mail and red
 flagging any spams, then mimedefang would kick in rejecting the email at
 smtp. I'm completely confused now

It's apparent that it's mimedefang who takes care of spam checking, ans so
it's mimedefang's business to take care of autolearn. Now, search mimedefang
FAQ, mailing list, forum or any similar place for autolearing info.

   $ grep add_header 10_default_prefs.cf
 
 # grep add_header 10_default_prefs.cf
 grep: 10_default_prefs.cf: No such file or directory

first try switching to directory where the sulr files atre stored!

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Depression is merely anger without enthusiasm. 


Re: is bayes learning?

2010-02-19 Thread tonjg


Jari Fredriksson wrote:
 
 That is not the recipe I meant. That calls SA yes, but does not
 reject. I can't provide a recipe for procmail as I personally use
 maildrop, but the recipe that is needed is one filing the spam to a spam
 folder (or /dev/null).

the golden rule for my server is that spam is not diverted to any folder.
Spam gets rejected at smtp. Is that what you meant by overkill?

-- 
View this message in context: 
http://old.nabble.com/is-bayes-learning--tp27616380p27652339.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: is bayes learning?

2010-02-19 Thread Jari Fredriksson
On 19.2.2010 12:42, tonjg wrote:
 
 
 Jari Fredriksson wrote:

 That is not the recipe I meant. That calls SA yes, but does not
 reject. I can't provide a recipe for procmail as I personally use
 maildrop, but the recipe that is needed is one filing the spam to a spam
 folder (or /dev/null).
 
 the golden rule for my server is that spam is not diverted to any folder.
 Spam gets rejected at smtp. Is that what you meant by overkill?
 

No, if you want it really rejected at smtp time, the solution is OK for it.

My lighter approach would have been a spam folder for spam but it does
not serve your purposes.

-- 
http://www.iki.fi/jarif/

One of the most striking differences between a cat and a lie is that a
cat has
only nine lives.
-- Mark Twain, Pudd'nhead Wilson's Calendar



signature.asc
Description: OpenPGP digital signature


Re: is bayes learning?

2010-02-18 Thread Matus UHLAR - fantomas
 Matus UHLAR - fantomas wrote:
  you may have autolearn plugin not active. What does X-Spam-Status header
  in your mail say?

On 17.02.10 05:48, tonjg wrote:
 it says:
 X-Spam-Score: 4.463 ()
 BAYES_60,HTML_IMAGE_ONLY_24,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY
 X-Scanned-By: MIMEDefang 2.67 on 172.16.1.36
 I don't know what BAYES_60 means.

you seem to be running mimedefang which takes care about the e-mail. I have
no idea how does mimedefang interact with spamassassin, but I think you
should ask your question in mimedefang mailing list, or at least search the
web for mimedefang and auto-learn.


-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Remember half the people you know are below average. 


Re: is bayes learning?

2010-02-18 Thread tonjg


Matus UHLAR - fantomas wrote:
 you seem to be running mimedefang which takes care about the e-mail. I
 have no idea how does mimedefang interact with spamassassin, but I think
 you should ask your question in mimedefang mailing list, or at least
 search the web for mimedefang and auto-learn.

thanks but I'm only using mimedefang to reject email recognised by
spamassassin, I'm not using md to scan for spam.

-- 
View this message in context: 
http://old.nabble.com/is-bayes-learning--tp27616380p27638511.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: is bayes learning?

2010-02-18 Thread Jari Fredriksson
On 18.2.2010 18:16, tonjg wrote:
 
 
 Matus UHLAR - fantomas wrote:
 you seem to be running mimedefang which takes care about the e-mail. I
 have no idea how does mimedefang interact with spamassassin, but I think
 you should ask your question in mimedefang mailing list, or at least
 search the web for mimedefang and auto-learn.
 
 thanks but I'm only using mimedefang to reject email recognised by
 spamassassin, I'm not using md to scan for spam.
 

How does MimeDefang reject anything if it does not scan it? Your log
header sample looked like it was scanned by MimeDefang. Propably MD
calls SpamAssassin in it's scan process just like amavisd does.

Using a perl package just to reject spam would be an overkill. A
simple procmail recipe would do it without any extra process.

-- 
http://www.iki.fi/jarif/

Unless hours were cups of sack, and minutes capons, and clocks the tongues
of bawds, and dials the signs of leaping houses, and the blessed sun himself
a fair, hot wench in flame-colored taffeta, I see no reason why thou
shouldst
be so superfluous to demand the time of the day.  I wasted time and now doth
time waste me.
-- William Shakespeare



signature.asc
Description: OpenPGP digital signature


Re: is bayes learning?

2010-02-18 Thread Karsten Bräckelmann
On Thu, 2010-02-18 at 08:16 -0800, an anonymous Nabble user wrote:
 Matus UHLAR wrote:
  you seem to be running mimedefang which takes care about the e-mail. I
  have no idea how does mimedefang interact with spamassassin, but I think
  you should ask your question in mimedefang mailing list, or at least
  search the web for mimedefang and auto-learn.
 
 thanks but I'm only using mimedefang to reject email recognised by
 spamassassin, I'm not using md to scan for spam.

Let's have a look again at the headers you posted before.

  X-Spam-Score: 4.463 () BAYES_60,HTML_IMAGE_ONLY_24,HTML_MESSAGE,
HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY
  X-Scanned-By: MIMEDefang 2.67 on 172.16.1.36

Are these not added by mimedefang? Specifically the first one.

That's not a standard SA header. If it's NOT mimedefang, you changed the
configuration. The default Status header includes auto-learning info
*always*, whether it's enabled or not.

  $ grep add_header 10_default_prefs.cf

Besides, SA does not allow removing the Checker-Version header. Thus, if
the above are all X-Spam headers in your mail, it was not SA adding
them. But some other tool in your mail processing chain.

  guenther


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: is bayes learning?

2010-02-18 Thread tonjg

well this has certainly thrown a spanner in the works and I don't know what
to do next. I was under the impression that sa was scanning my mail and red
flagging any spams, then mimedefang would kick in rejecting the email at
smtp. I'm completely confused now


  $ grep add_header 10_default_prefs.cf

# grep add_header 10_default_prefs.cf
grep: 10_default_prefs.cf: No such file or directory
-- 
View this message in context: 
http://old.nabble.com/is-bayes-learning--tp27616380p27642949.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: is bayes learning?

2010-02-18 Thread tonjg


Jari Fredriksson wrote:
 
 How does MimeDefang reject anything if it does not scan it? Your log
 header sample looked like it was scanned by MimeDefang. Propably MD
 calls SpamAssassin in it's scan process just like amavisd does.
 
 Using a perl package just to reject spam would be an overkill. A
 simple procmail recipe would do it without any extra process.

but md is a mail filter designed to process mail, how is that an overkill?
and where would one find a simple procmail recipe?

-- 
View this message in context: 
http://old.nabble.com/is-bayes-learning--tp27616380p27642991.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: is bayes learning?

2010-02-18 Thread Karsten Bräckelmann
On Thu, 2010-02-18 at 09:56 -0800, an anonymous Nabble user wrote:
 well this has certainly thrown a spanner in the works and I don't know what
 to do next. I was under the impression that sa was scanning my mail and red
 flagging any spams, then mimedefang would kick in rejecting the email at
 smtp. I'm completely confused now

Well, yes -- according to the rules in your headers, SA is scanning the
messages. However, SA does not talk SMTP itself, and thus needs some
glue to be integrated. In your case, it is likely mimedefang which calls
SA to scan the message.

IMHO, you should get an overview of the mail flow on your system first.
Further debugging after that.

$ grep add_header 10_default_prefs.cf
 
 # grep add_header 10_default_prefs.cf
 grep: 10_default_prefs.cf: No such file or directory

Ahem. You do understand what that mysterious 'grep' command does, don't
you?

As you can easily deduct from the error message, the second argument is
a file -- and it doesn't exists in the dir where you ran the command...


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: is bayes learning?

2010-02-18 Thread Chris
On Thu, 2010-02-18 at 09:59 -0800, tonjg wrote:
 
 Jari Fredriksson wrote:
  
  How does MimeDefang reject anything if it does not scan it? Your log
  header sample looked like it was scanned by MimeDefang. Propably MD
  calls SpamAssassin in it's scan process just like amavisd does.
  
  Using a perl package just to reject spam would be an overkill. A
  simple procmail recipe would do it without any extra process.
 
 but md is a mail filter designed to process mail, how is that an overkill?
 and where would one find a simple procmail recipe?
 

This is what I use, may not be the greatest but it's been working for
years:

:0 fw : $ASSASSINLOCK
*  50 
| /usr/local/bin/spamc -f


-- 
KeyID 0xE372A7DA98E6705C



signature.asc
Description: This is a digitally signed message part


Re: is bayes learning?

2010-02-18 Thread Jari Fredriksson
On 19.2.2010 1:48, Chris wrote:
 On Thu, 2010-02-18 at 09:59 -0800, tonjg wrote:

 Jari Fredriksson wrote:

 How does MimeDefang reject anything if it does not scan it? Your log
 header sample looked like it was scanned by MimeDefang. Propably MD
 calls SpamAssassin in it's scan process just like amavisd does.

 Using a perl package just to reject spam would be an overkill. A
 simple procmail recipe would do it without any extra process.

 but md is a mail filter designed to process mail, how is that an overkill?
 and where would one find a simple procmail recipe?

 
 This is what I use, may not be the greatest but it's been working for
 years:
 
 :0 fw : $ASSASSINLOCK
 *  50 
 | /usr/local/bin/spamc -f
 

That is not the recipe I meant. That calls SA yes, but does not
reject. I can't provide a recipe for procmail as I personally use
maildrop, but the recipe that is needed is one filing the spam to a spam
folder (or /dev/null).


-- 
http://www.iki.fi/jarif/

Your sister swims out to meet troop ships.



signature.asc
Description: OpenPGP digital signature


Re: is bayes learning?

2010-02-18 Thread Jari Fredriksson
On 19.2.2010 1:48, Chris wrote:
 On Thu, 2010-02-18 at 09:59 -0800, tonjg wrote:

 Jari Fredriksson wrote:

 How does MimeDefang reject anything if it does not scan it? Your log
 header sample looked like it was scanned by MimeDefang. Propably MD
 calls SpamAssassin in it's scan process just like amavisd does.

 Using a perl package just to reject spam would be an overkill. A
 simple procmail recipe would do it without any extra process.

 but md is a mail filter designed to process mail, how is that an overkill?
 and where would one find a simple procmail recipe?

 
 This is what I use, may not be the greatest but it's been working for
 years:
 
 :0 fw : $ASSASSINLOCK
 *  50 
 | /usr/local/bin/spamc -f
 

I wonder the lock file used in procmail scripts... I do not use one in
my maildrop and I see no use to a lock file in when using spamc.

-- 
http://www.iki.fi/jarif/

Your sister swims out to meet troop ships.



signature.asc
Description: OpenPGP digital signature


Re: is bayes learning?

2010-02-18 Thread John Hardin

On Fri, 19 Feb 2010, Jari Fredriksson wrote:


On 19.2.2010 1:48, Chris wrote:

On Thu, 2010-02-18 at 09:59 -0800, tonjg wrote:


Jari Fredriksson wrote:


How does MimeDefang reject anything if it does not scan it? Your log 
header sample looked like it was scanned by MimeDefang. Propably MD 
calls SpamAssassin in it's scan process just like amavisd does.


Using a perl package just to reject spam would be an overkill. A 
simple procmail recipe would do it without any extra process.


but md is a mail filter designed to process mail, how is that an 
overkill? and where would one find a simple procmail recipe?




This is what I use, may not be the greatest but it's been working for 
years:


:0 fw : $ASSASSINLOCK
*  50
| /usr/local/bin/spamc -f


That is not the recipe I meant. That calls SA yes, but does not
reject.


You _can't_ SMTP-time reject if you're using procmail, as procmail is a 
delivery agent. The message has already been accepted by the MTA by the 
time procmail sees it.


Is that what you meant?

I can't provide a recipe for procmail as I personally use maildrop, but 
the recipe that is needed is one filing the spam to a spam folder (or 
/dev/null).


Take a look in http://www.impsec.org/~jhardin/antispam/

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The first time I saw a bagpipe, I thought the player was torturing
  an octopus. I was amazed they could scream so loudly.
-- cat_herder_5263 on Y! SCOX
---
 4 days until George Washington's 278th Birthday


Re: is bayes learning?

2010-02-17 Thread Arthur Dent
On Tue, 2010-02-16 at 15:22 -0800, tonjg wrote:
 I've got a feeling that the spamassassin on my machine is improving in the
 way it recognises spam but I'd like to be sure it's not just my imagination.
 I did my first manual bayes learn about 2 weeks ago using 200 spams and 200
 hams, the process appeared to go properly. I read that autolearn is enabled
 by default and kicks in after 200 emails learnt, but is there a way to tell
 whether bayes is actually learning?

In addition to what the other respondents to this thread have said
(sa-learn --dump magic) you should also bear in mind the fact that
autolearn only works within set parameters. These are configurable, but
I forget what the default is for the moment.

What this means is, that if the threshold for autolearning spam is set
at 12, spam that is correctly identified as such and scores about 6 - 11
points in SA will not be autolearned. By the same token there is a
maximum threshold for autolearning ham.

I believe this is done for safety to prevent learning FPs and FNs
inappropriately.

What this means is that you must still continue to train bayes manually
with those mails close to the threshold.

I have a nightly cron job set up to read all my verified mail from spam
and ham folders and learn as ham or spam respectively. It doesn't matter
if the mail has already been learned - sa-learn will work that out for
itself.

See man sa-learn for the most comprehensive help you will ever find in a
man page!

HTH 






Re: is bayes learning?

2010-02-17 Thread tonjg


Mikael Syska wrote:
 
 [r...@freebsd /]# date -r 1266318121
 Tue Feb 16 12:02:01 CET 2010
 
 newsest atime should tell you when it last learned from a message.

thanks for your response, I ran sa-learn --dump magic:
0.000  0  3  0  non-token data: bayes db version
0.000  0234  0  non-token data: nspam
0.000  0280  0  non-token data: nham
0.000  0  28982  0  non-token data: ntokens
0.000  0 1048982400  0  non-token data: oldest atime
0.000  0 1266390928  0  non-token data: newest atime
0.000  0 1266379330  0  non-token data: last journal sync
atime
0.000  0 1264788275  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime
delta
0.000  0  0  0  non-token data: last expire
reduction count

but I don't get the same results as you. I get:
[r...@home admin]# date -r 1266390928
date: 1266390928: No such file or directory



-- 
View this message in context: 
http://old.nabble.com/is-bayes-learning--tp27616380p27622857.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: is bayes learning?

2010-02-17 Thread tonjg


RW-15 wrote:
 
 On Wed, 17 Feb 2010 00:29:38 +0100
 Mikael Syska mik...@syska.dk wrote:
 Watching nham, nspam counts is more meaningful.

my nspam and nham counts look the same as they were two weeks ago without
change, which makes me think that bayes isn't learning...
-- 
View this message in context: 
http://old.nabble.com/is-bayes-learning--tp27616380p27622878.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: is bayes learning?

2010-02-17 Thread Arthur Dent
On Wed, 2010-02-17 at 04:16 -0800, tonjg wrote:
 
 Mikael Syska wrote:
  
  [r...@freebsd /]# date -r 1266318121
  Tue Feb 16 12:02:01 CET 2010
  
  newsest atime should tell you when it last learned from a message.
 
 thanks for your response, I ran sa-learn --dump magic:
 0.000  0  3  0  non-token data: bayes db version
 0.000  0234  0  non-token data: nspam
 0.000  0280  0  non-token data: nham
 0.000  0  28982  0  non-token data: ntokens
 0.000  0 1048982400  0  non-token data: oldest atime
 0.000  0 1266390928  0  non-token data: newest atime
 0.000  0 1266379330  0  non-token data: last journal sync
 atime
 0.000  0 1264788275  0  non-token data: last expiry atime
 0.000  0  0  0  non-token data: last expire atime
 delta
 0.000  0  0  0  non-token data: last expire
 reduction count
 
 but I don't get the same results as you. I get:
 [r...@home admin]# date -r 1266390928
 date: 1266390928: No such file or directory
 

Try # date -d @1266390928

or go to http://www.epochconverter.com/






Re: is bayes learning?

2010-02-17 Thread Matus UHLAR - fantomas
 RW-15 wrote:
  On Wed, 17 Feb 2010 00:29:38 +0100
  Mikael Syska mik...@syska.dk wrote:
  Watching nham, nspam counts is more meaningful.

On 17.02.10 04:18, tonjg wrote:
 my nspam and nham counts look the same as they were two weeks ago without
 change, which makes me think that bayes isn't learning...

you may have autolearn plugin not active. What does X-Spam-Status header
in your mail say?

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I just got lost in thought. It was unfamiliar territory. 


Re: is bayes learning?

2010-02-17 Thread tonjg


Arthur Dent-6 wrote:
 
 Try # date -d @1266390928

ah yes thanks Arthur that worked:
[r...@home admin]# date -d @1266390928
Wed Feb 17 07:15:28 GMT 2010
[r...@home admin]#


-- 
View this message in context: 
http://old.nabble.com/is-bayes-learning--tp27616380p27623785.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: is bayes learning?

2010-02-17 Thread tonjg


Matus UHLAR - fantomas wrote:
 you may have autolearn plugin not active. What does X-Spam-Status header
 in your mail say?

it says:
X-Spam-Score: 4.463 ()
BAYES_60,HTML_IMAGE_ONLY_24,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY
X-Scanned-By: MIMEDefang 2.67 on 172.16.1.36
I don't know what BAYES_60 means.
-- 
View this message in context: 
http://old.nabble.com/is-bayes-learning--tp27616380p27623876.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



is bayes learning?

2010-02-16 Thread tonjg

I've got a feeling that the spamassassin on my machine is improving in the
way it recognises spam but I'd like to be sure it's not just my imagination.
I did my first manual bayes learn about 2 weeks ago using 200 spams and 200
hams, the process appeared to go properly. I read that autolearn is enabled
by default and kicks in after 200 emails learnt, but is there a way to tell
whether bayes is actually learning?
-- 
View this message in context: 
http://old.nabble.com/is-bayes-learning--tp27616380p27616380.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: is bayes learning?

2010-02-16 Thread Mikael Syska
Hi,

[r...@freebsd ]# sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0 22  0  non-token data: nham
0.000  0793  0  non-token data: ntokens
0.000  0 1266272147  0  non-token data: oldest atime
0.000  0 1266318121  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync atime
0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire
reduction count
[r...@freebsd /]# date -r 1266318121
Tue Feb 16 12:02:01 CET 2010

newsest atime should tell you when it last learned from a message.

Yes, my system is new and not yet using bayes ...

mvh

On Wed, Feb 17, 2010 at 12:22 AM, tonjg t...@freeuk.com wrote:

 I've got a feeling that the spamassassin on my machine is improving in the
 way it recognises spam but I'd like to be sure it's not just my imagination.
 I did my first manual bayes learn about 2 weeks ago using 200 spams and 200
 hams, the process appeared to go properly. I read that autolearn is enabled
 by default and kicks in after 200 emails learnt, but is there a way to tell
 whether bayes is actually learning?
 --
 View this message in context: 
 http://old.nabble.com/is-bayes-learning--tp27616380p27616380.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.




Re: is bayes learning?

2010-02-16 Thread RW
On Wed, 17 Feb 2010 00:29:38 +0100
Mikael Syska mik...@syska.dk wrote:


 newsest atime should tell you when it last learned from a message.

Token atimes get updated when you scan a mail.

Watching nham, nspam counts is more meaningful.


Re: is bayes learning?

2010-02-16 Thread Martin Gregorie
On Tue, 2010-02-16 at 15:22 -0800, tonjg wrote:
 I've got a feeling that the spamassassin on my machine is improving in the
 way it recognises spam but I'd like to be sure it's not just my imagination.
 I did my first manual bayes learn about 2 weeks ago using 200 spams and 200
 hams, the process appeared to go properly. I read that autolearn is enabled
 by default and kicks in after 200 emails learnt, but is there a way to tell
 whether bayes is actually learning?

Look at X-Spam-status message headers:

X-spam-status: No, score=1.2 required=6.0 tests=BAYES_00,HELO_LOCALHOST,
RCVD_IN_BSP_OTHER autolearn=ham version=3.2.5

or scan /var/log/maillog for spamd messages that report the results for
each message:

Feb 13 04:51:07 zoogz spamd[8924]: spamd: result: Y 15 - BAYES_80,
EMPTY_MESSAGE,HELO_LOCALHOST,MG_IMAGEATT,MG_IMAGESUS,MG_JPEG,MG_VIAUKFSN,
MISSING_SUBJECT,RCVD_IN_BL_SPAMCOP_NET,SHORT_HELO_AND_INLINE_IMAGE,TVD_SPACE_RATIO
 scantime=2.0,size=17758,user=getmail,uid=522,required_score=6.0,
rhost=localhost.localdomain,raddr=127.0.0.1,rport=41130,
mid=20100213044404.5698549saliva...@zavodzpr-sa.ba,bayes=0.873808,autolearn=spam

In both places the autolearn clause tells you what, if any, learning was
done from the message. The possible answers are ham,spam or no. The
latter applies to messages with scores that are fairly close to zero and
so were not automatically learned.


Martin




Re: bayes learning '0 messages found'

2010-02-15 Thread smfabac



John Hardin wrote:
 
 On Sat, 13 Feb 2010, smfabac wrote:
 
 Is there a message size limit for sa-learn?
 
 Yes, there is, and sadly sa-learn does not explicitly tell you a message 
 has been skipped because it's too large.
 
 If there's a non-text attachment try deleteing it and re-learning the 
 message.
 
 -- 
   John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
   jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
   key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
 ---
End users want eye candy and the ooo's and hhh's experience
when reading mail. To them email isn't a tool, but an entertainment
form. -- Steve Lake
 ---
   9 days until George Washington's 278th Birthday
 
 

Ok. It's a size problem:

I edited the notspam message and deleted 1000 lines from line 3000 to
4000, saved the file and then reprocessed notspam.

I continued getting 0 messages examined until I had deleted 3000 lines
of the message:

Message size as received:

$ wc -l notspam 
   6408 notspam  -- sa-learn --ham failed on notspam folder
 with one message  of 6000+ lines
$ 

After deleting 3003 lines:

$ wc -l notspam
   3405 notspam
$ vi notspam

 1  ^A^A^A^A
 2  From smf  Thu Feb 11 01:30:02 2010
 3  From: Boyd Lynn Gerber gerb...@zenez.com
 4  To: distribut...@registry.ca
 5  Subject: Quarterly ASCII posting of SCO UnixWare 7/OpenUNIX
8/OpenServer6 FAQ
 6  Date: Thu, 11 Feb 2010 00:05:18 -0700 (MST)
 7  Message-Id: ou8faqqt_1265871...@news.xmission.com

  3395
  3396   filepriv -f setuid programfile.exe
  3397
  3398  --
  3399  Boyd Gerber gerb...@zenez.com 801 849-0213
  3400  ZENEZ   1042 East Fort Union #135, Midvale Utah  84047
  3401
  3402
  3403  =_4B73B21B.8398EDEC--
  3404
  3405  ^A^A^A^A

$ sa-learn --showdots --ham --mbox notspam
.
Learned tokens from 1 message(s) (1 message(s) examined)
$ 
$ wc notspam
  lines: 3405  words:  18735  characters: 130876 notspam


So, does the documentation on sa-learn indicate that there is 
a size limit on the message to be processed?

-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27590620.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-15 Thread Kai Schaetzl
Smfabac wrote on Mon, 15 Feb 2010 00:20:06 -0800 (PST):

 So, does the documentation on sa-learn indicate that there is 
 a size limit on the message to be processed?

Why not check yourself?

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: bayes learning '0 messages found'

2010-02-15 Thread smfabac


Kai Schaetzl wrote:
 
 Smfabac wrote on Mon, 15 Feb 2010 00:20:06 -0800 (PST):
 
 So, does the documentation on sa-learn indicate that there is 
 a size limit on the message to be processed?
 
 Why not check yourself?
 
 Kai
 
 -- 
 Get your web at Conactive Internet Services: http://www.conactive.com
 
 
 
 
 

Thanks for your help Kai.

After checking
http://spamassassin.apache.org/full/3.0.x/dist/doc/sa-learn.html

I see that there is no official answer to the question. what is the message
size limit where sa-learn fails. 

The question So, does the documentation on sa-learn indicate that there is
a 
size limit on the messages to be processed? is a veiled request to the SA
developers/maintainers that people may be interested in that information.

-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27595445.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-15 Thread Kai Schaetzl
Smfabac wrote on Mon, 15 Feb 2010 07:27:19 -0800 (PST):

 The question So, does the documentation on sa-learn indicate that there is
 a 
 size limit on the messages to be processed? is a veiled request to the SA
 developers/maintainers that people may be interested in that information.

If you want to ask for better documentation of this for instance in the man 
file or even an option to override the default size limit you should ask on 
https://issues.apache.org/SpamAssassin/


Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: bayes learning '0 messages found'

2010-02-15 Thread Martin Gregorie
On Mon, 2010-02-15 at 07:27 -0800, smfabac wrote:
 I see that there is no official answer to the question. what is the message
 size limit where sa-learn fails. 
 
If you use something spamc rather than using sa_learn you can gain some
flexibility due to the places and hosts where you can run spamc plus you
get the ability to set the max message size yourself. Here's an extreme
example:

for f in spam/*
do
  l=$(wc $f | gawk '{ print $3 }')
  spamc --learntype=spam --max-size=$l $f
done

where the limit is set to the size of each spam message in turn.


Martin




Re: bayes learning '0 messages found'

2010-02-13 Thread smfabac


RW-15 wrote:
 
 On Fri, 12 Feb 2010 17:51:12 +
 RW rwmailli...@googlemail.com wrote:
 
 On Fri, 12 Feb 2010 09:17:54 -0800 (PST)
 smfabac smfa...@att.net wrote:
 
  
 
  Mark, 
  
  On UNIX any file is a mbox file if it contains mail messages in the
  form:
  
  ^A^A^A^A
  mail headers
  mail body
  ^A^A^A^A
  ^A^A^A^A
  Next Message mail headers
  mail body
  ^A^A^A^A
 
 I don't know what that is, but it's not a standard mbox format.
 
 In mbox format the emails all start with a blank line and a From.
 
 
 It appears to be mmdf format
 
 http://www.washington.edu/imap/documentation/formats.txt.html
 
 

Ok, 

Now that we're all on the same page. How do I find out why sa-learn
is not processing the legal not-spam file?  To re-cap, sa-learn --spam
--mbox isspam works but sa-learn --ham --mbox not-spam is not
working.  

The sa-learn --dump magic shows that messages have been 
added by the sa-learn command:

$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  12551  0  non-token data: nspam
0.000  0  68020  0  non-token data: nham
0.000  0 143948  0  non-token data: ntokens
0.000  0 1260104403  0  non-token data: oldest atime
0.000  0 1266048014  0  non-token data: newest atime
0.000  0 1266049794  0  non-token data: last journal sync
atime
0.000  0 1265630710  0  non-token data: last expiry atime
0.000  05529600  0  non-token data: last expire atime
delta
0.000  0  19095  0  non-token data: last expire
reduction co
unt

$ sa-learn --spam --mbox isspam
Learned tokens from 1 message(s) (1 message(s) examined)
$

$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  12552  0  non-token data: nspam
0.000  0  68020  0  non-token data: nham
0.000  0 144608  0  non-token data: ntokens
0.000  0 1260104403  0  non-token data: oldest atime
0.000  0 1266048014  0  non-token data: newest atime
0.000  0 1266049794  0  non-token data: last journal sync
atime
0.000  0 1265630710  0  non-token data: last expiry atime
0.000  05529600  0  non-token data: last expire atime
delta
0.000  0  19095  0  non-token data: last expire
reduction co
unt
$ 

As you can see the nspam has incremented by 1.

$ sa-learn --ham --mbox not-spam
Learned tokens from 0 message(s) (0 message(s) examined)
$ 

Read Create Save Delete Undelete Print Folder Options Quit
Set mail options and preferences
Folder: not-spamSaturday February 13, 2010 
2:34
-- [1] Message 

  1 gerb...@zenez.co  11 Feb 10 6404  Quarterly ASCII posting of SCO
Uni


Is there a message size limit for sa-learn?  The message in not-spam is 
plain ascii, no html.

$ wc -l not-spam
   6408 not-spam  -- sa-learn --ham failed on not-spam folder with one
message
$ 
$ wc -l isspam
   1039 isspam   -- sa-learn --spam worked on isspam folder with one
message
$ 
-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27573012.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-13 Thread Charles Gregory

On Sat, 13 Feb 2010, smfabac wrote:

Now that we're all on the same page. How do I find out why sa-learn
is not processing the legal not-spam file?  To re-cap, sa-learn --spam
--mbox isspam works but sa-learn --ham --mbox not-spam is not
working.


Well, I would expect if this suggestion were right you would have had all 
sorts of warning messages about syntax, but just in case


Maybe linux is interpreting the dash in the filename as a switch 
indicator? Try enclosing the file name in single quotes or use a filename 
without a dash...


- C




Re: bayes learning '0 messages found'

2010-02-13 Thread smfabac


Charles Gregory wrote:
 
 On Sat, 13 Feb 2010, smfabac wrote:
 Now that we're all on the same page. How do I find out why sa-learn
 is not processing the legal not-spam file?  To re-cap, sa-learn --spam
 --mbox isspam works but sa-learn --ham --mbox not-spam is not
 working.
 
 Well, I would expect if this suggestion were right you would have had all 
 sorts of warning messages about syntax, but just in case
 
 Maybe linux is interpreting the dash in the filename as a switch 
 indicator? Try enclosing the file name in single quotes or use a filename 
 without a dash...
 
 - C
 
 
 
 

$ ls -lt | head -3
total 15868
-rw---   1 smf  group 249046 Feb 13 02:37 not-spam
-rw-rw-rw-   1 smf  group  94762 Feb 13 02:29 isspam
$ mv not-spam notspam
$ ls -lt | head -3
total 15868
-rw---   1 smf  group 249046 Feb 13 02:37 notspam
-rw-rw-rw-   1 smf  group  94762 Feb 13 02:29 isspam

$ sa-learn --showdots --ham --mbox notspam

Learned tokens from 0 message(s) (0 message(s) examined)
$

On the off chance that permissions on the file is an issue:

$ chmod 666 notspam
$ ls -lt | head -3
total 15868
-rw-rw-rw-   1 smf  group 249046 Feb 13 02:37 notspam
-rw-rw-rw-   1 smf  group  94762 Feb 13 02:29 isspam

$ sa-learn --showdots --ham --mbox notspam

Learned tokens from 0 message(s) (0 message(s) examined)

Still no luck.

-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27576922.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-13 Thread Matus UHLAR - fantomas
On 12.02.10 09:17, smfabac wrote:
 On UNIX any file is a mbox file if it contains mail messages in the form:
 
 ^A^A^A^A
 mail headers
 mail body
 ^A^A^A^A
 ^A^A^A^A
 Next Message mail headers
 mail body
 ^A^A^A^A

mmdf, not mbox.

 And my not-spam file meets this requirement:
 
 ^A^A^A^A

sa-learn apparently does not support mmdf. when sa-learn does not recognize
the format of the file, it does not learn from it.

 Also, reading the file with the command mail -f not-spam launches 
 the UNIX mail reader showing that the file is legal mbox file.

your mail command supports mmdf.

save the message to mbox format (saving it to a single file without the ^A's
could work) and try sa-learn from it.
-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.


Re: bayes learning '0 messages found'

2010-02-13 Thread John Hardin

On Sat, 13 Feb 2010, smfabac wrote:


Is there a message size limit for sa-learn?


Yes, there is, and sadly sa-learn does not explicitly tell you a message 
has been skipped because it's too large.


If there's a non-text attachment try deleteing it and re-learning the 
message.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  End users want eye candy and the ooo's and hhh's experience
  when reading mail. To them email isn't a tool, but an entertainment
  form. -- Steve Lake
---
 9 days until George Washington's 278th Birthday


Re: bayes learning '0 messages found'

2010-02-13 Thread Charles Gregory

On Sat, 13 Feb 2010, smfabac wrote:

$ sa-learn --showdots --ham --mbox notspam
Learned tokens from 0 message(s) (0 message(s) examined)
Still no luck.


Are we sure the notspam file is clean? Try trimming it down to just one or 
two messages, and see how it goes


- C


Re: bayes learning '0 messages found'

2010-02-12 Thread smfabac


tonjg wrote:
 
 raq550 server
 OS: strongbolt2
 spamassassin.i386 0:3.2.5-1.el4
 
 I'm trying to run:
 sa-learn --spam --showdots --dir /path/to...mbox
 but it fails with:
 'Learned tokens from 0 message(s) (0 messages examined)'
 my spam mail is in a file called mbox but when I run the above command to
 the directory containg mbox it always fails with the '0 messages examined'
 error.
 I've also tried copying the mbox file to another location, removing all
 the restrictions on it but I still get '0 messages learned'.
 I know the sa-learn command is working properly because I previously
 pointed it to a wrong location and it picked up 3 tokens but it won't pick
 up anything from the mbox file. I've even tried renaming the (copied) mbox
 file and restarting spamassassin but no joy.
 The mbox file contains about 200 spam mails and is 3.5Mb. Thanks for any
 help.
 

I am having a similar problem as the  poster but I have successfully run
spamassassin for several years and today when I used the sa-lean
command to process the mailbox where I moved the mis-classified
mail message (not-spam) I get:

$ sa-learn --showdots --ham --mbox not-spam

Learned tokens from 0 message(s) (0 message(s) examined)
$

Check the mail folder not-spam:

$ mail -f not-spam
SCO OpenServer Mail Release 5.0.7  Type ? for help.
not-spam: 1 message
   1 gerb...@zenez.co Thu Feb 11 01:30 6405/248986 Quarterly ASCII posting
 of 


And reading the message:

Message  1:
From smf  Thu Feb 11 01:30:02 2010
From: Boyd Lynn Gerber gerb...@zenez.com
To: distribut...@registry.ca
Subject: Quarterly ASCII posting of SCO UnixWare 7/OpenUNIX 8/OpenServer 6
FAQ
Date: Thu, 11 Feb 2010 00:05:18 -0700 (MST)
Message-Id: ou8faqqt_1265871...@news.xmission.com
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on unix.smfabac.com
X-Spam-Level: ***
X-Spam-Status: Yes, score=3.4 required=3.0 tests=HEADER_SPAM
autolearn=unavailable version=3.2.5
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=--=_4B73B21B.8398EDEC
Status: RO

This is a multi-part message in MIME format.

=_4B73B21B.8398EDEC
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

Spam detection software, running on the system unix.smfabac.com, has


And sa-learn --dump --magic shows:

$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  12551  0  non-token data: nspam
0.000  0  67987  0  non-token data: nham
0.000  0 143194  0  non-token data: ntokens
0.000  0 1260104403  0  non-token data: oldest atime
0.000  0 1265990403  0  non-token data: newest atime
0.000  0 1265991303  0  non-token data: last journal sync
atime
0.000  0 1265630710  0  non-token data: last expiry atime
0.000  05529600  0  non-token data: last expire atime
delta
0.000  0  19095  0  non-token data: last expire
reduction co
unt
$

I have successfully run sa-learn --ham --mbox not-spam in the past so
why is it failing me now?

how do I determine why the message is not being processed by sa-learn?


-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27566005.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-12 Thread Mark Martinec
tonjg wrote:
 I'm trying to run:
 sa-learn --spam --showdots --dir /path/to...mbox
 but it fails with:
 'Learned tokens from 0 message(s) (0 messages examined)'
 my spam mail is in a file called mbox but when I run the above command to
 the directory containg mbox it always fails with the '0 messages
 examined' error.

If your messages are in a mbox *file*, you need an option --mbox,
not --dir .

smfabac wrote: 
 I am having a similar problem as the  poster but I have successfully run
 spamassassin for several years and today when I used the sa-lean
 command to process the mailbox where I moved the mis-classified
 mail message (not-spam) I get:
 
 $ sa-learn --showdots --ham --mbox not-spam
 
 Learned tokens from 0 message(s) (0 message(s) examined)

 Check the mail folder not-spam:

If not-spam is a folder (not a mbox file), you must not
use the option --mbox.

  Mark



Re: bayes learning '0 messages found'

2010-02-12 Thread smfabac


Mark Martinec wrote:
 
 tonjg wrote:
 I'm trying to run:
 sa-learn --spam --showdots --dir /path/to...mbox
 but it fails with:
 'Learned tokens from 0 message(s) (0 messages examined)'
 my spam mail is in a file called mbox but when I run the above command to
 the directory containg mbox it always fails with the '0 messages
 examined' error.
 
 If your messages are in a mbox *file*, you need an option --mbox,
 not --dir .
 
 smfabac wrote: 
 I am having a similar problem as the  poster but I have successfully run
 spamassassin for several years and today when I used the sa-lean
 command to process the mailbox where I moved the mis-classified
 mail message (not-spam) I get:
 
 $ sa-learn --showdots --ham --mbox not-spam
 
 Learned tokens from 0 message(s) (0 message(s) examined)
 
 Check the mail folder not-spam:
 
 If not-spam is a folder (not a mbox file), you must not
 use the option --mbox.
 
   Mark
 
 
 

Mark, 

On UNIX any file is a mbox file if it contains mail messages in the form:

^A^A^A^A
mail headers
mail body
^A^A^A^A
^A^A^A^A
Next Message mail headers
mail body
^A^A^A^A

And my not-spam file meets this requirement:

^A^A^A^A
From smf  Thu Feb 11 01:30:02 2010
From: Boyd Lynn Gerber gerb...@zenez.com
To: distribut...@registry.ca
...
stuff deleted
...
=_4B73B21B.8398EDEC--

^A^A^A^A

Also, reading the file with the command mail -f not-spam launches 
the UNIX mail reader showing that the file is legal mbox file.
-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27566692.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-12 Thread RW
On Fri, 12 Feb 2010 09:17:54 -0800 (PST)
smfabac smfa...@att.net wrote:

 

 Mark, 
 
 On UNIX any file is a mbox file if it contains mail messages in the
 form:
 
 ^A^A^A^A
 mail headers
 mail body
 ^A^A^A^A
 ^A^A^A^A
 Next Message mail headers
 mail body
 ^A^A^A^A

I don't know what that is, but it's not a standard mbox format.

In mbox format the emails all start with a blank line and a From.


Re: bayes learning '0 messages found'

2010-02-12 Thread RW
On Fri, 12 Feb 2010 17:51:12 +
RW rwmailli...@googlemail.com wrote:

 On Fri, 12 Feb 2010 09:17:54 -0800 (PST)
 smfabac smfa...@att.net wrote:
 
  
 
  Mark, 
  
  On UNIX any file is a mbox file if it contains mail messages in the
  form:
  
  ^A^A^A^A
  mail headers
  mail body
  ^A^A^A^A
  ^A^A^A^A
  Next Message mail headers
  mail body
  ^A^A^A^A
 
 I don't know what that is, but it's not a standard mbox format.
 
 In mbox format the emails all start with a blank line and a From.


It appears to be mmdf format

http://www.washington.edu/imap/documentation/formats.txt.html


Re: bayes learning '0 messages found'

2010-01-28 Thread Mark Martinec
On Thursday 28 January 2010 17:16:04 tonjg wrote:
 spamassassin.i386 0:3.2.5-1.el4
 
 I'm trying to run:
 sa-learn --spam --showdots --dir /path/to...mbox
 but it fails with:
 'Learned tokens from 0 message(s) (0 messages examined)'
 my spam mail is in a file called mbox but when I run the above command to
 the directory containg mbox it always fails with the '0 messages examined'
 error.

If the argument is a single mbox file, precede it with a --mbox option,
not with --dir .

  Mark


Re: bayes learning '0 messages found'

2010-01-28 Thread tonjg

it's okay - I found the solution at:
http://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html
the command needed --mbox to be included. I added this and the learning
worked.
-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27358559.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-01-28 Thread tonjg


Mark Martinec wrote:
 
 If the argument is a single mbox file, precede it with a --mbox option,
 not with --dir .

thanks for your response but I've got a further problem now (I think). I'm
trying to do the same thing with the ham command# sa-learn --showdots --mbox
--ham but nothing's happening. When I did the spam command it showed a
progression of dots and ended with a confirmation message of tokens found
and 216 emails scanned. But with the ham command there's nothing happening -
the cursor just dropped to the next line and it's been there for half an
hour now. Is this normal?
-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27358771.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-01-28 Thread Kevin Parris
If what you presented in your message is actually the command you used, then it 
might be looking for some input from the keyboard - you don't illustrate having 
specified the particular file you want it to use following the '--mbox' option, 
you have --ham in that position on the line.  I have not done any testing, so 
I can't say exactly how it would behave in that situation.

 tonjg t...@freeuk.com 01/28/10 2:02 PM 
Mark Martinec wrote:
 
 If the argument is a single mbox file, precede it with a --mbox option,
 not with --dir .

thanks for your response but I've got a further problem now (I think). I'm
trying to do the same thing with the ham command# sa-learn --showdots --mbox
--ham but nothing's happening. When I did the spam command it showed a
progression of dots and ended with a confirmation message of tokens found
and 216 emails scanned. But with the ham command there's nothing happening -
the cursor just dropped to the next line and it's been there for half an
hour now. Is this normal?




Re: Bayes learning trusted networks mailing list email

2009-06-05 Thread RW
On Fri, 05 Jun 2009 10:24:31 -0400
Micah Anderson mi...@riseup.net wrote:

 If I understand things properly, because I've got these
 setup in my trusted_networks, then these previous hops will be
 checked in RBLs, so the spam is more detectable.

That doesn't really help. If you think about it, tests that run on
untrusted headers will run whether or not you put the list servers into
your trusted network. The tests that run on the trusted boundary are
whitelisting rules (plus a few rules that will soon get moved to the
internal boundary). You might get some benefit from putting the list
servers into the internal network, but the chances are that the list is
already blocking on zen, and maybe DUL lists and SPF.

 What I am unsure of is if I am poisoning my bayes by reporting these
 messages that make it through as spam. Should I be just deleting them?
 The tokens that are legitimate that will end up as collateral damage
 are going to be the list footers, the list administration messages,
 and potentially other pieces.
 
 I'm hoping I can identify why my bayes database is so bad (it thinks
 everything is BAYES_00 now), and if this is why I will want to change
 my training behavior.

It's really hard for BAYES to work on in-list spams because they
contain so many strong ham tokens. What I would suggest is to use
a separate address and Bayes database for the lists and train it on all
spam, but only learn ham that doesn't hit BAYES_00. I use sieve to
select some in-list candidates for learning (with dspam rather than SA).

You might also configure BAYES to ignore some of the list headers.

Things like challenge-response messages and out-of-office replies are
best handled with simple filtering or custom SA tests.


Re: Bayes Learning with Analysis Attached

2008-04-29 Thread Theo Van Dinter
On Tue, Apr 29, 2008 at 11:08:22AM -0700, Matt Florido wrote:
 feature.  However, I'm wondering if this impacts sa-learn?  Can I simply
 run sa-learn on mails that have the analysis attached?  I also noticed

Yes.   sa-learn removes markup before doing the processing.

 I'm not seeing Bayes participating in the scoring.  Is this because it's
 new and my Bayes db hasn't been fully trained?

Yes.  You need 200 each ham and spam.

 Also, is adding additional rulesets from rulesemporium.com still
 necessary for added value?  And if so, do I just add them to my
 /etc/spamassassin directory?

First, use sa-update and get the SA updated rules.  Then, if you wanted
to add in third party rulesets, you could also look at using sa-update for
that.  There's docs on the wiki or you can just search the list archives.

-- 
Randomly Selected Tagline:
Phenomenal Cosmic Powers, Itty Little Living Space.   - Aladdin


pgp9xv3wCXdUD.pgp
Description: PGP signature


Re: Bayes Learning with Analysis Attached

2008-04-29 Thread Bob Proulx
Theo Van Dinter wrote:
 Matt Florido wrote:
  I'm not seeing Bayes participating in the scoring.  Is this because it's
  new and my Bayes db hasn't been fully trained?
 
 Yes.  You need 200 each ham and spam.

You can use sa-learn to dump the database stats and see how many of
each have been learned and other information.

  sa-learn --dump magic

Bob


Re: Bayes Learning with Analysis Attached

2008-04-29 Thread Jari Fredriksson
 Theo Van Dinter wrote:
 Matt Florido wrote:
 I'm not seeing Bayes participating in the scoring.  Is
 this because it's new and my Bayes db hasn't been fully
 trained? 
 
 Yes.  You need 200 each ham and spam.
 
 You can use sa-learn to dump the database stats and see
 how many of each have been learned and other information.
 
  sa-learn --dump magic
 
 Bob

I wonder why it is called magic. 

dump statistics would be much better. Dumping numbers from database is not 
rocket science, nor magic...




Re: Bayes Learning with Analysis Attached

2008-04-29 Thread Theo Van Dinter
On Wed, Apr 30, 2008 at 03:23:38AM +0300, Jari Fredriksson wrote:
 I wonder why it is called magic. 

Because the data that is being dumped is from the metadata in the DB, which we
store using magic tokens, since they're tokens that can't possibly exist in 
the
DB through normal means.

-- 
Randomly Selected Tagline:
This is not a novel to be tossed aside lightly.  It should be thrown with
 great force.  - Dorothy Parker


pgpJvVENhyn4L.pgp
Description: PGP signature


Re: spamc/spamd bayes learning question

2007-03-26 Thread Magnus Holmgren
On Saturday 24 March 2007 23:04, Marc Perkel wrote:
 The learn-spam script looks like this:

 /usr/bin/spamc -d euclid.ctyme.com -x -t 15 -L spam  /dev/null 2 /dev/null 
 /bin/echo   /dev/null 

 The echo command is just there so it returns a 0 and exim doesn't
 complain. Probably a better way to do that. 

It's common to put || true at the end of a command you don't care about the 
exit status of. Or you could just exit 0.

-- 
Magnus Holmgren[EMAIL PROTECTED]
   (No Cc of list mail needed, thanks)

  Exim is better at being younger, whereas sendmail is better for 
   Scrabble (50 point bonus for clearing your rack) -- Dave Evans


pgp2R2b4NU4nl.pgp
Description: PGP signature


Re: spamc/spamd bayes learning question

2007-03-25 Thread Matt Kettler
Marc Perkel wrote:
 Trying to set up spamc/spamd learning. Have a dedicated spamd server
 that is fed from several MTA machines running exim. On the exim side
 I'm piping messages into spamc as follows:

 unseen pipe /etc/exim/scripts/learn-spam

 The learn-spam script looks like this:

 /usr/bin/spamc -d euclid.ctyme.com -x -t 15 -L spam  /dev/null 2
 /dev/null
 /bin/echo   /dev/null

 The echo command is just there so it returns a 0 and exim doesn't
 complain. Probably a better way to do that. But - over on the spamd
 server side I'm getting:

 Mar 24 15:01:30 euclid spamd[2870]: spamd: Tell: Setting local for
 mail:11 in 0.1 seconds, 1512 bytes
 Mar 24 15:01:30 euclid spamd[5417]: spamd: Tell: Did nothing for
 mail:11 in 0.1 seconds, 13139 bytes

 The Did Nothing doesn't look good. I'm doing something wrong? is it
 the user? Should I try to force it to be root? Is it permissions?
 Trying to feed mysql bayes.
I'm certainly no expert at this config, but did you start spamd with the
-l (aka --allow-tell) option?





spamc/spamd bayes learning question

2007-03-24 Thread Marc Perkel
Trying to set up spamc/spamd learning. Have a dedicated spamd server 
that is fed from several MTA machines running exim. On the exim side I'm 
piping messages into spamc as follows:


unseen pipe /etc/exim/scripts/learn-spam

The learn-spam script looks like this:

/usr/bin/spamc -d euclid.ctyme.com -x -t 15 -L spam  /dev/null 2 /dev/null
/bin/echo   /dev/null

The echo command is just there so it returns a 0 and exim doesn't 
complain. Probably a better way to do that. But - over on the spamd 
server side I'm getting:


Mar 24 15:01:30 euclid spamd[2870]: spamd: Tell: Setting local for 
mail:11 in 0.1 seconds, 1512 bytes
Mar 24 15:01:30 euclid spamd[5417]: spamd: Tell: Did nothing for mail:11 
in 0.1 seconds, 13139 bytes


The Did Nothing doesn't look good. I'm doing something wrong? is it 
the user? Should I try to force it to be root? Is it permissions? Trying 
to feed mysql bayes.


Thanks in advance.



Re: Bayes learning email address

2006-04-16 Thread Andrew

John D. Hardin wrote:

On Sat, 15 Apr 2006, mouss wrote:



- you are trusting your users to make the right decision. The
problem is that different people have different opinions of what
is spam and what is not. Things get even worst if one user isn't
honest...



That's a problem with *any* scheme for allowing the users to train
Bayes themselves.

In practice, however, I think you'll see much more apathy than
stupidity or malice. My problem was with getting my users to even
*look at* their marginal-spams folder and classify the messages. Ever.

You should check for things like your own quota notification messages in 
the spam folder. If you send a boilerplate email in response to someone 
sending an email to your abuse or postmaster address, check for that 
too. I used to work for a fairly large ISP and we got these sorts of 
things sent to us all the time.


Andrew



Re: Bayes learning email address

2006-04-15 Thread John D. Hardin
On Sat, 15 Apr 2006, mouss wrote:

 - you are trusting your users to make the right decision. The
 problem is that different people have different opinions of what
 is spam and what is not. Things get even worst if one user isn't
 honest...

That's a problem with *any* scheme for allowing the users to train
Bayes themselves.

In practice, however, I think you'll see much more apathy than
stupidity or malice. My problem was with getting my users to even
*look at* their marginal-spams folder and classify the messages. Ever.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Senator, when you took your oath of office, you placed your hand on
 the Bible and swore to uphold the Constitution. You didn't place your
 hand on the Constitution and swear to uphold the Bible.
-- Jamie Raskin, Professor of Law at American
University, testifying before the Maryland Senate
---



Re: Bayes learning email address

2006-04-14 Thread mouss

Owen Mehegan wrote:
To make it easier for my users to train my server's Bayes database, I 
set up a user with the following procmail recipe in its .procmailrc:


:0
*  256000

   {
   :0c: spamassassin.spamlock
   | sa-learn --spam

   :0: spamassassin.filelock
   spam
   }

The idea is for people to redirect (not forward) uncaught spam to that 
address and have it added to our Bayes system. I suppose I could also 
--report those messages to the various reporting systems. Will this 
work, or are there pitfalls I haven't thought of?


- you are trusting your users to make the right decision. The problem is 
that different people have different opinions of what is spam and what 
is not. Things get even worst if one user isn't honest...


- you must protect this address from getting mail from untrusted sources 
(from outside for example). otherwise, anyone can pollute your bayes.


- how about reporting false positives?





Problem with Bayes learning

2006-02-28 Thread Jonathan Nie

Greetings!

I got a problem when I try to feed Bayes with large number of emails 
(over 1500). It just hang there and I got the the following error 
messages from maillog file:


.bayes: cannot open bayes databases /spamassassin/bayes_* R/W: lock 
failed: File exists


Does anyone know how to fix it?

Thanks.

Jonathan


Re: Problem with Bayes learning

2006-02-28 Thread Tyler Nally
On Tuesday 28 February 2006 05:06 pm, Jonathan Nie wrote:
 Greetings!

 I got a problem when I try to feed Bayes with large number of emails
 (over 1500). It just hang there and I got the the following error
 messages from maillog file:

 .bayes: cannot open bayes databases /spamassassin/bayes_* R/W: lock
 failed: File exists

 Does anyone know how to fix it?

The bayes section of my spamassassin setup in local.cf looks like
this:

#--
bayes_path /etc/mail/spamassassin/bayes/bayes
bayes_file_mode 0777

use_bayes 1
#bayes_use_hapaxes 1

# Enable Bayes auto-learning
bayes_auto_learn  1

bayes_auto_learn_threshold_nonspam0.1
bayes_auto_learn_threshold_spam   9.0
#--

Which for me.. would mean that I'd cd to:

   /etc/mail/spamassassin/bayes

.. and do a (as root):

   chmod 666 bayes*

... to allow anyprocess with access to those
bayes files an opportunity to either open and
read/write to it.

Nobody else (that can login to the system) has access 
to the files on that file system so it should be safe 
(at least for me) to perform this and it not be a 
breach of security of some kind.

I'm pretty sure that the owner/group of the bayes
files are spamd so that it can access the files
as it needs.. and when I run sa-learn to harvest 
other tokens, I run sa-learn as the same user as
well.

-- 
Tyler Nally
[EMAIL PROTECTED]


Re: Problem with Bayes learning

2006-02-28 Thread Matt Kettler
Tyler Nally wrote:
 On Tuesday 28 February 2006 05:06 pm, Jonathan Nie wrote:
 Greetings!

 I got a problem when I try to feed Bayes with large number of emails
 (over 1500). It just hang there and I got the the following error
 messages from maillog file:

 .bayes: cannot open bayes databases /spamassassin/bayes_* R/W: lock
 failed: File exists

 Does anyone know how to fix it?
 
 The bayes section of my spamassassin setup in local.cf looks like
 this:
 
 #--
 bayes_path /etc/mail/spamassassin/bayes/bayes
 bayes_file_mode 0777

snip
 
 Which for me.. would mean that I'd cd to:
 
/etc/mail/spamassassin/bayes
 
 .. and do a (as root):
 
chmod 666 bayes*
 
 ... to allow anyprocess with access to those
 bayes files an opportunity to either open and
 read/write to it.
 


OUCH! Bad advice. DO NOT DO THIS


Note that SA is not complaining it cannot access the files. It is complaining
that the bayes database is already locked.

This means that SA believes another process is ACTIVELY writing to the bayes
database. If you wind up doing a chmod 666 on your bayes lock file (which the
above command WILL do), and then invoke sa-learn while another process is
accessing the database, you will corrupt your ENTIRE bayes database beyond 
recovery.









Re: Problem with Bayes learning

2006-02-28 Thread Matt Kettler
Jonathan Nie wrote:
 Greetings!
 
 I got a problem when I try to feed Bayes with large number of emails
 (over 1500). It just hang there and I got the the following error
 messages from maillog file:
 
 .bayes: cannot open bayes databases /spamassassin/bayes_* R/W: lock
 failed: File exists
 
 Does anyone know how to fix it?

SA believes another process is currently writing to the bayes database. This
would be quite normal if a bayes expiry run was going on at the time.

Wait a while and see if it still happens.

If it still fails, shutdown ALL spamassassin operations, and try again.

If it *still* fails, manually delete the bayes lock file. (it will be in your
bayes directory. I think it's called bayes.mutex)





Re: Problem with Bayes learning

2006-02-28 Thread Jonathan Nie
Hi Matt,

I am new to spamassassin. Thank you so much for your help and Tyler too.

Bayes autolearn is enabled when I feed Bayes with the 1500 emails manually
using the sa-learn command. Does it cause the problem?

I also checked the Bayes database directory and found two stale lock files
bayes.lock One is pretty old, almost 4 months and the other was
created during I feed bayes this time. Could I delete them?

Thanks again.

Jonathan


 Jonathan Nie wrote:
 Greetings!

 I got a problem when I try to feed Bayes with large number of emails
 (over 1500). It just hang there and I got the the following error
 messages from maillog file:

 .bayes: cannot open bayes databases /spamassassin/bayes_* R/W: lock
 failed: File exists

 Does anyone know how to fix it?

 SA believes another process is currently writing to the bayes database.
 This
 would be quite normal if a bayes expiry run was going on at the time.

 Wait a while and see if it still happens.

 If it still fails, shutdown ALL spamassassin operations, and try again.

 If it *still* fails, manually delete the bayes lock file. (it will be in
 your
 bayes directory. I think it's called bayes.mutex)







Re: Problem with Bayes learning

2006-02-28 Thread Tyler Nally
On Tuesday 28 February 2006 10:46 pm, you wrote:

 I am new to spamassassin. Thank you so much for your help and Tyler too.

Thanks.. I'm not the expert.. I just use it!

 Bayes autolearn is enabled when I feed Bayes with the 1500 emails manually
 using the sa-learn command. Does it cause the problem?

I think that sa-learn... probably creates a lock file.  Assuming that 
sa-learn exits normally, I would think that it'd remove the lock file
when it's done.  I assume that it works this way because when you're
sa-learn-ing .. the auto-learn feature is unavailable for spamd to
record the bayes tokens (I think) because it can't get a lock on the
bayes structures to record them.  Once sa-learn halts and removes the
lock.. auto-learn should be available.

 I also checked the Bayes database directory and found two stale lock files
 bayes.lock One is pretty old, almost 4 months and the other was
 created during I feed bayes this time. Could I delete them?

I'd say.. that you can toast the 4 month old one rather easily... 
Watch for when sa-learn finishes.. and you should see the newer lock 
file go away after it's completion.  If it doesn't... then remove
that one as well   

I don't think, in the normal operation of spamassassin.. if the auto-learn
*write* to the bayes structure put's a lockfile on the bayes structures.
At the same time... I've never explicitly watched the directory that 
bayes exists .. to see if a lock file appears quickly and disapppears just
as fast when it's done. 

I do know.. that if I evoke *sa-learn*.. that a lockfile will exist
while it's sa-learn'ing.. and then go away afterwards.  While it's 
sa-learn'ing, I see the Spamassassin header tags show that autolearn
is unavailable during this time because it knows it can't open up
the bayes structures to write the tokens to it.


-- 
Tyler Nally
[EMAIL PROTECTED]


Re: Problem with Bayes learning

2006-02-28 Thread jdow

Stop receiving emails.
Stop the SpamAssassin service once the incoming mail spool is empty.
Then kill all vestiges of spamd or spamassassin that might still be
running from previously improperly terminated sessions.
Then run sa-learn.
If it STILL hangs with this lock you'd a problem somewhere fer shure.
Once sa-learn is run then restart spamassassin and restart your email
reception process.

Do NOT kill lockfiles while SpamAssassin is running. That invites
database corruption.

Is it possible the 1500 messages all at once triggers a potential
Bayes database expiration about half way through the pass and that
is what is getting it hung up? I'll leave it to the authors to
address that potential. It seems unlikely.

{^_^}
- Original Message - 
From: Jonathan Nie [EMAIL PROTECTED]




Hi Matt,

I am new to spamassassin. Thank you so much for your help and Tyler too.

Bayes autolearn is enabled when I feed Bayes with the 1500 emails manually
using the sa-learn command. Does it cause the problem?

I also checked the Bayes database directory and found two stale lock files
bayes.lock One is pretty old, almost 4 months and the other was
created during I feed bayes this time. Could I delete them?

Thanks again.

Jonathan



Jonathan Nie wrote:

Greetings!

I got a problem when I try to feed Bayes with large number of emails
(over 1500). It just hang there and I got the the following error
messages from maillog file:

.bayes: cannot open bayes databases /spamassassin/bayes_* R/W: lock
failed: File exists

Does anyone know how to fix it?


SA believes another process is currently writing to the bayes database.
This
would be quite normal if a bayes expiry run was going on at the time.

Wait a while and see if it still happens.

If it still fails, shutdown ALL spamassassin operations, and try again.

If it *still* fails, manually delete the bayes lock file. (it will be in
your
bayes directory. I think it's called bayes.mutex)






Per-User - Bayes Learning

2006-01-01 Thread Duane Hill
Hello All,

I  have  e-mail  accounts  that  have  been sending Spam to a specific
e-mail  address  as  an attachment for some time now. Before they were
manually  gone  through as I didn't have anything specific set up on a
per-account basis.

Now  that  I have SA on our Win2K server storing everything in a MySQL
schema,  I  would  like  to automate the process more. I have a script
that  I  wrote  that  will take and strip out any attached message and
uses  sa-learn.  However,  sa-learn  seems  to  be  time consuming (at
minimum,  9 seconds per attached message submitted). Is there anything
that can be done to speed up the process?

--

This message is made of 100% recycled electrons.



Re: Per-User - Bayes Learning

2006-01-01 Thread Matt Kettler

At 10:00 PM 1/1/2006, Duane Hill wrote:

Hello All,

I  have  e-mail  accounts  that  have  been sending Spam to a specific
e-mail  address  as  an attachment for some time now. Before they were
manually  gone  through as I didn't have anything specific set up on a
per-account basis.

Now  that  I have SA on our Win2K server storing everything in a MySQL
schema,  I  would  like  to automate the process more. I have a script
that  I  wrote  that  will take and strip out any attached message and
uses  sa-learn.  However,  sa-learn  seems  to  be  time consuming (at
minimum,  9 seconds per attached message submitted). Is there anything
that can be done to speed up the process?



Are you using the mysql.pm bayes store module, or the default generic one?

If you're using the generic sql.pm, I'd suggest switching. The learning 
time is cut by more than half.

http://wiki.apache.org/spamassassin/BayesBenchmarkResults
(1a and 1b are learning).

Also, if you're using SA 3.1.0  you can learn using spamc -L, which will 
take advantage of spamd instead of spawning a whole new perl instance. Very 
useful if you do a lot of learning, but I'll warn you this is a newish 
feature and it might have some growing pains (I've not used it)


http://spamassassin.apache.org/full/3.1.x/dist/doc/spamc.html




RE: Bayes learning error

2005-06-20 Thread Chris Russell



Hi Robert,

You need to install the DB_File perl module. Do the
following:

perl -eshell -MCPAN
install DB_File


Cheers,

Chris



From: Robert Swan [mailto:[EMAIL PROTECTED]
Sent: 20 June 2005 14:53To:
users@spamassassin.apache.orgSubject: Bayes learning
error


I am getting an error when I run
manual learning sa-learn ham . Has anyone seen this before or have a clue how
to fix it

debug: bayes: DB_File module not
installed, cannot use Bayes


I am using Redhat,
spamassassin 3.03 spamd,spamc, postfix


thanks


Robert
Swan







Peace he would say instead of
goodbyepeace my brother.
-- This
message has been scanned for viruses and dangerous content by MailScanner, and is believed
to be clean. MailScanner is part of the Email Filtering Service from Nexent
Internet . 

___The contents of this e-mail may be privileged and are confidential.It may not be disclosed to or used by anyone other than the addressee(s), nor copied in any way.  Any views or opinionspresented are solely those of the author and do not necessarily represent those of Knowledge Limited.If received in error, please advise the sender, then delete it from your system.___

Detailed directions for using IMAP for Bayes learning and configuring webuserprefs

2005-03-23 Thread Dan Kohn
Some folks might be interested in the updated detailed install
instructions on the wiki.

I've added sections on setting up a LearnAsSpam IMAP folder that's
remotely processed.  This is the best solution I've seen for integrating
SpamAssassin with end-users on an Exchange server.
http://wiki.apache.org/spamassassin/SingleUserUnixInstall#head-bea6b8dc4
f219edd3b9976e8f922a8f1c0603125

I've also added a section on configuring webuserprefs to give a friendly
web user interface for end-users wanting to edit their whitelists and
other settings.
http://wiki.apache.org/spamassassin/SingleUserUnixInstall#head-1dd15c06b
7e645638def3d2ed2ef31557d853659

Please let me know if you have any comments, or just fix them on the
wiki directly.

  - dan
--
Dan Kohn mailto:[EMAIL PROTECTED]
http://www.dankohn.com/  tel:+1-650-327-2600


Do you use MS Exchange public folders for bayes learning?

2005-03-05 Thread Matt Yackley
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

I would like to throw out a request for admins that are using, have tried or 
want to
use MS Exchage public folders to gather messages that will be fed back to 
sa-learn.

Background:
Since there are not many (any?) good ways to retrieve email messages out of an
Exchange/Outlook system in order to feed the messages back to a SA server to be 
run
through sa-learn, the best option for a large amount of users is to setup a 
public
folder and have users drop their messages in the PFs, then run an automated 
script
on the SA server to pull those messages in using a script like Nick Burch's
power-imap-sa-learn.pl [1].

However there seems to be an issue with MS Exchange public folders and IMAP.  
When
an email message is placed in a public folder, then retrived via IMAP, Exchange
strips out some of the SMTP headers and inserts some custom MS headers.  This is
clearly a non-optimal method due the loss of some great spam/ham signs.  While 
the
messages are fairly close to their original forms, they could be much better.

Request:
I currently have a ticket open with MS Premier support due to a bug in MS's
implementation of IMAP  public folders.  At this point in time, MS support 
confirms
that they can replicate the behavior and have escalated the issue and have sent 
it
off to the Exchange development team for a RFC.  My best guess is that they will
confirm the issue, but say that there is not enough reason to develop a fix or a
patch for the issue.

I would like to gather a list of admins who would be like this issue to be 
resolved,
so that I can have a little more push with MS.  It may help if I can let them 
know
that I have discussed this with X number of admins, who have X numbers of 
servers
and users who would like to this this issue resolved.

If you use SpamAssassin as a filter for an Exchange system and would like to add
your voice, please contact me off-list.

Thanks for your time,

Matt Yackley

[1] http://tirian.magd.ox.ac.uk/~nick/code/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFCKhrnjzAeShEp8NMRAinUAJsGxhKgq22XUyCSSqWCiC5WkUYZwgCcDijV
B8iyPACkUHQE4MIYfc25mqU=
=Mvc6
-END PGP SIGNATURE-


Re: Bayes learning

2004-11-29 Thread Jim Maul
Lisa Casey wrote:
Hi All,
I'm still fairly new to Spamassassin. I have a question regarding Bayes
learning in Spamassassin.  I'm running Spamassassin 3.0.1 on Redhat Linux. I
have one mailbox on this server that receives nothing but spam and quite a
lot of it. I decided that would be a good mailbox for spamassassin to
learn spam via Bayes.
So I set up a script in /etc/cron.daily that looks like this:
sa-learn --spam -C /etc/mail/spamassassin --showdots --dir
/var/mail/netlinkspam
sa-learn --sync
rm /var/mail/netlinkspam
The idea being that it would learn from all messages  in that mailbox on a
daily basis, then delete that mail so it isn't just learning the same thing
over and over (made sense to me...?).
I set this up a couple of weeks ago. From the volume of mail this mailbox
receives the bayes database should be well over 200 spams by now. But when I
do a spamassassin --lint --debug, I see this in regards to Bayes:
debug: bayes: 23170 tie-ing to DB file R/O
/var/spool/spamassassin/bayes_toks
debug: bayes: 23170 tie-ing to DB file R/O
/var/spool/spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: bayes: Not available for scanning, only 8 spam(s) in Bayes DB  200
debug: bayes: 23170 untie-ing
debug: bayes: 23170 untie-ing db_toks
debug: bayes: 23170 untie-ing db_seen
I've gotta be doing something wrong here. Any suggestions?

Sounds like you are sa-learn'ing the messages as a different user than 
mail processing is running as.  This will cause bayes to set up two 
completely different and separate databases.  Check to make sure you are 
running sa-learn as the same user as mail processing.

-Jim


Re: Bayes learning

2004-11-29 Thread Lisa Casey

- Original Message - 
From: [EMAIL PROTECTED]
To: Lisa Casey [EMAIL PROTECTED]
Cc: users@spamassassin.apache.org
Sent: Monday, November 29, 2004 4:21 PM
Subject: Re: Bayes learning


 Make sure the user you are running the script as is the same user that
 spamassassin runs as and that you are logged in as that same user when
 you run spamassassin --lint --debug. You're probably training a
 different database file than the one that's getting used when you run
 the --lint check.


Since two folks have come up with this same answer (and my cron script is
running as root) I'm sure this  is probably the  problem. OK, now here's a
dumb question (and I apologise for that) but I'm not sure which user
Spamassassin is actually running as.

In my setup, spamassassin is being called by Mimedefang which is running as
the user defang.  Is this then the user that Spamassassin is running as?

Thanks,

Lisa Casey



  1   2   >