Re: sa-learn question

2011-10-28 Thread Jari Fredriksson
28.10.2011 4:38, Ricardo Ardila Vetrovec kirjoitti:
 Greetings list!
 
 Excuse my english, is not so good.
 
 I have a spamassasin standalone server, my MX it is another server
 
 postfix query spamassassin for score and work greats
 
 My question it's about spam that is not recognize
 
 on the spamassassin server i create a mail box so users can redirect the
 emails they consider spam
 
 i run sa-learn --spam -u spamd --mbox /var/mail/spam
 
 question is: is this method works? with the redirection of the mail all
 headers change, that's my doubt
 
 Any help about this topic?
 

That is not optimal, it may be even bad.

How are the users accessing their mail? If with POP, I can't figure out
a solution. If with IMAP, you can create a Confirmed-SPAM folder for
them, and they can drag the spam to that folder. Then use a cron job to
learn those as spam... No extra headers, nothing.

-- 

Best of all is never to have been born.  Second best is to die soon.



signature.asc
Description: OpenPGP digital signature


Re: sa-learn question

2011-10-28 Thread RW
On Fri, 28 Oct 2011 18:24:47 +0300
Jari Fredriksson wrote:

 28.10.2011 4:38, Ricardo Ardila Vetrovec kirjoitti:

  on the spamassassin server i create a mail box so users can
  redirect the emails they consider spam
 ...
  question is: is this method works? with the redirection of the mail
  all headers change, that's my doubt

 That is not optimal, it may be even bad.
 
 How are the users accessing their mail? If with POP, I can't figure
 out a solution. If with IMAP, you can create a Confirmed-SPAM
 folder for them, and they can drag the spam to that folder. Then use
 a cron job to learn those as spam... No extra headers, nothing.

There are two classic solutions. One is to have learning folders (imap
or webmail), the other is to forward as an attachment and have a
script that extracts the original from the mime.

A simple redirect may work well enough, but there are (at least) a
couple of problems. Firstly sa-learn should ideally be able to find
the trusted and internal networks to reproduce the same tokenization
that it does on classification. It would be useful to strip any
received headers that break this. Secondly, if you use autolearning
then sa-learn must be able to identify if a mail has been previously
learned,and this requires all additional received headers be stripped.

 


Re: Sa-learn question

2008-06-05 Thread Benny Pedersen

On Thu, June 5, 2008 12:01, alexpacio wrote:

 Hello,
 anybody knows if, when i teach spam to spamassassin through the Bayesan
 Trainer sa-learn , is needed to delete the X-Spam-Status: and
 X-Spam-Checker-Version: strings from the header to let Spamassassin teaching
 well?

no training needs unmodified spam / ham mails with headers as you got them

sa-learn --spam --showdots  /tmp/spammail.msg
sa-learn --ham --showdots  /tmp/hammail.msg

spamassassin auto remove the headers that it self put in mails


Benny Pedersen
Need more webspace ? http://www.servage.net/?coupon=cust37098



Re: Sa-learn question

2008-06-05 Thread Matt Kettler

alexpacio wrote:

Hello,
anybody knows if, when i teach spam to spamassassin through the Bayesan
Trainer sa-learn , is needed to delete the X-Spam-Status: and
X-Spam-Checker-Version: strings from the header to let Spamassassin teaching
well?
  
SpamAssassin will remove any headers that it added itself prior to 
learning the message, including those. So, it's fine to leave them in.


However, SpamAssassin won't remove any nonstandard headers added by 
other spam scanning tools, or wrappers for SpamAssassin that add their 
own headers (ie: MailScanner). For those, you'll need to use a 
bayes_ignore_header directive, or strip them before feeding SA.


Re: sa-learn question

2008-03-13 Thread Matt Kettler

Hungry Snail wrote:

Site-wide is what i'm trying to setup, I guess i need to do some more
googling :)
  


assuming you're using db_file not SQL:

First create a path where you want your bayes DB to live, make that 
directory world RWX. (ie: chmod 0777)


in your /etc/mail/spamassassin/local.cf:

bayes_path your directory/bayes
bayes_file_mode 0777

Gotchas people often run into:

1) DO NOT use that directory for anything else. If there are any other 
files starting with bayes_ it will screw up the file locking.
2) bayes_path doesn't actually specify a path, it's a path plus partial 
filename. You NEED the extra /bayes on the end, this is part of the 
filenames being used by SA to create it's database files. SA will append 
_seen, _toks, etc as needed to create bayes_seen (seen message 
database), bayes_toks (token database).
3) Yes the mode needs to be 0777 not 0666, as it is sometimes used in 
creating directories. Really, bayes_mode is a mask, not an explicit 
mode. It will not create it's db files with the X bit, even if this is 
set to 0777.








Re: sa-learn question

2008-03-13 Thread Matt Kettler

Matt Kettler wrote:

Hungry Snail wrote:

Site-wide is what i'm trying to setup, I guess i need to do some more
googling :)
  
Also, I've updated the wiki article on sitewide bayes. It is now at 
least technically correct.


http://wiki.apache.org/spamassassin/SiteWideBayesSetup

previously it had several bits of bad advice:
Don't /etc/mail/spamassassin to store your bayes DB
Don't specify -C on  the sa-learn command-line. You REALLY don't want to 
use that option on any SA tool unless you know exactly what you're 
doing. (This option is really mostly for testers and developers.)

Use init scripts to restart spamd






Re: sa-learn question

2008-03-13 Thread Hungry Snail



Matt Kettler-3 wrote:
 
 previously it had several bits of bad advice:
 Don't /etc/mail/spamassassin to store your bayes DB
 Don't specify -C on  the sa-learn command-line. You REALLY don't want to 
 use that option on any SA tool unless you know exactly what you're 
 doing. (This option is really mostly for testers and developers.)
 Use init scripts to restart spamd
 

Thanks for all your advice Matt, much appreciated.
-- 
View this message in context: 
http://www.nabble.com/sa-learn-question-tp16019261p16025291.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: sa-learn question

2008-03-12 Thread Matt Kettler

Hungry Snail wrote:

Hi Guys,

I am using spam/notspam via Squirrelmail

If I mark an email as spam squirrelmail send this command.
COMMAND USED TO REPORT: /usr/bin/sa-learn --spam
--configpath=/etc/mail/spamassassin --showdots 
/var/spool/squirrelmail/attach//sb_tmp_174_1205370641

The result I get is..
[0] = Learned tokens from 0 message(s) (1 message(s) examined)

Does the result look correct? I was just wondering why is has 0 learned
tokens from 0 messages.
  
That generally suggests the message was already learned as spam, 
therefore no action was needed for the 1 message it examined, and no 
learning was performed.




Re: sa-learn question

2008-03-12 Thread Hungry Snail



Hungry Snail wrote:
 
 Hi Guys,
 
 I am using spam/notspam via Squirrelmail
 
 If I mark an email as spam squirrelmail send this command.
 COMMAND USED TO REPORT: /usr/bin/sa-learn --spam
 --configpath=/etc/mail/spamassassin --showdots 
 /var/spool/squirrelmail/attach//sb_tmp_174_1205370641
 
 The result I get is..
 [0] = Learned tokens from 0 message(s) (1 message(s) examined)
 
 Does the result look correct? I was just wondering why is has 0 learned
 tokens from 0 messages.
 
 Regards
 

Thats what I thought, but I forwarded the message to myself and it didnt get
flagged as spam, it was also a message that was received before spamassassin
was setup.

I did sa-learn --dump magic and this is what I got back.

0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0  0  0  non-token data: oldest atime
0.000  0  0  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync
atime
0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime
delta
0.000  0  0  0  non-token data: last expire
reduction count

is the command im using correct? I want the spam/hame rules to apply to
everyone and not have it on a per user basis.

Regards
-- 
View this message in context: 
http://www.nabble.com/sa-learn-question-tp16019261p16019763.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: sa-learn question

2008-03-12 Thread Matt Kettler

Hungry Snail wrote:


Hungry Snail wrote:
  

Hi Guys,

I am using spam/notspam via Squirrelmail

If I mark an email as spam squirrelmail send this command.
COMMAND USED TO REPORT: /usr/bin/sa-learn --spam
--configpath=/etc/mail/spamassassin --showdots 
/var/spool/squirrelmail/attach//sb_tmp_174_1205370641

The result I get is..
[0] = Learned tokens from 0 message(s) (1 message(s) examined)

Does the result look correct? I was just wondering why is has 0 learned
tokens from 0 messages.

Regards




Thats what I thought, but I forwarded the message to myself and it didnt get
flagged as spam, it was also a message that was received before spamassassin
was setup.
  
Why would forwarding a message to yourself be a valid test?  Or do you 
mean something different like resubmitting the raw message to your mail 
queue.


Generally speaking forwarded messages generated by a mail client are 
*COMPLETELY* different than the original. New headers, new Recieved 
path, new body encoding, possibly removal of text/plain section of a 
multipart alternative message, probably new linewrapping. Forwarding 
doesn't forward the same message. It forwards some rendering of the text 
parts, the rest is mangled by your MUA.


try redirecting or piping the raw message to spamassassin -t, like you 
did with sa-learn.



I did sa-learn --dump magic and this is what I got back.

0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0  0  0  non-token data: oldest atime
0.000  0  0  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync
atime
0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime
delta
0.000  0  0  0  non-token data: last expire
reduction count

is the command im using correct? 
That's quite suspect. Did you run it as the same user as the sa-learn?  
You might want to try the sa-learn again with -D to see what the 
debugging has to say.





I want the spam/hame rules to apply to
everyone and not have it on a per user basis.

Regards
  




Re: sa-learn question

2008-03-12 Thread Hungry Snail



Matt Kettler-3 wrote:
 
 Did you run it as the same user as the sa-learn?  
 You might want to try the sa-learn again with -D to see what the 
 debugging has to say.
 

Bah, it works fine if I issue the command via the ssh.

sa-learn --spam --configpath=/etc/mail/spamassassin --showdots
/var/vmail/mydomain.tld/user/cur/1205274708.P15843Q0M957724.host:2,
.
Learned tokens from 1 message(s) (1 message(s) examined)

My issues seem to be relating to squirrelmail issuing the command when I
click the spam button, I wonder if it is trying to issue the command via the
www-data user.
-- 
View this message in context: 
http://www.nabble.com/sa-learn-question-tp16019261p16020246.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: sa-learn question

2008-03-12 Thread Matt Kettler

Hungry Snail wrote:


Matt Kettler-3 wrote:
  
Did you run it as the same user as the sa-learn?  
You might want to try the sa-learn again with -D to see what the 
debugging has to say.





Bah, it works fine if I issue the command via the ssh.

sa-learn --spam --configpath=/etc/mail/spamassassin --showdots
/var/vmail/mydomain.tld/user/cur/1205274708.P15843Q0M957724.host:2,
.
Learned tokens from 1 message(s) (1 message(s) examined)

My issues seem to be relating to squirrelmail issuing the command when I
click the spam button, I wonder if it is trying to issue the command via the
www-data user.
  

Quite likely.

Regardless, unless you've got a site-wide single bayes db, it needs to 
run as whatever user gets used when email comes in, which may not be the 
same as the recipient..





Re: sa-learn question

2008-03-12 Thread Hungry Snail

Site-wide is what i'm trying to setup, I guess i need to do some more
googling :)
-- 
View this message in context: 
http://www.nabble.com/sa-learn-question-tp16019261p16020836.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: sa-learn question about number of messages processed

2007-04-16 Thread PakOgah


Matt Kettler wrote:

Mário Gamito wrote:
  

Hi,

How can i know how many messages did already sa-learn processed ?


You mean the total number of messages learned in the bayes database
(includes sa-learn and autolearn)?

sa-learn --dump magic

and how do I read this information ?
# sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0569  0  non-token data: nspam
0.000  0  7  0  non-token data: nham
0.000  0  53898  0  non-token data: ntokens
0.000  0  987802486  0  non-token data: oldest atime
0.000  0 1176482771  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal 
sync atime

0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count


Re: sa-learn question about number of messages processed

2007-04-16 Thread Matt Kettler
PakOgah wrote:

 Matt Kettler wrote:
 Mário Gamito wrote:
  
 Hi,

 How can i know how many messages did already sa-learn processed ?
 
 You mean the total number of messages learned in the bayes database
 (includes sa-learn and autolearn)?

 sa-learn --dump magic
 and how do I read this information ?
 # sa-learn --dump magic
 0.000  0  3  0  non-token data: bayes db version
Bayes DB is in the version 3 format. (it's changed a couple times in
history, but hasn't changed recently)

 0.000  0569  0  non-token data: nspam
You have trained 569 nonspam messages
 0.000  0  7  0  non-token data: nham
You have trained 7 spam messages, which is very few, not enough for SA
to be willing to start using the bayes database to rate mail yet.. by
default you need 200 (and I do not recommend changing it to anything
lower except in lab tests to study bayes errors in under-trained
databases.).
 0.000  0  53898  0  non-token data: ntokens
There are 53,898 total tokens in the bayes database. (small, but not
absurdly so. By default SA aims to keep it between 150k and 100k.
Looking above, you've not trained enough emails for SA to start
considering throwing out old tokens to keep it under 150k.)
 0.000  0  987802486  0  non-token data: oldest atime
 0.000  0 1176482771  0  non-token data: newest atime
The least-recently used token in the database was last accessed
987802486 seconds after January 1st, 1970, and the most-recent was
accessed at 1176482771. (not very interesting except to compare against
each other)
 0.000  0  0  0  non-token data: last journal
 sync atime
 0.000  0  0  0  non-token data: last expiry atime
 0.000  0  0  0  non-token data: last expire
 atime delta
 0.000  0  0  0  non-token data: last expire
 reduction count

There's never been a journal sync or expiration of old tokens.

In a young database this is reasonably normal, although I'd eventually
expect a journal sync after you've got enough nonspam for your bayes to
become actively used by SA. Also, you'll never get expiry until your
database is a bit larger. Expiry doesn't kick in until you've got
150,000 tokens, and you've got about a third of that.



Re: sa-learn question about number of messages processed

2007-04-16 Thread John D. Hardin
On Mon, 16 Apr 2007, Matt Kettler wrote:

  0.000  0569  0  non-token data: nspam
 You have trained 569 nonspam messages

that should be: 569 spams (Number SPAM)

  0.000  0  7  0  non-token data: nham
 You have trained 7 spam messages

and: 7 hams (Number HAM)

Pakogah: you need to train 193 more ham emails before Bayes will start 
scoring.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Ten-millimeter explosive-tip caseless, standard light armor
  piercing rounds. Why?
---
 3 days until The 232nd anniversary of The Shot Heard 'Round The World



Re: sa-learn question about number of messages processed

2007-04-13 Thread Matt Kettler
Mário Gamito wrote:
 Hi,

 How can i know how many messages did already sa-learn processed ?
You mean the total number of messages learned in the bayes database
(includes sa-learn and autolearn)?

sa-learn --dump magic




Re: sa-learn question

2006-09-23 Thread Matt Kettler
Russell Jones wrote:
 If I have multiple sa-learn processes going at the same time, can that
 corrupt the database and/or cause some other problem that I don't want
 to happen? Or is it safe to have the following in crontab for example:
  
 @daily sa-learn --spam
 /home/eggycrew/imap/eggycrew.com/rjones/Maildir/.INBOX.spam
 @daily sa-learn --ham /home/eggycrew/imap/eggycrew.com/rjones/Maildir/cur
 @daily sa-learn --ham /home/eggycrew/imap/eggycrew.com/rjones/Maildir/new
Well, nothing bad will happen, but they'll all effectively get run one
at a time. Since only one process can have the R/W lock on the bayes DB,
one of them will get the lock and the others will go to sleep waiting
for the lock to be released.


Re: sa-learn question

2006-09-22 Thread John D. Hardin
On Fri, 22 Sep 2006, Russell Jones wrote:

 @daily sa-learn --spam 
 /home/eggycrew/imap/eggycrew.com/rjones/Maildir/.INBOX.spam
 @daily sa-learn --ham /home/eggycrew/imap/eggycrew.com/rjones/Maildir/cur
 @daily sa-learn --ham /home/eggycrew/imap/eggycrew.com/rjones/Maildir/new

Put all your learns in a single shell script, and run that.

I also age the learn mailbox files to keep their sizes down when they
are learned, and I only learn if the file has been modified in the
last day or two.

Attached is the script I have in my cron.daily directory...

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  False is the idea of utility that sacrifices a thousand real
  advantages for one imaginary or trifling inconvenience; that would
  take fire from men because it burns, and water because one may drown
  in it; that has no remedy for evils except destruction. The laws
  that forbid the carrying of arms are laws of such a nature. They
  disarm only those who are neither inclined nor determined to commit
  crime.   -- Cesare Beccaria, quoted by Thomas Jefferson
---
#!/bin/bash

#
# Train spamassassin global bayes filter
#

# learn from folders in user home dirs
#: echo Learning from user local mailboxes
for SPAM in `find /home/*/[Mm]ail -type f \( -name SpamAssassin-SPAM* -or -name 
spambox \) -mtime -3`
do
if [ -s $SPAM ]
then
echo SPAM from $SPAM
MBTYPE=--mbox
if [ `file $SPAM | grep ' MBX mail '` ]
then
MBTYPE=--mbx
fi
/usr/bin/sa-learn --spam -C /etc/mail/spamassassin $MBTYPE $SPAM
fi
done
echo
for HAM in `find /home/*/[Mm]ail -type f \( -name SpamAssassin-HAM* -or -name 
hambox \) -mtime -3`
do
if [ -s $HAM ]
then
echo HAM from $HAM
MBTYPE=--mbox
if [ `file $HAM | grep ' MBX mail '` ]
then
MBTYPE=--mbx
fi
/usr/bin/sa-learn --ham -C /etc/mail/spamassassin $MBTYPE $HAM
fi
done

# Report status
echo
echo Bayes Statistics:
/usr/bin/sa-learn --dump magic

chmod a+r /etc/mail/spamassassin/bayes_seen /etc/mail/spamassassin/bayes_toks



Re: sa-learn question

2006-09-07 Thread Theo Van Dinter
On Thu, Sep 07, 2006 at 02:19:25PM -0500, EviL_SmUrF wrote:
 Quick question about spamassassin's sa-learn feature. I am running 
 spamassassin on a semi-large webhosting server, and I can't seem to find 
 rather or not when I run sa-learn, if what it learns it will apply to only 
 that email address it was ran on, or the entire domain, or all of the domains 
 hosted on the box. Example of what I am running:

It doesn't quite work like that.  sa-learn updates a database, the
recipient information doesn't really matter.  The tokens that are learned
will be used by what or who-ever you have configured to use that database
for scanning.

ie: If you have individual DBs per user, then the learning applies to
the user whose database you updated.  If you have a sitewide DB config,
then it'll be for all users.

-- 
Randomly Generated Tagline:
My wife and I were happy for years.  Then we met.


pgpRa9Tx6nyIX.pgp
Description: PGP signature


Re: SA-LEARN Question

2006-08-30 Thread Miki
Hello Christopher,

Tuesday, August 22, 2006, 3:21:36 PM, you wrote:

CM Hi,
CM We have over 100 domains on a server, all of which are getting junk mail. SA
CM 3.1.4 installed, but I don't think it's properly trained yet (even though I
CM did upgrade from an earlier version).

CM If I set up a [EMAIL PROTECTED] address and tell all my customers to
CM forward the junk mail they get to that address, then run sa-learn on that
CM mailbox, will that help, or, will it train SA that the users that forwarded
CM the junk ARE the spammers and start to assign higher scores to legitimate
CM customers?

Hi,
I have qmail, SA and MUA is The Bat!
I found that Redirect email is not good, as SA think about me as
sender, but forward of spam to junk account is OK, it strip forwarded
by headers and learn it.


-- 
Best regards,
 Mikimailto:[EMAIL PROTECTED]




Re: SA-LEARN Question

2006-08-22 Thread Jim Maul

Christopher Mills wrote:

Hi,
We have over 100 domains on a server, all of which are getting junk mail. SA 
3.1.4 installed, but I don't think it's properly trained yet (even though I did 
upgrade from an earlier version).


If I set up a [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] address and 
tell all my customers to forward the junk mail they get to that address, then 
run sa-learn on that mailbox, will that help, or, will it train SA that the 
users that forwarded the junk ARE the spammers and start to assign higher scores 
to legitimate customers?


If you forward the emails, this process will not work.  You must either 
forward it as an attachment and then strip the attachment and run 
sa-learn on that or use some other method which preserves the original 
headers.  How you do this depends largely on your setup.


-Jim



RE: SA-LEARN Question

2006-08-22 Thread Bowie Bailey
Christopher Mills wrote:
 Hi,
 We have over 100 domains on a server, all of which are getting junk
 mail. SA 3.1.4 installed, but I don't think it's properly trained yet
 (even though I did upgrade from an earlier version).  
 
 If I set up a [EMAIL PROTECTED] address and tell all my customers
 to forward the junk mail they get to that address, then run sa-learn
 on that mailbox, will that help, or, will it train SA that the users
 that forwarded the junk ARE the spammers and start to assign higher
 scores to legitimate customers?

No, SA will learn that messages forwarded from your users are spam.

As someone else pointed out, you need to find a method that preserves
the original headers of the message.  Forwarding the spam as an
attachment and then stripping it out or copying it to a shared imap
folder are two of the more common options.

-- 
Bowie


Re: SA-LEARN Question

2006-08-22 Thread Andrew
Jim Maul wrote:
 Christopher Mills wrote:
 Hi,
 We have over 100 domains on a server, all of which are getting junk
 mail. SA 3.1.4 installed, but I don't think it's properly trained yet
 (even though I did upgrade from an earlier version).

 If I set up a [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 address and tell all my customers to forward the junk mail they get to
 that address, then run sa-learn on that mailbox, will that help, or,
 will it train SA that the users that forwarded the junk ARE the
 spammers and start to assign higher scores to legitimate customers?
 
 If you forward the emails, this process will not work.  You must either
 forward it as an attachment and then strip the attachment and run
 sa-learn on that or use some other method which preserves the original
 headers.  How you do this depends largely on your setup.
 

Here's a link describing how I use maildrop to deliver emails to special
maildirs for processing by sa-learn.

http://www.arda.homeunix.net/spamassassin.html#bayesian

Andrew



RE: SA-LEARN Question

2006-08-22 Thread Jean-Paul Natola








Wouldnt forwarding strip away
header info that is used to train spam?















From: Christopher
Mills [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 22, 2006
9:22 AM
To: users@spamassassin.apache.org
Subject: SA-LEARN Question





Hi,
We have over 100 domains on a server, all of which are getting junk mail. SA
3.1.4 installed, but I don't think it's properly trained yet (even though I did
upgrade from an earlier version).

If I set up a [EMAIL PROTECTED] address
and tell all my customers to forward the junk mail they get to that address,
then run sa-learn on that mailbox, will that help, or, will it train SA that
the users that forwarded the junk ARE the spammers and start to assign higher
scores to legitimate customers? 








Re: SA-LEARN Question

2006-08-22 Thread Michel Vaillancourt
Bowie Bailey wrote:
 Christopher Mills wrote:
 Hi,
 We have over 100 domains on a server, all of which are getting junk
 mail. SA 3.1.4 installed, but I don't think it's properly trained yet
 (even though I did upgrade from an earlier version).  

 If I set up a [EMAIL PROTECTED] address and tell all my customers
 to forward the junk mail they get to that address, then run sa-learn
 on that mailbox, will that help, or, will it train SA that the users
 that forwarded the junk ARE the spammers and start to assign higher
 scores to legitimate customers?
 
 No, SA will learn that messages forwarded from your users are spam.
 
 As someone else pointed out, you need to find a method that preserves
 the original headers of the message.  Forwarding the spam as an
 attachment and then stripping it out or copying it to a shared imap
 folder are two of the more common options.
 

   I have similar, albiet smaller, environment.  What I've done is asked my 
users who want to help to have a ConfirmedSpam folder in their IMAP 
directory.  Every night I cron-job a LOCATE for that folder and then tell 
sa-learn to learn those emails.  Then I empty the mail dir to start fresh for 
the next day.  It works like a charm.

-- 
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Re: SA-LEARN Question

2006-08-22 Thread Magnus Holmgren
On Tuesday 22 August 2006 16:31, Jean-Paul Natola took the opportunity to say:
 Wouldn't forwarding strip away header info that is used to train spam?

It depends on the MUA. Some MUAs, like MS Outlook (who would've guessed?) (at 
least Outlook 2000), mangle the mail even when forwarding as an attachment. 
Well-behaved MUAs preserve everything when forwarding as an attachment, but 
then you need to extract that attachment.

-- 
Magnus Holmgren[EMAIL PROTECTED]
   (No Cc of list mail needed, thanks)


pgpNXFe7znmAg.pgp
Description: PGP signature


Re: SA-LEARN Question

2006-08-22 Thread Gino Cerullo

On 22-Aug-06, at 1:57 PM, Magnus Holmgren wrote:

On Tuesday 22 August 2006 16:31, Jean-Paul Natola took the  
opportunity to say:
Wouldn't forwarding strip away header info that is used to train  
spam?


It depends on the MUA. Some MUAs, like MS Outlook (who would've  
guessed?) (at
least Outlook 2000), mangle the mail even when forwarding as an  
attachment.
Well-behaved MUAs preserve everything when forwarding as an  
attachment, but

then you need to extract that attachment.


I've been told to, and do use, Redirect instead of Forward when  
sending spam to a common mailbox for sa-learn.


--
Gino Cerullo

Pixel Point Studios
21 Chesham Drive
Toronto, ON  M3M 1W6

416-247-7740





smime.p7s
Description: S/MIME cryptographic signature


RE: SA-LEARN Question

2006-08-22 Thread Bowie Bailey
Michel Vaillancourt wrote:
 Bowie Bailey wrote:
  Christopher Mills wrote:
   Hi,
   We have over 100 domains on a server, all of which are getting
   junk mail. SA 3.1.4 installed, but I don't think it's properly
   trained yet (even though I did upgrade from an earlier version).
   
   If I set up a [EMAIL PROTECTED] address and tell all my
   customers to forward the junk mail they get to that address, then
   run sa-learn on that mailbox, will that help, or, will it train
   SA that the users that forwarded the junk ARE the spammers and
   start to assign higher scores to legitimate customers?
  
  No, SA will learn that messages forwarded from your users are spam.
  
  As someone else pointed out, you need to find a method that
  preserves the original headers of the message.  Forwarding the spam
  as an attachment and then stripping it out or copying it to a
  shared imap folder are two of the more common options.
  
 
I have similar, albiet smaller, environment.  What I've done is
 asked my users who want to help to have a ConfirmedSpam folder in
 their IMAP directory.  Every night I cron-job a LOCATE for that
 folder and then tell sa-learn to learn those emails.  Then I empty
 the mail dir to start fresh for the next day.  It works like a charm.

For balanced learning, you should also have a ConfirmedHam folder so
that you can learn from both ham and spam.

-- 
Bowie


Re: sa-learn question

2006-03-01 Thread mouss
Drew Burchett a écrit :
 Does sa-learn read subdirectories? 

If you mean maildir folders, yes.





Re: sa-learn question

2005-04-03 Thread Matt Kettler
At 01:35 AM 4/3/2005, Roman Serbski wrote:
There are some spam messages being not blocked by SA so as far as I
understood I can teach Bayes to learn them? But is it worth to feed
sa-learn with junk messages that already have headers modified?
Yes, that's fine.. sa-learn is smart enough to undo any changes that the 
spamassassin configuration made. 



Re: sa-learn question

2004-10-12 Thread Rakesh
I think you should check the SpamAssassin wiki for the solution to your 
problem

http://wiki.apache.org/spamassassin/BayesInSpamAssassin
Rakesh
Lance wrote:
Alright, we're running courier IMAP along with pop3 but our spool is all
Maildir format.  I've got a public spam folder for certain people so
what would the sa-learn command be?
sa-learn --spam /var/spool/mail/unixvault.net/shared/.Spam/cur/*
or do I need to insert something in there?  --mbx/--mbox?  I'm not sure
if there's a difference on how it learns or not or if it could result
in false positives if its not learning correctly.
lance