Diego Pomatta wrote:
John Thompson escribió:
On 2008-01-23, Diego Pomatta <[EMAIL PROTECTED]> wrote:
I use Thunderbird. There are two files for that folder: Junk.msf (7k)
and Junk (53.172k). The msf file must be some kind of index. I just
feed the biggest one to sa-learn?
Yup. Use "sa-learn --spam --mbox Junk" to learn your spam. You'll want
to use the "--mbox" switch so sa-learn will process it as an mbox
format mailbox, since that's what Thunderbird uses to store mail.
~/sa-learn --spam --mbox Junk
Learned tokens from 7 message(s) (7 message(s) examined)
Looks like it worked feeding it the entire Thunderbird Junk folder file. :)
Thanks all.
Btw, what the difference between using "sa-learn --spam..." and
"spamassassin --report..." like Anthony said?
From:
http://spamassassin.apache.org/full/3.2.x/doc/spamassassin-run.html
"-r, --report
Report this message as manually-verified spam. This will submit the
mail message read from STDIN to various spam-blocker databases.
Currently, these are the Distributed Checksum Clearinghouse
http://www.rhyolite.com/anti-spam/dcc/, Pyzor
http://pyzor.sourceforge.net/, Vipul's Razor
http://razor.sourceforge.net/, and SpamCop http://www.spamcop.net/.
If the message contains SpamAssassin markup, the markup will be
stripped out automatically before submission. The support modules for
DCC, Pyzor, and Razor must be installed for spam to be reported to each
service. SpamCop reports will have greater effect if you register and
set the spamcop_to_address option.
The message will also be submitted to SpamAssassin's learning
systems; currently this is the internal Bayesian statistical-filtering
system (the BAYES rules). (Note that if you only want to perform
statistical learning, and do not want to report mail to third-parties,
you should use the sa-learn command directly instead.)"
This option teaches the Bayesian system, but also submits to third party
systems like DCC and SpamCop.
--
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW: http://www.chime.ucl.ac.uk/~rmhiajp/
"A CAT scan should take less time than a PET scan. For a CAT scan,
they're only looking for one thing, whereas a PET scan could result in
a lot of things." - Carl Princi, 2002/07/19