Bayesit plugin and filters executing order

2004-04-06 Thread Michael R Kizer

I just upgraded to TB v2 after a long time on v1.x and started checking out
the Bayesit plugin. I noticed that some of my messages that I have filters
on (to move them into separate folders) were being flagged as junk mail. I
have my filters setup in the following order:
  Multiple filters for mailing lists, known subject lines, etc. moved
into their respective folders.
  2nd to the last entry is a KNOWN filter to move all messages from
people in my address book into a special folder.
  Last entry is potential SPAM filter that moves everything that made
it this far into a holding folder for later review.

I'm assuming that the Bayesit plugin runs against all incoming mail prior
to any of the filters, so that's why I am seeing some of my mailing list
messages tagged as SPAM.

I suppose I could add a bunch of entries to the Bayesit plugin's whitelist,
but I hate duplicating what's already in my filters.

Just trying to confirm my suspicions
~Mike



Current version is 2.04.7 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Bayesit plugin and filters executing order

2004-04-06 Thread Michael R Kizer

Second follow-up question

Does the Bayesit plugin merely base it's calculations on the subject and
body of the message?
Example, say if I subscribe to a mailing list called [EMAIL PROTECTED]
and have filters setup to move these messages into a specific folder (based
on the sender's address and or the fact that [ABC] would show up in the
subject line). Occasionally (well, too often on some lists), a spammer
sends some junk mail to the list... if I mark these as JUNK in TB, will it
potentially have an effect on legit messages from that group? Perhaps
adding the [ABC] from the subject line to the list of words associated with
junk mail?

Just curious if I should just ignore spam that comes in via a mailing list,
or start flagging it as junk for the Bayesit plugin.

~Mike



Current version is 2.04.7 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Bayesit plugin and filters executing order

2004-04-06 Thread dAniel hAhler
Hello bats,

on Tue, 6. Apr 2004 at 10:44:42 -0700 Michael R Kizer wrote:

 Does the Bayesit plugin merely base it's calculations on the subject and
 body of the message?

on the raw content/source of the mail, ie all words.

 a spammer sends some junk mail to the list... if I mark these as JUNK
 in TB, will it potentially have an effect on legit messages from that
 group? Perhaps adding the [ABC] from the subject line to the list of
 words associated with junk mail?

It will notice that [ABC] has been used for spam, but significantly
less than for legit mail. So, [ABC] would still belong to the group
Ham, not Spam.

 Just curious if I should just ignore spam that comes in via a mailing list,
 or start flagging it as junk for the Bayesit plugin.

No, you should mark every spam.
This way a Bayesian filter will catch also the spams sent to a list
(because of other words that belong to the Spam group).


-- 
shinE!
GnuPG/PGP key: http://thequod.de/danielhahler.asc
lifted with The Bat! 2.05 Beta/14 on Windows XP Service Pack 1.



Current version is 2.04.7 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Bayesit plugin and filters executing order

2004-04-06 Thread dAniel hAhler
Hello bats,

on Tue, 6. Apr 2004 at 10:24:26 -0700 Michael R Kizer wrote:

 I'm assuming that the Bayesit plugin runs against all incoming mail
 prior to any of the filters, so that's why I am seeing some of my
 mailing list messages tagged as SPAM.

Correct.

 I suppose I could add a bunch of entries to the Bayesit plugin's
 whitelist, but I hate duplicating what's already in my filters.

AFAIK Bayesit is not able (due to the plugin API) to insert headers into
the mail (eg with a spam score). If that was possible you could remove
the tickmark in the Spam plugin config to move the mail into Junk folder
and filter on that Bayesit-headers.
Nevertheless you should re-train Bayesit with the mails it got wrong and
you probably won't notice it, if they get filtered correctly anyway.
So, with the current setup you are somehow forced to re-train and that's
good for Bayesit's learning capabilities.

I for myself use POPFile, which uses the same approach (Bayesian), but
is a lot more useful, as it can have as many buckets as you want. Eg,
I have spam, english, german, admin and PGP.
Accuracy is 99.62% for 28293 mails - which is awesome.


-- 
shinE!
GnuPG/PGP key: http://thequod.de/danielhahler.asc
lifted with The Bat! 2.05 Beta/14 on Windows XP Service Pack 1.



Current version is 2.04.7 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html