Re: collaborative bayes bases

David A . Roth Fri, 18 Nov 2005 06:45:59 -0800

I see. That's a very good point, about sharing the Bayes within adifferent community.

Anyone see a problem with a single-user collecting spam (and ham) fromvarious personal mailboxes that came in from different internet serviceproviders and doing a sa-learn on it?


David Roth
rothmail (at) comcast (dot) net

On Nov 18, 2005, at 8:45 AM, Anthony Peacock wrote:

Hi,

I actually think it is more to do with the fact that one person's
spam could be another person's ham.  If the mail streams and servers
are carrying messages for a community of users who receive (and want
to receive) similar types of email messages, I can't see any major
problem with using those emails to train Bayes.  However, if the
servers are processing email for two completely different user
communities their ideas of what is and isn't spam could be so
different that the Bayes stats become diluted.

For instance I work for a Medical School, but in a heavily IT based
department.  Some terms that may be considered pornographic for
someone working in banking could be  perfectly acceptable in my
environment.

If I am understanding this correctly...the concern is that the

Bayes

should match the mail server in which the ham and spam was received on
only?

David Roth
rothmail (at) comcast.net (dot) net

On Nov 18, 2005, at 5:10 AM, qMax wrote:

in wiki://BayesInSpamAssassin it is said:
Do not train Bayes on different mail streams or public spam corpora.
These method will mislead Bayes into believing certain tokens are
spammy or hammy when they are not.

Could you explain why it is so, and what could happen if to teach
nayes from several mail servers ?

--
 qMax



--
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"Computer  software  consists of  only  two  components:
ones and zeros, in roughly equal proportions.   All that is
required is to sort them into the correct order."

Re: collaborative bayes bases

Reply via email to