Am 31.05.2016 um 16:59 schrieb Antony Stone:
On Tuesday 31 May 2016 at 15:32:49, Reindl Harald wrote:

Am 31.05.2016 um 15:28 schrieb Antony Stone:
2. You should be aware (*especially* if using this stuff as the basis of
a research project - any competent referee should pick up on something
like this) that SA works best when the emails it is asked to process are
from the same source as it has been trained with.  In other words, you
shovel real emails through a real mail server and train SA using this
spam and ham; you then use that trains SA to assess mail passing through
that same mail server, for the same users.  Anything significantly
varying from this is not going to work well, and is certainly not a good
test of how well SA works.

not true - i heard similar nonsense about "you can't re-use you MX bayes
database on a submission server" - i can, do and it works like a charm

Oh!

I had read SA documentation such as
https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html which contains
comments such as:

"The pros of Bayesian spam analysis:
Can greatly reduce false positives and false negatives.
 - It learns from your mail, so it is tailored to your unique e-mail flow."

"You're urged to avoid using a publicly available corpus (sample) - this must
be taken from YOUR mail server, if it is to be statistically useful.
Otherwise, the results may be pretty skewed."

If this sort of advice is incorrect, maybe a request should be raised with the
SA developers to update the official documentation?

that's all based on opinions - the only question is the quality of training and i don't base my decisions and what i say on some opionions on a website but a ton of accounts on both involved copmanies sharing bayes database for inbound and outgoing mail

well, with the defaults of auto-learning that opinions maybe are true

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to