Hi,
zimbra is working as it should and I won't change it with another software
:-)

I started to deploy a new configuration but when I discovered that the auto
learn internal ti spamassassin was active (by error) we decided that the
best solution is to zap the database and load only our corpus. I'm in the
process to prepare the corpus between "training messages" and "test
messages" to test if the training is ok, on another server, of course....

I think that on 29 we will proceed on the production server. I will report
the results.

The zimbra man loaded a big corpus of spam that arrived on some mailboxes
and you can see the 1.000.000 nspam present in the db... I think I won't
let him load them again... Only "approved" spam must be used to feed the
bayes engine...

Francesco

On Tue, May 28, 2019 at 7:48 PM Matus UHLAR - fantomas <uh...@fantomas.sk>
wrote:

> On 28.05.19 15:34, hg user wrote:
> >I did some more research and I think I have to report the new discovery so
> >that the thread can be useful to other Readers.
> >
> >First:
> >0.000          0       5232          0  non-token data: nspam
> >0.000          0      70408          0  non-token data: nham
> >0.000          0     388070          0  non-token data: ntokens
> >nspam and nham values are definitively the number of messages learnt.
> >
> >Second:
> >I saw that nham increased every few seconds. I discovered that
> >bayes_auto_learn was enabled !
> >My situation yesterday:
> >0.000          0    1042011          0  non-token data: nspam
> >0.000          0      66472          0  non-token data: nham
> >0.000          0     663479          0  non-token data: ntokens
> >My situation now:
> >0.000          0    1042049          0  non-token data: nspam
> >0.000          0      71228          0  non-token data: nham
> >0.000          0    1040661          0  non-token data: ntokens
> >
> >So, at least, I now know that the system is feeding the bayes engine with
> >some new data and that in this way the results can change.
> >
> >Third:
> >in 72_active.cf there are a lot of bayes_ignore_header directives, but
> they
> >don't include the ones added by my commercial antivirus. Should I create a
> >patch?
> >
> >Fourth:
> >I added a dbg statement to bayes.pm, sub tokenize, to print the tokens it
> >extracts from the message.
> >I agree with some, I don't with others. I'd like to know if there is some
> >doc that lists why tokens are extracted this way (some notes are in the
> >source code)
> >I discovered that probably some words should be added to the stopwords
> list
> >but there is no way to do it in a configuration file, I should modify
> >spamassassin code directly...
> >
> >
> >
> >To end:
> >I think that the only way to proceed now is to nuke the bayes db and start
> >from scratch:
> >- setup bayes configuration correctly
> >- double check the corpus to be correctly classified
> >- run sa-learn
>
> Do you still use Zimbra? if so, have you configured Zimbra?
> Did you consult your Zimbra-man?
>
>
> >For the "setup bayes configuration correctly" step I accept your
> >contributions :-) I excluded all the headers of my antivirus and
> >internal/external/trusted.
>
> --
> Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
> Warning: I wish NOT to receive e-mail advertising to this address.
> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> Spam is for losers who can't get business any other way.
>

Reply via email to