On Fri, 26 Sep 2014, Matus UHLAR - fantomas wrote:
On 25.09.14 07:51, John Hardin wrote:
You are probably going to have to wipe and retrain your bayes database from
scratch using known-good (i.e. hand classified) corpora. I also suggest
turning off autolearn.
I'm not sure wiping BAYES is needed, unless training does not
He has autolearn running. Unless he has copies of the spams that were
learned as ham, there's no way to totally undo that short of wipe and
start over from scratch.
You *did* keep your initial Bayes training corpora, right?
this is very good idea to have. Maybe at least keeping all autolearned spam
and ham for some time, just for the possibility of retraining.
The critical part is to have base corpora of *correctly classified* (i.e.
manually reviewed) messages. If you're keeping copies of autolearned
messages (which will probably be quite a few) then you *need* to *manually
review* them before using them for retraining, otherwise you'll probably
end up simply rebuilding a mistrained database.
If you have users submitting FP/FN messages for training, and the admin
verifies them before training with them (which should be done unless the
judgement and responsibility of the user in question is trusted), that's a
good source for part of your base retraining corpora.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The difference between ignorance and stupidity is that the stupid
desire to remain ignorant. -- Jim Bacon
-----------------------------------------------------------------------
848 days since the first successful private support mission to ISS (SpaceX)