John Hardin wrote:
How varied is the character of your message traffic? Is manual learning
an option, especially with larger autolearn thresholds?
What is this... "manual learning"... you speak of? <g>
Not really an option in the short term, although in the long term I'd
*like* to have a system similar to what I've mostly trained users to do
on the much smaller systems - forward misclassified mail to a suitable
role account as an attachment for manual processing (whitelist,
blacklist, feed to Bayes, write/adjust rules, etc). Of course, that
requires someone to *do* the manual processing.... :(
I've been taking my own FNs and feeding them back in; that's really the
only misclassified mail I have easy access to. No FPs noticed so far....
Then at least you'd be able to reseed your bayes with a known-good corpus.
*nod* I've thought about exporting the database from the smaller system
and pulling it in to the cluster to see how the accuracy is.
"Tokens don't get expired according to my understanding of the expiry
algorithm" about sums up the immediate problem; overall filter accuracy
is pretty good on the whole.
-kgd