On 13-Mar-2009, at 21:21, decoder wrote:
John Hardin wrote:
If you want it to be dynamical, then the plugin could do the
appending. However, the model cannot be extended, that means to
incorporate new lines, the whole model must be recalculated. So this
can't be done per message but only maybe on a daily basis.
I don't see any need for the model to be dynamic. Periodic
recalculation of it should be just fine. I bet even daily
reprocessing will prove to be over zealous. Weekly, perhaps even
monthly.
That implies that people are indeed using bayes training, but it
might be a suitable idea. However, I don't think anyway that FPs and
FNs spoil the SVM result. SVMs are quite robust to outliers (which
FPs and FNs essentially are) and if their number is low compared to
the total amount of mail, the algorithm will have no problem to
predict them properly anyway :)
I'm thinking that FPs and FNs are bayes problem anyway. This tool
need to concentrate on seeing just what rules hit and building off
that. I'd go so far to say that as far as SVM is concerned, there is
no such thing as a false postive or negative.
So if the dataset is sufficently large but has _some_ wrongly
labeled points, the chances that the result is still what you wanted
to have are high :)
That makes sense, and is sorta what I was trying to say up above.
--
Bart: That was the worst day of my life
Homer: That was the worst day of your life SO FAR.