On Thu, 15 Feb 2018 19:24:14 +0100 Reindl Harald wrote: > Am 15.02.2018 um 19:20 schrieb RW: > > On Thu, 15 Feb 2018 17:15:47 +0100
> > You are talking about ultra-rare tokens here, the chances of these > > dominating a classification is negligibl > it is not - in 2015 i had to purge "in doubt" a few days of training > because unreasonable amount of ham was classified as BAYES_50 or even > tagged instead BAYES_00 and we talk about a bay with around 100.000 > sample sin total where with your logic you would not expect to get > biased within a few days - yes, that was training-mistakes for sure - > but when you are able to bias a bayes with a few years of corpus > within a few days your exmples are wrong I have no idea what you are talking about, how it's relevant, or what you did wrong, but it doesn't trump mathematics.