27.04.2013 04:54, Karsten Bräckelmann kirjoitti: > And it is good advice to keep the initial training corpora to a > ratio roughly assembling your ham/spam ratio, or maybe 1/1. (At this > point, we're approaching woodoo. Learning 10 times more ham than spam is > most likely to be a bad choice, though.) I don't see any problem with having a corpus like this:
0.000 0 28252 0 non-token data: nspam 0.000 0 187579 0 non-token data: nham I have no problems with Bayes whatsoever. -- There's small choice in rotten apples. -- William Shakespeare, "The Taming of the Shrew"
signature.asc
Description: OpenPGP digital signature