27.04.2013 04:54, Karsten Bräckelmann kirjoitti:
> And it is good advice to keep the initial training corpora to a
> ratio roughly assembling your ham/spam ratio, or maybe 1/1. (At this
> point, we're approaching woodoo. Learning 10 times more ham than spam is
> most likely to be a bad choice, though.)
I don't see any problem with having a corpus like this:

0.000          0      28252          0  non-token data: nspam
0.000          0     187579          0  non-token data: nham

I have no problems with Bayes whatsoever.

-- 

There's small choice in rotten apples.
                -- William Shakespeare, "The Taming of the Shrew"


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to