On Wed, 20 Jan 2016 22:21:49 -0800 Marc Perkel wrote: > OK - Just to show you this isn't Bayesian - see if you can do this. > > Here is a list of 5505874 words and phrases used in the subject line > of HAM and never seen in the subject line of SPAM > > http://www.junkemailfilter.com/data/subject-ham.txt > > Here is a list of 3494938 words and phrases used in the subject line > of SPAM and never seen in the subject line of HAM > > http://www.junkemailfilter.com/data/subject-spam.txt > > Hope you understand it now. Not Bayesian!!!!
the only difference between "ambulatory care" -> only in ham "aall cards" -> only in spam and "ambulatory care" occurs 16 times in ham and 0 times in spam "aall cards" occurs 0 times in ham and 3 times in spam is that you have discarded the count information.