On Wed, 20 Jan 2016 22:21:49 -0800
Marc Perkel wrote:

> OK - Just to show you this isn't Bayesian - see if you can do this.
> 
> Here is a list of 5505874 words and phrases used in the subject line
> of HAM and never seen in the subject line of SPAM
> 
> http://www.junkemailfilter.com/data/subject-ham.txt
> 
> Here is a list of 3494938 words and phrases used in the subject line
> of SPAM and never seen in the subject line of HAM
> 
> http://www.junkemailfilter.com/data/subject-spam.txt
> 
> Hope you understand it now. Not Bayesian!!!!


the only difference between


  "ambulatory care" -> only in ham
  "aall cards"      -> only in spam

and 
   

   "ambulatory care"  occurs 16 times in ham and 0 times in spam
   
   "aall cards"       occurs  0 times in ham and 3 times in spam

is that you have discarded the count information.

Reply via email to