On Aug 22, 2016, at 10:44 AM, Marc Perkel <supp...@junkemailfilter.com<mailto:supp...@junkemailfilter.com>> wrote:
On 08/22/16 09:06, Dianne Skoll wrote: On Mon, 22 Aug 2016 09:03:38 -0700 Marc Perkel <supp...@junkemailfilter.com<mailto:supp...@junkemailfilter.com>> wrote: The ones that are the same are of no interest. Only where it matches one side and not the other. But... but... that's exactly like Bayes if you throw out tokens whose observed probability is not 0 or 1. Also, in your list of tokens, they are all phrases ranging from 1 to 4 words, and that's why you get good results. Multiword Bayes is just as good, and I know that from experience. This is nothing like bayes. Bayes is creating a mental block. When I describe it to people who don't know bayes they immediately get it. If I describe it to people who know bayes - they confuse it. Bayes is a probability spectrum based on a frequency match on both sets. That's not even close to what I'm doing. I think you've copied and pasted this same paragraph half a dozen times now, and the list has tried it's best to accommodate your statement about "Bayes is creating a mental block", asking you pertinent questions that either remained un-answered, and/or when answered provided conflicting statements, and when pressed ended up showing that what you are doing is (at best) a slightly modified version. However, I find the statement "When I describe it to people who don't know bayes they immediately get it" the most telling of them all. Of course people who don't know the probability theory will look at what you are doing and go "Wow!!! This is great!!" BECAUSE THEY DON'T KNOW. People who know, obviously, recognize it for what it is, and you can claim as much as you like it's NOT, but at the end of they day, if it looks like a rose, smells like a rose (no matter what you call it) tis still rose! All you have to do is READ the Process section of the following link to see exactly how similar your explanation is (save one factor which is using phrases vs. words), which has already been explained as a feature of SA using multi-word tokens: https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering Also - some of what I'm doing is all combinations, not just sequential. So it's like a system that writes and scores it's own rules. I just throw data at it and it classifies it. The real magic is the feedback learning. So as it identifies ham it learns new words and phrases that then match email from other people. So it learns how normal people speak, it learns how spammers speak, and it identifies the DIFFERENCES between the two. And it's completely automated. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com<mailto:supp...@junkemailfilter.com> http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400