It may come down to my understanding of Bayes and its tokens.. Also
having a bit a problem explaining this concept on paper...

I see this as adding an extra layer to the Bayes:

Consider the following 2 basic emails:

Mail 1:
Viagra

Mail 2:
V1agra


With Bayes:

Mail 1:
<token 1>

Mail 2:
<token 2>

With Concepts & Bayes:

Mail 1:
<token 1>
<meds>

Mail 2:
<token 2>
<meds>

---

So without Concepts:

Mail 1 comes into the platform, is tokenized (token1) and is classified
and learnt as spam.
Mail 2 comes into the platform, tokenized (token2) and has no common
tokens with mail 1 - so no association is made

With Concepts

Mail 1 comes into the platform, is tokenized (token1 & meds) and is
classified and learnt as spam.
Mail 2 comes into the platform, is tokenized (token2 & meds) and has the
same common "meds" token as associated with Mail 1

Does this makes sense - am I right in my assumptions?

Paul

On 25/05/16 09:02, Merijn van den Kroonenberg wrote:
With David's help I have tracked down the problem(s). Version 0.02 is
up. Would be interested to hear you thoughts - even if just theoretical
about the affect to the Bayes DB.
Just in theory, i am curious what part of the Bayes filter you hope to
improve? I think you are not adding any *new* information to the e-mail,
your concepts are based purely on the mail content right?

It seems you just overpower some tokens a bit more but I am not sure if
your concepts are useful for a bayes filter. Especially a bayes filter
would not need this I would say. Maybe the concepts would be useful to
humans or rules written by humans.

Paul
--
Paul Stead
Systems Engineer
Zen Internet



--
Paul Stead
Systems Engineer
Zen Internet

Reply via email to