It may come down to my understanding of Bayes and its tokens.. Also
having a bit a problem explaining this concept on paper...
I see this as adding an extra layer to the Bayes:
Consider the following 2 basic emails:
Mail 1:
Viagra
Mail 2:
V1agra
With Bayes:
Mail 1:
<token 1>
Mail 2:
<token 2>
With Concepts & Bayes:
Mail 1:
<token 1>
<meds>
Mail 2:
<token 2>
<meds>
---
So without Concepts:
Mail 1 comes into the platform, is tokenized (token1) and is classified
and learnt as spam.
Mail 2 comes into the platform, tokenized (token2) and has no common
tokens with mail 1 - so no association is made
With Concepts
Mail 1 comes into the platform, is tokenized (token1 & meds) and is
classified and learnt as spam.
Mail 2 comes into the platform, is tokenized (token2 & meds) and has the
same common "meds" token as associated with Mail 1
Does this makes sense - am I right in my assumptions?
Paul
On 25/05/16 09:02, Merijn van den Kroonenberg wrote:
With David's help I have tracked down the problem(s). Version 0.02 is
up. Would be interested to hear you thoughts - even if just theoretical
about the affect to the Bayes DB.
Just in theory, i am curious what part of the Bayes filter you hope to
improve? I think you are not adding any *new* information to the e-mail,
your concepts are based purely on the mail content right?
It seems you just overpower some tokens a bit more but I am not sure if
your concepts are useful for a bayes filter. Especially a bayes filter
would not need this I would say. Maybe the concepts would be useful to
humans or rules written by humans.
Paul
--
Paul Stead
Systems Engineer
Zen Internet
--
Paul Stead
Systems Engineer
Zen Internet