Hi guys,

Based upon some information from others on the list I have put together
a plugin for SA which canonicalises an email into it's basic "concepts".
Concepts are converted to tags, which Bayes can use as tokens to further
help identify spammy/hammy characteristics

Here are some examples of tags from some emails today -

---8<---
X-SA-Concepts: experience regards money optout time-ref dear great home
request member enjoy woman-adj important online click all-rights
email-adr please price best hot-adj
X-SA-Concepts: experience contact optout winner time-ref survey dear
home privacy prize store thankyou important click gift chance please
X-SA-Concepts: google law search-eng optout amazing order facebook
goodtime privacy lotsofmoney request enjoy details service partner
linkedin twitter trust contact time-ref great online click shop
email-adr please customer newsletter news
X-SA-Concepts: photos view-online money contact optout time-ref cost
reply2me service details online click please
X-SA-Concepts: friend hotwords trust experience regards contact time-ref
medical woman drugs consultant pill mailto woman-adj secret health earn
email-adr please security hot-adj day-of-week
X-SA-Concepts: https mailto re euros regards money youtube invoice
email-adr facebook best hair
---8<---

This plugin essentially adds an extra layer between the raw input
characteristics and recognition types - allowing clustering of different
characteristics to a more generic type - in effect giving Bayes more of
a two-layer neural network approach.

When combined with Bayes learning these email semantics (or Concepts)
can then be combined with the multiple other characteristics of that
email, to then be compared to other email that came before it.

https://github.com/fmbla/spamassassin-concepts

I'd be really interested to hear your feedback/thoughts on this system
and it's approach.

Paul
--
Paul Stead
Systems Engineer
Zen Internet

Reply via email to