On Tue, 17 Nov 2015, Reindl Harald wrote:

Am 17.11.2015 um 05:15 schrieb Eric Abrahamsen:
 I used "sa-learn --dump magic --dbpath ...." on several of my virtual
 users, and it's hard to tell what's going on -- they seem to have their
 own databases, but most have little or nothing in them, which makes me
 think the script is not actually recording the learning properly

a per user-bayes don't work for most sites just because you need enough ham *and* spam to get it working properly and most users don't care enough or train it wrong (move newsletters they subscribed and to lazy to unsubscribe in the spamfolder)

a hand trained site-wide bayes works much better and don't demand *every* user collect enough samples and understand how it works

+1

Being blunt, your userbase can be broadly divided into "clueful" and "non-clueful". You probably don't have many clueful users whose judgement and responsibility you trust. Their submissions could potentially be trained without review.

You can also set up shared misclassified ham and spam folders that non-clueful users can copy messages to, but those submissions would need to be reviewed before being moved to the *real* training corpora by you.

Always keep your training corpora.

(This model falls apart at the Large Company and ISP level, of course...)

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Justice is justice, whereas "social justice" is code for one set
  of rules for the rich, another for the poor; one set for whites,
  another set for minorities; one set for straight men, another for
  women and gays. In short, it's the opposite of actual justice.
                                                    -- Burt Prelutsky
-----------------------------------------------------------------------

Reply via email to