On Wed, 5 May 2021, [email protected] wrote:

Hello

I'm new to masscheck, nothing uploaded yet, and have two questions

Welcome aboard!

As my spam corpus comes from my traps and my ham "just" from my personal
addresses there is quite an imbalance between my spam- and ham corpus
(300 ham and several k's of spam). Is such an imbalance a problem for
reliable masscheck?

"Reliable"? No, the balance doesn't affect reliability. What affects reliability is the accuracy of the classification of the messages in your corpora - ham really needs to be *ham*. Misclassification has a greater impact than a poor ratio. Spend some time making sure it's correctly classified.

That said, what we really need is ham in non-English languages. If there's any way you can get more good (accurately classified) non-English ham, that would be the greatest benefit.

Your masscheck corpora don't leave your machine, only the rule hit stats get uploaded, so it's not a potential privacy violation (or not much of one). Do you know anyone (perhaps family members) who would trust you with a copy of their ham emails to add to your corpus?

Is your ham corpus limited to what you've used to train Bayes? Or do you really get that little email? Put more in. About the only properly-classified ham I *wouldn't* put into masscheck corpora would be emails discussing spam (e.g. the SA users list is a big no-no).


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [email protected]                         pgpk -a [email protected]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Are you a mildly tech-literate politico horrified by the level of
  ignorance demonstrated by lawmakers gearing up to regulate online
  technology they don't even begin to grasp? Cool. Now you have a
  tiny glimpse into a day in the life of a gun owner.   -- Sean Davis
-----------------------------------------------------------------------
 3 days until the 76th anniversary of VE day

Reply via email to