On Tue, 14 Aug 2018, micah anderson wrote:

John Hardin <jhar...@impsec.org> writes:

On Tue, 14 Aug 2018, micah anderson wrote:

but how can I tell how many messages are part of the corpus?

As RW said, hover over the percentages.

Thanks.

Also, the percentages seem very low: 1.5192% Spam, and .0005%
Ham... 1.5% seems low to me to be adding 3.5 score to this rule, but
what do I know... which is why I'm asking.

It's not so much the raw amount of spam it hits, it's that it hits spam
that few other rules hit, or that it hits spam that other rules hit but
that doesn't score high enough with those other rules.

You also want to look at the score-map section when evaluating a rule.

Is there an explanation of the score-map section somewhere?

For this one it says:

 scoremap  ham:  0  33.33%    1 *************
 scoremap  ham:  1  66.67%    2 **************************
 scoremap spam:  1   0.08%   15
 scoremap spam:  3   0.61%  121
 scoremap spam:  4  90.24% 17791 ************************************
 scoremap spam:  5   2.69%  531 *
 scoremap spam:  6   4.54%  896 *
 scoremap spam:  7   1.10%  217
 scoremap spam:  8   0.26%   52
 scoremap spam:  9   0.40%   79
 scoremap spam: 10   0.01%    2
 scoremap spam: 11   0.05%    9
 scoremap spam: 14   0.01%    2

What are these columns and how can I interpret it?

ham/spam: what it hit

The number after ham/spam is the points the message earned. Unfortunately I don't know offhand whether or not that includes *this rule*. I'd have to go digging in the code to determine that. I suspect it's the total score including this rule. I also don't recall offhand which scoreset of the four possible that the score here is based on. It may be the non-net scoreset for the regular weekly runs and the net scoreset for the net run on the weekend, but I don't know whether its the bayes or non-bayes variant.

The percentage should be obvious, the asterisks are a visual representation of that.

The final number is the total number of messages that hit at that score.

For example, this rule hit 17791 spams scored at 4 points, which was 90.24% of the total spam hits.

Based on the above, this rule is helping detect low-scoring spams, but a little more is still needed to push them over the threshold. *Potentially* that would be increasing the score of this rule, but it's already at ~3.5 points and bumping it any higher is edging into "poison pill" territory, which is generally a bad idea (except for rules that are very high S/O on malware, in which case yes, poison away!).

It's not so much the raw amount of spam it hits, it's that it hits spam
that few other rules hit, or that it hits spam that other rules hit but
that doesn't score high enough with those other rules.

I searched my pile of mail that I have from two ice ages ago, and I did
find 6 messages that were hits of this rule, one of them was spam, five
of them were this person trying to contact me.

...without a subject?

Do you happen to be seeing FPs with this rule?

Yes, its why I am investigating it. I think it is common for people who
are sending mail from their mobiles, where they use it more like a quick
chat instead of a 'regular mail'....

In fact, this person used:
X-Mailer: iPad Mail (15F79)

OK, I can see about adding some mobile MUA exclusions. Any FP headers you can provide (directly) will be helpful. Go ahead and sanitize the recipient info, I don't think that would be relevant to tuning this one.


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Efficiency can magnify good, but it magnifies evil just as well.
  So, we should not be surprised to find that modern electronic
  communication magnifies stupidity as *efficiently* as it magnifies
  intelligence.                                   -- Robert A. Matern
-----------------------------------------------------------------------
 Tomorrow: the 73rd anniversary of the end of World War II

Reply via email to