Re: Proposed rule for too many dots in From

Grant Taylor Thu, 20 Dec 2018 20:13:16 -0800

On 12/20/18 8:34 PM, Grant Taylor wrote:

I'm going back through and analyzing how I'm extracting data and trying to satisfactorily explain some oddities.

Out of 244,921 messages there are 16,528 unique addresses, this is how the messages break down for

Here's how the dots in the user parts of 16,528 unique addresses out of 244,921 messages break down:


  13,277               (no dots 80.3%)
   2,936 .             ( 1 dot  17.7%)
     281 ..            ( 2 dots  1.7%)
      29 ...           ( 3 dots  0.2%)
       3 ....          ( 4 dots  0.0%)
       1 .....         ( 5 dots  0.0%)
       1 ...........   (11 dots  0.0%)

So, in light of this information, I would be willing to concede 3 or more dots is possibly and indicator of spam.

My previous log methodology would add the following spam score to messages with 3 or more dots. (Assuming 3 dots is the number we start adding to the spam score.)


 3 dots = 1
 4 dots = 1.26
 5 dots = 1.46
11 dots = 2.18

Assuming 2 dots are allowed and is the number:

 3 dots = 1.58
 4 dots = 2.00
 5 dots = 2.32
11 dots = 3.46

I think I would be comfortable blindly adding log$Base($numberOfDots) (when numberOfDots > $Base) to the spam score. I don't even see a need to mess with a meta rule.




--
Grant. . . .
unix || die

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Proposed rule for too many dots in From

Reply via email to