Rules to Score Random Letters Sender Name

Ilan Aisic 23 Mar 2004 10:56:39 -0000

Hi,

I've noticed that a lot of the spam that is trapped here
has a "from" address that is a long random sequence of characters
before the @domain part.  For example:
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]


None of these ever triggers anything (unlike the ones that contain digits).

Is there a way to write a rule that will distinguish this addresses from
Ones that are legitimate long names or connected words with no separators?
Perhaps scoring based on statistical appearance of rare characters such as
xyz?

I think that a simple rule that shall give a small score for any long
sequence
(say > 20 characters) that is only letters before the @ part is good for
some
of the cases:

header ManyLettersName  From =~ /<[a-zA-Z]{20,[EMAIL PROTECTED],50}\..{1,3}/
score ManyLettersName   0.5
Describe ManyLettersName   Sender name composed from a very long string of
letters

The above was never tested!
A similar rule can give a higher score to extremely long string of letters 

This however, is not good for this example (copied from my log file):
[EMAIL PROTECTED]

Here it seems, AI is required to determine that the thing is not a real name
in any language.

Opinions?

--ilan

Rules to Score Random Letters Sender Name

Reply via email to