On 10/03/2009 05:08 PM, John Hardin wrote:
On Sat, 3 Oct 2009, Warren Togami wrote:
On 10/01/2009 02:36 PM, John Hardin wrote:
On Thu, 1 Oct 2009, Warren Togami wrote:
> The "Oddity" I was pointing out at the beginning of the thread is not
> prevalence of .cn URI's, but rather most of them appear to be exactly
> 8 characters long. Could someone please commit my T_CN_8_URL rule to
> the sandbox so we can see if that trend holds beyond my own corpa?
I've put a .CN 8 URI rule into my sandbox file but it may be a few days
before it gets committed, my stuff is in flux right now...
# 8-letter .cn domain, per Warren Togami
uri CN_EIGHT m;^https?://(?:[^./]+\.)*[^./]{8}\.cn/;
describe CN_EIGHT .CN uri with eight-letter domain name
score CN_EIGHT 0.10
Possible bug here... Do all URI's necessarily have a trailing slash?
First results are in:
http://ruleqa.spamassassin.org/20091003-r821273-n/T_CN_EIGHT/detail
Can't trust those results yet. The trailing slash bug, and John Rudd might be
correct about whitespace?
[^./]{8}\.cn
Actually, doesn't this match other characters that shouldn't be in a domain
name?
Then there are "valid" URL's like http://password:usern...@example.com/ not
matched by this rule.
Could you please add the following to the sandbox before tomorrow?
# from http://www.apnic.net/db/ranges.html at 20091002, meta bits added 20090930
# copied from khop-bl.sa.khopesh.com
header __RCVD_VIA_APNIC Received =~
/(?-xism:[^0-9.](?:2(?:0(?:2(?:\.1(?:2(?:3\.(?:0?(?:[4-9][0-9]|3[2-9])|[12][0-9]{2})\.[012]?[0-9]{1,2}|[^3]\.(?:012]?[0-9]{1,2}){2})|[^2]3\.(?:012]?[0-9]{1,2}){2})|(?:\.[02]?[0-9]{1,2}){3})|3(?:\.[012]?[0-9]{1,2}){3})|(?:1[0189]|2[012])(?:\.[012]?[0-9]{1,2}){3})|1(?:(?:2[0123456]|8[023]|1\d|75)(?:\.[012]?[0-9]{1,2}){3}|69\.2(?:1[0-9]|2[0-3]|0[89])(?:\.[012]?[0-9]{1,2}){2})|(?:5[89]|6[01])(?:\.[012]?[0-9]{1,2}){3})(?:[\]\)\s]))/
describe __RCVD_VIA_APNIC Received through a relay in Asia/Pacific Network
meta CN_EIGHT_NOAPNIC CN_EIGHT && !__RCVD_VIA_APNIC && !ALL_TRUSTED
describe CN_EIGHT_NOAPNIC .cn URI exactly 8 characters long, excluding APNIC
One silly arbitrary rule, excluding prejudiced rule. This is still unsafe but
should show us some interesting numbers.
Warren Togami
wtog...@redhat.com