RE: .cn Oddity

2009-10-13 Thread Chris Santerre


> -Original Message-
> From: jdow [mailto:j...@earthlink.net]

> 
> {^_-}   (Some of the ninjas are burned out. I have one such to my
> back when we're both in the room beating away at our CPUs.)
> 

+1 burnout. Too many things going on. Will eventually get my 232nd wind and
be back in the fight. Although I want to approach antispam in a whole
different way next time. I still have a few ideas up my sleave ;) 

--Chris 


Re: .cn Oddity

2009-10-11 Thread jdow

From: "MySQL Student" 
Sent: Sunday, 2009/October/11 09:08



Hi,


We use some rules if we talk open about it and say hey this spammer is
stupid look here, then it will take less then 12 hours and that gap is
closed and we loose a valuable trick.


yes its the way it is, spammers can also read maillists and adapt there
spamming rules to get bypassed


It sounds like social engineering needs to be part of the attack
rules/strategy that we employ on these spammers :-)


They say you can't con a conman. The same "They" say a lot of things
that are not strictly true. I'd still love a good ham corpus from
China. It may be that most Chinese domains are 8 characters. That
would make a "not 8 characters plus .cn" rule devastating to the .cn
spammers. That is aside from the fact that they tend to trigger so
many effective (yet old) rules and Bayes that none of them have
gotten through the filters.

{^_-}   (Some of the ninjas are burned out. I have one such to my
   back when we're both in the room beating away at our CPUs.)


Re: .cn Oddity

2009-10-11 Thread MySQL Student
Hi,

>> We use some rules if we talk open about it and say hey this spammer is
>> stupid look here, then it will take less then 12 hours and that gap is
>> closed and we loose a valuable trick.
>
> yes its the way it is, spammers can also read maillists and adapt there
> spamming rules to get bypassed

It sounds like social engineering needs to be part of the attack
rules/strategy that we employ on these spammers :-)

Regards,
Alex


Re: .cn Oddity

2009-10-11 Thread Benny Pedersen

On søn 11 okt 2009 12:12:20 CEST, jdow wrote

could squeeze his spam decreased. It's still decreasing, although at a
slower rate due to the relative inactivity of the SARE ninjas.


sare rules is non maintained now, but it could still go to masscheck  
to get the best of them readded in to sa


--
xpoint



Re: .cn Oddity

2009-10-11 Thread Benny Pedersen

On søn 11 okt 2009 11:48:11 CEST, Raymond Dijkxhoorn wrote

We use some rules if we talk open about it and say hey this spammer  
is stupid look here, then it will take less then 12 hours and that  
gap is closed and we loose a valuable trick.


yes its the way it is, spammers can also read maillists and adapt  
there spamming rules to get bypassed



Fighting spam is more then just ventilating idea's its much more then that.


lets make whitelist of cn domains that is not seen in spam, more fun  
for the spammers now, can we still say 8 char cn domain rules ? :)


what if sender or evelope is a hotmail.* meta it, cn domains could as  
well have email on there own domain, not tested


if i see a email with more then one domain its basicly spam

--
xpoint



Re: .cn Oddity

2009-10-11 Thread Raymond Dijkxhoorn

Hi!


So I am quite aware of losing good rules. HOWEVER, as he found out WE
keep the old rules and add new ones and his keyhole through which he
could squeeze his spam decreased. It's still decreasing, although at a
slower rate due to the relative inactivity of the SARE ninjas.


Most Ninja's including me are idle due to this same exposure thing. We 
share within the SARE group internally but most are not published like in 
the past. Some are added by Alex to the generic SA updates however.


Bye,
Raymond.


Re: .cn Oddity

2009-10-11 Thread jdow

From: "Raymond Dijkxhoorn" 
Sent: Sunday, 2009/October/11 02:48



Hi!


7263 T_CN_URL hits in 15517 spam corpus
7200 T_CN_8_URL hits in 15517 spam corpus

Does this make any sense?  This is funny.  Could someone add this rule 
to the sandbox?  I'm just curious.


I have to admire one thing about spammers. They respond very rapidly to 
"threats" to their ability to break through spam protection software. You 
became curious and mentioned this on the date above. Spammers are already 
using <7 character names>.cn.


Thats why i said earlier in the thread if you see  something, test it 
silently and add it silently. Thats the only way to get use of it.


We use some rules if we talk open about it and say hey this spammer is 
stupid look here, then it will take less then 12 hours and that gap is 
closed and we loose a valuable trick.


Fighting spam is more then just ventilating idea's its much more then 
that.


Bye,
Raymond.


Some years ago, Raymond, I "used" this list to bait a specific spammer
about how pathetic his scores were. They were high but he didn't break
100. Within a week he found a way. (His spams had (have) many features
that very characteristic of his work but hard to use for anti-spam.
This involved a specific portion of a name he'd use for registering his
phony domains.)

So I am quite aware of losing good rules. HOWEVER, as he found out WE
keep the old rules and add new ones and his keyhole through which he
could squeeze his spam decreased. It's still decreasing, although at a
slower rate due to the relative inactivity of the SARE ninjas.

{^_^} 



Re: .cn Oddity

2009-10-11 Thread Raymond Dijkxhoorn

Hi!


7263 T_CN_URL hits in 15517 spam corpus
7200 T_CN_8_URL hits in 15517 spam corpus

Does this make any sense?  This is funny.  Could someone add this rule to 
the sandbox?  I'm just curious.


I have to admire one thing about spammers. They respond very rapidly to 
"threats" to their ability to break through spam protection software. 
You became curious and mentioned this on the date above. Spammers are 
already using <7 character names>.cn.


Thats why i said earlier in the thread if you see  something, test it 
silently and add it silently. Thats the only way to get use of it.


We use some rules if we talk open about it and say hey this spammer is 
stupid look here, then it will take less then 12 hours and that gap is 
closed and we loose a valuable trick.


Fighting spam is more then just ventilating idea's its much more then 
that.


Bye,
Raymond.


Re: .cn Oddity

2009-10-10 Thread Warren Togami

On 10/11/2009 02:07 AM, jdow wrote:


I have to admire one thing about spammers. They respond very rapidly to
"threats" to their ability to break through spam protection software. You
became curious and mentioned this on the date above. Spammers are already
using <7 character names>.cn.

{^_-}


Yes, I see they began registering \w{7}.cn domains around October 3rd 
and the \w{8}.cn spam is a lot less now.


Warren


Re: .cn Oddity

2009-10-10 Thread jdow

From: "Warren Togami" 
Sent: Wednesday, 2009/September/30 21:40



uri T_CN_URL  /[^\/]+\.cn(?:$|\/|\?)/i
describe T_CN_URL Contains a URL in the .cn domain

uri T_CN_8_URL  /[\/.]+\w{8}\.cn(?:$|\/|\?)/i
describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 
characters long


http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail
Last night's masscheck.  63243 out of 124241 spam hits T_CN_URL, nearly 
51%.


7263 T_CN_URL hits in 15517 spam corpus
7200 T_CN_8_URL hits in 15517 spam corpus

Does this make any sense?  This is funny.  Could someone add this rule to 
the sandbox?  I'm just curious.


Warren Togami
wtog...@redhat.com


I have to admire one thing about spammers. They respond very rapidly to
"threats" to their ability to break through spam protection software. You
became curious and mentioned this on the date above. Spammers are already
using <7 character names>.cn.

{^_-} 



Re: .cn Oddity

2009-10-04 Thread John Hardin

On Sun, 4 Oct 2009, Warren Togami wrote:


On 10/04/2009 04:07 PM, John Hardin wrote:

 On Thu, 1 Oct 2009, Warren Togami wrote:

>  The "Oddity" I was pointing out at the beginning of the thread is not
>  prevalence of .cn URI's, but rather most of them appear to be exactly
>  8 characters long.

 Are there any other .cn domain formats (like {8}.com.cn) that would be
 of interest? I was trolling through a spam quarantine I'd forgoten about
 and found a message containing this:

 {domain}.cn
 {domain}.com.cn
 {domain}.net.cn



I wouldn't bother.  I only wanted to check the relative % of CN_EIGHT to 
CN_URL because I found it strange that the majority of CN_URL had exactly 8 
characters.


In the end this rule is unsafe to use in production so it doesn't matter much 
to check for even less prevalent matches that we can't use either.


OK

BTW, I have commit access now.  Mind if I move these rules from your sandbox 
into my own sandbox?


Go ahead.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  I'm seriously considering getting one of those bright-orange prison
  overalls and stencilling PASSENGER on the back. Along with the paper
  slippers, I ought to be able to walk right through security.
 -- Brian Kantor in a.s.r
---
 Approximately 9164580 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-04 Thread Warren Togami

On 10/04/2009 04:07 PM, John Hardin wrote:

On Thu, 1 Oct 2009, Warren Togami wrote:


The "Oddity" I was pointing out at the beginning of the thread is not
prevalence of .cn URI's, but rather most of them appear to be exactly
8 characters long.


Are there any other .cn domain formats (like {8}.com.cn) that would be
of interest? I was trolling through a spam quarantine I'd forgoten about
and found a message containing this:

{domain}.cn
{domain}.com.cn
{domain}.net.cn



I wouldn't bother.  I only wanted to check the relative % of CN_EIGHT to 
CN_URL because I found it strange that the majority of CN_URL had 
exactly 8 characters.


In the end this rule is unsafe to use in production so it doesn't matter 
much to check for even less prevalent matches that we can't use either.


BTW, I have commit access now.  Mind if I move these rules from your 
sandbox into my own sandbox?


Warren Togami
wtog...@redhat.com


Re: .cn Oddity

2009-10-04 Thread John Hardin

On Thu, 1 Oct 2009, Warren Togami wrote:

The "Oddity" I was pointing out at the beginning of the thread is not 
prevalence of .cn URI's, but rather most of them appear to be exactly 8 
characters long.


Are there any other .cn domain formats (like {8}.com.cn) that would be of 
interest? I was trolling through a spam quarantine I'd forgoten about and 
found a message containing this:


{domain}.cn
{domain}.com.cn
{domain}.net.cn

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  One difference between a liberal and a pickpocket is that if you
  demand your money back from a pickpocket he will not question your
  motives.  -- William Rusher
---
 Approximately 9157680 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-04 Thread John Hardin

On Sun, 4 Oct 2009, Karsten Br?ckelmann wrote:


On Sun, 2009-10-04 at 09:59 -0400, Warren Togami wrote:

On 10/04/2009 12:21 AM, John Hardin wrote:



Right, in adding things to the sandbox it does not necessarily mean I
suggest they should become rules. I am mainly curious to see what the
results say.


Warning: autopromotion


Is there a way to prevent autopromotion for a particular rule?


Yep, using tflags nopublish,


Done. Will be committed momentarily.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Warning Labels we'd like to see #1: "If you are a stupid idiot while
 using this product you may hurt yourself. And it won't be our fault."
---
 Approximately 9152160 firearms legally purchased in the U.S. this year

Re: .cn Oddity

2009-10-04 Thread Karsten Bräckelmann
On Sun, 2009-10-04 at 09:59 -0400, Warren Togami wrote:
> On 10/04/2009 12:21 AM, John Hardin wrote:

> > > Right, in adding things to the sandbox it does not necessarily mean I
> > > suggest they should become rules. I am mainly curious to see what the
> > > results say.
> >
> > Warning: autopromotion
> 
> Is there a way to prevent autopromotion for a particular rule?

Yep, using tflags nopublish, or explicitly naming the rule with a T_
prefix. Also see bug 5545 [1].


[1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5545

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: [SA] .cn Oddity

2009-10-04 Thread Warren Togami

On 10/04/2009 12:21 AM, John Hardin wrote:

On Sat, 3 Oct 2009, Warren Togami wrote:


On 10/03/2009 07:50 PM, Adam Katz wrote:


8 is *extremely* important in Chinese culture. When running these
tests, make sure that there is a good quantity of .cn TLD URIs in the
ham before drawing any conclusions.


Right, in adding things to the sandbox it does not necessarily mean I
suggest they should become rules. I am mainly curious to see what the
results say.


Warning: autopromotion



Is there a way to prevent autopromotion for a particular rule?

Warren


Re: .cn Oddity

2009-10-03 Thread John Hardin

On Sat, 3 Oct 2009, Warren Togami wrote:


On 10/03/2009 07:11 PM, John Hardin wrote:

>  [^./]{8}\.cn
> 
>  Actually, doesn't this match other characters that shouldn't be in a

>  domain name?

 ...is _anything_ (apart from periods) excluded from domain names these
 days? :)

 Changed to \w{8} for testing. Can you provide examples of needing more
 than \w?


I doubt it matters for this particular rule, but dash characters are valid in 
domain names too right?


\w seems to be alpha, numeric and underscore.  Underscore isn't valid in a 
domain name.


True.

Let's let this version go through a masscheck cycle and then I'll change 
it to [-\w]{8}


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Vista "security improvements" consist of attempting to shift blame
  onto the user when things go wrong.
---
 Approximately 9134220 firearms legally purchased in the U.S. this year


Re: [SA] .cn Oddity

2009-10-03 Thread John Hardin

On Sat, 3 Oct 2009, Warren Togami wrote:


On 10/03/2009 07:50 PM, Adam Katz wrote:


 8 is *extremely* important in Chinese culture.  When running these
 tests, make sure that there is a good quantity of .cn TLD URIs in the
 ham before drawing any conclusions.


Right, in adding things to the sandbox it does not necessarily mean I suggest 
they should become rules.  I am mainly curious to see what the results say.


Warning: autopromotion

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Vista "security improvements" consist of attempting to shift blame
  onto the user when things go wrong.
---
 Approximately 9134220 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-03 Thread Warren Togami

On 10/03/2009 07:11 PM, John Hardin wrote:

[^./]{8}\.cn

Actually, doesn't this match other characters that shouldn't be in a
domain name?


...is _anything_ (apart from periods) excluded from domain names these
days? :)

Changed to \w{8} for testing. Can you provide examples of needing more
than \w?


I doubt it matters for this particular rule, but dash characters are 
valid in domain names too right?


\w seems to be alpha, numeric and underscore.  Underscore isn't valid in 
a domain name.


Warren


Re: [SA] .cn Oddity

2009-10-03 Thread Warren Togami

On 10/03/2009 07:50 PM, Adam Katz wrote:


8 is *extremely* important in Chinese culture.  When running these
tests, make sure that there is a good quantity of .cn TLD URIs in the
ham before drawing any conclusions.


Right, in adding things to the sandbox it does not necessarily mean I 
suggest they should become rules.  I am mainly curious to see what the 
results say.


Warren


Re: [SA] .cn Oddity

2009-10-03 Thread Adam Katz
Warren Togami wrote:
>>> The "Oddity" I was pointing out at the beginning of the thread is not
>>> prevalence of .cn URI's, but rather most of them appear to be exactly 8
>>> characters long. Could someone please commit my T_CN_8_URL rule to the
>>> sandbox so we can see if that trend holds beyond my own corpa?
>>
>> (And yes I'm fully aware even this narrowed rule is prejudiced and
>> unsafe. This is is partly out of curiosity, and also wondering if it
>> could be made useful if meta booleaned with something else.)

jdow then mused:
> I just had a thought, Warren. Look up Chinese numerology. 8 signifies
> wealth or sudden prosperity. Conversely, I suspect few Chinese names
> are four characters. Four is a pun on death. Some social sites might
> like 5 letters - me. 7 is right out, it's a vulgar word in Cantonese.
> 9 is also slang or vulgar in Cantonese.
> 
> I wonder how many companies that deal with China have figured out that
> an "888" toll free number is WONDERFUL, "Wealth, wealth, wealth."
> 
> I understand numerology is quite important to the Chinese. (Of course,
> I am not claiming to be an expert. The above is mostly Wikipoodle and
> surmise.)

8 is *extremely* important in Chinese culture.  When running these
tests, make sure that there is a good quantity of .cn TLD URIs in the
ham before drawing any conclusions.


Re: .cn Oddity

2009-10-03 Thread John Hardin

On Sat, 3 Oct 2009, Warren Togami wrote:

Can't trust those results yet.  The trailing slash bug, and John Rudd 
might be correct about whitespace?


I doubt whitespace will be a problem. That would break the parser before 
it even got to the rule, and while "dom%20name.cn" might be syntactically 
valid would a registrar ever _accept_ such a domain name?


Examples solicited.


[^./]{8}\.cn

Actually, doesn't this match other characters that shouldn't be in a 
domain name?


...is _anything_ (apart from periods) excluded from domain names these 
days? :)


Changed to \w{8} for testing. Can you provide examples of needing more 
than \w?


Then there are "valid" URL's like http://password:usern...@example.com/ 
not matched by this rule.


The URI parser apparently discards username:password@ from URIs:

[6788] dbg: rules: ran body rule ALL_BODY ==> got hit: 
"http://fnord:b...@87654321.cn";
[6788] dbg: rules: ran uri rule CN_EIGHT ==> got hit: "http://87654321.cn";


Could you please add the following to the sandbox before tomorrow?

#  from http://www.apnic.net/db/ranges.html at 20091002, meta bits added 
#  20090930

#  copied from khop-bl.sa.khopesh.com
header __RCVD_VIA_APNIC Received =~ 
/(?-xism:[^0-9.](?:2(?:0(?:2(?:\.1(?:2(?:3\.(?:0?(?:[4-9][0-9]|3[2-9])|[12][0-9]{2})\.[012]?[0-9]{1,2}|[^3]\.(?:012]?[0-9]{1,2}){2})|[^2]3\.(?:012]?[0-9]{1,2}){2})|(?:\.[02]?[0-9]{1,2}){3})|3(?:\.[012]?[0-9]{1,2}){3})|(?:1[0189]|2[012])(?:\.[012]?[0-9]{1,2}){3})|1(?:(?:2[0123456]|8[023]|1\d|75)(?:\.[012]?[0-9]{1,2}){3}|69\.2(?:1[0-9]|2[0-3]|0[89])(?:\.[012]?[0-9]{1,2}){2})|(?:5[89]|6[01])(?:\.[012]?[0-9]{1,2}){3})(?:[\]\)\s]))/

describe __RCVD_VIA_APNIC Received through a relay in Asia/Pacific Network

meta CN_EIGHT_NOAPNIC CN_EIGHT && !__RCVD_VIA_APNIC && !ALL_TRUSTED
describe CN_EIGHT_NOAPNIC .cn URI exactly 8 characters long, excluding APNIC

One silly arbitrary rule, excluding prejudiced rule.  This is still unsafe 
but should show us some interesting numbers.


Done. Not sure if the nightly is already running or not...

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #6: If you can choose what to bring to a
  gunfight, bring a long gun and a friend with a long gun.
---
 Approximately 9127320 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-03 Thread John Rudd
On Sat, Oct 3, 2009 at 15:55, John Hardin  wrote:
> On Sat, 3 Oct 2009, John Rudd wrote:
>
>> On Sat, Oct 3, 2009 at 11:06, Warren Togami  wrote:
>>
>>>
>>> # 8-letter .cn domain, per Warren Togami
>>> uri            CN_EIGHT
>>>  m;^https?://(?:[^./]+\.)*[^./]{8}\.cn/;
>>> describe       CN_EIGHT            .CN uri with eight-letter domain name
>>> score          CN_EIGHT            0.10
>>>
>>> Possible bug here... Do all URI's necessarily have a trailing slash?
>>
>>
>> And, don't you want to omit whitespace from the 8 characters?  Or am I
>> missing something that takes care of that for you?
>
> I don't think a parsed URI would have whitespace in the hostname part. This
> isn't a body rule.

That would be the part I was missing :-)


Re: .cn Oddity

2009-10-03 Thread John Hardin

On Sat, 3 Oct 2009, John Rudd wrote:


On Sat, Oct 3, 2009 at 11:06, Warren Togami  wrote:



# 8-letter .cn domain, per Warren Togami
uri            CN_EIGHT            m;^https?://(?:[^./]+\.)*[^./]{8}\.cn/;
describe       CN_EIGHT            .CN uri with eight-letter domain name
score          CN_EIGHT            0.10

Possible bug here... Do all URI's necessarily have a trailing slash?



And, don't you want to omit whitespace from the 8 characters?  Or am I
missing something that takes care of that for you?


I don't think a parsed URI would have whitespace in the hostname part. 
This isn't a body rule.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #6: If you can choose what to bring to a
  gunfight, bring a long gun and a friend with a long gun.
---
 Approximately 9127320 firearms legally purchased in the U.S. this year

Re: .cn Oddity

2009-10-03 Thread Warren Togami

On 10/03/2009 05:08 PM, John Hardin wrote:

On Sat, 3 Oct 2009, Warren Togami wrote:


On 10/01/2009 02:36 PM, John Hardin wrote:

On Thu, 1 Oct 2009, Warren Togami wrote:

> The "Oddity" I was pointing out at the beginning of the thread is not
> prevalence of .cn URI's, but rather most of them appear to be exactly
> 8 characters long. Could someone please commit my T_CN_8_URL rule to
> the sandbox so we can see if that trend holds beyond my own corpa?

I've put a .CN 8 URI rule into my sandbox file but it may be a few days
before it gets committed, my stuff is in flux right now...



# 8-letter .cn domain, per Warren Togami
uri CN_EIGHT m;^https?://(?:[^./]+\.)*[^./]{8}\.cn/;
describe CN_EIGHT .CN uri with eight-letter domain name
score CN_EIGHT 0.10

Possible bug here... Do all URI's necessarily have a trailing slash?


First results are in:

http://ruleqa.spamassassin.org/20091003-r821273-n/T_CN_EIGHT/detail



Can't trust those results yet.  The trailing slash bug, and John Rudd might be 
correct about whitespace?

[^./]{8}\.cn

Actually, doesn't this match other characters that shouldn't be in a domain 
name?

Then there are "valid" URL's like http://password:usern...@example.com/  not 
matched by this rule.

Could you please add the following to the sandbox before tomorrow?

# from http://www.apnic.net/db/ranges.html at 20091002, meta bits added 20090930
# copied from khop-bl.sa.khopesh.com
header __RCVD_VIA_APNIC Received =~ 
/(?-xism:[^0-9.](?:2(?:0(?:2(?:\.1(?:2(?:3\.(?:0?(?:[4-9][0-9]|3[2-9])|[12][0-9]{2})\.[012]?[0-9]{1,2}|[^3]\.(?:012]?[0-9]{1,2}){2})|[^2]3\.(?:012]?[0-9]{1,2}){2})|(?:\.[02]?[0-9]{1,2}){3})|3(?:\.[012]?[0-9]{1,2}){3})|(?:1[0189]|2[012])(?:\.[012]?[0-9]{1,2}){3})|1(?:(?:2[0123456]|8[023]|1\d|75)(?:\.[012]?[0-9]{1,2}){3}|69\.2(?:1[0-9]|2[0-3]|0[89])(?:\.[012]?[0-9]{1,2}){2})|(?:5[89]|6[01])(?:\.[012]?[0-9]{1,2}){3})(?:[\]\)\s]))/
describe __RCVD_VIA_APNIC Received through a relay in Asia/Pacific Network

meta CN_EIGHT_NOAPNIC CN_EIGHT && !__RCVD_VIA_APNIC && !ALL_TRUSTED
describe CN_EIGHT_NOAPNIC .cn URI exactly 8 characters long, excluding APNIC

One silly arbitrary rule, excluding prejudiced rule.  This is still unsafe but 
should show us some interesting numbers.

Warren Togami
wtog...@redhat.com


Re: .cn Oddity

2009-10-03 Thread John Rudd
On Sat, Oct 3, 2009 at 11:06, Warren Togami  wrote:

>
> # 8-letter .cn domain, per Warren Togami
> uri            CN_EIGHT            m;^https?://(?:[^./]+\.)*[^./]{8}\.cn/;
> describe       CN_EIGHT            .CN uri with eight-letter domain name
> score          CN_EIGHT            0.10
>
> Possible bug here... Do all URI's necessarily have a trailing slash?


And, don't you want to omit whitespace from the 8 characters?  Or am I
missing something that takes care of that for you?


Re: .cn Oddity

2009-10-03 Thread John Hardin

On Sat, 3 Oct 2009, Warren Togami wrote:


On 10/01/2009 02:36 PM, John Hardin wrote:

 On Thu, 1 Oct 2009, Warren Togami wrote:

>  The "Oddity" I was pointing out at the beginning of the thread is not
>  prevalence of .cn URI's, but rather most of them appear to be exactly
>  8 characters long. Could someone please commit my T_CN_8_URL rule to
>  the sandbox so we can see if that trend holds beyond my own corpa?

 I've put a .CN 8 URI rule into my sandbox file but it may be a few days
 before it gets committed, my stuff is in flux right now...



# 8-letter .cn domain, per Warren Togami
uriCN_EIGHTm;^https?://(?:[^./]+\.)*[^./]{8}\.cn/;
describe   CN_EIGHT.CN uri with eight-letter domain name
score  CN_EIGHT0.10

Possible bug here... Do all URI's necessarily have a trailing slash?


First results are in:

http://ruleqa.spamassassin.org/20091003-r821273-n/T_CN_EIGHT/detail

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  We are hell-bent and determined to allocate the talent, the
  resources, the money, the innovation to absolutely become a
  powerhouse in the ad business.   -- Microsoft CEO Steve Ballmer
  ...because allocating talent to securing Windows isn't profitable?
---
 Approximately 9125940 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-03 Thread John Hardin

On Sat, 3 Oct 2009, Ned Slider wrote:


Warren Togami wrote:

 On 10/01/2009 02:36 PM, John Hardin wrote:
>  On Thu, 1 Oct 2009, Warren Togami wrote:
> 
> >  The "Oddity" I was pointing out at the beginning of the thread is 
> >  not prevalence of .cn URI's, but rather most of them appear to be 
> >  exactly 8 characters long. Could someone please commit my 
> >  T_CN_8_URL rule to the sandbox so we can see if that trend holds 
> >  beyond my own corpa?
> 
>  I've put a .CN 8 URI rule into my sandbox file but it may be a few 
>  days before it gets committed, my stuff is in flux right now...

>

 # 8-letter .cn domain, per Warren Togami
 uriCN_EIGHTm;^https?://(?:[^./]+\.)*[^./]{8}\.cn/;
 describe   CN_EIGHT.CN uri with eight-letter domain name
 score  CN_EIGHT0.10

 Possible bug here... Do all URI's necessarily have a trailing slash?


\b might be better?


Yes. I didn't use \b because I had a temporary attack of the stupids.

Fixed.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  It's easy to be noble with other people's money.
   -- John McKay, _The Welfare State:
  No Mercy for the Middle Class_
---
 Approximately 9123180 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-03 Thread Ned Slider

Warren Togami wrote:

On 10/01/2009 02:36 PM, John Hardin wrote:

On Thu, 1 Oct 2009, Warren Togami wrote:


The "Oddity" I was pointing out at the beginning of the thread is not
prevalence of .cn URI's, but rather most of them appear to be exactly
8 characters long. Could someone please commit my T_CN_8_URL rule to
the sandbox so we can see if that trend holds beyond my own corpa?


I've put a .CN 8 URI rule into my sandbox file but it may be a few days
before it gets committed, my stuff is in flux right now...



# 8-letter .cn domain, per Warren Togami
uriCN_EIGHTm;^https?://(?:[^./]+\.)*[^./]{8}\.cn/;
describe   CN_EIGHT.CN uri with eight-letter domain name
score  CN_EIGHT0.10

Possible bug here... Do all URI's necessarily have a trailing slash?

Warren Togami
wtog...@redhat.com



\b might be better?




Re: .cn Oddity

2009-10-03 Thread Warren Togami

On 10/01/2009 02:36 PM, John Hardin wrote:

On Thu, 1 Oct 2009, Warren Togami wrote:


The "Oddity" I was pointing out at the beginning of the thread is not
prevalence of .cn URI's, but rather most of them appear to be exactly
8 characters long. Could someone please commit my T_CN_8_URL rule to
the sandbox so we can see if that trend holds beyond my own corpa?


I've put a .CN 8 URI rule into my sandbox file but it may be a few days
before it gets committed, my stuff is in flux right now...



# 8-letter .cn domain, per Warren Togami
uriCN_EIGHTm;^https?://(?:[^./]+\.)*[^./]{8}\.cn/;
describe   CN_EIGHT.CN uri with eight-letter domain name
score  CN_EIGHT0.10

Possible bug here... Do all URI's necessarily have a trailing slash?

Warren Togami
wtog...@redhat.com


Re: .cn Oddity

2009-10-02 Thread MySQL Student
Hi All,

Regarding the .cn oddity, I added these to my rules, and of about 79k
messages today so far, I have the following:

uri LOC_URI_CN  m;^https?://[^/?]+\.cn\b;
uri T_CN_8_URL  /[\/.]+\w{8}\.cn(?:$|\/|\?)/i

LOC_URI_CN: 2926
T_CN_8_URL: 1634

HTH,
Alex


Re: .cn Oddity

2009-10-01 Thread John Hardin

On Thu, 1 Oct 2009, Warren Togami wrote:

The "Oddity" I was pointing out at the beginning of the thread is not 
prevalence of .cn URI's, but rather most of them appear to be exactly 8 
characters long.  Could someone please commit my T_CN_8_URL rule to the 
sandbox so we can see if that trend holds beyond my own corpa?


I've put a .CN 8 URI rule into my sandbox file but it may be a few days 
before it gets committed, my stuff is in flux right now...


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #9: Accuracy is relative: most combat
  shooting standards will be more dependent on "pucker factor" than
  the inherent accuracy of the gun.
---
 Approximately 9055560 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-01 Thread jdow

From: "Ned Slider" 
Sent: Thursday, 2009/October/01 10:48



Warren Togami wrote:

On 10/01/2009 01:05 PM, John Hardin wrote:

On Thu, 1 Oct 2009, jdow wrote:


From: "John Hardin" 


Yours may still hit .cn in the path part. May I suggest:

m;^https?://[^/?]+\.cn\b;


Regardless of their correctness, would you care to expound on the 
success

of these two rules, John? I like what works not political correctness.
I think these are two interesting observations. Of course, they won't
work very well for somebody doing business with China or embedded
within the .cn TLD.


"what works" is based on the accuracy of the corpora. If the corpora
show lots of spam with .cn TLD URIs and little or no ham with such, then
that rule will hit often, and have a good S/O, and get a high score.

I too am surprised that .cn TLDs appear in 51% of the spam corpus but I
haven't looked into it in any detail. I can certainly check it against
my own corpora and see if it's reasonable - but then again, I don't do
any business with anyone in china, and I _do_ get a fair amount of bulk
emails from manufacturers in china purportedly looking for business
partners.



The "Oddity" I was pointing out at the beginning of the thread is not 
prevalence of .cn URI's, but rather most of them appear to be exactly 8 
characters long.  Could someone please commit my T_CN_8_URL rule to the 
sandbox so we can see if that trend holds beyond my own corpa?


Warren



Warren,

Seems to hold true here to an extent. From my recent confirmed spam 
archive I see:


# cat spam* | grep '\.cn\b' | grep http | wc -l
1088

# cat spam* | grep '\.\w\{8\}\.cn\b' | grep http | wc -l
908

# cat spam* | grep '\/\w\{8\}\.cn\b' | grep http | wc -l
23


so 85% of .cn URIs also match the {8}.cn pattern. Not quite as high as 
your findings, but very high nevertheless.


Based on my last note about Chinese numerology I bet if you have a large
Chinese ham corpus you'd pick up on 8 as a magic number there, too. I am
intrigued enough I'd LOVE to know if that's right.

{^_^} 



Re: .cn Oddity

2009-10-01 Thread jdow

From: "Warren Togami" 
Sent: Thursday, 2009/October/01 10:24



On 10/01/2009 01:16 PM, Warren Togami wrote:

On 10/01/2009 01:05 PM, John Hardin wrote:

On Thu, 1 Oct 2009, jdow wrote:


From: "John Hardin" 


Yours may still hit .cn in the path part. May I suggest:

m;^https?://[^/?]+\.cn\b;


Regardless of their correctness, would you care to expound on the
success
of these two rules, John? I like what works not political correctness.
I think these are two interesting observations. Of course, they won't
work very well for somebody doing business with China or embedded
within the .cn TLD.


"what works" is based on the accuracy of the corpora. If the corpora
show lots of spam with .cn TLD URIs and little or no ham with such, then
that rule will hit often, and have a good S/O, and get a high score.

I too am surprised that .cn TLDs appear in 51% of the spam corpus but I
haven't looked into it in any detail. I can certainly check it against
my own corpora and see if it's reasonable - but then again, I don't do
any business with anyone in china, and I _do_ get a fair amount of bulk
emails from manufacturers in china purportedly looking for business
partners.



The "Oddity" I was pointing out at the beginning of the thread is not
prevalence of .cn URI's, but rather most of them appear to be exactly 8
characters long. Could someone please commit my T_CN_8_URL rule to the
sandbox so we can see if that trend holds beyond my own corpa?

Warren


(And yes I'm fully aware even this narrowed rule is prejudiced and unsafe. 
This is is partly out of curiosity, and also wondering if it could be made 
useful if meta booleaned with something else.)


Warren


I just had a thought, Warren. Look up Chinese numerology. 8 signifies
wealth or sudden prosperity. Conversely, I suspect few Chinese names
are four characters. Four is a pun on death. Some social sites might
like 5 letters - me. 7 is right out, it's a vulgar word in Cantonese.
9 is also slang or vulgar in Cantonese.

I wonder how many companies that deal with China have figured out that
an "888" toll free number is WONDERFUL, "Wealth, wealth, wealth."

I understand numerology is quite important to the Chinese. (Of course,
I am not claiming to be an expert. The above is mostly Wikipoodle and
surmise.)

{^_-} 



Re: .cn Oddity

2009-10-01 Thread Ned Slider

Warren Togami wrote:

On 10/01/2009 01:05 PM, John Hardin wrote:

On Thu, 1 Oct 2009, jdow wrote:


From: "John Hardin" 


Yours may still hit .cn in the path part. May I suggest:

m;^https?://[^/?]+\.cn\b;


Regardless of their correctness, would you care to expound on the 
success

of these two rules, John? I like what works not political correctness.
I think these are two interesting observations. Of course, they won't
work very well for somebody doing business with China or embedded
within the .cn TLD.


"what works" is based on the accuracy of the corpora. If the corpora
show lots of spam with .cn TLD URIs and little or no ham with such, then
that rule will hit often, and have a good S/O, and get a high score.

I too am surprised that .cn TLDs appear in 51% of the spam corpus but I
haven't looked into it in any detail. I can certainly check it against
my own corpora and see if it's reasonable - but then again, I don't do
any business with anyone in china, and I _do_ get a fair amount of bulk
emails from manufacturers in china purportedly looking for business
partners.



The "Oddity" I was pointing out at the beginning of the thread is not 
prevalence of .cn URI's, but rather most of them appear to be exactly 8 
characters long.  Could someone please commit my T_CN_8_URL rule to the 
sandbox so we can see if that trend holds beyond my own corpa?


Warren



Warren,

Seems to hold true here to an extent. From my recent confirmed spam 
archive I see:


# cat spam* | grep '\.cn\b' | grep http | wc -l
1088

# cat spam* | grep '\.\w\{8\}\.cn\b' | grep http | wc -l
908

# cat spam* | grep '\/\w\{8\}\.cn\b' | grep http | wc -l
23


so 85% of .cn URIs also match the {8}.cn pattern. Not quite as high as 
your findings, but very high nevertheless.






Re: .cn Oddity

2009-10-01 Thread Warren Togami

On 10/01/2009 01:16 PM, Warren Togami wrote:

On 10/01/2009 01:05 PM, John Hardin wrote:

On Thu, 1 Oct 2009, jdow wrote:


From: "John Hardin" 


Yours may still hit .cn in the path part. May I suggest:

m;^https?://[^/?]+\.cn\b;


Regardless of their correctness, would you care to expound on the
success
of these two rules, John? I like what works not political correctness.
I think these are two interesting observations. Of course, they won't
work very well for somebody doing business with China or embedded
within the .cn TLD.


"what works" is based on the accuracy of the corpora. If the corpora
show lots of spam with .cn TLD URIs and little or no ham with such, then
that rule will hit often, and have a good S/O, and get a high score.

I too am surprised that .cn TLDs appear in 51% of the spam corpus but I
haven't looked into it in any detail. I can certainly check it against
my own corpora and see if it's reasonable - but then again, I don't do
any business with anyone in china, and I _do_ get a fair amount of bulk
emails from manufacturers in china purportedly looking for business
partners.



The "Oddity" I was pointing out at the beginning of the thread is not
prevalence of .cn URI's, but rather most of them appear to be exactly 8
characters long. Could someone please commit my T_CN_8_URL rule to the
sandbox so we can see if that trend holds beyond my own corpa?

Warren


(And yes I'm fully aware even this narrowed rule is prejudiced and 
unsafe.  This is is partly out of curiosity, and also wondering if it 
could be made useful if meta booleaned with something else.)


Warren


Re: .cn Oddity

2009-10-01 Thread Warren Togami

On 10/01/2009 01:05 PM, John Hardin wrote:

On Thu, 1 Oct 2009, jdow wrote:


From: "John Hardin" 


Yours may still hit .cn in the path part. May I suggest:

m;^https?://[^/?]+\.cn\b;


Regardless of their correctness, would you care to expound on the success
of these two rules, John? I like what works not political correctness.
I think these are two interesting observations. Of course, they won't
work very well for somebody doing business with China or embedded
within the .cn TLD.


"what works" is based on the accuracy of the corpora. If the corpora
show lots of spam with .cn TLD URIs and little or no ham with such, then
that rule will hit often, and have a good S/O, and get a high score.

I too am surprised that .cn TLDs appear in 51% of the spam corpus but I
haven't looked into it in any detail. I can certainly check it against
my own corpora and see if it's reasonable - but then again, I don't do
any business with anyone in china, and I _do_ get a fair amount of bulk
emails from manufacturers in china purportedly looking for business
partners.



The "Oddity" I was pointing out at the beginning of the thread is not 
prevalence of .cn URI's, but rather most of them appear to be exactly 8 
characters long.  Could someone please commit my T_CN_8_URL rule to the 
sandbox so we can see if that trend holds beyond my own corpa?


Warren


Re: .cn Oddity

2009-10-01 Thread John Hardin

On Thu, 1 Oct 2009, jdow wrote:


From: "John Hardin" 


 Yours may still hit .cn in the path part. May I suggest:

   m;^https?://[^/?]+\.cn\b;


Regardless of their correctness, would you care to expound on the success
of these two rules, John? I like what works not political correctness.
I think these are two interesting observations. Of course, they won't 
work very well for somebody doing business with China or embedded within 
the .cn TLD.


"what works" is based on the accuracy of the corpora. If the corpora show 
lots of spam with .cn TLD URIs and little or no ham with such, then that 
rule will hit often, and have a good S/O, and get a high score.


I too am surprised that .cn TLDs appear in 51% of the spam corpus but I 
haven't looked into it in any detail. I can certainly check it against my 
own corpora and see if it's reasonable - but then again, I don't do any 
business with anyone in china, and I _do_ get a fair amount of bulk emails 
from manufacturers in china purportedly looking for business partners.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If "healthcare is a Right" means that the government is obligated
  to provide the people with hospitals, physicians, treatments and
  medications at low or no cost, then the right to free speech means
  the government is obligated to provide the people with printing
  presses and public address systems, the right to freedom of
  religion means the government is obligated to build churches for the
  people, and the right to keep and bear arms means the government is
  obligated to provide the people with guns, all at low or no cost.
---
 Approximately 9052800 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-01 Thread John Hardin

On Thu, 1 Oct 2009, Benny Pedersen wrote:


On tor 01 okt 2009 18:26:01 CEST, John Hardin wrote

m;^https?://[^/?]+\.cn\b;


replace ; with / no ?

m/\bhttps?://[^/?]+\.cn\b/i


No. The point to m; is so that you can embed / in the RE without escaping 
them. You are changing the RE delimiters.


m{...} is fine _if_ you don't use {m,n} syntax, in which case it becomes 
confusing.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If "healthcare is a Right" means that the government is obligated
  to provide the people with hospitals, physicians, treatments and
  medications at low or no cost, then the right to free speech means
  the government is obligated to provide the people with printing
  presses and public address systems, the right to freedom of
  religion means the government is obligated to build churches for the
  people, and the right to keep and bear arms means the government is
  obligated to provide the people with guns, all at low or no cost.
---
 Approximately 9052800 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-01 Thread jdow

From: "John Hardin" 
Sent: Thursday, 2009/October/01 09:26



On Thu, 1 Oct 2009, Ned Slider wrote:


John Hardin wrote:

 On Thu, 1 Oct 2009, Warren Togami wrote:

>  uri T_CN_URL  /[^\/]+\.cn(?:$|\/|\?)/i
>  describe T_CN_URL Contains a URL in the .cn domain
>
>  uri T_CN_8_URL  /[\/.]+\w{8}\.cn(?:$|\/|\?)/i
>  describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 
> characters long

>
>  http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail
>  Last night's masscheck.  63243 out of 124241 spam hits T_CN_URL, 
> nearly 51%.

>
>  7263 T_CN_URL hits in 15517 spam corpus
>  7200 T_CN_8_URL hits in 15517 spam corpus
>
>  Does this make any sense?  This is funny.  Could someone add this 
> rule to the sandbox?  I'm just curious.


 I note that neither is anchored at the beginning of the URI, so they 
may

 be hitting on .cn embedded somewhere within the path part.

 That doesn't explain 51%, though.


I run my own custom .cn tld URI rule, and whilst it's right down in 
percentage terms atm, in the past it has certainly hit on around 50% plus 
of all spam containing a URI. So depending on the corpus, I'm not 
surprised by the 51%.


uri LOCAL_URI_CN m{https?://.{1,40}\.cn\b}
describe LOCAL_URI_CN contains link to Chinese tld


Yours may still hit .cn in the path part. May I suggest:

  m;^https?://[^/?]+\.cn\b;


Regardless of their correctness, would you care to expound on the success
of these two rules, John? I like what works not political correctness. I
think these are two interesting observations. Of course, they won't work
very well for somebody doing business with China or embedded within the
.cn TLD.

{^_-} 



Re: .cn Oddity

2009-10-01 Thread Benny Pedersen

On tor 01 okt 2009 18:26:01 CEST, John Hardin wrote

m;^https?://[^/?]+\.cn\b;


replace ; with / no ?

m/\bhttps?://[^/?]+\.cn\b/i

--
xpoint



Re: .cn Oddity

2009-10-01 Thread John Hardin

On Thu, 1 Oct 2009, Ned Slider wrote:


John Hardin wrote:

 On Thu, 1 Oct 2009, Warren Togami wrote:

>  uri T_CN_URL  /[^\/]+\.cn(?:$|\/|\?)/i
>  describe T_CN_URL Contains a URL in the .cn domain
> 
>  uri T_CN_8_URL  /[\/.]+\w{8}\.cn(?:$|\/|\?)/i
>  describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 
>  characters long
> 
>  http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail
>  Last night's masscheck.  63243 out of 124241 spam hits T_CN_URL, nearly 
>  51%.
> 
>  7263 T_CN_URL hits in 15517 spam corpus

>  7200 T_CN_8_URL hits in 15517 spam corpus
> 
>  Does this make any sense?  This is funny.  Could someone add this rule 
>  to the sandbox?  I'm just curious.


 I note that neither is anchored at the beginning of the URI, so they may
 be hitting on .cn embedded somewhere within the path part.

 That doesn't explain 51%, though.


I run my own custom .cn tld URI rule, and whilst it's right down in 
percentage terms atm, in the past it has certainly hit on around 50% plus of 
all spam containing a URI. So depending on the corpus, I'm not surprised by 
the 51%.


uri LOCAL_URI_CNm{https?://.{1,40}\.cn\b}
describeLOCAL_URI_CNcontains link to Chinese tld


Yours may still hit .cn in the path part. May I suggest:

  m;^https?://[^/?]+\.cn\b;

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If "healthcare is a Right" means that the government is obligated
  to provide the people with hospitals, physicians, treatments and
  medications at low or no cost, then the right to free speech means
  the government is obligated to provide the people with printing
  presses and public address systems, the right to freedom of
  religion means the government is obligated to build churches for the
  people, and the right to keep and bear arms means the government is
  obligated to provide the people with guns, all at low or no cost.
---
 Approximately 9052800 firearms legally purchased in the U.S. this year


Re: .cn Oddity

2009-10-01 Thread Ned Slider

John Hardin wrote:

On Thu, 1 Oct 2009, Warren Togami wrote:


uri T_CN_URL  /[^\/]+\.cn(?:$|\/|\?)/i
describe T_CN_URL Contains a URL in the .cn domain

uri T_CN_8_URL  /[\/.]+\w{8}\.cn(?:$|\/|\?)/i
describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 
characters long


http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail
Last night's masscheck.  63243 out of 124241 spam hits T_CN_URL, 
nearly 51%.


7263 T_CN_URL hits in 15517 spam corpus
7200 T_CN_8_URL hits in 15517 spam corpus

Does this make any sense?  This is funny.  Could someone add this rule 
to the sandbox?  I'm just curious.


I note that neither is anchored at the beginning of the URI, so they may 
be hitting on .cn embedded somewhere within the path part.


That doesn't explain 51%, though.



I run my own custom .cn tld URI rule, and whilst it's right down in 
percentage terms atm, in the past it has certainly hit on around 50% 
plus of all spam containing a URI. So depending on the corpus, I'm not 
surprised by the 51%.


uri LOCAL_URI_CNm{https?://.{1,40}\.cn\b}
describeLOCAL_URI_CNcontains link to Chinese tld



Re: .cn Oddity

2009-10-01 Thread John Hardin

On Thu, 1 Oct 2009, Warren Togami wrote:


uri T_CN_URL  /[^\/]+\.cn(?:$|\/|\?)/i
describe T_CN_URL Contains a URL in the .cn domain

uri T_CN_8_URL  /[\/.]+\w{8}\.cn(?:$|\/|\?)/i
describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters 
long


http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail
Last night's masscheck.  63243 out of 124241 spam hits T_CN_URL, nearly 51%.

7263 T_CN_URL hits in 15517 spam corpus
7200 T_CN_8_URL hits in 15517 spam corpus

Does this make any sense?  This is funny.  Could someone add this rule to the 
sandbox?  I'm just curious.


I note that neither is anchored at the beginning of the URI, so they may 
be hitting on .cn embedded somewhere within the path part.


That doesn't explain 51%, though.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Therapeutic Phrenologist - send email for affordable rate schedule.
---
 Approximately 9051420 firearms legally purchased in the U.S. this year


.cn Oddity

2009-09-30 Thread Warren Togami

uri T_CN_URL  /[^\/]+\.cn(?:$|\/|\?)/i
describe T_CN_URL Contains a URL in the .cn domain

uri T_CN_8_URL  /[\/.]+\w{8}\.cn(?:$|\/|\?)/i
describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 
characters long


http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail
Last night's masscheck.  63243 out of 124241 spam hits T_CN_URL, nearly 51%.

7263 T_CN_URL hits in 15517 spam corpus
7200 T_CN_8_URL hits in 15517 spam corpus

Does this make any sense?  This is funny.  Could someone add this rule 
to the sandbox?  I'm just curious.


Warren Togami
wtog...@redhat.com