RE: .cn Oddity
> -Original Message- > From: jdow [mailto:j...@earthlink.net] > > {^_-} (Some of the ninjas are burned out. I have one such to my > back when we're both in the room beating away at our CPUs.) > +1 burnout. Too many things going on. Will eventually get my 232nd wind and be back in the fight. Although I want to approach antispam in a whole different way next time. I still have a few ideas up my sleave ;) --Chris
Re: .cn Oddity
From: "MySQL Student" Sent: Sunday, 2009/October/11 09:08 Hi, We use some rules if we talk open about it and say hey this spammer is stupid look here, then it will take less then 12 hours and that gap is closed and we loose a valuable trick. yes its the way it is, spammers can also read maillists and adapt there spamming rules to get bypassed It sounds like social engineering needs to be part of the attack rules/strategy that we employ on these spammers :-) They say you can't con a conman. The same "They" say a lot of things that are not strictly true. I'd still love a good ham corpus from China. It may be that most Chinese domains are 8 characters. That would make a "not 8 characters plus .cn" rule devastating to the .cn spammers. That is aside from the fact that they tend to trigger so many effective (yet old) rules and Bayes that none of them have gotten through the filters. {^_-} (Some of the ninjas are burned out. I have one such to my back when we're both in the room beating away at our CPUs.)
Re: .cn Oddity
Hi, >> We use some rules if we talk open about it and say hey this spammer is >> stupid look here, then it will take less then 12 hours and that gap is >> closed and we loose a valuable trick. > > yes its the way it is, spammers can also read maillists and adapt there > spamming rules to get bypassed It sounds like social engineering needs to be part of the attack rules/strategy that we employ on these spammers :-) Regards, Alex
Re: .cn Oddity
On søn 11 okt 2009 12:12:20 CEST, jdow wrote could squeeze his spam decreased. It's still decreasing, although at a slower rate due to the relative inactivity of the SARE ninjas. sare rules is non maintained now, but it could still go to masscheck to get the best of them readded in to sa -- xpoint
Re: .cn Oddity
On søn 11 okt 2009 11:48:11 CEST, Raymond Dijkxhoorn wrote We use some rules if we talk open about it and say hey this spammer is stupid look here, then it will take less then 12 hours and that gap is closed and we loose a valuable trick. yes its the way it is, spammers can also read maillists and adapt there spamming rules to get bypassed Fighting spam is more then just ventilating idea's its much more then that. lets make whitelist of cn domains that is not seen in spam, more fun for the spammers now, can we still say 8 char cn domain rules ? :) what if sender or evelope is a hotmail.* meta it, cn domains could as well have email on there own domain, not tested if i see a email with more then one domain its basicly spam -- xpoint
Re: .cn Oddity
Hi! So I am quite aware of losing good rules. HOWEVER, as he found out WE keep the old rules and add new ones and his keyhole through which he could squeeze his spam decreased. It's still decreasing, although at a slower rate due to the relative inactivity of the SARE ninjas. Most Ninja's including me are idle due to this same exposure thing. We share within the SARE group internally but most are not published like in the past. Some are added by Alex to the generic SA updates however. Bye, Raymond.
Re: .cn Oddity
From: "Raymond Dijkxhoorn" Sent: Sunday, 2009/October/11 02:48 Hi! 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. I have to admire one thing about spammers. They respond very rapidly to "threats" to their ability to break through spam protection software. You became curious and mentioned this on the date above. Spammers are already using <7 character names>.cn. Thats why i said earlier in the thread if you see something, test it silently and add it silently. Thats the only way to get use of it. We use some rules if we talk open about it and say hey this spammer is stupid look here, then it will take less then 12 hours and that gap is closed and we loose a valuable trick. Fighting spam is more then just ventilating idea's its much more then that. Bye, Raymond. Some years ago, Raymond, I "used" this list to bait a specific spammer about how pathetic his scores were. They were high but he didn't break 100. Within a week he found a way. (His spams had (have) many features that very characteristic of his work but hard to use for anti-spam. This involved a specific portion of a name he'd use for registering his phony domains.) So I am quite aware of losing good rules. HOWEVER, as he found out WE keep the old rules and add new ones and his keyhole through which he could squeeze his spam decreased. It's still decreasing, although at a slower rate due to the relative inactivity of the SARE ninjas. {^_^}
Re: .cn Oddity
Hi! 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. I have to admire one thing about spammers. They respond very rapidly to "threats" to their ability to break through spam protection software. You became curious and mentioned this on the date above. Spammers are already using <7 character names>.cn. Thats why i said earlier in the thread if you see something, test it silently and add it silently. Thats the only way to get use of it. We use some rules if we talk open about it and say hey this spammer is stupid look here, then it will take less then 12 hours and that gap is closed and we loose a valuable trick. Fighting spam is more then just ventilating idea's its much more then that. Bye, Raymond.
Re: .cn Oddity
On 10/11/2009 02:07 AM, jdow wrote: I have to admire one thing about spammers. They respond very rapidly to "threats" to their ability to break through spam protection software. You became curious and mentioned this on the date above. Spammers are already using <7 character names>.cn. {^_-} Yes, I see they began registering \w{7}.cn domains around October 3rd and the \w{8}.cn spam is a lot less now. Warren
Re: .cn Oddity
From: "Warren Togami" Sent: Wednesday, 2009/September/30 21:40 uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i describe T_CN_URL Contains a URL in the .cn domain uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters long http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly 51%. 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. Warren Togami wtog...@redhat.com I have to admire one thing about spammers. They respond very rapidly to "threats" to their ability to break through spam protection software. You became curious and mentioned this on the date above. Spammers are already using <7 character names>.cn. {^_-}
Re: .cn Oddity
On Sun, 4 Oct 2009, Warren Togami wrote: On 10/04/2009 04:07 PM, John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: > The "Oddity" I was pointing out at the beginning of the thread is not > prevalence of .cn URI's, but rather most of them appear to be exactly > 8 characters long. Are there any other .cn domain formats (like {8}.com.cn) that would be of interest? I was trolling through a spam quarantine I'd forgoten about and found a message containing this: {domain}.cn {domain}.com.cn {domain}.net.cn I wouldn't bother. I only wanted to check the relative % of CN_EIGHT to CN_URL because I found it strange that the majority of CN_URL had exactly 8 characters. In the end this rule is unsafe to use in production so it doesn't matter much to check for even less prevalent matches that we can't use either. OK BTW, I have commit access now. Mind if I move these rules from your sandbox into my own sandbox? Go ahead. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- I'm seriously considering getting one of those bright-orange prison overalls and stencilling PASSENGER on the back. Along with the paper slippers, I ought to be able to walk right through security. -- Brian Kantor in a.s.r --- Approximately 9164580 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On 10/04/2009 04:07 PM, John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Are there any other .cn domain formats (like {8}.com.cn) that would be of interest? I was trolling through a spam quarantine I'd forgoten about and found a message containing this: {domain}.cn {domain}.com.cn {domain}.net.cn I wouldn't bother. I only wanted to check the relative % of CN_EIGHT to CN_URL because I found it strange that the majority of CN_URL had exactly 8 characters. In the end this rule is unsafe to use in production so it doesn't matter much to check for even less prevalent matches that we can't use either. BTW, I have commit access now. Mind if I move these rules from your sandbox into my own sandbox? Warren Togami wtog...@redhat.com
Re: .cn Oddity
On Thu, 1 Oct 2009, Warren Togami wrote: The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Are there any other .cn domain formats (like {8}.com.cn) that would be of interest? I was trolling through a spam quarantine I'd forgoten about and found a message containing this: {domain}.cn {domain}.com.cn {domain}.net.cn -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- One difference between a liberal and a pickpocket is that if you demand your money back from a pickpocket he will not question your motives. -- William Rusher --- Approximately 9157680 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On Sun, 4 Oct 2009, Karsten Br?ckelmann wrote: On Sun, 2009-10-04 at 09:59 -0400, Warren Togami wrote: On 10/04/2009 12:21 AM, John Hardin wrote: Right, in adding things to the sandbox it does not necessarily mean I suggest they should become rules. I am mainly curious to see what the results say. Warning: autopromotion Is there a way to prevent autopromotion for a particular rule? Yep, using tflags nopublish, Done. Will be committed momentarily. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Warning Labels we'd like to see #1: "If you are a stupid idiot while using this product you may hurt yourself. And it won't be our fault." --- Approximately 9152160 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On Sun, 2009-10-04 at 09:59 -0400, Warren Togami wrote: > On 10/04/2009 12:21 AM, John Hardin wrote: > > > Right, in adding things to the sandbox it does not necessarily mean I > > > suggest they should become rules. I am mainly curious to see what the > > > results say. > > > > Warning: autopromotion > > Is there a way to prevent autopromotion for a particular rule? Yep, using tflags nopublish, or explicitly naming the rule with a T_ prefix. Also see bug 5545 [1]. [1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5545 -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: [SA] .cn Oddity
On 10/04/2009 12:21 AM, John Hardin wrote: On Sat, 3 Oct 2009, Warren Togami wrote: On 10/03/2009 07:50 PM, Adam Katz wrote: 8 is *extremely* important in Chinese culture. When running these tests, make sure that there is a good quantity of .cn TLD URIs in the ham before drawing any conclusions. Right, in adding things to the sandbox it does not necessarily mean I suggest they should become rules. I am mainly curious to see what the results say. Warning: autopromotion Is there a way to prevent autopromotion for a particular rule? Warren
Re: .cn Oddity
On Sat, 3 Oct 2009, Warren Togami wrote: On 10/03/2009 07:11 PM, John Hardin wrote: > [^./]{8}\.cn > > Actually, doesn't this match other characters that shouldn't be in a > domain name? ...is _anything_ (apart from periods) excluded from domain names these days? :) Changed to \w{8} for testing. Can you provide examples of needing more than \w? I doubt it matters for this particular rule, but dash characters are valid in domain names too right? \w seems to be alpha, numeric and underscore. Underscore isn't valid in a domain name. True. Let's let this version go through a masscheck cycle and then I'll change it to [-\w]{8} -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Vista "security improvements" consist of attempting to shift blame onto the user when things go wrong. --- Approximately 9134220 firearms legally purchased in the U.S. this year
Re: [SA] .cn Oddity
On Sat, 3 Oct 2009, Warren Togami wrote: On 10/03/2009 07:50 PM, Adam Katz wrote: 8 is *extremely* important in Chinese culture. When running these tests, make sure that there is a good quantity of .cn TLD URIs in the ham before drawing any conclusions. Right, in adding things to the sandbox it does not necessarily mean I suggest they should become rules. I am mainly curious to see what the results say. Warning: autopromotion -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Vista "security improvements" consist of attempting to shift blame onto the user when things go wrong. --- Approximately 9134220 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On 10/03/2009 07:11 PM, John Hardin wrote: [^./]{8}\.cn Actually, doesn't this match other characters that shouldn't be in a domain name? ...is _anything_ (apart from periods) excluded from domain names these days? :) Changed to \w{8} for testing. Can you provide examples of needing more than \w? I doubt it matters for this particular rule, but dash characters are valid in domain names too right? \w seems to be alpha, numeric and underscore. Underscore isn't valid in a domain name. Warren
Re: [SA] .cn Oddity
On 10/03/2009 07:50 PM, Adam Katz wrote: 8 is *extremely* important in Chinese culture. When running these tests, make sure that there is a good quantity of .cn TLD URIs in the ham before drawing any conclusions. Right, in adding things to the sandbox it does not necessarily mean I suggest they should become rules. I am mainly curious to see what the results say. Warren
Re: [SA] .cn Oddity
Warren Togami wrote: >>> The "Oddity" I was pointing out at the beginning of the thread is not >>> prevalence of .cn URI's, but rather most of them appear to be exactly 8 >>> characters long. Could someone please commit my T_CN_8_URL rule to the >>> sandbox so we can see if that trend holds beyond my own corpa? >> >> (And yes I'm fully aware even this narrowed rule is prejudiced and >> unsafe. This is is partly out of curiosity, and also wondering if it >> could be made useful if meta booleaned with something else.) jdow then mused: > I just had a thought, Warren. Look up Chinese numerology. 8 signifies > wealth or sudden prosperity. Conversely, I suspect few Chinese names > are four characters. Four is a pun on death. Some social sites might > like 5 letters - me. 7 is right out, it's a vulgar word in Cantonese. > 9 is also slang or vulgar in Cantonese. > > I wonder how many companies that deal with China have figured out that > an "888" toll free number is WONDERFUL, "Wealth, wealth, wealth." > > I understand numerology is quite important to the Chinese. (Of course, > I am not claiming to be an expert. The above is mostly Wikipoodle and > surmise.) 8 is *extremely* important in Chinese culture. When running these tests, make sure that there is a good quantity of .cn TLD URIs in the ham before drawing any conclusions.
Re: .cn Oddity
On Sat, 3 Oct 2009, Warren Togami wrote: Can't trust those results yet. The trailing slash bug, and John Rudd might be correct about whitespace? I doubt whitespace will be a problem. That would break the parser before it even got to the rule, and while "dom%20name.cn" might be syntactically valid would a registrar ever _accept_ such a domain name? Examples solicited. [^./]{8}\.cn Actually, doesn't this match other characters that shouldn't be in a domain name? ...is _anything_ (apart from periods) excluded from domain names these days? :) Changed to \w{8} for testing. Can you provide examples of needing more than \w? Then there are "valid" URL's like http://password:usern...@example.com/ not matched by this rule. The URI parser apparently discards username:password@ from URIs: [6788] dbg: rules: ran body rule ALL_BODY ==> got hit: "http://fnord:b...@87654321.cn"; [6788] dbg: rules: ran uri rule CN_EIGHT ==> got hit: "http://87654321.cn"; Could you please add the following to the sandbox before tomorrow? # from http://www.apnic.net/db/ranges.html at 20091002, meta bits added # 20090930 # copied from khop-bl.sa.khopesh.com header __RCVD_VIA_APNIC Received =~ /(?-xism:[^0-9.](?:2(?:0(?:2(?:\.1(?:2(?:3\.(?:0?(?:[4-9][0-9]|3[2-9])|[12][0-9]{2})\.[012]?[0-9]{1,2}|[^3]\.(?:012]?[0-9]{1,2}){2})|[^2]3\.(?:012]?[0-9]{1,2}){2})|(?:\.[02]?[0-9]{1,2}){3})|3(?:\.[012]?[0-9]{1,2}){3})|(?:1[0189]|2[012])(?:\.[012]?[0-9]{1,2}){3})|1(?:(?:2[0123456]|8[023]|1\d|75)(?:\.[012]?[0-9]{1,2}){3}|69\.2(?:1[0-9]|2[0-3]|0[89])(?:\.[012]?[0-9]{1,2}){2})|(?:5[89]|6[01])(?:\.[012]?[0-9]{1,2}){3})(?:[\]\)\s]))/ describe __RCVD_VIA_APNIC Received through a relay in Asia/Pacific Network meta CN_EIGHT_NOAPNIC CN_EIGHT && !__RCVD_VIA_APNIC && !ALL_TRUSTED describe CN_EIGHT_NOAPNIC .cn URI exactly 8 characters long, excluding APNIC One silly arbitrary rule, excluding prejudiced rule. This is still unsafe but should show us some interesting numbers. Done. Not sure if the nightly is already running or not... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #6: If you can choose what to bring to a gunfight, bring a long gun and a friend with a long gun. --- Approximately 9127320 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On Sat, Oct 3, 2009 at 15:55, John Hardin wrote: > On Sat, 3 Oct 2009, John Rudd wrote: > >> On Sat, Oct 3, 2009 at 11:06, Warren Togami wrote: >> >>> >>> # 8-letter .cn domain, per Warren Togami >>> uri CN_EIGHT >>> m;^https?://(?:[^./]+\.)*[^./]{8}\.cn/; >>> describe CN_EIGHT .CN uri with eight-letter domain name >>> score CN_EIGHT 0.10 >>> >>> Possible bug here... Do all URI's necessarily have a trailing slash? >> >> >> And, don't you want to omit whitespace from the 8 characters? Or am I >> missing something that takes care of that for you? > > I don't think a parsed URI would have whitespace in the hostname part. This > isn't a body rule. That would be the part I was missing :-)
Re: .cn Oddity
On Sat, 3 Oct 2009, John Rudd wrote: On Sat, Oct 3, 2009 at 11:06, Warren Togami wrote: # 8-letter .cn domain, per Warren Togami uri CN_EIGHT m;^https?://(?:[^./]+\.)*[^./]{8}\.cn/; describe CN_EIGHT .CN uri with eight-letter domain name score CN_EIGHT 0.10 Possible bug here... Do all URI's necessarily have a trailing slash? And, don't you want to omit whitespace from the 8 characters? Or am I missing something that takes care of that for you? I don't think a parsed URI would have whitespace in the hostname part. This isn't a body rule. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #6: If you can choose what to bring to a gunfight, bring a long gun and a friend with a long gun. --- Approximately 9127320 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On 10/03/2009 05:08 PM, John Hardin wrote: On Sat, 3 Oct 2009, Warren Togami wrote: On 10/01/2009 02:36 PM, John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: > The "Oddity" I was pointing out at the beginning of the thread is not > prevalence of .cn URI's, but rather most of them appear to be exactly > 8 characters long. Could someone please commit my T_CN_8_URL rule to > the sandbox so we can see if that trend holds beyond my own corpa? I've put a .CN 8 URI rule into my sandbox file but it may be a few days before it gets committed, my stuff is in flux right now... # 8-letter .cn domain, per Warren Togami uri CN_EIGHT m;^https?://(?:[^./]+\.)*[^./]{8}\.cn/; describe CN_EIGHT .CN uri with eight-letter domain name score CN_EIGHT 0.10 Possible bug here... Do all URI's necessarily have a trailing slash? First results are in: http://ruleqa.spamassassin.org/20091003-r821273-n/T_CN_EIGHT/detail Can't trust those results yet. The trailing slash bug, and John Rudd might be correct about whitespace? [^./]{8}\.cn Actually, doesn't this match other characters that shouldn't be in a domain name? Then there are "valid" URL's like http://password:usern...@example.com/ not matched by this rule. Could you please add the following to the sandbox before tomorrow? # from http://www.apnic.net/db/ranges.html at 20091002, meta bits added 20090930 # copied from khop-bl.sa.khopesh.com header __RCVD_VIA_APNIC Received =~ /(?-xism:[^0-9.](?:2(?:0(?:2(?:\.1(?:2(?:3\.(?:0?(?:[4-9][0-9]|3[2-9])|[12][0-9]{2})\.[012]?[0-9]{1,2}|[^3]\.(?:012]?[0-9]{1,2}){2})|[^2]3\.(?:012]?[0-9]{1,2}){2})|(?:\.[02]?[0-9]{1,2}){3})|3(?:\.[012]?[0-9]{1,2}){3})|(?:1[0189]|2[012])(?:\.[012]?[0-9]{1,2}){3})|1(?:(?:2[0123456]|8[023]|1\d|75)(?:\.[012]?[0-9]{1,2}){3}|69\.2(?:1[0-9]|2[0-3]|0[89])(?:\.[012]?[0-9]{1,2}){2})|(?:5[89]|6[01])(?:\.[012]?[0-9]{1,2}){3})(?:[\]\)\s]))/ describe __RCVD_VIA_APNIC Received through a relay in Asia/Pacific Network meta CN_EIGHT_NOAPNIC CN_EIGHT && !__RCVD_VIA_APNIC && !ALL_TRUSTED describe CN_EIGHT_NOAPNIC .cn URI exactly 8 characters long, excluding APNIC One silly arbitrary rule, excluding prejudiced rule. This is still unsafe but should show us some interesting numbers. Warren Togami wtog...@redhat.com
Re: .cn Oddity
On Sat, Oct 3, 2009 at 11:06, Warren Togami wrote: > > # 8-letter .cn domain, per Warren Togami > uri CN_EIGHT m;^https?://(?:[^./]+\.)*[^./]{8}\.cn/; > describe CN_EIGHT .CN uri with eight-letter domain name > score CN_EIGHT 0.10 > > Possible bug here... Do all URI's necessarily have a trailing slash? And, don't you want to omit whitespace from the 8 characters? Or am I missing something that takes care of that for you?
Re: .cn Oddity
On Sat, 3 Oct 2009, Warren Togami wrote: On 10/01/2009 02:36 PM, John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: > The "Oddity" I was pointing out at the beginning of the thread is not > prevalence of .cn URI's, but rather most of them appear to be exactly > 8 characters long. Could someone please commit my T_CN_8_URL rule to > the sandbox so we can see if that trend holds beyond my own corpa? I've put a .CN 8 URI rule into my sandbox file but it may be a few days before it gets committed, my stuff is in flux right now... # 8-letter .cn domain, per Warren Togami uriCN_EIGHTm;^https?://(?:[^./]+\.)*[^./]{8}\.cn/; describe CN_EIGHT.CN uri with eight-letter domain name score CN_EIGHT0.10 Possible bug here... Do all URI's necessarily have a trailing slash? First results are in: http://ruleqa.spamassassin.org/20091003-r821273-n/T_CN_EIGHT/detail -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- We are hell-bent and determined to allocate the talent, the resources, the money, the innovation to absolutely become a powerhouse in the ad business. -- Microsoft CEO Steve Ballmer ...because allocating talent to securing Windows isn't profitable? --- Approximately 9125940 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On Sat, 3 Oct 2009, Ned Slider wrote: Warren Togami wrote: On 10/01/2009 02:36 PM, John Hardin wrote: > On Thu, 1 Oct 2009, Warren Togami wrote: > > > The "Oddity" I was pointing out at the beginning of the thread is > > not prevalence of .cn URI's, but rather most of them appear to be > > exactly 8 characters long. Could someone please commit my > > T_CN_8_URL rule to the sandbox so we can see if that trend holds > > beyond my own corpa? > > I've put a .CN 8 URI rule into my sandbox file but it may be a few > days before it gets committed, my stuff is in flux right now... > # 8-letter .cn domain, per Warren Togami uriCN_EIGHTm;^https?://(?:[^./]+\.)*[^./]{8}\.cn/; describe CN_EIGHT.CN uri with eight-letter domain name score CN_EIGHT0.10 Possible bug here... Do all URI's necessarily have a trailing slash? \b might be better? Yes. I didn't use \b because I had a temporary attack of the stupids. Fixed. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- It's easy to be noble with other people's money. -- John McKay, _The Welfare State: No Mercy for the Middle Class_ --- Approximately 9123180 firearms legally purchased in the U.S. this year
Re: .cn Oddity
Warren Togami wrote: On 10/01/2009 02:36 PM, John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? I've put a .CN 8 URI rule into my sandbox file but it may be a few days before it gets committed, my stuff is in flux right now... # 8-letter .cn domain, per Warren Togami uriCN_EIGHTm;^https?://(?:[^./]+\.)*[^./]{8}\.cn/; describe CN_EIGHT.CN uri with eight-letter domain name score CN_EIGHT0.10 Possible bug here... Do all URI's necessarily have a trailing slash? Warren Togami wtog...@redhat.com \b might be better?
Re: .cn Oddity
On 10/01/2009 02:36 PM, John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? I've put a .CN 8 URI rule into my sandbox file but it may be a few days before it gets committed, my stuff is in flux right now... # 8-letter .cn domain, per Warren Togami uriCN_EIGHTm;^https?://(?:[^./]+\.)*[^./]{8}\.cn/; describe CN_EIGHT.CN uri with eight-letter domain name score CN_EIGHT0.10 Possible bug here... Do all URI's necessarily have a trailing slash? Warren Togami wtog...@redhat.com
Re: .cn Oddity
Hi All, Regarding the .cn oddity, I added these to my rules, and of about 79k messages today so far, I have the following: uri LOC_URI_CN m;^https?://[^/?]+\.cn\b; uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i LOC_URI_CN: 2926 T_CN_8_URL: 1634 HTH, Alex
Re: .cn Oddity
On Thu, 1 Oct 2009, Warren Togami wrote: The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? I've put a .CN 8 URI rule into my sandbox file but it may be a few days before it gets committed, my stuff is in flux right now... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #9: Accuracy is relative: most combat shooting standards will be more dependent on "pucker factor" than the inherent accuracy of the gun. --- Approximately 9055560 firearms legally purchased in the U.S. this year
Re: .cn Oddity
From: "Ned Slider" Sent: Thursday, 2009/October/01 10:48 Warren Togami wrote: On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: "John Hardin" Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. "what works" is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren Warren, Seems to hold true here to an extent. From my recent confirmed spam archive I see: # cat spam* | grep '\.cn\b' | grep http | wc -l 1088 # cat spam* | grep '\.\w\{8\}\.cn\b' | grep http | wc -l 908 # cat spam* | grep '\/\w\{8\}\.cn\b' | grep http | wc -l 23 so 85% of .cn URIs also match the {8}.cn pattern. Not quite as high as your findings, but very high nevertheless. Based on my last note about Chinese numerology I bet if you have a large Chinese ham corpus you'd pick up on 8 as a magic number there, too. I am intrigued enough I'd LOVE to know if that's right. {^_^}
Re: .cn Oddity
From: "Warren Togami" Sent: Thursday, 2009/October/01 10:24 On 10/01/2009 01:16 PM, Warren Togami wrote: On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: "John Hardin" Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. "what works" is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren (And yes I'm fully aware even this narrowed rule is prejudiced and unsafe. This is is partly out of curiosity, and also wondering if it could be made useful if meta booleaned with something else.) Warren I just had a thought, Warren. Look up Chinese numerology. 8 signifies wealth or sudden prosperity. Conversely, I suspect few Chinese names are four characters. Four is a pun on death. Some social sites might like 5 letters - me. 7 is right out, it's a vulgar word in Cantonese. 9 is also slang or vulgar in Cantonese. I wonder how many companies that deal with China have figured out that an "888" toll free number is WONDERFUL, "Wealth, wealth, wealth." I understand numerology is quite important to the Chinese. (Of course, I am not claiming to be an expert. The above is mostly Wikipoodle and surmise.) {^_-}
Re: .cn Oddity
Warren Togami wrote: On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: "John Hardin" Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. "what works" is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren Warren, Seems to hold true here to an extent. From my recent confirmed spam archive I see: # cat spam* | grep '\.cn\b' | grep http | wc -l 1088 # cat spam* | grep '\.\w\{8\}\.cn\b' | grep http | wc -l 908 # cat spam* | grep '\/\w\{8\}\.cn\b' | grep http | wc -l 23 so 85% of .cn URIs also match the {8}.cn pattern. Not quite as high as your findings, but very high nevertheless.
Re: .cn Oddity
On 10/01/2009 01:16 PM, Warren Togami wrote: On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: "John Hardin" Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. "what works" is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren (And yes I'm fully aware even this narrowed rule is prejudiced and unsafe. This is is partly out of curiosity, and also wondering if it could be made useful if meta booleaned with something else.) Warren
Re: .cn Oddity
On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: "John Hardin" Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. "what works" is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The "Oddity" I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren
Re: .cn Oddity
On Thu, 1 Oct 2009, jdow wrote: From: "John Hardin" Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. "what works" is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If "healthcare is a Right" means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- Approximately 9052800 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On Thu, 1 Oct 2009, Benny Pedersen wrote: On tor 01 okt 2009 18:26:01 CEST, John Hardin wrote m;^https?://[^/?]+\.cn\b; replace ; with / no ? m/\bhttps?://[^/?]+\.cn\b/i No. The point to m; is so that you can embed / in the RE without escaping them. You are changing the RE delimiters. m{...} is fine _if_ you don't use {m,n} syntax, in which case it becomes confusing. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If "healthcare is a Right" means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- Approximately 9052800 firearms legally purchased in the U.S. this year
Re: .cn Oddity
From: "John Hardin" Sent: Thursday, 2009/October/01 09:26 On Thu, 1 Oct 2009, Ned Slider wrote: John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: > uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i > describe T_CN_URL Contains a URL in the .cn domain > > uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i > describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 > characters long > > http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail > Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, > nearly 51%. > > 7263 T_CN_URL hits in 15517 spam corpus > 7200 T_CN_8_URL hits in 15517 spam corpus > > Does this make any sense? This is funny. Could someone add this > rule to the sandbox? I'm just curious. I note that neither is anchored at the beginning of the URI, so they may be hitting on .cn embedded somewhere within the path part. That doesn't explain 51%, though. I run my own custom .cn tld URI rule, and whilst it's right down in percentage terms atm, in the past it has certainly hit on around 50% plus of all spam containing a URI. So depending on the corpus, I'm not surprised by the 51%. uri LOCAL_URI_CN m{https?://.{1,40}\.cn\b} describe LOCAL_URI_CN contains link to Chinese tld Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. {^_-}
Re: .cn Oddity
On tor 01 okt 2009 18:26:01 CEST, John Hardin wrote m;^https?://[^/?]+\.cn\b; replace ; with / no ? m/\bhttps?://[^/?]+\.cn\b/i -- xpoint
Re: .cn Oddity
On Thu, 1 Oct 2009, Ned Slider wrote: John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: > uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i > describe T_CN_URL Contains a URL in the .cn domain > > uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i > describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 > characters long > > http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail > Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly > 51%. > > 7263 T_CN_URL hits in 15517 spam corpus > 7200 T_CN_8_URL hits in 15517 spam corpus > > Does this make any sense? This is funny. Could someone add this rule > to the sandbox? I'm just curious. I note that neither is anchored at the beginning of the URI, so they may be hitting on .cn embedded somewhere within the path part. That doesn't explain 51%, though. I run my own custom .cn tld URI rule, and whilst it's right down in percentage terms atm, in the past it has certainly hit on around 50% plus of all spam containing a URI. So depending on the corpus, I'm not surprised by the 51%. uri LOCAL_URI_CNm{https?://.{1,40}\.cn\b} describeLOCAL_URI_CNcontains link to Chinese tld Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If "healthcare is a Right" means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- Approximately 9052800 firearms legally purchased in the U.S. this year
Re: .cn Oddity
John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i describe T_CN_URL Contains a URL in the .cn domain uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters long http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly 51%. 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. I note that neither is anchored at the beginning of the URI, so they may be hitting on .cn embedded somewhere within the path part. That doesn't explain 51%, though. I run my own custom .cn tld URI rule, and whilst it's right down in percentage terms atm, in the past it has certainly hit on around 50% plus of all spam containing a URI. So depending on the corpus, I'm not surprised by the 51%. uri LOCAL_URI_CNm{https?://.{1,40}\.cn\b} describeLOCAL_URI_CNcontains link to Chinese tld
Re: .cn Oddity
On Thu, 1 Oct 2009, Warren Togami wrote: uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i describe T_CN_URL Contains a URL in the .cn domain uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters long http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly 51%. 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. I note that neither is anchored at the beginning of the URI, so they may be hitting on .cn embedded somewhere within the path part. That doesn't explain 51%, though. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Therapeutic Phrenologist - send email for affordable rate schedule. --- Approximately 9051420 firearms legally purchased in the U.S. this year
.cn Oddity
uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i describe T_CN_URL Contains a URL in the .cn domain uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters long http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly 51%. 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. Warren Togami wtog...@redhat.com