Re: bayes, numbers of tokens and performance

2010-03-18 Thread Matus UHLAR - fantomas
On 17.03.10 16:47, tonjg wrote: I've done a bayes learn on 500 spams and 400 hams, my spam hit threshold is 5 and I'm getting a success rate of about 40% in identifying spam, and that's after doing an sa-update too. I was hoping to get better results than this. How many spam and ham tokens

Re: sa-update channels

2010-03-18 Thread Kai Schaetzl
Micah Anderson wrote on Wed, 17 Mar 2010 18:20:40 -0400: saupdates.openprotect.com It's been said repeatedly on this list: don't use it. Kai -- Get your web at Conactive Internet Services: http://www.conactive.com

Re: Checking to see if multiple types of headers exist

2010-03-18 Thread Ned Slider
Julian Yap wrote: I'm trying to consolidate some rules I have. I'm wondering if there's a way to see if multiple types of headers exist. eg. Currently separate rules: header CMN_LIST_1 exists:X-Campid describe CMN_LIST_1 Mail comes from a common campaign list mailer score CMN_LIST_1 0.5

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Kai Schaetzl
So, how many tokens do you have in your db now? Kai -- Get your web at Conactive Internet Services: http://www.conactive.com

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Matus UHLAR - fantomas wrote: do you have network checks enabled? Do you have network plugins (razor, pyzor, dcc, uribl) loaded? Do you have other plugins (like textcat) loaded? no, I am unfamilar with these plugins. which version of SA do you have installed? version 3.2.5-1.el4.rf

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Kai Schaetzl wrote: So, how many tokens do you have in your db now? I hope this command gives the correct answer... # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0514 0 non-token data: nspam 0.000 0

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Mikael Syska wrote: Does it help when you sa-learn the spams ? Does it change the BAYES_ score for that mail ? I'm going to do another sa-learn when I hit 100 more spams and I'll see then if it makes a difference. In the meantime I've lowered my hit threshold to 4. DNS available? no --

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Bowie Bailey
tonjg wrote: Kai Schaetzl wrote: So, how many tokens do you have in your db now? I hope this command gives the correct answer... # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0514 0 non-token data:

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Kai Schaetzl
Tonjg wrote on Thu, 18 Mar 2010 05:17:21 -0700 (PDT): I hope this command gives the correct answer... # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0514 0 non-token data: nspam 0.000 0402

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Kai Schaetzl
Tonjg wrote on Thu, 18 Mar 2010 05:20:45 -0700 (PDT): I'm going to do another sa-learn when I hit 100 more spams and I'll see then if it makes a difference. In the meantime I've lowered my hit threshold to 4. Don't do that. Kai -- Get your web at Conactive Internet Services:

RE: Upgrading to SpamAssassin 3.3

2010-03-18 Thread Kaleb Hosie
I restored my VM back to it's original snapshot and installed the module you uploaded to your website but it's giving me problems. Here's what I get: [r...@mailgate ~]# rpm -ivh ./perl-NetAddr-IP-4.004-2.el5.src.rpm warning: ./perl-NetAddr-IP-4.004-2.el5.src.rpm: Header V3 DSA signature: NOKEY,

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Matus UHLAR - fantomas
Mikael Syska wrote: Does it help when you sa-learn the spams ? Does it change the BAYES_ score for that mail ? On 18.03.10 05:20, tonjg wrote: I'm going to do another sa-learn when I hit 100 more spams and I'll see then if it makes a difference. learn whenever possible, mostly on

Re: sa-update channels

2010-03-18 Thread Jason Bertoch
On 2010/03/17 6:20 PM, Micah Anderson wrote: I'm trying to find out what the current state of the art is for plugins and channel updates. For channels I've been using: updates.spamassassin.org sought.rules.yerp.org saupdates.openprotect.com But I wonder if the last two are still relevant, or

Re: sa-update channels

2010-03-18 Thread Yet Another Ninja
On 2010-03-18 15:02, Jason Bertoch wrote: On 2010/03/17 6:20 PM, Micah Anderson wrote: I'm trying to find out what the current state of the art is for plugins and channel updates. For channels I've been using: updates.spamassassin.org sought.rules.yerp.org saupdates.openprotect.com But I

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Kai Schaetzl wrote: Don't do that. why not? -- View this message in context: http://old.nabble.com/bayes%2C-numbers-of-tokens-and-performance-tp27940005p27946788.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Matus UHLAR - fantomas wrote: DNS available? no well, why? DNS helps very much for catching spam. all blacklists use DNS (afaik) sorry, when you said dns I didn't know you were referring to the dnsbl's. I know the black lists are excellent for filtering spam but I've got those switched

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Kai Schaetzl wrote: I've been running with 2 million with no problems. based on this I see you're right that my db's are tiny and not enough for the success rate I'm aiming for. -- View this message in context:

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Jason Bertoch
On 2010/03/18 10:56 AM, tonjg wrote: Kai Schaetzl wrote: Don't do that. why not? Rule scores are generated based on a default required_score of 5. Fiddling with the required_score should be the _last_ thing you do, if at all. You should really try to determine why your system isn't

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Jason Bertoch-2 wrote: You should really try to determine why your system isn't performing well first. ok I've changed it back to 5 -- View this message in context: http://old.nabble.com/bayes%2C-numbers-of-tokens-and-performance-tp27940005p27947096.html Sent from the SpamAssassin - Users

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
update: after doing some reading on google I found init.pre and added: loadplugin Mail::SpamAssassin::Plugin::Razor2 and loadplugin Mail::SpamAssassin::Plugin::Pyzor and restarted spamassassin. -- View this message in context:

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Yet Another Ninja
On 2010-03-18 16:36, tonjg wrote: update: after doing some reading on google I found init.pre and added: loadplugin Mail::SpamAssassin::Plugin::Razor2 and loadplugin Mail::SpamAssassin::Plugin::Pyzor and restarted spamassassin. Did you also install the plugins? These two are not

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Mikael Syska
Hi On Thu, Mar 18, 2010 at 1:20 PM, tonjg t...@freeuk.com wrote: Mikael Syska wrote: Does it help when you sa-learn the spams ? Does it change the BAYES_ score for that mail ? I'm going to do another sa-learn when I hit 100 more spams and I'll see then if it makes a difference. In the

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Yet Another Ninja wrote: Did you also install the plugins? These two are not delivered with SA. I thought they were. In my system I've got: /var/lib/spamassassin/3.002005/updates_spamassassin_org/25_razor2.cf /usr/share/spamassassin/25_razor2.cf and

RE: Upgrading to SpamAssassin 3.3

2010-03-18 Thread Kaleb Hosie
I have figured it out. I downloaded the source for an old version of NetAddr:IP from: http://search.cpan.org/~luismunoz/NetAddr-IP-4.007/IP.pm#INSTALLATION After compiling that code, it worked. I have one issue with Spamassassin right now though. I use a program called SpamAssassin Quarantine

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Bowie Bailey
tonjg wrote: Yet Another Ninja wrote: Did you also install the plugins? These two are not delivered with SA. I thought they were. In my system I've got: /var/lib/spamassassin/3.002005/updates_spamassassin_org/25_razor2.cf /usr/share/spamassassin/25_razor2.cf and

unwhitelist from_dkim?

2010-03-18 Thread Michael Scheidell
how do you unwhitelist from these? def_whitelist_from_dkim *...@facebookmail.com would it be unwhitelist_from_dkim *...@facebookmail.com? (guess not: su vscan -c 'spamassassin --lint' Mar 18 16:42:35.367 [15557] warn: config: failed to parse line, skipping, in

Pathological messages causing long scan times

2010-03-18 Thread Kris Deugau
Every so often I get nudged to check into a message stuck on one of our inbound MXes. So far, every one of them has been spam, but a few have cause some odd behaviour with spamc/spamd. Here's one pretty much guaranteed to peg a CPU core for ~130 seconds (or more):

RE: unwhitelist from_dkim?

2010-03-18 Thread Chris Richman
Hi, Michael. If there is an email address that you'd like to never receive email from LinkedIn, let me know and I can add it to our suppression list. Sorry for the troubles. Chris -Original Message- From: Michael Scheidell [mailto:scheid...@secnap.net] Sent: Thursday, March 18, 2010

Re: Pathological messages causing long scan times

2010-03-18 Thread Matt Garretson
On 3/18/2010 5:15 PM, Kris Deugau wrote: Here's one pretty much guaranteed to peg a CPU core for ~130 seconds (or more): http://pastebin.com/2ssy2YEk Interesting. I see the same thing as you on that message. There's a two-minute gap between these two debug lines: rules: ran body rule

Re: Pathological messages causing long scan times

2010-03-18 Thread Justin Mason
On Thu, Mar 18, 2010 at 21:56, Matt Garretson ma...@assembly.state.ny.us wrote: On 3/18/2010 5:15 PM, Kris Deugau wrote: Here's one pretty much guaranteed to peg a CPU core for ~130 seconds (or more): http://pastebin.com/2ssy2YEk Interesting. I see the same thing as you on that message.

Yahoo/URL spam

2010-03-18 Thread Alex
Hi, I'm having a real problem with this persistent spam that contains just a URL as the body, and is always from yahoo. I've got an example here: http://pastebin.com/UqzhDHEu 'example.com' is my change. I'm using SA v3.2.5 with postfix/amavis. I'm concerned that the bayes score is always low. I

Re: Pathological messages causing long scan times

2010-03-18 Thread Matt Garretson
On 3/18/2010 5:56 PM, Matt Garretson wrote: On 3/18/2010 5:15 PM, Kris Deugau wrote: Here's one pretty much guaranteed to peg a CPU core for ~130 seconds (or http://pastebin.com/2ssy2YEk Interesting. I see the same thing as you on that message. There's a two-minute gap between these two

Re: Pathological messages causing long scan times

2010-03-18 Thread Matt Garretson
On 3/18/2010 6:06 PM, Matt Garretson wrote: It looks like a dns call (or two?) for URI-A took 120 seconds to return. Is that a mere coincdence, or could that be causing a spin of some sort? FWIW, strace shows spamassassin doing this about twice a second (with varying arguments) during the

Re: Pathological messages causing long scan times

2010-03-18 Thread Justin Mason
that's CPU-bound, no system calls = regexp matching. body, rawbody or full rules. On Thu, Mar 18, 2010 at 22:16, Matt Garretson ma...@assembly.state.ny.us wrote: On 3/18/2010 6:06 PM, Matt Garretson wrote: It looks like a dns call (or two?) for URI-A took 120 seconds to return. Is that a mere

RE: Pathological messages causing long scan times

2010-03-18 Thread Gary Smith
Here's one pretty much guaranteed to peg a CPU core for ~130 seconds (or more): http://pastebin.com/2ssy2YEk I'm not seeing your 130 sec CPU issue on my end. Are as mentioned by Matt, are you running into some DNS issue? These are stock rule + other house rules in place. I'm not

Re: Yahoo/URL spam

2010-03-18 Thread Martin Gregorie
On Thu, 2010-03-18 at 18:05 -0400, Alex wrote: Hi, I'm having a real problem with this persistent spam that contains just a URL as the body, and is always from yahoo. I've got an example here: http://pastebin.com/UqzhDHEu 'example.com' is my change. I'm using SA v3.2.5 with

Re: Yahoo/URL spam

2010-03-18 Thread RW
On Thu, 18 Mar 2010 22:31:04 + Martin Gregorie mar...@gregorie.org wrote: There's something odd about the message as posted: I'm getting hits on MISSING_SUBJECT and MISSING_DATE (SA 3.3.0). Some of the wrapped headers aren't properly indented. Probably happened on editing.

Re: unwhitelist from_dkim?

2010-03-18 Thread Mark Martinec
Michael, how do you unwhitelist from these? def_whitelist_from_dkim *...@facebookmail.com would it be unwhitelist_from_dkim *...@facebookmail.com? Should have been, but it was never implemented. Here is a patch to implement it (against 3.3.0 or 3.3.1 or trunk). Please open a feature request

Re: Checking to see if multiple types of headers exist

2010-03-18 Thread Matt Kettler
On 3/18/2010 6:41 AM, Ned Slider wrote: Julian Yap wrote: I'm trying to consolidate some rules I have. I'm wondering if there's a way to see if multiple types of headers exist. eg. Currently separate rules: header CMN_LIST_1 exists:X-Campid describe CMN_LIST_1 Mail comes from a common

Re: Bayes and Plugin Tokens

2010-03-18 Thread Matt Kettler
On 3/17/2010 8:32 PM, RW wrote: The ASN and RelayCountry plugins are supposed to add extra tokens to Bayes. However, I don't see any evidence that this is happening (in 3.3.0). In a test message I see: bayes: header tokens for X-Relay-Countries = GB GB ** ** GB GB but don't see any

Re: Pathological messages causing long scan times

2010-03-18 Thread John Hardin
On Fri, 19 Mar 2010, Mark Martinec wrote: On Thursday March 18 2010 23:18:56 Justin Mason wrote: that's CPU-bound, no system calls = regexp matching. body, rawbody or full rules. Yes, it's terrible, takes 4 minutes here (SA 3.3, perl 5.10.1). The offending rule is FILL_THIS_FORM_LONG from

Re: Pathological messages causing long scan times

2010-03-18 Thread John Hardin
On Thu, 18 Mar 2010, John Hardin wrote: On Fri, 19 Mar 2010, Mark Martinec wrote: On Thursday March 18 2010 23:18:56 Justin Mason wrote: that's CPU-bound, no system calls = regexp matching. body, rawbody or full rules. Yes, it's terrible, takes 4 minutes here (SA 3.3, perl 5.10.1).