Re: Number of rules
On Jul 30, 2009, at 18:12, Dennis B. Hopp dh...@coreps.com wrote: Yeah I knew that. I have a few negative scoring rules but not many (outside of what might be in the misc rules sets I have). What is a good threshold for ham then? 5.0 is the score SA us designed for. It's a very good number in almost all cases.
Parallelizing Spam Assassin
Hi I was measuring how quickly could SA [spam assassin] process spams when several SA processes are run in parallel over separate mbox files. I used a 8 core machine. Below are the numbers when I forked different number of processes. Fork = 8; Rate = 57 msgs/sec Fork = 4; Rate = 44 msgs/sec Fork = 1; Rate = 22 msgs/sec I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing a linear increase in the throughput? Is a file locking creating the bottleneck? If yes, which particular file is being locked? If no, what could be the reason for this? thnx -- View this message in context: http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24751958.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
AutoWhiteList
Hi, Where can I find sa-awlUtil as it does not appear to be in the download file ? Best Regards, -- SplatNIX IT Services :: Innovation through collaboration
Re: Parallelizing Spam Assassin
hi -- turn off Bayes and AWL. On Fri, Jul 31, 2009 at 07:55, poifghabhinav.pat...@gmail.com wrote: Hi I was measuring how quickly could SA [spam assassin] process spams when several SA processes are run in parallel over separate mbox files. I used a 8 core machine. Below are the numbers when I forked different number of processes. Fork = 8; Rate = 57 msgs/sec Fork = 4; Rate = 44 msgs/sec Fork = 1; Rate = 22 msgs/sec I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing a linear increase in the throughput? Is a file locking creating the bottleneck? If yes, which particular file is being locked? If no, what could be the reason for this? thnx -- View this message in context: http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24751958.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com. -- --j.
Re: Parallelizing Spam Assassin
On Thu, Jul 30, 2009 at 11:55:21PM -0700, poifgh wrote: Why am I not seeing a linear increase in the throughput? Is a file locking creating the bottleneck? Maybe the auto white list. --
Re: Parallelizing Spam Assassin
On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote: Hi I was measuring how quickly could SA [spam assassin] process spams when several SA processes are run in parallel over separate mbox files. I used a 8 core machine. Below are the numbers when I forked different number of processes. Fork = 8; Rate = 57 msgs/sec Fork = 4; Rate = 44 msgs/sec Fork = 1; Rate = 22 msgs/sec I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing a linear increase in the throughput? Is a file locking creating the bottleneck? If yes, which particular file is being locked? If no, what could be the reason for this? thnx Wow! That's a real flying machine! Imagine what Barracuda Networks could do with that if they did not fill their gay little boxes with hardware rubbish from the floors of MSI and supermicro. Jesus, try and process that many messages with a $30,000 Barracuda and watch support bitch 'You are fully scanning to much mail and making our rubbish hardware wet the bed.' LOL. Well done you!
Re: Parallelizing Spam Assassin
On Fri, Jul 31, 2009 at 09:32, rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote: Imagine what Barracuda Networks could do with that if they did not fill their gay little boxes with hardware rubbish from the floors of MSI and supermicro. Jesus, try and process that many messages with a $30,000 Barracuda and watch support bitch 'You are fully scanning to much mail and making our rubbish hardware wet the bed.' LOL. Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate. -- --j.
Re: Parallelizing Spam Assassin
On Fri, Jul 31, 2009 at 09:32:42AM +0100, rich...@buzzhost.co.uk wrote: On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote: Hi I was measuring how quickly could SA [spam assassin] process spams when several SA processes are run in parallel over separate mbox files. I used a 8 core machine. Below are the numbers when I forked different number of processes. Fork = 8; Rate = 57 msgs/sec Fork = 4; Rate = 44 msgs/sec Fork = 1; Rate = 22 msgs/sec I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing a linear increase in the throughput? Is a file locking creating the bottleneck? If yes, which particular file is being locked? If no, what could be the reason for this? thnx Wow! That's a real flying machine! Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used and any nondefault rules/settings? Certainly sounds strange that 1 core could top out the same. Anyone else have figures? Maybe I've borked something myself..
Re: Parallelizing Spam Assassin
On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote: On Fri, Jul 31, 2009 at 09:32, rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote: Imagine what Barracuda Networks could do with that if they did not fill their gay little boxes with hardware rubbish from the floors of MSI and supermicro. Jesus, try and process that many messages with a $30,000 Barracuda and watch support bitch 'You are fully scanning to much mail and making our rubbish hardware wet the bed.' LOL. Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate. I apologise for the any language deemed offensive. Whilst 'Jesus', 'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for openly swearing and using the filty phrase 'Barracuda Networks'. For this I apologise.
Re: Parallelizing Spam Assassin
On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote: [...] I was measuring how quickly could SA [spam assassin] process spams when several SA processes are run in parallel over separate mbox files. I used a 8 core machine. Below are the numbers when I forked different number of processes. Fork = 8; Rate = 57 msgs/sec Fork = 4; Rate = 44 msgs/sec Fork = 1; Rate = 22 msgs/sec I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing a linear increase in the throughput? Is a file locking creating the Because the bottleneck is not (only) the CPUs? Run `vmstat 1` or similar to see (or at least get an idea;-) if the workload is I/O bound or CPU-bound or bottleneck? If yes, which particular file is being locked? If no, what could Maybe. The default store in files drivers locks the DBs exclusively for each access. be the reason for this? Switch the DB backend to some MySQL or PostgreSQL (or whatever you like using from the supported ones). Run that on the very same machine and compare the numbers with the above. Bernd -- Firmix Software GmbH http://www.firmix.at/ mobil: +43 664 4416156 fax: +43 1 7890849-55 Embedded Linux Development and Services
Re: AutoWhiteList
--[ UxBoD ]-- wrote: Hi, Where can I find sa-awlUtil as it does not appear to be in the download file ? Best Regards, Hmmm, it looks like someone has been editing the wiki in ways that don't match anything in any released or unreleased version of SA. The tool is named check-whitelist. There's been talk of changing AWL stuff to not reference the word whitelist, but AFAIK, this hasn't even been done in the unreleased 3.3 code. Regardless, you can fetch check_whitelist from SVN: http://svn.apache.org/repos/asf/spamassassin/branches/3.2/tools/
Re: Parallelizing Spam Assassin
rich...@buzzhost.co.uk wrote: On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote: On Fri, Jul 31, 2009 at 09:32, rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote: Imagine what Barracuda Networks could do with that if they did not fill their gay little boxes with hardware rubbish from the floors of MSI and supermicro. Jesus, try and process that many messages with a $30,000 Barracuda and watch support bitch 'You are fully scanning to much mail and making our rubbish hardware wet the bed.' LOL. Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate. I apologise for the any language deemed offensive. Whilst 'Jesus', 'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for openly swearing and using the filty phrase 'Barracuda Networks'. For this I apologise. Richard, we are not joking. Please watch your language on this mailing list, or you will be banned from it. You have now been warned by 2 members of the Project Management Committee. You will not be warned again.
Cant Post Message
I have a post I have tried several times over the last week to post to this forum and it never seems to get posted. I don't understand why? There is nothing exotic about it, just text, a question and email header info I pasted. Any idea whats up? Thanks, Wes
Re: Cant Post Message
- twofers twof...@yahoo.com wrote: I have a post I have tried several times over the last week to post to this forum and it never seems to get posted. I don't understand why? There is nothing exotic about it, just text, a question and email header info I pasted. Any idea whats up? Thanks, Wes obfuscate the header as it may be tripping SA :) or even better use pastebin. Best Regards, -- SplatNIX IT Services :: Innovation through collaboration
Re: Cant Post Message
Quoting twofers twof...@yahoo.com: I have a post I have tried several times over the last week to post to this forum and it never seems to get posted. I don't understand why? There is nothing exotic about it, just text, a question and email header info I pasted. Any idea whats up? Thanks, Wes Try putting the header on a site like www.pastebin.com and then put the link in your e-mail rather then the actual header. --Dennis
Re: Number of rules
Quoting LuKreme krem...@kreme.com: On Jul 30, 2009, at 18:12, Dennis B. Hopp dh...@coreps.com wrote: Yeah I knew that. I have a few negative scoring rules but not many (outside of what might be in the misc rules sets I have). What is a good threshold for ham then? 5.0 is the score SA us designed for. It's a very good number in almost all cases. I meant the threshold for bayes auto learn to learn the message. I'll try switching back to the default values.
Re: Number of rules
On Fri, 31 Jul 2009 03:55:48 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: The default of 0.1. It's a default for a reason. But that *really* is not your problem. Your problem is with learning spam, not learning even more ham. Just as you mentioned in your original report. See my previous response for a solution. You want to learn more spam. What he actually wrote was that 3.7% of _all_messages_ were hitting hitting BAYES_00, and 1.7% were hitting BAYES_99. If he actually meant what he wrote and doesn't have an extraordinary spam/ham ratio, then he clearly has a problem with both spam and ham.
Re: Problem with whitelist_from_rcvd and forged reverse lookup
On Thu, 2009-07-30 at 16:46 +0200, Sebastian Wiesinger wrote: * Matus UHLAR - fantomas uh...@fantomas.sk [2009-07-30 16:35]: On 30.07.09 14:03, Sebastian Wiesinger wrote: I was under the impression that whitelist_from_rcvd checks if the reverse lookup is forged. But still with the following rule On 30.07.09 21:06, Karsten Bräckelmann wrote: SA does not do the DNS lookup, but depends on the MTA doing so and recording the result in the Received header. the MTA (sendmail?) did put a may be forged into Received: line, In which case SA shouldl ignore the hostname. Now the question is, if it does, and the reported X-Spam headers indicates it does not, which is a bug then. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. The box said 'Requires Windows 95 or better', so I bought a Macintosh.
Re: Number of rules
Quoting RW rwmailli...@googlemail.com: On Fri, 31 Jul 2009 03:55:48 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: The default of 0.1. It's a default for a reason. But that *really* is not your problem. Your problem is with learning spam, not learning even more ham. Just as you mentioned in your original report. See my previous response for a solution. You want to learn more spam. What he actually wrote was that 3.7% of _all_messages_ were hitting hitting BAYES_00, and 1.7% were hitting BAYES_99. If he actually meant what he wrote and doesn't have an extraordinary spam/ham ratio, then he clearly has a problem with both spam and ham. I cleared my maia statistics a couple of days ago. Since then BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other BAYES_XX are less then 1000 times). In those same couple of days we have processed about 45,000 messages (this is the number of messages that actually reached spamassasin and wasn't out right rejected). So my initial percentages were way off (I was going by maia mailguards sa rule statistics). So roughly 10% of mail is hitting BAYES_00 and 5% is hitting BAYES_99. It seems to me that BAYES_99 should probably be triggered more often then BAYES_00. If there is a better way to get sa statistics I'd be happy to know. I know that the bayes success rate comes down to training, but like every other administrator I can't possible check every message for accuracy and I was hoping to make the auto learn a little better. I thought maybe I just didn't have enough rules (both negative and positive scoring) to trigger the auto learn often enough. Thanks, --Dennis
Re: Network Tests / Rule Files Directories
On Thu, 2009-07-30 at 19:30 -0700, Stefan Malte Schumacher wrote: Hello A Nabble user with a name. Hooray! :) :0fw: spamassassin.lock | spamassassin I suggest running the spamd daemon, and then change that to call spamc rather than plain spamassassin. That eliminates the start-up penalty for starting Perl and SA for each incoming message. :0 * ^X-Spam-Status: Yes spam A delivery recipe, mbox format destination. You want locking. (Default is perfectly fine, just make that first line :0: with a trailing colon.) My first problem is that there is still a lot of spam coming through. I have enabled and configured Razor, DCC and Pyzor but even though most spam is recognized by DCC it doesn't give enough points to classify the mail as spam. If this doesn't help, you might be better of uploading a raw sample including all headers somewhere (own server, or a pastebin) and send a link. Spam coming through can have a lot of reasons. Your stabbing at these particular 3 rules might or might not be the real cause. I have tried adding the appropriate lines, which I believe should be score DCC_CHECK 5.0 if I want all emails which pass the DCC-Check to get 5 points. Unfortunately this is not working, neither for DCC nor for Razor. Yes, that should do it. Evidence that it's not working? Show us some SA headers. In this case, a spam sample that triggered DCC, cause the Report header does show the rule's score. So which lines do I have to add in order for all mails which are recognized by either DCC, Razor or Pyzor to be classified as Spam? Keep in mind that DCC lists *bulk*, not necessarily spam. Mailing-list traffic is one example, usually listed by DCC. Locate lists two directories with SpamAssassin-Rules: /var/lib/spamassassin/3.002005/updates_spamassassin_org/ sa-update channels' rule-sets. /usr/share/spamassassin Stock rules shipped with SA. Put there at install time, which may be a package manager or from source. These will be used by default. Ignored, if there is an sa-update dir. Running spamassassin -D sample-spam.txt seems to indicate that only the directory under /var/lib is used. Can I delete the old files in /usr/share/spamassassin or are they still needed? Why does They will not be used, as long as there's *always* an sa-update dir with a version matching your current SA version. As a fallback, and not to mess with your install process, I do not recommend to flame it. It's just 620 kB anyway. SpamAssassin place the updates rules in a different directoy than the one in which the original rules are installed? Because the update ones are versioned. Because there may be multiple channels. Because package managers generally don't like messing with their install base. ;) And because it is a safe fallback. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Number of rules
On Fri, 31 Jul 2009, Dennis B. Hopp wrote: I cleared my maia statistics a couple of days ago. Since then BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other BAYES_XX are less then 1000 times). Do they all add up to about 45,000? In those same couple of days we have processed about 45,000 messages (this is the number of messages that actually reached spamassasin and wasn't out right rejected). If there is a better way to get sa statistics I'd be happy to know. sa_stats.pl from the SARE website. http://www.rulesemporium.com/programs/ -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- A sword is never a killer, it is but a tool in the killer's hands. -- Lucius Annaeus Seneca (Martial) 4BC-65AD --- 5 days until the 274th anniversary of John Peter Zenger's acquittal
Re: Any one interested in using a proper forum?
Michael Hutchinson schrieb: Gidday Peter, I don't know about anyone else, but I'm getting a bit hacked of with this 1980's style forum. I'm trying to get to the bottom of an SA issue and this list/forum thing is giving me a bigger headache than SA! It's a bit like that when you're using Mailing lists, just another thing to get used to in I.T life! Spamassassin has more than one or two users now and I personally think that it should have a support forum to match the class of software, which is now world class. I know it's free and all that, but even so, if this is the only form of support they provide, I'm thinking that I'll just start an alternative support forum, using standard, full featured forum software (like SMF). Is there any support for this (I already know there will be opposition from those who are 'resident' here. Sorry guys, I just want do something to help those who just dive in when they have an urgent problem. No hard feelings I hope.) FWIW I think you're driving at creating a forum that would be easier to use or understand for the average joe-bloggs user. This is all very well, but Mailing Lists aren't exactly hard to stay on top of. As for using E-Mail to discuss problems with Spamassassin, I can think of nothing more applicable. Anyone being an Admin of a Spamassassin enabled Mail Server server, should be familiar enough with E-Mail to be able to handle Mailing Lists without too much fuss. If this is such a big problem perhaps they shouldn't be Administering a Mail Filtering system at all. Just my 2cents. Michael Hutchinson. I did not subscribe to the mailing list. I am using news.gmane.org and for me this is way the best to read. No forum software needed, no rules needed, I only need a newsreader (Thunderbird does this job qute good for me). Not everything that looks old fashioned is less comfortable than a teletubby webinterface ;-) Just to add my 2cents. Ralph Bornefeld-Ettmann
Re: Number of rules
On Fri, 2009-07-31 at 07:53 -0500, Dennis B. Hopp wrote: I know that the bayes success rate comes down to training, but like every other administrator I can't possible check every message for accuracy and I was hoping to make the auto learn a little better. I thought maybe I just didn't have enough rules (both negative and positive scoring) to trigger the auto learn often enough. As I wrote before, you particularly need to train spam with a low-ish Bayes confidence, regardless of the overall SA score or the number of rules hit. This does need some supervision. One way to help this would be to set up some (server-side) rules to deliver all spam triggering BAYES_80 or lower into a dedicated folder. Sort by Subject, and go through the list. If the Subject alone isn't sufficient evidence, have a quick look at the mail. Confirmed spam then can be learned. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Number of rules
On Fri, 2009-07-31 at 06:07 -0700, John Hardin wrote: On Fri, 31 Jul 2009, Dennis B. Hopp wrote: I cleared my maia statistics a couple of days ago. Since then BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other BAYES_XX are less then 1000 times). Do they all add up to about 45,000? Doh! Good catch, John. No, they cannot possibly. Do the math. These 3 rules are less than 10k, remaining 35k. Each less than 1k hits means we need another 35 rules. However, there are merely 6 ones left. $ grep -c BAYES_ 50_scores.cf 9 The stats are incorrect. Well, unless the lions share is processed with Bayes disabled, or otherwise not processed by SA. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Parallelizing Spam Assassin
On Fri, 2009-07-31 at 07:26 -0400, Matt Kettler wrote: rich...@buzzhost.co.uk wrote: On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote: On Fri, Jul 31, 2009 at 09:32, rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote: Imagine what Barracuda Networks could do with that if they did not fill their gay little boxes with hardware rubbish from the floors of MSI and supermicro. Jesus, try and process that many messages with a $30,000 Barracuda and watch support bitch 'You are fully scanning to much mail and making our rubbish hardware wet the bed.' LOL. Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate. I apologise for the any language deemed offensive. Whilst 'Jesus', 'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for openly swearing and using the filty phrase 'Barracuda Networks'. For this I apologise. Richard, we are not joking. Please watch your language on this mailing list, or you will be banned from it. You have now been warned by 2 members of the Project Management Committee. You will not be warned again. I have already apologised. I will not use the words you appear to have found offensive again. Can I ask, is this actually about the words I used *or* because of my comments regarding Barracuda Networks? I ask because I note they made a 'monetary donation' to Apache: http://www.barracudanetworks.com/ns/company/open-source.php If you want to ban me I will understand - you need to keep the wheels greased. It would give me more time to concentrate on leaking all the Barracuda code into the public domain, along with the various 'warez' tools I've written for it. This would probably be more beneficial to Barracuda Customers than dropping in here and making jokes at such low hanging fruit. If any Barracuda Customer would like to know how to unlock their barracuda without lifting the lid, or get change the model serial number and get free e.u. email me off list as I've just been banned for upsetting a sponsor LOL
Re: Number of rules
Quoting John Hardin jhar...@impsec.org: On Fri, 31 Jul 2009, Dennis B. Hopp wrote: I cleared my maia statistics a couple of days ago. Since then BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other BAYES_XX are less then 1000 times). Do they all add up to about 45,000? No they don't. I see some messages that trigger no rules at all (Bayes or otherwise). I thought that was odd since I thought a bayes rule should trigger pretty much all the time. In those same couple of days we have processed about 45,000 messages (this is the number of messages that actually reached spamassasin and wasn't out right rejected). If there is a better way to get sa statistics I'd be happy to know. sa_stats.pl from the SARE website. http://www.rulesemporium.com/programs/ I'll take a look. Will this works with logs that are written by amavisd-new? Thanks, --Dennis
Re: Number of rules
On Fri, 31 Jul 2009 07:53:00 -0500 Dennis B. Hopp dh...@coreps.com wrote: I cleared my maia statistics a couple of days ago. Since then BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other BAYES_XX are less then 1000 times). In those same couple of days we have processed about 45,000 messages (this is the number of messages that actually reached spamassasin and wasn't out right rejected). 4510+2366+1568+1000 is a lot less than 45,000 So my initial percentages were way off (I was going by maia mailguards sa rule statistics). So roughly 10% of mail is hitting BAYES_00 and 5% is hitting BAYES_99. It seems to me that BAYES_99 should probably be triggered more often then BAYES_00. The ratio of BAYES_99 to BAYES_00 should mostly reflect the overall spam to ham ratio, it's not a figure of merit. Your percentages aren't consistent with with your numbers, over 70% of the Bayes results are at BAYES_99 or BAYES_00, which isn't all that bad. The main issue here is that your numbers don't add up, only about 1 in 10 of your 45,000 messages processed by spamassassin are accounted for in the BAYES statistics. If there is a better way to get sa statistics I'd be happy to know. I know that the bayes success rate comes down to training, but like every other administrator I can't possible check every message for accuracy and I was hoping to make the auto learn a little better. I thought maybe I just didn't have enough rules (both negative and positive scoring) to trigger the auto learn often enough. With the the number of extra rules and plugins you have, you should have no trouble in autolearning all the spam you need, you might even want to increase the threshold from 8 to avoid misslearning.
Re: Number of rules
On Fri, 31 Jul 2009, RW wrote: The main issue here is that your numbers don't add up, only about 1 in 10 of your 45,000 messages processed by spamassassin are accounted for in the BAYES statistics. ...which was my point. Rather than troubleshooting learning, at this point Dennis should be troubleshooting why messages are not being processed by BAYES at all. Once _that_ is fixed, he can look at whether or not the scores it's producing are reasonable. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Vista is at best mildly annoying and at worst makes you want to rush to Redmond, Wash. and rip somebody's liver out. -- Forbes --- 5 days until the 274th anniversary of John Peter Zenger's acquittal
Re: Parallelizing Spam Assassin
On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote: ... dropping in here and making jokes at such low hanging fruit. Make all the jokes at Barracuda's expense that you like, complain about them all you like, just avoid offensive language. Vitriol is more impressive if you are creative enough to avoid using profanity and vulgarity while still blasting your target to pieces. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Vista is at best mildly annoying and at worst makes you want to rush to Redmond, Wash. and rip somebody's liver out. -- Forbes --- 5 days until the 274th anniversary of John Peter Zenger's acquittal
Re: Number of rules
On Fri, 31 Jul 2009, Dennis B. Hopp wrote: Quoting John Hardin jhar...@impsec.org: On Fri, 31 Jul 2009, Dennis B. Hopp wrote: I cleared my maia statistics a couple of days ago. Since then BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other BAYES_XX are less then 1000 times). Do they all add up to about 45,000? No they don't. I see some messages that trigger no rules at all (Bayes or otherwise). I thought that was odd since I thought a bayes rule should trigger pretty much all the time. It should. In those same couple of days we have processed about 45,000 messages (this is the number of messages that actually reached spamassasin and wasn't out right rejected). If there is a better way to get sa statistics I'd be happy to know. sa_stats.pl from the SARE website. http://www.rulesemporium.com/programs/ I'll take a look. Will this works with logs that are written by amavisd-new? That I don't know. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Vista is at best mildly annoying and at worst makes you want to rush to Redmond, Wash. and rip somebody's liver out. -- Forbes --- 5 days until the 274th anniversary of John Peter Zenger's acquittal
Re: Number of rules
Quoting Karsten Bräckelmann guent...@rudersport.de: On Fri, 2009-07-31 at 06:07 -0700, John Hardin wrote: On Fri, 31 Jul 2009, Dennis B. Hopp wrote: I cleared my maia statistics a couple of days ago. Since then BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other BAYES_XX are less then 1000 times). Do they all add up to about 45,000? Doh! Good catch, John. No, they cannot possibly. Do the math. These 3 rules are less than 10k, remaining 35k. Each less than 1k hits means we need another 35 rules. However, there are merely 6 ones left. $ grep -c BAYES_ 50_scores.cf 9 The stats are incorrect. Well, unless the lions share is processed with Bayes disabled, or otherwise not processed by SA. I do have sanesecurity rules in clamav which may be filtering messages before spamassassin sees them which would account for some of the difference between the total BAYES triggered and messages received. We also relay all outbound mail through these same servers but do not send outbound mail through spamassassin which again would make for some difference. I should have thought to mention that before. I couldn't get sa-stats to give me any useful information. I did get amavis-logwatch and I am not sure if I like what it's showing me. I ran it against the last few maillogs I have so it encompasses basically the last month. Here is the relevant parts of the output: http://pastebin.com/m59ddaf1d If I'm reading that correctly less then 50% of mail is actually being filtered (seems like it should be higher then that). Those stats don't count the messages we completely reject. We don't reject solely on one RBL but use policy-weightd to reject messages. I guess I could just let all messages through to SA for a few days to see how things change, but I don't see the point of wasting CPU/Memory for messages that are pretty much guaranteed spam. Here is the stats on my postfix: http://pastebin.com/m15d2533e Maybe I'm worried about nothing but given some of the spam that I get forwarded that gets through (some very obvious spam) and then to see what rules it hits just makes me think that something isn't quite right. --Dennis
Re: Parallelizing Spam Assassin
On Fri, 2009-07-31 at 08:25 -0700, John Hardin wrote: On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote: ... dropping in here and making jokes at such low hanging fruit. Make all the jokes at Barracuda's expense that you like, complain about them all you like, just avoid offensive language. Vitriol is more impressive if you are creative enough to avoid using profanity and vulgarity while still blasting your target to pieces. Received and understood.
Re: Number of rules
On Fri, 2009-07-31 at 10:36 -0500, Dennis B. Hopp wrote: I couldn't get sa-stats to give me any useful information. AFAIK it understands spamd logs, not Amavis logs. You would need to adjust the script for that -- as discussed just a few days ago. If I'm reading that correctly less then 50% of mail is actually being filtered (seems like it should be higher then that). Those stats Actually, the numbers you gave for the last couple days are even lower. About one third, 15k out of 45k do have a BAYES_xx hit and thus are scanned by SA. I told you how to train your Bayes, if you're not satisfied with the result. Whether you like it not, there really isn't an other way. FWIW, blocking the obvious offenders early seems like a proper explanation for Bayes not showing a lot of high hitters. Anyway, considering the back and forth -- IMHO, you *first* should get a clear picture how exactly your mail is being processed. I don't feel like stabbing in the dark. Maybe I'm worried about nothing but given some of the spam that I get forwarded that gets through (some very obvious spam) and then to see what rules it hits just makes me think that something isn't quite right. Forwarded -- as in reports by your users, or forwarded from external MXs to yours? In the latter case, the obvious thing to check is your internal and trusted network settings. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Number of rules
Quoting Karsten Bräckelmann guent...@rudersport.de: If I'm reading that correctly less then 50% of mail is actually being filtered (seems like it should be higher then that). Those stats Actually, the numbers you gave for the last couple days are even lower. About one third, 15k out of 45k do have a BAYES_xx hit and thus are scanned by SA. I told you how to train your Bayes, if you're not satisfied with the result. Whether you like it not, there really isn't an other way. FWIW, blocking the obvious offenders early seems like a proper explanation for Bayes not showing a lot of high hitters. Yes you did and I'm going to set something up to make a copy of the messages that trigger BAYES_20 through BAYES_80 into a separate mailbox that I can then inspect periodically for a while (while still letting the message be delivered to the user) Anyway, considering the back and forth -- IMHO, you *first* should get a clear picture how exactly your mail is being processed. I don't feel like stabbing in the dark. And I don't expect you to take a stab in the dark. The 45K messages was the total processed inbound and outbound which I didn't think about that outbound is not funneled through SA and so would not be seen in BAYES. So I admit, it was a poor analysis on my part. Maybe I'm worried about nothing but given some of the spam that I get forwarded that gets through (some very obvious spam) and then to see what rules it hits just makes me think that something isn't quite right. Forwarded -- as in reports by your users, or forwarded from external MXs to yours? In the latter case, the obvious thing to check is your internal and trusted network settings. Forwarded from internal users asking how it got through the spam filters. I rarely get reports to our abuse/postmaster addresses (with the exception of AOL users who mark messages as spam when they clearly are not spam).
Re: Number of rules
Dennis, On 7/31/2009 8:36 AM, Dennis B. Hopp wrote: I couldn't get sa-stats to give me any useful information. I did get amavis-logwatch and I am not sure if I like what it's showing me. I ran it against the last few maillogs I have so it encompasses basically the last month. Here is the relevant parts of the output: http://pastebin.com/m59ddaf1d If I'm reading that correctly less then 50% of mail is actually being filtered (seems like it should be higher then that). Those stats don't count the messages we completely reject. We don't reject solely on Correct. Amavis-logwatch will only show you what it saw. It does not poke into your MTAs reject stats. Its a *good* thing that the major junk isn't hitting amavis. Think in terms of reject *layers*. one RBL but use policy-weightd to reject messages. I guess I could just let all messages through to SA for a few days to see how things change, but I don't see the point of wasting CPU/Memory for messages that are pretty much guaranteed spam. No, don't do that. What's the point of letting in clearly forged, bogus, or other junk? It will just slow/hinder delivery to your customers. Here is the stats on my postfix: http://pastebin.com/m15d2533e You have a 90% MTA reject rate. That's a pretty good first cut. Maybe I'm worried about nothing but given some of the spam that I get forwarded that gets through (some very obvious spam) and then to see what rules it hits just makes me think that something isn't quite right. Just start fine tuning your rules, and monitor what types of things are getting passed your MTA. I don't see any unverified client host rejects - you might want to consider that safe method of culling out more at the front door. Mine cuts out about 15.5% Reject unverified client host 15.47% but some of this may ultimately be overlap into another reject area such as an RBL. man 5 postconf | less +/check_reverse_client_hostname_access Mike
Re: Parallelizing Spam Assassin
Henrik K wrote: Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used and any nondefault rules/settings? Certainly sounds strange that 1 core could top out the same. Anyone else have figures? Maybe I've borked something myself.. The rules sets were default .. 1. Took a fresh SA download 2. Run [configured number of parallel] SA on a [different giant] mbox file without DNSBL and 'use_bayes 0' and 'bayes_auto_learn 0' -- View this message in context: http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760106.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Parallelizing Spam Assassin
Bernd Petrovitsch wrote: On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote: [...] I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing a linear increase in the throughput? Is a file locking creating the Because the bottleneck is not (only) the CPUs? Run `vmstat 1` or similar to see (or at least get an idea;-) if the workload is I/O bound or CPU-bound or bottleneck? If yes, which particular file is being locked? If no, what could Maybe. The default store in files drivers locks the DBs exclusively for each access. be the reason for this? Switch the DB backend to some MySQL or PostgreSQL (or whatever you like using from the supported ones). Run that on the very same machine and compare the numbers with the above. Running 'top' with a single SA process running gives 12.5% CPU utilization which makes sense since one core is fully utilized at this point out of 8 cores. The SA process reports 100% util for that CPU When fork goes to 8, each individual CPU is utilized from 30-70% mostly staying about 30 and only a few reaching 70. I can vmstat to check out the IO which I dont think should be a problem - the disks are fast enough to deliver order of magnitudes more reads than 50 msgs/sec. Can you elaborate on 'store in files'? What are these files, what are they used for - can they be turned off? Thnx -- View this message in context: http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760163.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Parallelizing Spam Assassin
c. r. wrote: On Thu, Jul 30, 2009 at 11:55:21PM -0700, poifgh wrote: Why am I not seeing a linear increase in the throughput? Is a file locking creating the bottleneck? Maybe the auto white list. -- I can try turning off AWL and get back here.. Thnx -- View this message in context: http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760203.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Parallelizing Spam Assassin
Henrik K wrote: Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used and any nondefault rules/settings? Certainly sounds strange that 1 core could top out the same. Anyone else have figures? Maybe I've borked something myself.. The problem is not with 22 being a low number, but when we have other free cores to run different SA parallely why doesnt the throughput scale linearly .. I expect for 8 cores with 8 SA running simultaneously the number to be 150+ msgs/sec but it is 1/3rd at 50 msgs/sec -- View this message in context: http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760294.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Parallelizing Spam Assassin
I'm assuming you run a tad more messages than I, but on a quad with a failover I have never seen the failover kick in 4 years. This is not disputing your observations, just noting mine. I claim absolutely no knowledge about the core processing/stacking though I would assume (perhaps incorrectly) that the parsing would be part of the software (MTA). I freely admit I only picked up what seems the tail end of this thread but having used SA for so many years I think I have at least a handle on how it plays (hence the failover). My failover SA is in place to handle slow queries from the primary SA. Assuming (again) that mail size has been factored and any AV is running remotely? Just a few thoughts based on a very cursory read of a few posts, sadly - or happily, work make my contributions here limited. I'd be interested in the results of this though. Kind regards Nigel PS - apologies if I'm repeating prior observations. On Fri, 31 Jul 2009 10:41:47 -0700 (PDT), poifgh abhinav.pat...@gmail.com wrote: Henrik K wrote: Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used and any nondefault rules/settings? Certainly sounds strange that 1 core could top out the same. Anyone else have figures? Maybe I've borked something myself.. The problem is not with 22 being a low number, but when we have other free cores to run different SA parallely why doesnt the throughput scale linearly .. I expect for 8 cores with 8 SA running simultaneously the number to be 150+ msgs/sec but it is 1/3rd at 50 msgs/sec
Re: Parallelizing Spam Assassin
In my tests - there was not MTA. The mails/spam were collected from some server in mbox format and fed to SA using --mbox switch. The size of msgs was not altered in any fashion - just the usual size of incoming spam/mails There are no AV [you mean Anti Virus right?] running on the machine Would be back with results -- Nigel Frankcom-2 wrote: I'm assuming you run a tad more messages than I, but on a quad with a failover I have never seen the failover kick in 4 years. This is not disputing your observations, just noting mine. I claim absolutely no knowledge about the core processing/stacking though I would assume (perhaps incorrectly) that the parsing would be part of the software (MTA). I freely admit I only picked up what seems the tail end of this thread but having used SA for so many years I think I have at least a handle on how it plays (hence the failover). My failover SA is in place to handle slow queries from the primary SA. Assuming (again) that mail size has been factored and any AV is running remotely? Just a few thoughts based on a very cursory read of a few posts, sadly - or happily, work make my contributions here limited. I'd be interested in the results of this though. Kind regards Nigel PS - apologies if I'm repeating prior observations. On Fri, 31 Jul 2009 10:41:47 -0700 (PDT), poifgh abhinav.pat...@gmail.com wrote: Henrik K wrote: Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used and any nondefault rules/settings? Certainly sounds strange that 1 core could top out the same. Anyone else have figures? Maybe I've borked something myself.. The problem is not with 22 being a low number, but when we have other free cores to run different SA parallely why doesnt the throughput scale linearly .. I expect for 8 cores with 8 SA running simultaneously the number to be 150+ msgs/sec but it is 1/3rd at 50 msgs/sec -- View this message in context: http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24761236.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Bogus Data within style tags poisoning SA results
This seems to be a newer tactic, and a lot of email with content poisoning seems to be slipping through our spam filters. The reason is all the legitimate content between style tags. Most email apps don't show the data between style tags and therefore goes ignored and unseen by the recipient, but SA seems to be looking at it and using it to poison the scoring system. Here's an example of what we're seeing within the message source. style creatures quickly produce approve crevice nuclear moping esoteric pernicious motion faith does embodies does purify testament maximum exceeding centralism intellect prey tidying welcomed traal impress tuneless athwart mansions endures flames echo motion rooms alcohol rituals etc.. etc.. etc.. /style That is followed usually by common image spam with a link Click for more or Go to details. Anyone have a solution for this? Can SA be trained to ignore whats in between style tags? What would that break? Thanks, - N
running two versions of spamd?
Hi I have set up spamassassin to run as a damon and run as the user spamd instead of root. When I run ps xafu | grep spamd I get this output: root 2892 0.0 0.0 3116 716 pts/0S+ 20:34 0:00 \_ grep spamd root 2389 0.0 2.5 29288 26852 ?Ss 17:15 0:02 /usr/sbin/spamd --create-prefs --max-children 5 --username spamd --helper-home-dir /var/lib/spamassassin/ -s /var/lib/spamassassin/spamd.log -d --pidfile=/var/lib/spamassassin/spamd.pid spamd 2581 0.0 2.8 32304 29732 ?S17:16 0:07 \_ spamd child spamd 2582 0.0 2.6 30432 27920 ?S17:16 0:00 \_ spamd child Is this normal or is spamd running both as root and spamd? Another question: When I run sa-update should I run it as root or spamd? Thanks!! -- View this message in context: http://www.nabble.com/running-two-versions-of-spamd--tp24761508p24761508.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Parallelizing Spam Assassin
OK - I can see what metrics you are trying to ascertain - I think. I'm not sure that your test and real life are 'right'. For obvious reasons I don't want to carry this one on via list - I would suggest you ask Justin and I will be happy to give info on my local setup (this assumes Justin can grab time away from toxic nappies/daipers) There is a lot you can do to ameliorate load. On bad days my quad does 50 a second so it's doable. I will freely admit I have no clue quite how this came to be, but it is (a case of having colleagues knowing more than I do - for which I am eternally grateful; the usual culprits know who they are) Kind regards Nigel On Fri, 31 Jul 2009 11:41:14 -0700 (PDT), poifgh abhinav.pat...@gmail.com wrote: In my tests - there was not MTA. The mails/spam were collected from some server in mbox format and fed to SA using --mbox switch. The size of msgs was not altered in any fashion - just the usual size of incoming spam/mails There are no AV [you mean Anti Virus right?] running on the machine Would be back with results -- Nigel Frankcom-2 wrote: I'm assuming you run a tad more messages than I, but on a quad with a failover I have never seen the failover kick in 4 years. This is not disputing your observations, just noting mine. I claim absolutely no knowledge about the core processing/stacking though I would assume (perhaps incorrectly) that the parsing would be part of the software (MTA). I freely admit I only picked up what seems the tail end of this thread but having used SA for so many years I think I have at least a handle on how it plays (hence the failover). My failover SA is in place to handle slow queries from the primary SA. Assuming (again) that mail size has been factored and any AV is running remotely? Just a few thoughts based on a very cursory read of a few posts, sadly - or happily, work make my contributions here limited. I'd be interested in the results of this though. Kind regards Nigel PS - apologies if I'm repeating prior observations. On Fri, 31 Jul 2009 10:41:47 -0700 (PDT), poifgh abhinav.pat...@gmail.com wrote: Henrik K wrote: Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used and any nondefault rules/settings? Certainly sounds strange that 1 core could top out the same. Anyone else have figures? Maybe I've borked something myself.. The problem is not with 22 being a low number, but when we have other free cores to run different SA parallely why doesnt the throughput scale linearly .. I expect for 8 cores with 8 SA running simultaneously the number to be 150+ msgs/sec but it is 1/3rd at 50 msgs/sec
privacy policy updates?
I've gotten a message from realage-privacypolicy.com which looks like it is a typical corporate html-heavy message. This one is updating me that their privacy policy has changed. The reason I am suspicious is that I've received at least 3 others this week that look very similar from various other sites. I think the others were from something called Kaboose and another was Harmony (or something similar, not eHarmony) and the third was... can't remember the third and it's already been deleted. I haven't gone to any of the sites, and it could all be coincidence, but it seemed a little suspicious to me. Over-reaction? The realage one really does have a lot of spam sign (the domain name for one, though it is real), the content-type text/html with no plain alternative, obviously tracked URLS like http://link.realage-mail.com/u.d?B4GsbPkLdHyrFL8gOixE=914 and the fact I have no idea who these people are. -- Hi, I'm Gary Cooper, but not the Gary Cooper that's dead.
Re: Any one interested in using a proper forum?
profanity no. Even if you cannot think properly and use your brain the people here have brains that function. {^_^} - Original Message - From: snowweb pe...@snowweb.co.uk Sent: Tuesday, 2009/July/28 04:07 I don't know about anyone else, but I'm getting a bit hacked of with this 1980's style forum. I'm trying to get to the bottom of an SA issue and this list/forum thing is giving me a bigger headache than SA! Spamassassin has more than one or two users now and I personally think that it should have a support forum to match the class of software, which is now world class. I know it's free and all that, but even so, if this is the only form of support they provide, I'm thinking that I'll just start an alternative support forum, using standard, full featured forum software (like SMF). Is there any support for this (I already know there will be opposition from those who are 'resident' here. Sorry guys, I just want do something to help those who just dive in when they have an urgent problem. No hard feelings I hope.) Peter Snow -- View this message in context: http://www.nabble.com/Any-one-interested-in-using-a-proper-forum--tp24697144p24697144.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Parallelizing Spam Assassin
In my tests - there was not MTA. The mails/spam were collected from some server in mbox format and fed to SA using --mbox switch. The size of msgs was not altered in any fashion - just the usual size of incoming spam/mails If you're interested in testing/tuning spamassassin for heavy loads you should consider using spamd daemon. Then you may use SLAMD [1] as performance evaluation platform [2]. It takes some effort to set up the environment, but SLAMD helps in repetitive testing and keeping track of the results (comparison, history, charts). [1] http://www.slamd.com [2] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5689 -- Pawel Sasin WIRTUALNA POLSKA Spolka Akcyjna z siedziba w Gdansku przy ul. Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w Gdansku pod numerem KRS 068548, o kapitale zakladowym 67.980.024,00 zlotych oplaconym w calosci oraz Numerze Identyfikacji Podatkowej 957-07-51-216.
Re: Parallelizing Spam Assassin
On Jul 31, 2009, at 1:55 AM, poifgh wrote: I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing a linear increase in the throughput? Is a file locking creating the bottleneck? If yes, which particular file is being locked? If no, what could be the reason for this? There could be many reasons, check out my talk (admittedly out of date a little but should still be mostly relevant) on High Performance Apache SpamAssassin at the following link: http://people.apache.org/~parker/presentations/index.html Keep in mind that you might also be seeing other factors like memory and disk I/O contention. You don't really spell out your testing infrastructure so its not real clear if you're even performing a valid test. Also, I wouldn't necessarily expect to see a linear increase, although you might be able to take some easy steps for increasing your overall performance. Michael
Re: Parallelizing Spam Assassin
On Jul 31, 2009, at 2:53 AM, Justin Mason wrote: On Fri, Jul 31, 2009 at 09:32, rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote: Imagine what Barracuda Networks could do with that if they did not fill their gay little boxes with hardware rubbish from the floors of MSI and supermicro. Jesus, try and process that many messages with a $30,000 Barracuda and watch support bitch 'You are fully scanning to much mail and making our rubbish hardware wet the bed.' LOL. Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate. I dunno, 'gay' isn't that offensive. -- Overhead, without any fuss, the stars were going out.
Re: Parallelizing Spam Assassin
On Jul 31, 2009, at 9:25 AM, John Hardin wrote: On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote: ... dropping in here and making jokes at such low hanging fruit. Make all the jokes at Barracuda's expense that you like, complain about them all you like, just avoid offensive language. Really? Referring to gay hardware is THAT offensive that someone would need to be banned over it? -- Is a vegetarian permitted to eat animal crackers?
Re: Parallelizing Spam Assassin
From: Matt Kettler mkettler...@verizon.net Sent: Friday, 2009/July/31 04:26 rich...@buzzhost.co.uk wrote: On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote: On Fri, Jul 31, 2009 at 09:32, rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote: ... Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate. ... Richard, we are not joking. Please watch your language on this mailing list, or you will be banned from it. You have now been warned by 2 members of the Project Management Committee. You will not be warned again. Given that profanity is the effort of a small mind to express itself I have a feeling he's going to receive his third and final warning any time now, Matt. {^_-}
Re: Parallelizing Spam Assassin
On Jul 31, 2009, at 1:33 PM, jdow wrote: Given that profanity is the effort of a small mind to express itself I have a feeling he's going to receive his third and final warning any time now, Matt Given that nothing that richard said is not anything I've heard on, say, prime time TV or... a committee meeting I am really curious now as to what was considered 'obscene'. I'm quite serious. Have I stumbled into a list run by religious freaks? -- Clark's Law: Sufficiently advanced cluelessness is indistinguishable from malice Clark Slaw: Anything that has been severely damaged or destroyed by application of Clark's Law
Re: Parallelizing Spam Assassin
On Fri, Jul 31, 2009 at 12:37, LuKremekrem...@kreme.com wrote: On Jul 31, 2009, at 1:33 PM, jdow wrote: Given that profanity is the effort of a small mind to express itself I have a feeling he's going to receive his third and final warning any time now, Matt Given that nothing that richard said is not anything I've heard on, say, prime time TV or... a committee meeting I am really curious now as to what was considered 'obscene'. I'm quite serious. Have I stumbled into a list run by religious freaks? (mods: sorry if this also falls into the verboten category, I'm more trying to explore/catalog than perpetuate) Maybe it was using the word bitch, where he could have used the word complain. (and, religious freaks aren't the only freaks that don't like to see the word Jesus used in that kind of context ... saying words like Jesus around atheist freaks can also result in them claiming offence ... luckily religious freaks and atheist freaks aren't as common as merely religious people and merely atheist people)
Re: Bogus Data within style tags poisoning SA results
On Fri, 31 Jul 2009, Nathan M wrote: Here's an example of what we're seeing within the message source. style creatures quickly produce approve crevice nuclear moping esoteric pernicious motion faith does embodies does purify testament maximum exceeding centralism intellect prey tidying welcomed traal impress tuneless athwart mansions endures flames echo motion rooms alcohol rituals etc.. etc.. etc.. /style Style tags have some format requirements. It might be reasonable (though expensive) to try to detect style tags that do not have any of those syntactic elements... For now, though, this is just more bayes poison. Train it as spam and the scores will go up. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- False is the idea of utility that sacrifices a thousand real advantages for one imaginary or trifling inconvenience; that would take fire from men because it burns, and water because one may drown in it; that has no remedy for evils except destruction. The laws that forbid the carrying of arms are laws of such a nature. They disarm only those who are neither inclined nor determined to commit crime. -- Cesare Beccaria, quoted by Thomas Jefferson --- 5 days until the 274th anniversary of John Peter Zenger's acquittal
Re: running two versions of spamd?
On Fri, 2009-07-31 at 11:59 -0700, an anonymous Nabble user wrote: I have set up spamassassin to run as a damon and run as the user spamd instead of root. When I run ps xafu | grep spamd I get this output: root 2389 0.0 2.5 29288 26852 ?Ss 17:15 0:02 /usr/sbin/spamd --create-prefs --max-children 5 --username spamd --helper-home-dir /var/lib/spamassassin/ -s /var/lib/spamassassin/spamd.log -d --pidfile=/var/lib/spamassassin/spamd.pid spamd 2581 0.0 2.8 32304 29732 ?S17:16 0:07 \_ spamd child spamd 2582 0.0 2.6 30432 27920 ?S17:16 0:00 \_ spamd child Is this normal or is spamd running both as root and spamd? You are starting the daemon as root. And tell it to setuid to the user spamd. I believe this is perfectly normal. Btw, see 'man spamd' for the -u option. Only the child processes, which correctly setuid'd, will process messages. Another question: When I run sa-update should I run it as root or spamd? The master process (which does not scan messages, but care about its busy children) will read that data. So you want to ensure it's readable by that user. FWIW, if you would not explicitly specify the -u option, the child spamds would setuid to the user calling spamc. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Parallelizing Spam Assassin
LuKreme said the following on 7/31/09 3:27 PM: Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate. I dunno, 'gay' isn't that offensive. Gay is *not* a synonym for stupid. I do take offense to the term being used in that manner. --Glenn
Re: Parallelizing Spam Assassin
rich...@buzzhost.co.uk wrote: email me off list as I've just been banned for upsetting a sponsor LOL Richard, this has nothing to do with Barracuda. They have no influence over my opinions whatsoever. I don't work for Apache or Barracuda, or any company sponsored by either.Neither Apache nor Barracuda has complained. At the time I warned you, I didn't even remember that Barracuda ever donated to Apache. I don't think any member of the PMC has any regular contact with Barracuda, although we've had occasional contact about using their RBL. Your warning is about using foul language, and then choosing to thumb your nose at the warning Justin gave you. You're behaving like an impudent and foul mouthed child, and that's unwelcome her. That said, I really don't appreciate you using this list to rant about Barracuda's products, or discuss them at all. This is the SpamAssassin list, not the Barracuda list. Barracuda may use SpamAssassin, and SpamAssassin may support the Barracuda public RBL, but beyond that, any discussion of them is, quite frankly, off-topic. I don't care how good or bad their commercial product, or its support is, because it is off-topic here. I don't welcome people praising Barracuda any more than I welcome complaints. It simply doesn't matter to SpamAssassin, so it doesn't belong here. You may as well be ranting about Ford cars for all I care, it still doesn't belongs here. This list is about SpamAssassin, nothing more, nothing less. Continue with the foul language, and you'll find the door very quickly. Keep harping on the same off-topic subject and we will eventually get tired of it. You've said your peace about Barracuda, now give it a rest, because frankly I don't care about their products, I care about our product. Is that difficult to understand?
Re: Parallelizing Spam Assassin
On Fri, Jul 31, 2009 at 10:41:47AM -0700, poifgh wrote: Henrik K wrote: Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used and any nondefault rules/settings? Certainly sounds strange that 1 core could top out the same. Anyone else have figures? Maybe I've borked something myself.. The problem is not with 22 being a low number, but when we have other free I did not say it was a problem. I was just wondering how fast CPU/memory you have, since my 3Ghz AMD doesn't seem to keep up. I just tested with fresh 3.2.5 install, and running 500 mail mbox with single core resulted in 11 msgs / sec. Then I used sa-compile, and it raised to 15. Did you use it also? Of course your mailbox could be a lot different, so hard to compare. cores to run different SA parallely why doesnt the throughput scale linearly .. I expect for 8 cores with 8 SA running simultaneously the number to be 150+ msgs/sec but it is 1/3rd at 50 msgs/sec Anyway as people have already said here, disable AWL: use_auto_whitelist 0
Re: Parallelizing Spam Assassin
I am sorry, I did not provide any statistics of the machine involved. CPU - 8 cores with each core 2327 MHz RAM - 16GB Afair its has 7200RPM disk - 2TB. Yes, people were right in indicating AWL could be the problem. turning off AWL results in near linear scaling of SA as we increase number of processes. My input is more than a 100K [mostly] spams which allowed me to have each run last for several minutes and then take an avg to get #msgs/sec With AWL, bayes and DNSBL turned off - i get about 24 msgs/sec for 1 fork and 166 msgs/sec for 8 fork with awl on and bayes and DNSBL off, i get about 22 msgs/sec for 1 fork and 50 msgs/sec for 8 fork Thnx everyone for helping out. -- Henrik K wrote: On Fri, Jul 31, 2009 at 10:41:47AM -0700, poifgh wrote: I did not say it was a problem. I was just wondering how fast CPU/memory you have, since my 3Ghz AMD doesn't seem to keep up. I just tested with fresh 3.2.5 install, and running 500 mail mbox with single core resulted in 11 msgs / sec. Then I used sa-compile, and it raised to 15. Did you use it also? Of course your mailbox could be a lot different, so hard to compare. cores to run different SA parallely why doesnt the throughput scale linearly .. I expect for 8 cores with 8 SA running simultaneously the number to be 150+ msgs/sec but it is 1/3rd at 50 msgs/sec Anyway as people have already said here, disable AWL: use_auto_whitelist 0 -- View this message in context: http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24765545.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Parallelizing Spam Assassin
I havent tried with sa-compile yet - I can give it a shot -- Henrik K wrote: On Fri, Jul 31, 2009 at 10:41:47AM -0700, poifgh wrote: Henrik K wrote: Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used and any nondefault rules/settings? Certainly sounds strange that 1 core could top out the same. Anyone else have figures? Maybe I've borked something myself.. The problem is not with 22 being a low number, but when we have other free I did not say it was a problem. I was just wondering how fast CPU/memory you have, since my 3Ghz AMD doesn't seem to keep up. I just tested with fresh 3.2.5 install, and running 500 mail mbox with single core resulted in 11 msgs / sec. Then I used sa-compile, and it raised to 15. Did you use it also? Of course your mailbox could be a lot different, so hard to compare. cores to run different SA parallely why doesnt the throughput scale linearly .. I expect for 8 cores with 8 SA running simultaneously the number to be 150+ msgs/sec but it is 1/3rd at 50 msgs/sec Anyway as people have already said here, disable AWL: use_auto_whitelist 0 -- View this message in context: http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24765570.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Parallelizing Spam Assassin
On Fri, 2009-07-31 at 17:37 -0400, Glenn Sieb wrote: LuKreme said the following on 7/31/09 3:27 PM: Richard -- please watch your language. This is a public mailing list, and offensive language here is inappropriate. I dunno, 'gay' isn't that offensive. Gay is *not* a synonym for stupid. I do take offense to the term being used in that manner. --Glenn I find it deeply offensive that the word 'gay' is used as a synonym for homosexual in an attempt to stop people from using 'queer' - but hey 'gays' are not the only ones with opinions that 'matter'. Gay **is** a synonym for 'stupid' (silly) as far as I am concerned. It's original meaning of 'carefree','happy','silly' and 'showy' are clearly being used with sarcasm. The fact is 'queers' hijacked the word as per this; — USAGE Gay is now a standard term for ‘homosexual’, and is the term preferred by homosexual men to describe themselves. As a result, it is now very difficult to use gay in its earlier meanings ‘carefree’ or ‘bright and showy’ without arousing a sense of double entendre. Gay in its modern sense typically refers to men, lesbian being the standard term for homosexual women. http://www.askoxford.com/concise_oed/gay?view=uk So please *quit* with the sympathetic pink preaching and learn what the word actually means. Just because it is the term preferred by homosexual men to describe themselves does not mean a minority have the right to slate people who use the word properly. With regards to the dig about Barracuda - this *WAS* OT. There were some benchmark tests discussed here that were impressive. My experience of SA in daily production is on Barracuda Appliances that STRUGGLE to push 6-8 messages a second through, so it was relevant as comparison. The wording could have been chosen with more care and I apologise to Christians or dog lovers who found the use of the messiah or female form offensive. However, the use of gay in a sarcastic context clearly fits with the original origin of the word, not by that section of the society who have stolen it and made it OT and OM. For that I make ***NO*** apology. I appreciate that using 'gay' in it's real meaning may hurt the feelings of some 'homosexuals' but as I have to respect their choices and views, they should show *me* the same respect for *my* views and choices. You may not like who I am and what I do, I may not like who you are and what you do. Now do we need to continue this or throw little tin God banning threats around more or can we just *get along* knowing we are all different but frequenting this list for Spamassassin information ?
Re: Parallelizing Spam Assassin
From: LuKreme krem...@kreme.com Sent: Friday, 2009/July/31 12:30 On Jul 31, 2009, at 9:25 AM, John Hardin wrote: On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote: ... dropping in here and making jokes at such low hanging fruit. Make all the jokes at Barracuda's expense that you like, complain about them all you like, just avoid offensive language. Really? Referring to gay hardware is THAT offensive that someone would need to be banned over it? No, it's the word expensive. {+_+}
Re: Parallelizing Spam Assassin
From: LuKreme krem...@kreme.com Sent: Friday, 2009/July/31 12:37 On Jul 31, 2009, at 1:33 PM, jdow wrote: Given that profanity is the effort of a small mind to express itself I have a feeling he's going to receive his third and final warning any time now, Matt Given that nothing that richard said is not anything I've heard on, say, prime time TV or... a committee meeting I am really curious now as to what was considered 'obscene'. I'm quite serious. Have I stumbled into a list run by religious freaks? Not me. I can happily go several whole days without hearing the B word. When I hear it I get B...y. {^_^} Joanne
Re: Parallelizing Spam Assassin
From: poifgh abhinav.pat...@gmail.com Sent: Friday, 2009/July/31 19:47 I am sorry, I did not provide any statistics of the machine involved. CPU - 8 cores with each core 2327 MHz RAM - 16GB Afair its has 7200RPM disk - 2TB. One disk you might consider a striped array to get disk speed. 50 megabytes per second stresses most disks pretty hard - not to the limit. But if there is a lot of seeking involved as well as multiple copies of the files being made as they pass through the system I can see how it'd be a little rough on the disk throughput. {^_^}