Re: Parallelizing Spam Assassin

2009-08-03 Thread Dan Schaefer
This whole time I thought the subject line was Paralyzing Spam 
Assassin and the original poster was having trouble with SA locking up. 
Oops. ;-)


--
Dan Schaefer
Web Developer/Systems Analyst
Performance Administration Corp.



Re: Parallelizing Spam Assassin

2009-08-03 Thread jp
I would run a tcpdump on the ethernet interface while doing this, just 
in case there are network tests happening that you are not aware of.

On Thu, Jul 30, 2009 at 11:55:21PM -0700, poifgh wrote:
 
 Hi
 
 I was measuring how quickly could SA [spam assassin] process spams when
 several SA processes are run in parallel over separate mbox files. I used a
 8 core machine. Below are the numbers when I forked different number of
 processes.
 
 Fork = 8;
 Rate = 57 msgs/sec
 
 Fork = 4;
 Rate = 44 msgs/sec
 
 Fork = 1;
 Rate = 22 msgs/sec
 
 
 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
 a linear increase in the throughput? Is a file locking creating the
 bottleneck? If yes, which particular file is being locked? If no, what could
 be the reason for this?
 
 thnx
 -- 
 View this message in context: 
 http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24751958.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

-- 
/*
Jason Philbrook   |   Midcoast Internet Solutions - Wireless and DSL
KB1IOJ|   Broadband Internet Access, Dialup, and Hosting 
 http://f64.nu/   |   for Midcoast Mainehttp://www.midcoast.com/
*/


Re: Parallelizing Spam Assassin

2009-08-03 Thread poifgh

I did that - with DNSBL off there are no port 53 communications from SA

--


Jason Philbrook wrote:
 
 I would run a tcpdump on the ethernet interface while doing this, just 
 in case there are network tests happening that you are not aware of.
 
 On Thu, Jul 30, 2009 at 11:55:21PM -0700, poifgh wrote:
 
 Hi
 
 I was measuring how quickly could SA [spam assassin] process spams when
 several SA processes are run in parallel over separate mbox files. I used
 a
 8 core machine. Below are the numbers when I forked different number of
 processes.
 
 Fork = 8;
 Rate = 57 msgs/sec
 
 Fork = 4;
 Rate = 44 msgs/sec
 
 Fork = 1;
 Rate = 22 msgs/sec
 
 
 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not
 seeing
 a linear increase in the throughput? Is a file locking creating the
 bottleneck? If yes, which particular file is being locked? If no, what
 could
 be the reason for this?
 
 thnx
 -- 
 View this message in context:
 http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24751958.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
 
 -- 
 /*
 Jason Philbrook   |   Midcoast Internet Solutions - Wireless and DSL
 KB1IOJ|   Broadband Internet Access, Dialup, and Hosting 
  http://f64.nu/   |   for Midcoast Mainehttp://www.midcoast.com/
 */
 
 

-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24796555.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-08-01 Thread Linda Walsh

It's an American thing.  Things that are normal speech for UK blokes, get
Americans all disturbed.

Funny, used to be the other way around...but well...times change.



Justin Mason wrote:

On Fri, Jul 31, 2009 at 09:32,
rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:

Imagine what Barracuda Networks could do with that if they did not fill
their gay little boxes with hardware rubbish from the floors of MSI and
supermicro. Jesus, try and process that many messages with a $30,000
Barracuda and watch support bitch 'You are fully scanning to much mail
and making our rubbish hardware wet the bed.' LOL.


Richard -- please watch your language.   This is a public mailing
list, and offensive language here is inappropriate.



Re: Parallelizing Spam Assassin

2009-08-01 Thread Patrick Ben Koetter
* Linda Walsh sa-u...@tlinx.org:
 It's an American thing.  Things that are normal speech for UK blokes, get
 Americans all disturbed.

Sloppy language is sloppy language everywhere! I took offense in the message,
too and I am neither American nor am I from the UK.

But what annoys me the most is that the comments were simply off-topic. I can
go and meet some friends and I can happily spend the whole night cracking one
joke after another - pc or not pc.

There's a place of everything. This is the place for SpamAssassin. I wish we
could get back to what this thread was all about: Parallelizing
SpamAssassin.

p...@rick

 Funny, used to be the other way around...but well...times change.
 
 Justin Mason wrote:
 On Fri, Jul 31, 2009 at 09:32,
 rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
 Imagine what Barracuda Networks could do with that if they did not fill
 their gay little boxes with hardware rubbish from the floors of MSI and
 supermicro. Jesus, try and process that many messages with a $30,000
 Barracuda and watch support bitch 'You are fully scanning to much mail
 and making our rubbish hardware wet the bed.' LOL.
 
 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.
 

-- 
state of mind
Digitale Kommunikation

http://www.state-of-mind.de

Franziskanerstraße 15  Telefon +49 89 3090 4664
81669 München  Telefax +49 89 3090 4666

Amtsgericht MünchenPartnerschaftsregister PR 563



Re: Parallelizing Spam Assassin

2009-08-01 Thread Linda Walsh

May I point out, that while you may find the language crude -- it isn't
language that would violate FTC standards in that in used any of the 
7 or so 'unmentionable words'...


People -- these standards of 'crude language' really need to be strongly
held 'in check' -- the US is 'supposed' to be the society of 'free speech'
unless it is obscene or threatening.

I don't think his posting was either (BTW, I've never even 'heard' or seen
his name before this post.  All I saw was his 'uk' addr -- and I've known
a few 'uk' types, and many of them sound very crude to an American ear
these days.

So in addition to applying strictures in a conservative manner, we must,
hopefully, try to be sensitive to different cultural backgrounds.

If I was talking with a black teen from downtown SF/Oakland, I'd have to
translate from Eubonics -- which can sound rather crude and might contain
and F-word every other sentence.  I just apply my linguistic filter and
attempt to get the meaning.  I hardly thing this list is aimed at an young
audience -- and kid 13+ is going to have heard quite an ear-full of 'colorful 
explicatives' from ST4:Voyage home (a family movie), to everyday peer talk.

Yes -- it sounded crude...more than I, normally hear in America -- but not more than I'd hear in London. 


Just my 2-cents on cultural sensitivity, and the ability to be amused at 
cultural differences (rather than choosing to be offended by them).

p.s. - Most Commercial vendor products are Bantha Poodoo -- especially for
Virus/Security and Spam protection, but NOT all.  Usually the highest 
advertised profile are the worst -- they put more budget into advertising than 
engineering.

Yeah, I still thing SA is a bit slow, but I put much of that up to it being
written in an interpretive language and it's wide flexibility and extensibility 
with plug-ins.  Whatcha gonna do?  Maybe we should rewrite it in Forth?
*grin*...


Re: Parallelizing Spam Assassin

2009-08-01 Thread Linda Walsh

Well -- it's not just the cores -- what was the usage of the cores that
were being used?  were 3 out the 8 'pegged'?  Are these 'real' cores, or
HT cores?  In the Core2 and P4 archs, HT's actually slowed down a good 
many workloads unless they were tightly constructed to work on the same

data in cache.  Else, those HT's did just enough extra work to block cache
contents more than anything else.

What's the disk I/O look like?  I mean don't just focus on idle cores --
if the wait is on disk, maybe the cores can't get the data fast enough.

If the network is involved, well, that's a drag on any message checking.
I'm seeing times of .3msgs/sec, but I think that's with networking turned
on.  Pretty Ugly.



poifgh wrote:



Henrik K wrote:

Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
used
and any nondefault rules/settings? Certainly sounds strange that 1 core
could top out the same. Anyone else have figures? Maybe I've borked
something myself..



The problem is not with 22 being a low number, but when we have other free
cores to run different SA parallely why doesnt the throughput scale linearly
.. I expect for 8 cores with 8 SA running simultaneously the number to be
150+ msgs/sec but it is 1/3rd at 50 msgs/sec





Re: Parallelizing Spam Assassin

2009-08-01 Thread rich...@buzzhost.co.uk
On Fri, 2009-07-31 at 23:40 -0700, Linda Walsh wrote:
 It's an American thing.  Things that are normal speech for UK blokes, get
 Americans all disturbed.

I'm sure that is mostly it, Linda. They don't seem to 'get' it.
Two things I observe in this whole 'barracuda-gate' posting;

1. Being 'offended' is not terminal, it does not kill, disable or have any side 
effects.
Can you image going to a doctor and saying You've got to treat me Doctor, I 
got offended,
my feelings are hurt.

2. Cultural differences exist. If I am expected to respect the 'diversity' that 
has people 
jumping up and down about the use of 'gay' because *they* have a different 
meaning for it,
it is not unreasonable to expect *them* to respect my diversity in using it in 
it's original context.
I'm tired of being told not to offend or upset people who don't show my views 
and beliefs equal respect.

Anyway, it's all OT and pointless in any context of processing spam - the point 
I made was factual love it or hate it.
That was poor hardware spec used in a well known retail anti-spam appliance = 
6-8 MPS 'fully scanned'.





Re: Parallelizing Spam Assassin

2009-08-01 Thread Henrik K

On Sat, Aug 01, 2009 at 12:04:08AM -0700, Linda Walsh wrote:
 Well -- it's not just the cores -- what was the usage of the cores that
 were being used?  were 3 out the 8 'pegged'?  Are these 'real' cores, or
 HT cores?  In the Core2 and P4 archs, HT's actually slowed down a good  
 many workloads unless they were tightly constructed to work on the same
 data in cache.  Else, those HT's did just enough extra work to block cache
 contents more than anything else.

I really doubt there's HT involved in a recent looking 8 core 16GB machine..

 What's the disk I/O look like?  I mean don't just focus on idle cores --
 if the wait is on disk, maybe the cores can't get the data fast enough.

As we already guessed, AWL (BerkeleyDB) caused disk I/O and slowness. For
heavy loads you need to use SQL (or maybe the better BDB plugin in 3.3 if we
get it working).

 If the network is involved, well, that's a drag on any message checking.
 I'm seeing times of .3msgs/sec, but I think that's with networking turned
 on.  Pretty Ugly.

It affects single messages, but not total throughput. With network checks
you just dedicate a lot more childs. Waiting for network responses takes no
CPU time, thus you can process more messages simultaneously.



Re: Parallelizing Spam Assassin

2009-08-01 Thread Per Jessen
Henrik K wrote:

 On Sat, Aug 01, 2009 at 12:04:08AM -0700, Linda Walsh wrote:
 Well -- it's not just the cores -- what was the usage of the cores
 that
 were being used?  were 3 out the 8 'pegged'?  Are these 'real' cores,
 or
 HT cores?  In the Core2 and P4 archs, HT's actually slowed down a
 good many workloads unless they were tightly constructed to work on
 the same
 data in cache.  Else, those HT's did just enough extra work to block
 cache contents more than anything else.
 
 I really doubt there's HT involved in a recent looking 8 core 16GB
 machine..

Why not?  I have a couple of brandnew Intel Core i7 (Nehalem) systems
with 8Gb RAM - they have 1 physical CPU with 4 cores and HT =
8 cores.  And they've got room for more RAM :-)


/Per Jessen, Zürich



Re: Parallelizing Spam Assassin

2009-08-01 Thread Justin Mason
On Sat, Aug 1, 2009 at 10:04, Henrik Kh...@hege.li wrote:

 On Sat, Aug 01, 2009 at 12:04:08AM -0700, Linda Walsh wrote:
 Well -- it's not just the cores -- what was the usage of the cores that
 were being used?  were 3 out the 8 'pegged'?  Are these 'real' cores, or
 HT cores?  In the Core2 and P4 archs, HT's actually slowed down a good
 many workloads unless they were tightly constructed to work on the same
 data in cache.  Else, those HT's did just enough extra work to block cache
 contents more than anything else.

 I really doubt there's HT involved in a recent looking 8 core 16GB machine..

 What's the disk I/O look like?  I mean don't just focus on idle cores --
 if the wait is on disk, maybe the cores can't get the data fast enough.

 As we already guessed, AWL (BerkeleyDB) caused disk I/O and slowness. For
 heavy loads you need to use SQL (or maybe the better BDB plugin in 3.3 if we
 get it working).

 If the network is involved, well, that's a drag on any message checking.
 I'm seeing times of .3msgs/sec, but I think that's with networking turned
 on.  Pretty Ugly.

 It affects single messages, but not total throughput. With network checks
 you just dedicate a lot more childs. Waiting for network responses takes no
 CPU time, thus you can process more messages simultaneously.

although you will also need to allocate more memory, as well, to
ensure that no swapping takes place.

-- 
--j.


Re: Parallelizing Spam Assassin

2009-08-01 Thread Henrik K
On Sat, Aug 01, 2009 at 11:46:57AM +0200, Per Jessen wrote:
 Henrik K wrote:
 
  On Sat, Aug 01, 2009 at 12:04:08AM -0700, Linda Walsh wrote:
  Well -- it's not just the cores -- what was the usage of the cores
  that
  were being used?  were 3 out the 8 'pegged'?  Are these 'real' cores,
  or
  HT cores?  In the Core2 and P4 archs, HT's actually slowed down a
  good many workloads unless they were tightly constructed to work on
  the same
  data in cache.  Else, those HT's did just enough extra work to block
  cache contents more than anything else.
  
  I really doubt there's HT involved in a recent looking 8 core 16GB
  machine..
 
 Why not?  I have a couple of brandnew Intel Core i7 (Nehalem) systems
 with 8Gb RAM - they have 1 physical CPU with 4 cores and HT =
 8 cores.  And they've got room for more RAM :-)

Ah a comeback.. I guess it's atleast better than the P4 stuff? That reminds
me, gotta test how SA runs on a Sun T5240 with 16 core 128 cores..



Re: Parallelizing Spam Assassin

2009-08-01 Thread Per Jessen
Henrik K wrote:

 On Sat, Aug 01, 2009 at 11:46:57AM +0200, Per Jessen wrote:
 Henrik K wrote:
 
  On Sat, Aug 01, 2009 at 12:04:08AM -0700, Linda Walsh wrote:
  Well -- it's not just the cores -- what was the usage of the cores
  that
  were being used?  were 3 out the 8 'pegged'?  Are these 'real'
  cores, or
  HT cores?  In the Core2 and P4 archs, HT's actually slowed down a
  good many workloads unless they were tightly constructed to work
  on the same
  data in cache.  Else, those HT's did just enough extra work to
  block cache contents more than anything else.
  
  I really doubt there's HT involved in a recent looking 8 core 16GB
  machine..
 
 Why not?  I have a couple of brandnew Intel Core i7 (Nehalem) systems
 with 8Gb RAM - they have 1 physical CPU with 4 cores and HT =
 8 cores.  And they've got room for more RAM :-)
 
 Ah a comeback.. I guess it's atleast better than the P4 stuff?  

Not sure about that - AFAICT, it's exactly the same technology. (I
haven't done in exhaustive tests though).  


/Per Jessen, Zürich



Re: Parallelizing Spam Assassin

2009-08-01 Thread Karsten Bräckelmann
On Fri, 2009-07-31 at 23:56 -0700, Linda Walsh wrote:
 May I point out, that while you may find the language crude -- it isn't
 language that would violate FTC standards in that in used any of the 
 7 or so 'unmentionable words'...

It's not about words on their own -- it's about how they are being used,
and their meaning in context.

 BTW, I've never even 'heard' or seen his name before this post.

Must have been a warm and cozy place, the rock you've been hiding
under. ;)  You missed a 3 digit figure of posts and uncalled-for
off-topic rants within a few weeks.

 If I was talking with [...]  I just apply my linguistic filter and
 attempt to get the meaning.

Sic.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Parallelizing Spam Assassin

2009-08-01 Thread Matt Kettler
Um, Linda.. I'm pretty positive Justin is Irish, not American.

Linda Walsh wrote:
 It's an American thing.  Things that are normal speech for UK blokes, get
 Americans all disturbed.

 Funny, used to be the other way around...but well...times change.



 Justin Mason wrote:
 On Fri, Jul 31, 2009 at 09:32,
 rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
 Imagine what Barracuda Networks could do with that if they did not fill
 their gay little boxes with hardware rubbish from the floors of MSI and
 supermicro. Jesus, try and process that many messages with a $30,000
 Barracuda and watch support bitch 'You are fully scanning to much mail
 and making our rubbish hardware wet the bed.' LOL.

 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.






Some benchmarks (Re: Parallelizing Spam Assassin)

2009-08-01 Thread Henrik K
On Sat, Aug 01, 2009 at 01:34:34PM +0300, Henrik K wrote:

 That reminds me, gotta test how SA runs on a Sun T5240 with 16 core 128
 cores..

Well not that impressive for SA, price/speed wise..

T2+ 2x8x1.4Ghz, 144 msgs/sec @ 128 processes
AMD X4 4x3Ghz, 43 msgs/sec @ 4 processes

Note that this is 3.3 SVN with all the rulesrc included, perl 5.10. I saved
the used stuff at http://sa.hege.li/bench/ to be able to make real
comparisons, if someone has interesting servers. And this is as scientific
as I can bother. :)



Re: Parallelizing Spam Assassin

2009-07-31 Thread Justin Mason
hi -- turn off Bayes and AWL.

On Fri, Jul 31, 2009 at 07:55, poifghabhinav.pat...@gmail.com wrote:

 Hi

 I was measuring how quickly could SA [spam assassin] process spams when
 several SA processes are run in parallel over separate mbox files. I used a
 8 core machine. Below are the numbers when I forked different number of
 processes.

 Fork = 8;
 Rate = 57 msgs/sec

 Fork = 4;
 Rate = 44 msgs/sec

 Fork = 1;
 Rate = 22 msgs/sec


 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
 a linear increase in the throughput? Is a file locking creating the
 bottleneck? If yes, which particular file is being locked? If no, what could
 be the reason for this?

 thnx
 --
 View this message in context: 
 http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24751958.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.





-- 
--j.


Re: Parallelizing Spam Assassin

2009-07-31 Thread Christian Recktenwald
On Thu, Jul 30, 2009 at 11:55:21PM -0700, poifgh wrote:
 Why am I not seeing a linear increase in the throughput? 
 Is a file locking creating the bottleneck?

Maybe the auto white list.

-- 


Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote:
 Hi
 
 I was measuring how quickly could SA [spam assassin] process spams when
 several SA processes are run in parallel over separate mbox files. I used a
 8 core machine. Below are the numbers when I forked different number of
 processes.
 
 Fork = 8;
 Rate = 57 msgs/sec
 
 Fork = 4;
 Rate = 44 msgs/sec
 
 Fork = 1;
 Rate = 22 msgs/sec
 
 
 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
 a linear increase in the throughput? Is a file locking creating the
 bottleneck? If yes, which particular file is being locked? If no, what could
 be the reason for this?
 
 thnx
Wow! That's a real flying machine!

Imagine what Barracuda Networks could do with that if they did not fill
their gay little boxes with hardware rubbish from the floors of MSI and
supermicro. Jesus, try and process that many messages with a $30,000
Barracuda and watch support bitch 'You are fully scanning to much mail
and making our rubbish hardware wet the bed.' LOL.

Well done you!







Re: Parallelizing Spam Assassin

2009-07-31 Thread Justin Mason
On Fri, Jul 31, 2009 at 09:32,
rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
 Imagine what Barracuda Networks could do with that if they did not fill
 their gay little boxes with hardware rubbish from the floors of MSI and
 supermicro. Jesus, try and process that many messages with a $30,000
 Barracuda and watch support bitch 'You are fully scanning to much mail
 and making our rubbish hardware wet the bed.' LOL.

Richard -- please watch your language.   This is a public mailing
list, and offensive language here is inappropriate.

-- 
--j.


Re: Parallelizing Spam Assassin

2009-07-31 Thread Henrik K
On Fri, Jul 31, 2009 at 09:32:42AM +0100, rich...@buzzhost.co.uk wrote:
 On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote:
  Hi
  
  I was measuring how quickly could SA [spam assassin] process spams when
  several SA processes are run in parallel over separate mbox files. I used a
  8 core machine. Below are the numbers when I forked different number of
  processes.
  
  Fork = 8;
  Rate = 57 msgs/sec
  
  Fork = 4;
  Rate = 44 msgs/sec
  
  Fork = 1;
  Rate = 22 msgs/sec
  
  
  I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
  a linear increase in the throughput? Is a file locking creating the
  bottleneck? If yes, which particular file is being locked? If no, what could
  be the reason for this?
  
  thnx
 Wow! That's a real flying machine!

Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used
and any nondefault rules/settings? Certainly sounds strange that 1 core
could top out the same. Anyone else have figures? Maybe I've borked
something myself..



Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote:
 On Fri, Jul 31, 2009 at 09:32,
 rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
  Imagine what Barracuda Networks could do with that if they did not fill
  their gay little boxes with hardware rubbish from the floors of MSI and
  supermicro. Jesus, try and process that many messages with a $30,000
  Barracuda and watch support bitch 'You are fully scanning to much mail
  and making our rubbish hardware wet the bed.' LOL.
 
 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.
 
I apologise for the any language deemed offensive. Whilst 'Jesus',
'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for
openly swearing and using the filty phrase  'Barracuda Networks'. For
this I apologise.





Re: Parallelizing Spam Assassin

2009-07-31 Thread Bernd Petrovitsch
On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote:
[...]
 I was measuring how quickly could SA [spam assassin] process spams when
 several SA processes are run in parallel over separate mbox files. I used a
 8 core machine. Below are the numbers when I forked different number of
 processes.
 
 Fork = 8;
 Rate = 57 msgs/sec
 
 Fork = 4;
 Rate = 44 msgs/sec
 
 Fork = 1;
 Rate = 22 msgs/sec
 
 
 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
 a linear increase in the throughput? Is a file locking creating the
Because the bottleneck is not (only) the CPUs?
Run `vmstat 1` or similar to see (or at least get an idea;-) if the
workload is I/O bound or CPU-bound or 

 bottleneck? If yes, which particular file is being locked? If no, what could
Maybe. The default store in files drivers locks the DBs exclusively
for each access.

 be the reason for this?
Switch the DB backend to some MySQL or PostgreSQL (or whatever you like
using from the supported ones). Run that on the very same machine and
compare the numbers with the above.

Bernd
-- 
Firmix Software GmbH   http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
  Embedded Linux Development and Services




Re: Parallelizing Spam Assassin

2009-07-31 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote:
   
 On Fri, Jul 31, 2009 at 09:32,
 rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
 
 Imagine what Barracuda Networks could do with that if they did not fill
 their gay little boxes with hardware rubbish from the floors of MSI and
 supermicro. Jesus, try and process that many messages with a $30,000
 Barracuda and watch support bitch 'You are fully scanning to much mail
 and making our rubbish hardware wet the bed.' LOL.
   
 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.

 
 I apologise for the any language deemed offensive. Whilst 'Jesus',
 'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for
 openly swearing and using the filty phrase  'Barracuda Networks'. For
 this I apologise.



   
Richard, we are not joking. Please watch your language on this mailing
list, or you will be banned from it.

You have now been warned by 2 members of the Project Management
Committee. You will not be warned again.





Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Fri, 2009-07-31 at 07:26 -0400, Matt Kettler wrote:
 rich...@buzzhost.co.uk wrote:
  On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote:

  On Fri, Jul 31, 2009 at 09:32,
  rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
  
  Imagine what Barracuda Networks could do with that if they did not fill
  their gay little boxes with hardware rubbish from the floors of MSI and
  supermicro. Jesus, try and process that many messages with a $30,000
  Barracuda and watch support bitch 'You are fully scanning to much mail
  and making our rubbish hardware wet the bed.' LOL.

  Richard -- please watch your language.   This is a public mailing
  list, and offensive language here is inappropriate.
 
  
  I apologise for the any language deemed offensive. Whilst 'Jesus',
  'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for
  openly swearing and using the filty phrase  'Barracuda Networks'. For
  this I apologise.
 
 
 

 Richard, we are not joking. Please watch your language on this mailing
 list, or you will be banned from it.
 
 You have now been warned by 2 members of the Project Management
 Committee. You will not be warned again.
 
 
 
I have already apologised. I will not use the words you appear to have
found offensive again. Can I ask, is this actually about the words I
used *or* because of my comments regarding Barracuda Networks? I ask
because I note they made a 'monetary donation' to Apache:

http://www.barracudanetworks.com/ns/company/open-source.php

If you want to ban me I will understand - you need to keep the wheels
greased. It would give me more time to concentrate on leaking all the
Barracuda code into the public domain, along with the various 'warez'
tools I've written for it. This would probably be more beneficial to
Barracuda Customers than dropping in here and making jokes at such low
hanging fruit. If any Barracuda Customer would like to know how to
unlock their barracuda without lifting the lid, or get change the model
serial number and get free e.u. email me off list as I've just been
banned for upsetting a sponsor LOL





Re: Parallelizing Spam Assassin

2009-07-31 Thread John Hardin

On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote:


... dropping in here and making jokes at such low hanging fruit.


Make all the jokes at Barracuda's expense that you like, complain about 
them all you like, just avoid offensive language. Vitriol is more 
impressive if you are creative enough to avoid using profanity and 
vulgarity while still blasting your target to pieces.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Vista is at best mildly annoying and at worst makes you want to
  rush to Redmond, Wash. and rip somebody's liver out.  -- Forbes
---
 5 days until the 274th anniversary of John Peter Zenger's acquittal


Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Fri, 2009-07-31 at 08:25 -0700, John Hardin wrote:
 On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote:
 
  ... dropping in here and making jokes at such low hanging fruit.
 
 Make all the jokes at Barracuda's expense that you like, complain about 
 them all you like, just avoid offensive language. Vitriol is more 
 impressive if you are creative enough to avoid using profanity and 
 vulgarity while still blasting your target to pieces.
 
Received and understood.



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh



Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The rules sets were default ..
1. Took a fresh SA download
2. Run [configured number of parallel] SA on a [different giant] mbox file
without DNSBL and 'use_bayes 0' and 'bayes_auto_learn 0'


-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760106.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh


Bernd Petrovitsch wrote:
 
 On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote:
 [...]
 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not
 seeing
 a linear increase in the throughput? Is a file locking creating the
 Because the bottleneck is not (only) the CPUs?
 Run `vmstat 1` or similar to see (or at least get an idea;-) if the
 workload is I/O bound or CPU-bound or 
 
 bottleneck? If yes, which particular file is being locked? If no, what
 could
 Maybe. The default store in files drivers locks the DBs exclusively
 for each access.
 
 be the reason for this?
 Switch the DB backend to some MySQL or PostgreSQL (or whatever you like
 using from the supported ones). Run that on the very same machine and
 compare the numbers with the above.
 

Running 'top' with a single SA process running gives 12.5% CPU utilization
which makes sense since one core is fully utilized at this point out of 8
cores. The SA process reports 100% util for that CPU

When fork goes to 8, each individual CPU is utilized from 30-70%  mostly
staying about 30 and only a few reaching 70.

I can vmstat to check out the IO which I dont think should be a problem -
the disks are fast enough to deliver order of magnitudes more reads than 50
msgs/sec.


Can you elaborate on 'store in files'? What are these files, what are they
used for - can they be turned off?

Thnx
-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760163.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh



c. r. wrote:
 
 On Thu, Jul 30, 2009 at 11:55:21PM -0700, poifgh wrote:
 Why am I not seeing a linear increase in the throughput? 
 Is a file locking creating the bottleneck?
 
 Maybe the auto white list.
 
 -- 
 

I can try turning off AWL and get back here..

Thnx
-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760203.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh



Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The problem is not with 22 being a low number, but when we have other free
cores to run different SA parallely why doesnt the throughput scale linearly
.. I expect for 8 cores with 8 SA running simultaneously the number to be
150+ msgs/sec but it is 1/3rd at 50 msgs/sec



-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760294.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread Nigel Frankcom
I'm assuming you run a tad more messages than I, but on a quad with a
failover I have never seen the failover kick in 4 years. This is not
disputing your observations, just noting mine.

I claim absolutely no knowledge about the core processing/stacking
though I would assume (perhaps incorrectly) that the parsing would be
part of the software (MTA).

I freely admit I only picked up what seems the tail end of this thread
but having used SA for so many years I think I have at least a handle
on how it plays (hence the failover). My failover SA is in place to
handle slow queries from the primary SA. Assuming (again) that mail
size has been factored and any AV is running remotely?

Just a few thoughts based on a very cursory read of a few posts, sadly
- or happily, work make my contributions here limited.

I'd be interested in the results of this though.

Kind regards

Nigel

PS - apologies if I'm repeating prior observations.

On Fri, 31 Jul 2009 10:41:47 -0700 (PDT), poifgh
abhinav.pat...@gmail.com wrote:




Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The problem is not with 22 being a low number, but when we have other free
cores to run different SA parallely why doesnt the throughput scale linearly
.. I expect for 8 cores with 8 SA running simultaneously the number to be
150+ msgs/sec but it is 1/3rd at 50 msgs/sec


Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh

In my tests - there was not MTA. The mails/spam were collected from some
server in mbox format and fed to SA using --mbox switch. The size of msgs
was not altered in any fashion - just the usual size of incoming spam/mails

There are no AV [you mean Anti Virus right?] running on the machine

Would be back with results

--




Nigel Frankcom-2 wrote:
 
 I'm assuming you run a tad more messages than I, but on a quad with a
 failover I have never seen the failover kick in 4 years. This is not
 disputing your observations, just noting mine.
 
 I claim absolutely no knowledge about the core processing/stacking
 though I would assume (perhaps incorrectly) that the parsing would be
 part of the software (MTA).
 
 I freely admit I only picked up what seems the tail end of this thread
 but having used SA for so many years I think I have at least a handle
 on how it plays (hence the failover). My failover SA is in place to
 handle slow queries from the primary SA. Assuming (again) that mail
 size has been factored and any AV is running remotely?
 
 Just a few thoughts based on a very cursory read of a few posts, sadly
 - or happily, work make my contributions here limited.
 
 I'd be interested in the results of this though.
 
 Kind regards
 
 Nigel
 
 PS - apologies if I'm repeating prior observations.
 
 On Fri, 31 Jul 2009 10:41:47 -0700 (PDT), poifgh
 abhinav.pat...@gmail.com wrote:
 



Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The problem is not with 22 being a low number, but when we have other free
cores to run different SA parallely why doesnt the throughput scale
linearly
.. I expect for 8 cores with 8 SA running simultaneously the number to be
150+ msgs/sec but it is 1/3rd at 50 msgs/sec
 
 

-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24761236.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread Nigel Frankcom
OK - I can see what metrics you are trying to ascertain - I think. I'm
not sure that your test and real life are 'right'. For obvious reasons
I don't want to carry this one on via list - I would suggest you ask
Justin and I will be happy to give info on my local setup (this
assumes Justin can grab time away from toxic nappies/daipers)

There is a lot you can do to ameliorate load. On bad days my quad does
50 a second so it's doable. I will freely admit I have no clue quite
how this came to be, but it is (a case of having colleagues knowing
more than I do - for which I am eternally grateful; the usual culprits
know who they are)

Kind regards

Nigel



On Fri, 31 Jul 2009 11:41:14 -0700 (PDT), poifgh
abhinav.pat...@gmail.com wrote:


In my tests - there was not MTA. The mails/spam were collected from some
server in mbox format and fed to SA using --mbox switch. The size of msgs
was not altered in any fashion - just the usual size of incoming spam/mails

There are no AV [you mean Anti Virus right?] running on the machine

Would be back with results

--




Nigel Frankcom-2 wrote:
 
 I'm assuming you run a tad more messages than I, but on a quad with a
 failover I have never seen the failover kick in 4 years. This is not
 disputing your observations, just noting mine.
 
 I claim absolutely no knowledge about the core processing/stacking
 though I would assume (perhaps incorrectly) that the parsing would be
 part of the software (MTA).
 
 I freely admit I only picked up what seems the tail end of this thread
 but having used SA for so many years I think I have at least a handle
 on how it plays (hence the failover). My failover SA is in place to
 handle slow queries from the primary SA. Assuming (again) that mail
 size has been factored and any AV is running remotely?
 
 Just a few thoughts based on a very cursory read of a few posts, sadly
 - or happily, work make my contributions here limited.
 
 I'd be interested in the results of this though.
 
 Kind regards
 
 Nigel
 
 PS - apologies if I'm repeating prior observations.
 
 On Fri, 31 Jul 2009 10:41:47 -0700 (PDT), poifgh
 abhinav.pat...@gmail.com wrote:
 



Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The problem is not with 22 being a low number, but when we have other free
cores to run different SA parallely why doesnt the throughput scale
linearly
.. I expect for 8 cores with 8 SA running simultaneously the number to be
150+ msgs/sec but it is 1/3rd at 50 msgs/sec
 
 


Re: Parallelizing Spam Assassin

2009-07-31 Thread Paweł Sasin
 In my tests - there was not MTA. The mails/spam were collected from
 some server in mbox format and fed to SA using --mbox switch. The
 size of msgs was not altered in any fashion - just the usual size of
 incoming spam/mails

If you're interested in testing/tuning spamassassin for heavy loads you
should consider using spamd daemon. Then you may use SLAMD [1] as
performance evaluation platform [2].

It takes some effort to set up the environment, but SLAMD helps in
repetitive testing and keeping track of the results (comparison,
history, charts).

[1] http://www.slamd.com
[2] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5689

-- 
Pawel Sasin

WIRTUALNA POLSKA Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.


Re: Parallelizing Spam Assassin

2009-07-31 Thread Michael Parker


On Jul 31, 2009, at 1:55 AM, poifgh wrote:


I ran freshly build SA with Bayes and DNSBL turned off. Why am I not  
seeing

a linear increase in the throughput? Is a file locking creating the
bottleneck? If yes, which particular file is being locked? If no,  
what could

be the reason for this?


There could be many reasons, check out my talk (admittedly out of date  
a little but should still be mostly relevant) on High Performance  
Apache SpamAssassin at the following link:


http://people.apache.org/~parker/presentations/index.html

Keep in mind that you might also be seeing other factors like memory  
and disk I/O contention.  You don't really spell out your testing  
infrastructure so its not real clear if you're even performing a valid  
test.


Also, I wouldn't necessarily expect to see a linear increase, although  
you might be able to take some easy steps for increasing your overall  
performance.


Michael



Re: Parallelizing Spam Assassin

2009-07-31 Thread LuKreme

On Jul 31, 2009, at 2:53 AM, Justin Mason wrote:
On Fri, Jul 31, 2009 at 09:32,

rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
Imagine what Barracuda Networks could do with that if they did not  
fill
their gay little boxes with hardware rubbish from the floors of MSI  
and

supermicro. Jesus, try and process that many messages with a $30,000
Barracuda and watch support bitch 'You are fully scanning to much  
mail

and making our rubbish hardware wet the bed.' LOL.


Richard -- please watch your language.   This is a public mailing
list, and offensive language here is inappropriate.


I dunno, 'gay' isn't that offensive.


--
Overhead, without any fuss, the stars were going out.



Re: Parallelizing Spam Assassin

2009-07-31 Thread LuKreme

On Jul 31, 2009, at 9:25 AM, John Hardin wrote:

On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote:


... dropping in here and making jokes at such low hanging fruit.


Make all the jokes at Barracuda's expense that you like, complain  
about them all you like, just avoid offensive language.


Really? Referring to gay hardware is THAT offensive that someone would  
need to be banned over it?


--
Is a vegetarian permitted to eat animal crackers?



Re: Parallelizing Spam Assassin

2009-07-31 Thread jdow

From: Matt Kettler mkettler...@verizon.net
Sent: Friday, 2009/July/31 04:26



rich...@buzzhost.co.uk wrote:

On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote:
  

On Fri, Jul 31, 2009 at 09:32,
rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:


...
  

Richard -- please watch your language.   This is a public mailing
list, and offensive language here is inappropriate.



...




  

Richard, we are not joking. Please watch your language on this mailing
list, or you will be banned from it.

You have now been warned by 2 members of the Project Management
Committee. You will not be warned again.


Given that profanity is the effort of a small mind to express itself
I have a feeling he's going to receive his third and final warning any
time now, Matt.

{^_-}


Re: Parallelizing Spam Assassin

2009-07-31 Thread LuKreme

On Jul 31, 2009, at 1:33 PM, jdow wrote:

Given that profanity is the effort of a small mind to express itself
I have a feeling he's going to receive his third and final warning any
time now, Matt


Given that nothing that richard said is not anything I've heard on,  
say, prime time TV or... a committee meeting I am really curious now  
as to what was considered 'obscene'.


I'm quite serious.

Have I stumbled into a list run by religious freaks?

--
Clark's Law: Sufficiently advanced cluelessness is
indistinguishable from malice
Clark Slaw: Anything that has been severely damaged or destroyed
by application of Clark's Law



Re: Parallelizing Spam Assassin

2009-07-31 Thread John Rudd
On Fri, Jul 31, 2009 at 12:37, LuKremekrem...@kreme.com wrote:
 On Jul 31, 2009, at 1:33 PM, jdow wrote:

 Given that profanity is the effort of a small mind to express itself
 I have a feeling he's going to receive his third and final warning any
 time now, Matt

 Given that nothing that richard said is not anything I've heard on, say,
 prime time TV or... a committee meeting I am really curious now as to what
 was considered 'obscene'.

 I'm quite serious.

 Have I stumbled into a list run by religious freaks?

(mods: sorry if this also falls into the verboten category, I'm more
trying to explore/catalog than perpetuate)

Maybe it was using the word bitch, where he could have used the word
complain.

(and, religious freaks aren't the only freaks that don't like to see
the word Jesus used in that kind of context ... saying words like
Jesus around atheist freaks can also result in them claiming offence
... luckily religious freaks and atheist freaks aren't as common as
merely religious people and merely atheist people)


Re: Parallelizing Spam Assassin

2009-07-31 Thread Glenn Sieb
LuKreme said the following on 7/31/09 3:27 PM:
 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.

 I dunno, 'gay' isn't that offensive.



Gay is *not* a synonym for stupid.

I do take offense to the term being used in that manner.

--Glenn



Re: Parallelizing Spam Assassin

2009-07-31 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 email me off list as I've just been
 banned for upsetting a sponsor LOL
   
Richard, this has nothing to do with Barracuda. They have no influence
over my opinions whatsoever. I don't work for Apache or Barracuda, or
any company sponsored by either.Neither Apache nor Barracuda has
complained. At the time I warned you, I didn't even remember that
Barracuda ever donated to Apache. I don't think any member of the PMC
has any regular contact with Barracuda, although we've had occasional
contact about using their RBL.

Your warning is about using foul language, and then choosing to thumb
your nose at the warning Justin gave you. You're behaving like an
impudent and foul mouthed child, and that's unwelcome her.

That said, I really don't appreciate you using this list to rant about
Barracuda's products, or discuss them at all. This is the SpamAssassin
list, not the Barracuda list. Barracuda may use SpamAssassin, and
SpamAssassin may support the Barracuda public RBL, but beyond that, any
discussion of them is, quite frankly, off-topic. I don't care how good
or bad their commercial product, or its support is, because it is
off-topic here. I don't welcome people praising Barracuda any more than
I welcome complaints. It simply doesn't matter to SpamAssassin, so it
doesn't belong here.

You may as well be ranting about Ford cars for all I care, it still
doesn't belongs here.

This list is about SpamAssassin, nothing more, nothing less.

Continue with the foul language, and you'll find the door very quickly.
Keep harping on the same off-topic subject and we will eventually get
tired of it. You've said your peace about Barracuda, now give it a rest,
because frankly I don't care about their products, I care about our product.

Is that difficult to understand?













   



Re: Parallelizing Spam Assassin

2009-07-31 Thread Henrik K
On Fri, Jul 31, 2009 at 10:41:47AM -0700, poifgh wrote:

 Henrik K wrote:
  
  Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
  Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
  used
  and any nondefault rules/settings? Certainly sounds strange that 1 core
  could top out the same. Anyone else have figures? Maybe I've borked
  something myself..
  
 
 The problem is not with 22 being a low number, but when we have other free

I did not say it was a problem. I was just wondering how fast CPU/memory you
have, since my 3Ghz AMD doesn't seem to keep up.

I just tested with fresh 3.2.5 install, and running 500 mail mbox with
single core resulted in 11 msgs / sec. Then I used sa-compile, and it raised
to 15. Did you use it also?

Of course your mailbox could be a lot different, so hard to compare.

 cores to run different SA parallely why doesnt the throughput scale linearly
 .. I expect for 8 cores with 8 SA running simultaneously the number to be
 150+ msgs/sec but it is 1/3rd at 50 msgs/sec

Anyway as people have already said here, disable AWL:

use_auto_whitelist 0



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh

I am sorry, I did not provide any statistics of the machine involved.
CPU - 8 cores with each core 2327 MHz
RAM - 16GB
Afair its has 7200RPM disk - 2TB.

Yes, people were right in indicating AWL could be the problem. turning off
AWL results in near linear scaling of SA as we increase number of processes.
My input is more than a 100K [mostly] spams which allowed me to have each
run last for several minutes and then take an avg to get #msgs/sec


With AWL, bayes and DNSBL turned off - i get about 24 msgs/sec for 1 fork
and 166 msgs/sec for 8 fork

with awl on and bayes and DNSBL off, i get about 22 msgs/sec for 1 fork and
50 msgs/sec for 8 fork

Thnx everyone for helping out.

--



Henrik K wrote:
 
 On Fri, Jul 31, 2009 at 10:41:47AM -0700, poifgh wrote:
 
 
 I did not say it was a problem. I was just wondering how fast CPU/memory
 you
 have, since my 3Ghz AMD doesn't seem to keep up.
 
 I just tested with fresh 3.2.5 install, and running 500 mail mbox with
 single core resulted in 11 msgs / sec. Then I used sa-compile, and it
 raised
 to 15. Did you use it also?
 
 Of course your mailbox could be a lot different, so hard to compare.
 
 cores to run different SA parallely why doesnt the throughput scale
 linearly
 .. I expect for 8 cores with 8 SA running simultaneously the number to be
 150+ msgs/sec but it is 1/3rd at 50 msgs/sec
 
 Anyway as people have already said here, disable AWL:
 
 use_auto_whitelist 0
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24765545.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh

I havent tried with sa-compile yet - I can give it a shot

--


Henrik K wrote:
 
 On Fri, Jul 31, 2009 at 10:41:47AM -0700, poifgh wrote:

 Henrik K wrote:
  
  Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
  Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
  used
  and any nondefault rules/settings? Certainly sounds strange that 1 core
  could top out the same. Anyone else have figures? Maybe I've borked
  something myself..
  
 
 The problem is not with 22 being a low number, but when we have other
 free
 
 I did not say it was a problem. I was just wondering how fast CPU/memory
 you
 have, since my 3Ghz AMD doesn't seem to keep up.
 
 I just tested with fresh 3.2.5 install, and running 500 mail mbox with
 single core resulted in 11 msgs / sec. Then I used sa-compile, and it
 raised
 to 15. Did you use it also?
 
 Of course your mailbox could be a lot different, so hard to compare.
 
 cores to run different SA parallely why doesnt the throughput scale
 linearly
 .. I expect for 8 cores with 8 SA running simultaneously the number to be
 150+ msgs/sec but it is 1/3rd at 50 msgs/sec
 
 Anyway as people have already said here, disable AWL:
 
 use_auto_whitelist 0
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24765570.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Fri, 2009-07-31 at 17:37 -0400, Glenn Sieb wrote:
 LuKreme said the following on 7/31/09 3:27 PM:
  Richard -- please watch your language.   This is a public mailing
  list, and offensive language here is inappropriate.
 
  I dunno, 'gay' isn't that offensive.
 
 
 
 Gay is *not* a synonym for stupid.
 
 I do take offense to the term being used in that manner.
 
 --Glenn
 
I find it deeply offensive that the word 'gay' is used as a synonym for
homosexual in an attempt to stop people from using 'queer' - but hey
'gays' are not the only ones with opinions that 'matter'.

Gay **is** a synonym for 'stupid' (silly) as far as I am concerned. It's
original meaning of 'carefree','happy','silly' and 'showy' are clearly
being used with sarcasm. The fact is 'queers' hijacked the word as per
this;

— USAGE Gay is now a standard term for ‘homosexual’, and is the term
preferred by homosexual men to describe themselves. As a result, it is
now very difficult to use gay in its earlier meanings ‘carefree’ or
‘bright and showy’ without arousing a sense of double entendre. Gay in
its modern sense typically refers to men, lesbian being the standard
term for homosexual women.
http://www.askoxford.com/concise_oed/gay?view=uk

So please *quit* with the sympathetic pink preaching and learn what the
word actually means. Just because it is the term preferred by
homosexual men to describe themselves does not mean a minority have the
right to slate people who use the word properly.

With regards to the dig about Barracuda - this *WAS* OT. There were some
benchmark tests discussed here that were impressive. My experience of SA
in daily production is on Barracuda Appliances that STRUGGLE to
push 6-8 messages a second through, so it was relevant as comparison.
The wording could have been chosen with more care and I apologise to
Christians or dog lovers who found the use of the messiah or female form
offensive. However, the use of gay in a sarcastic context clearly fits
with the original origin of the word, not by that section of the society
who have stolen it and made it OT and OM. For that I make ***NO***
apology. I appreciate that using 'gay' in it's real meaning may hurt the
feelings of some 'homosexuals' but as I have to respect their choices
and views, they should show *me* the same respect for *my* views and
choices. You may not like who I am and what I do, I may not like who you
are and what you do.

Now do we need to continue this or throw little tin God banning threats
around more or can we just *get along* knowing we are all different but
frequenting this list for Spamassassin information ?





Re: Parallelizing Spam Assassin

2009-07-31 Thread jdow

From: LuKreme krem...@kreme.com
Sent: Friday, 2009/July/31 12:30



On Jul 31, 2009, at 9:25 AM, John Hardin wrote:

On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote:


... dropping in here and making jokes at such low hanging fruit.


Make all the jokes at Barracuda's expense that you like, complain  
about them all you like, just avoid offensive language.


Really? Referring to gay hardware is THAT offensive that someone would  
need to be banned over it?


No, it's the word expensive.

{+_+}


Re: Parallelizing Spam Assassin

2009-07-31 Thread jdow

From: LuKreme krem...@kreme.com
Sent: Friday, 2009/July/31 12:37



On Jul 31, 2009, at 1:33 PM, jdow wrote:

Given that profanity is the effort of a small mind to express itself
I have a feeling he's going to receive his third and final warning any
time now, Matt


Given that nothing that richard said is not anything I've heard on,  
say, prime time TV or... a committee meeting I am really curious now  
as to what was considered 'obscene'.


I'm quite serious.

Have I stumbled into a list run by religious freaks?


Not me. I can happily go several whole days without hearing the
B word. When I hear it I get B...y.

{^_^}   Joanne


Re: Parallelizing Spam Assassin

2009-07-31 Thread jdow

From: poifgh abhinav.pat...@gmail.com
Sent: Friday, 2009/July/31 19:47




I am sorry, I did not provide any statistics of the machine involved.
CPU - 8 cores with each core 2327 MHz
RAM - 16GB
Afair its has 7200RPM disk - 2TB.


One disk you might consider a striped array to get disk speed.
50 megabytes per second stresses most disks pretty hard - not to the
limit. But if there is a lot of seeking involved as well as multiple
copies of the files being made as they pass through the system I can
see how it'd be a little rough on the disk throughput.

{^_^}