Re: Number of rules

2009-07-31 Thread LuKreme

On Jul 30, 2009, at 18:12, Dennis B. Hopp dh...@coreps.com wrote:
Yeah I knew that.  I have a few negative scoring rules but not many  
(outside of what might be in the misc rules sets I have).  What is a  
good threshold for ham then?


5.0 is the score SA us designed for. It's a very good number in almost  
all cases.




Parallelizing Spam Assassin

2009-07-31 Thread poifgh

Hi

I was measuring how quickly could SA [spam assassin] process spams when
several SA processes are run in parallel over separate mbox files. I used a
8 core machine. Below are the numbers when I forked different number of
processes.

Fork = 8;
Rate = 57 msgs/sec

Fork = 4;
Rate = 44 msgs/sec

Fork = 1;
Rate = 22 msgs/sec


I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
a linear increase in the throughput? Is a file locking creating the
bottleneck? If yes, which particular file is being locked? If no, what could
be the reason for this?

thnx
-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24751958.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



AutoWhiteList

2009-07-31 Thread --[ UxBoD ]--
Hi, 

Where can I find sa-awlUtil as it does not appear to be in the download file ? 

Best Regards, 

-- 
SplatNIX IT Services :: Innovation through collaboration 


Re: Parallelizing Spam Assassin

2009-07-31 Thread Justin Mason
hi -- turn off Bayes and AWL.

On Fri, Jul 31, 2009 at 07:55, poifghabhinav.pat...@gmail.com wrote:

 Hi

 I was measuring how quickly could SA [spam assassin] process spams when
 several SA processes are run in parallel over separate mbox files. I used a
 8 core machine. Below are the numbers when I forked different number of
 processes.

 Fork = 8;
 Rate = 57 msgs/sec

 Fork = 4;
 Rate = 44 msgs/sec

 Fork = 1;
 Rate = 22 msgs/sec


 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
 a linear increase in the throughput? Is a file locking creating the
 bottleneck? If yes, which particular file is being locked? If no, what could
 be the reason for this?

 thnx
 --
 View this message in context: 
 http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24751958.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.





-- 
--j.


Re: Parallelizing Spam Assassin

2009-07-31 Thread Christian Recktenwald
On Thu, Jul 30, 2009 at 11:55:21PM -0700, poifgh wrote:
 Why am I not seeing a linear increase in the throughput? 
 Is a file locking creating the bottleneck?

Maybe the auto white list.

-- 


Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote:
 Hi
 
 I was measuring how quickly could SA [spam assassin] process spams when
 several SA processes are run in parallel over separate mbox files. I used a
 8 core machine. Below are the numbers when I forked different number of
 processes.
 
 Fork = 8;
 Rate = 57 msgs/sec
 
 Fork = 4;
 Rate = 44 msgs/sec
 
 Fork = 1;
 Rate = 22 msgs/sec
 
 
 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
 a linear increase in the throughput? Is a file locking creating the
 bottleneck? If yes, which particular file is being locked? If no, what could
 be the reason for this?
 
 thnx
Wow! That's a real flying machine!

Imagine what Barracuda Networks could do with that if they did not fill
their gay little boxes with hardware rubbish from the floors of MSI and
supermicro. Jesus, try and process that many messages with a $30,000
Barracuda and watch support bitch 'You are fully scanning to much mail
and making our rubbish hardware wet the bed.' LOL.

Well done you!







Re: Parallelizing Spam Assassin

2009-07-31 Thread Justin Mason
On Fri, Jul 31, 2009 at 09:32,
rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
 Imagine what Barracuda Networks could do with that if they did not fill
 their gay little boxes with hardware rubbish from the floors of MSI and
 supermicro. Jesus, try and process that many messages with a $30,000
 Barracuda and watch support bitch 'You are fully scanning to much mail
 and making our rubbish hardware wet the bed.' LOL.

Richard -- please watch your language.   This is a public mailing
list, and offensive language here is inappropriate.

-- 
--j.


Re: Parallelizing Spam Assassin

2009-07-31 Thread Henrik K
On Fri, Jul 31, 2009 at 09:32:42AM +0100, rich...@buzzhost.co.uk wrote:
 On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote:
  Hi
  
  I was measuring how quickly could SA [spam assassin] process spams when
  several SA processes are run in parallel over separate mbox files. I used a
  8 core machine. Below are the numbers when I forked different number of
  processes.
  
  Fork = 8;
  Rate = 57 msgs/sec
  
  Fork = 4;
  Rate = 44 msgs/sec
  
  Fork = 1;
  Rate = 22 msgs/sec
  
  
  I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
  a linear increase in the throughput? Is a file locking creating the
  bottleneck? If yes, which particular file is being locked? If no, what could
  be the reason for this?
  
  thnx
 Wow! That's a real flying machine!

Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was used
and any nondefault rules/settings? Certainly sounds strange that 1 core
could top out the same. Anyone else have figures? Maybe I've borked
something myself..



Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote:
 On Fri, Jul 31, 2009 at 09:32,
 rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
  Imagine what Barracuda Networks could do with that if they did not fill
  their gay little boxes with hardware rubbish from the floors of MSI and
  supermicro. Jesus, try and process that many messages with a $30,000
  Barracuda and watch support bitch 'You are fully scanning to much mail
  and making our rubbish hardware wet the bed.' LOL.
 
 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.
 
I apologise for the any language deemed offensive. Whilst 'Jesus',
'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for
openly swearing and using the filty phrase  'Barracuda Networks'. For
this I apologise.





Re: Parallelizing Spam Assassin

2009-07-31 Thread Bernd Petrovitsch
On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote:
[...]
 I was measuring how quickly could SA [spam assassin] process spams when
 several SA processes are run in parallel over separate mbox files. I used a
 8 core machine. Below are the numbers when I forked different number of
 processes.
 
 Fork = 8;
 Rate = 57 msgs/sec
 
 Fork = 4;
 Rate = 44 msgs/sec
 
 Fork = 1;
 Rate = 22 msgs/sec
 
 
 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not seeing
 a linear increase in the throughput? Is a file locking creating the
Because the bottleneck is not (only) the CPUs?
Run `vmstat 1` or similar to see (or at least get an idea;-) if the
workload is I/O bound or CPU-bound or 

 bottleneck? If yes, which particular file is being locked? If no, what could
Maybe. The default store in files drivers locks the DBs exclusively
for each access.

 be the reason for this?
Switch the DB backend to some MySQL or PostgreSQL (or whatever you like
using from the supported ones). Run that on the very same machine and
compare the numbers with the above.

Bernd
-- 
Firmix Software GmbH   http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
  Embedded Linux Development and Services




Re: AutoWhiteList

2009-07-31 Thread Matt Kettler
--[ UxBoD ]-- wrote:
 Hi, 

 Where can I find sa-awlUtil as it does not appear to be in the download file 
 ? 

 Best Regards, 

   
Hmmm, it looks like someone has been editing the wiki in ways that don't
match anything in any released or unreleased version of SA.

The tool is named check-whitelist.

There's been talk of changing AWL stuff to not reference the word
whitelist, but AFAIK, this hasn't even been done in the unreleased 3.3
code.

Regardless, you can fetch check_whitelist from SVN:

http://svn.apache.org/repos/asf/spamassassin/branches/3.2/tools/







Re: Parallelizing Spam Assassin

2009-07-31 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote:
   
 On Fri, Jul 31, 2009 at 09:32,
 rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
 
 Imagine what Barracuda Networks could do with that if they did not fill
 their gay little boxes with hardware rubbish from the floors of MSI and
 supermicro. Jesus, try and process that many messages with a $30,000
 Barracuda and watch support bitch 'You are fully scanning to much mail
 and making our rubbish hardware wet the bed.' LOL.
   
 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.

 
 I apologise for the any language deemed offensive. Whilst 'Jesus',
 'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for
 openly swearing and using the filty phrase  'Barracuda Networks'. For
 this I apologise.



   
Richard, we are not joking. Please watch your language on this mailing
list, or you will be banned from it.

You have now been warned by 2 members of the Project Management
Committee. You will not be warned again.





Cant Post Message

2009-07-31 Thread twofers
I have a post I have tried several times over the last week to post to this 
forum and it never seems to get posted. I don't understand why?
 
There is nothing exotic about it, just text, a question and email header info 
I pasted.
 
Any idea whats up?
 
Thanks,
 
Wes


  

Re: Cant Post Message

2009-07-31 Thread --[ UxBoD ]--
- twofers twof...@yahoo.com wrote: 
 
I have a post I have tried several times over the last week to post to this 
forum and it never seems to get posted. I don't understand why? 

There is nothing exotic about it, just text, a question and email header info I 
pasted. 

Any idea whats up? 

Thanks, 

Wes 
 
obfuscate the header as it may be tripping SA :) or even better use pastebin. 



Best Regards, 

-- 
SplatNIX IT Services :: Innovation through collaboration 


Re: Cant Post Message

2009-07-31 Thread Dennis B. Hopp

Quoting twofers twof...@yahoo.com:

I have a post I have tried several times over the last week to post   
to this forum and it never seems to get posted. I don't understand   
why?

 
There is nothing exotic about it, just text, a question and email   
header info I pasted.

 
Any idea whats up?
 
Thanks,
 
Wes





Try putting the header on a site like www.pastebin.com and then put  
the link in your e-mail rather then the actual header.


--Dennis


Re: Number of rules

2009-07-31 Thread Dennis B. Hopp

Quoting LuKreme krem...@kreme.com:


On Jul 30, 2009, at 18:12, Dennis B. Hopp dh...@coreps.com wrote:
Yeah I knew that.  I have a few negative scoring rules but not many  
 (outside of what might be in the misc rules sets I have).  What is  
 a good threshold for ham then?


5.0 is the score SA us designed for. It's a very good number in almost
all cases.


I meant the threshold for bayes auto learn to learn the message.  I'll  
try switching back to the default values.


Re: Number of rules

2009-07-31 Thread RW
On Fri, 31 Jul 2009 03:55:48 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:


 The default of 0.1. It's a default for a reason.
 
 But that *really* is not your problem. Your problem is with learning
 spam, not learning even more ham. Just as you mentioned in your
 original report. See my previous response for a solution. You want to
 learn more spam.

What he actually wrote was that 3.7% of _all_messages_ were hitting
hitting BAYES_00, and 1.7% were hitting BAYES_99.

If he actually meant what he wrote and doesn't have an extraordinary
spam/ham ratio, then he clearly has a problem with both spam and
ham.


Re: Problem with whitelist_from_rcvd and forged reverse lookup

2009-07-31 Thread Matus UHLAR - fantomas
 On Thu, 2009-07-30 at 16:46 +0200, Sebastian Wiesinger wrote:
  * Matus UHLAR - fantomas uh...@fantomas.sk [2009-07-30 16:35]:
   On 30.07.09 14:03, Sebastian Wiesinger wrote:
 
I was under the impression that whitelist_from_rcvd checks if the
reverse lookup is forged. But still with the following rule

On 30.07.09 21:06, Karsten Bräckelmann wrote:
 SA does not do the DNS lookup, but depends on the MTA doing so and
 recording the result in the Received header.

the MTA (sendmail?) did put a may be forged into Received: line, In which
case SA shouldl ignore the hostname.

Now the question is, if it does, and the reported X-Spam headers indicates it
does not, which is a bug then.

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The box said 'Requires Windows 95 or better', so I bought a Macintosh.


Re: Number of rules

2009-07-31 Thread Dennis B. Hopp

Quoting RW rwmailli...@googlemail.com:


On Fri, 31 Jul 2009 03:55:48 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:



The default of 0.1. It's a default for a reason.

But that *really* is not your problem. Your problem is with learning
spam, not learning even more ham. Just as you mentioned in your
original report. See my previous response for a solution. You want to
learn more spam.


What he actually wrote was that 3.7% of _all_messages_ were hitting
hitting BAYES_00, and 1.7% were hitting BAYES_99.

If he actually meant what he wrote and doesn't have an extraordinary
spam/ham ratio, then he clearly has a problem with both spam and
ham.



I cleared my maia statistics a couple of days ago.  Since then  
BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50  
1568 (all the other BAYES_XX are less then 1000 times).  In those same  
couple of days we have processed about 45,000 messages (this is the  
number of messages that actually reached spamassasin and wasn't out  
right rejected).  So my initial percentages were way off (I was going  
by maia mailguards sa rule statistics).  So roughly 10% of mail is  
hitting BAYES_00 and 5% is hitting BAYES_99.  It seems to me that  
BAYES_99 should probably be triggered more often then BAYES_00.


If there is a better way to get sa statistics I'd be happy to know.

I know that the bayes success rate comes down to training, but like  
every other administrator I can't possible check every message for  
accuracy and I was hoping to make the auto learn a little better.  I  
thought maybe I just didn't have enough rules (both negative and  
positive scoring) to trigger the auto learn often enough.


Thanks,

--Dennis



Re: Network Tests / Rule Files Directories

2009-07-31 Thread Karsten Bräckelmann
On Thu, 2009-07-30 at 19:30 -0700, Stefan Malte Schumacher wrote:
 Hello

A Nabble user with a name. Hooray! :)

 :0fw: spamassassin.lock
 | spamassassin

I suggest running the spamd daemon, and then change that to call spamc
rather than plain spamassassin. That eliminates the start-up penalty for
starting Perl and SA for each incoming message.

 :0
 * ^X-Spam-Status: Yes
 spam

A delivery recipe, mbox format destination. You want locking. (Default
is perfectly fine, just make that first line :0: with a trailing colon.)


 My first problem is that there is still a lot of spam coming through.
 I have enabled and configured Razor, DCC and Pyzor but even though
 most spam is recognized by DCC it doesn't give enough points to
 classify the mail as spam.

If this doesn't help, you might be better of uploading a raw sample
including all headers somewhere (own server, or a pastebin) and send a
link.

Spam coming through can have a lot of reasons. Your stabbing at these
particular 3 rules might or might not be the real cause.

 I have tried adding the appropriate lines, which I believe should be
 score DCC_CHECK 5.0 if I want all emails which pass the DCC-Check
 to get 5 points. Unfortunately this is not working, neither for DCC
 nor for Razor.

Yes, that should do it.

Evidence that it's not working? Show us some SA headers. In this case, a
spam sample that triggered DCC, cause the Report header does show the
rule's score.

 So which lines do I have to add in order for all mails which are
 recognized by either DCC, Razor or Pyzor to be classified as Spam?

Keep in mind that DCC lists *bulk*, not necessarily spam. Mailing-list
traffic is one example, usually listed by DCC.


 Locate lists two directories with SpamAssassin-Rules:
 /var/lib/spamassassin/3.002005/updates_spamassassin_org/

sa-update channels' rule-sets.

 /usr/share/spamassassin

Stock rules shipped with SA. Put there at install time, which may be a
package manager or from source. These will be used by default. Ignored,
if there is an sa-update dir.

 Running spamassassin -D  sample-spam.txt seems to indicate that only
 the directory under /var/lib is used. Can I delete the old files in
 /usr/share/spamassassin or are they still needed? Why does

They will not be used, as long as there's *always* an sa-update dir with
a version matching your current SA version. As a fallback, and not to
mess with your install process, I do not recommend to flame it. It's
just 620 kB anyway.

 SpamAssassin place the updates rules in a different directoy than the
 one in which the original rules are installed?

Because the update ones are versioned. Because there may be multiple
channels. Because package managers generally don't like messing with
their install base. ;)  And because it is a safe fallback.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Number of rules

2009-07-31 Thread John Hardin

On Fri, 31 Jul 2009, Dennis B. Hopp wrote:

I cleared my maia statistics a couple of days ago.  Since then BAYES_00 has 
triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other 
BAYES_XX are less then 1000 times).


Do they all add up to about 45,000?

In those same couple of days we have processed about 45,000 messages 
(this is the number of messages that actually reached spamassasin and 
wasn't out right rejected).


If there is a better way to get sa statistics I'd be happy to know.


sa_stats.pl from the SARE website.

http://www.rulesemporium.com/programs/

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  A sword is never a killer, it is but a tool in the killer's hands.
  -- Lucius Annaeus Seneca (Martial) 4BC-65AD
---
 5 days until the 274th anniversary of John Peter Zenger's acquittal


Re: Any one interested in using a proper forum?

2009-07-31 Thread Ralph Bornefeld-Ettmann

Michael Hutchinson schrieb:

Gidday Peter,


I don't know about anyone else, but I'm getting a bit hacked of with
this
1980's style forum. I'm trying to get to the bottom of an SA issue and
this
list/forum thing is giving me a bigger headache than SA!


It's a bit like that when you're using Mailing lists, just another thing
to get used to in I.T life!
 

Spamassassin has more than one or two users now and I personally think
that
it should have a support forum to match the class of software, which

is

now
world class.

I know it's free and all that, but even so, if this is the only form

of

support they provide, I'm thinking that I'll just start an alternative
support forum, using standard, full featured forum software (like

SMF).

Is there any support for this (I already know there will be opposition
from
those who are 'resident' here. Sorry guys, I just want do something to
help
those who just dive in when they have an urgent problem. No hard
feelings I
hope.)


FWIW I think you're driving at creating a forum that would be easier to
use or understand for the average joe-bloggs user. This is all very
well, but Mailing Lists aren't exactly hard to stay on top of. As for
using E-Mail to discuss problems with Spamassassin, I can think of
nothing more applicable. Anyone being an Admin of a Spamassassin enabled
Mail Server server, should be familiar enough with E-Mail to be able to
handle Mailing Lists without too much fuss. If this is such a big
problem perhaps they shouldn't be Administering a Mail Filtering system
at all.

Just my 2cents.
Michael Hutchinson.



I did not subscribe to the mailing list. I am using news.gmane.org and 
for me this is way the best to read. No forum software needed, no rules 
needed, I only need a newsreader (Thunderbird does this job qute good 
for me).


Not everything that looks old fashioned is less comfortable than a 
teletubby webinterface ;-)


Just to add my 2cents.
Ralph Bornefeld-Ettmann



Re: Number of rules

2009-07-31 Thread Karsten Bräckelmann
On Fri, 2009-07-31 at 07:53 -0500, Dennis B. Hopp wrote:
 I know that the bayes success rate comes down to training, but like  
 every other administrator I can't possible check every message for  
 accuracy and I was hoping to make the auto learn a little better.  I  
 thought maybe I just didn't have enough rules (both negative and  
 positive scoring) to trigger the auto learn often enough.

As I wrote before, you particularly need to train spam with a low-ish
Bayes confidence, regardless of the overall SA score or the number of
rules hit. This does need some supervision.

One way to help this would be to set up some (server-side) rules to
deliver all spam triggering BAYES_80 or lower into a dedicated folder.
Sort by Subject, and go through the list. If the Subject alone isn't
sufficient evidence, have a quick look at the mail. Confirmed spam then
can be learned.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Number of rules

2009-07-31 Thread Karsten Bräckelmann
On Fri, 2009-07-31 at 06:07 -0700, John Hardin wrote:
 On Fri, 31 Jul 2009, Dennis B. Hopp wrote:
 
  I cleared my maia statistics a couple of days ago.  Since then BAYES_00 has 
  triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other 
  BAYES_XX are less then 1000 times).
 
 Do they all add up to about 45,000?

Doh!  Good catch, John.

No, they cannot possibly.  Do the math. These 3 rules are less than 10k,
remaining 35k. Each less than 1k hits means we need another  35 rules.
However, there are merely 6 ones left.

  $ grep -c BAYES_ 50_scores.cf
  9

The stats are incorrect.  Well, unless the lions share is processed with
Bayes disabled, or otherwise not processed by SA.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Fri, 2009-07-31 at 07:26 -0400, Matt Kettler wrote:
 rich...@buzzhost.co.uk wrote:
  On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote:

  On Fri, Jul 31, 2009 at 09:32,
  rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
  
  Imagine what Barracuda Networks could do with that if they did not fill
  their gay little boxes with hardware rubbish from the floors of MSI and
  supermicro. Jesus, try and process that many messages with a $30,000
  Barracuda and watch support bitch 'You are fully scanning to much mail
  and making our rubbish hardware wet the bed.' LOL.

  Richard -- please watch your language.   This is a public mailing
  list, and offensive language here is inappropriate.
 
  
  I apologise for the any language deemed offensive. Whilst 'Jesus',
  'Bitch' and 'Wet the bed' are mostly acceptable, I offer no defence for
  openly swearing and using the filty phrase  'Barracuda Networks'. For
  this I apologise.
 
 
 

 Richard, we are not joking. Please watch your language on this mailing
 list, or you will be banned from it.
 
 You have now been warned by 2 members of the Project Management
 Committee. You will not be warned again.
 
 
 
I have already apologised. I will not use the words you appear to have
found offensive again. Can I ask, is this actually about the words I
used *or* because of my comments regarding Barracuda Networks? I ask
because I note they made a 'monetary donation' to Apache:

http://www.barracudanetworks.com/ns/company/open-source.php

If you want to ban me I will understand - you need to keep the wheels
greased. It would give me more time to concentrate on leaking all the
Barracuda code into the public domain, along with the various 'warez'
tools I've written for it. This would probably be more beneficial to
Barracuda Customers than dropping in here and making jokes at such low
hanging fruit. If any Barracuda Customer would like to know how to
unlock their barracuda without lifting the lid, or get change the model
serial number and get free e.u. email me off list as I've just been
banned for upsetting a sponsor LOL





Re: Number of rules

2009-07-31 Thread Dennis B. Hopp

Quoting John Hardin jhar...@impsec.org:


On Fri, 31 Jul 2009, Dennis B. Hopp wrote:

I cleared my maia statistics a couple of days ago.  Since then   
BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50  
 1568 (all the other BAYES_XX are less then 1000 times).


Do they all add up to about 45,000?



No they don't.  I see some messages that trigger no rules at all  
(Bayes or otherwise).  I thought that was odd since I thought a bayes  
rule should trigger pretty much all the time.


In those same couple of days we have processed about 45,000   
messages (this is the number of messages that actually reached   
spamassasin and wasn't out right rejected).


If there is a better way to get sa statistics I'd be happy to know.


sa_stats.pl from the SARE website.

http://www.rulesemporium.com/programs/


I'll take a look.  Will this works with logs that are written by amavisd-new?

Thanks,

--Dennis



Re: Number of rules

2009-07-31 Thread RW
On Fri, 31 Jul 2009 07:53:00 -0500
Dennis B. Hopp dh...@coreps.com wrote:

 I cleared my maia statistics a couple of days ago.  Since then  
 BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50  
 1568 (all the other BAYES_XX are less then 1000 times).  In those
 same couple of days we have processed about 45,000 messages (this is
 the number of messages that actually reached spamassasin and wasn't
 out right rejected).  

4510+2366+1568+1000 is a lot less than 45,000

 So my initial percentages were way off (I was
 going by maia mailguards sa rule statistics).  So roughly 10% of mail
 is hitting BAYES_00 and 5% is hitting BAYES_99.  It seems to me that  
 BAYES_99 should probably be triggered more often then BAYES_00.

The ratio of BAYES_99 to BAYES_00 should mostly reflect the overall
spam to ham ratio, it's not a figure of merit. Your percentages
aren't consistent with with your numbers,  over 70% of the Bayes results
are at BAYES_99 or BAYES_00, which isn't all that bad.

The main issue here is that your numbers don't add up, only about 1 in
10 of your 45,000 messages processed by spamassassin are accounted for
in the BAYES statistics.

 If there is a better way to get sa statistics I'd be happy to know.
 
 I know that the bayes success rate comes down to training, but like  
 every other administrator I can't possible check every message for  
 accuracy and I was hoping to make the auto learn a little better.  I  
 thought maybe I just didn't have enough rules (both negative and  
 positive scoring) to trigger the auto learn often enough.

With the the number of extra rules and plugins you have, you should
have no trouble in autolearning all the spam you need, you might even
want to increase the threshold  from 8 to avoid misslearning.



Re: Number of rules

2009-07-31 Thread John Hardin

On Fri, 31 Jul 2009, RW wrote:

The main issue here is that your numbers don't add up, only about 1 in 
10 of your 45,000 messages processed by spamassassin are accounted for 
in the BAYES statistics.


...which was my point. Rather than troubleshooting learning, at this point 
Dennis should be troubleshooting why messages are not being processed by 
BAYES at all. Once _that_ is fixed, he can look at whether or not the 
scores it's producing are reasonable.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Vista is at best mildly annoying and at worst makes you want to
  rush to Redmond, Wash. and rip somebody's liver out.  -- Forbes
---
 5 days until the 274th anniversary of John Peter Zenger's acquittal


Re: Parallelizing Spam Assassin

2009-07-31 Thread John Hardin

On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote:


... dropping in here and making jokes at such low hanging fruit.


Make all the jokes at Barracuda's expense that you like, complain about 
them all you like, just avoid offensive language. Vitriol is more 
impressive if you are creative enough to avoid using profanity and 
vulgarity while still blasting your target to pieces.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Vista is at best mildly annoying and at worst makes you want to
  rush to Redmond, Wash. and rip somebody's liver out.  -- Forbes
---
 5 days until the 274th anniversary of John Peter Zenger's acquittal


Re: Number of rules

2009-07-31 Thread John Hardin

On Fri, 31 Jul 2009, Dennis B. Hopp wrote:


Quoting John Hardin jhar...@impsec.org:


On Fri, 31 Jul 2009, Dennis B. Hopp wrote:

 I cleared my maia statistics a couple of days ago.  Since then 
 BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 
 1568 (all the other BAYES_XX are less then 1000 times).


Do they all add up to about 45,000?


No they don't.  I see some messages that trigger no rules at all (Bayes 
or otherwise).  I thought that was odd since I thought a bayes rule 
should trigger pretty much all the time.


It should.

 In those same couple of days we have processed about 45,000 messages 
 (this is the number of messages that actually reached spamassasin and 
 wasn't out right rejected).
 
 If there is a better way to get sa statistics I'd be happy to know.


sa_stats.pl from the SARE website.

http://www.rulesemporium.com/programs/


I'll take a look.  Will this works with logs that are written by 
amavisd-new?


That I don't know.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Vista is at best mildly annoying and at worst makes you want to
  rush to Redmond, Wash. and rip somebody's liver out.  -- Forbes
---
 5 days until the 274th anniversary of John Peter Zenger's acquittal


Re: Number of rules

2009-07-31 Thread Dennis B. Hopp

Quoting Karsten Bräckelmann guent...@rudersport.de:


On Fri, 2009-07-31 at 06:07 -0700, John Hardin wrote:

On Fri, 31 Jul 2009, Dennis B. Hopp wrote:

 I cleared my maia statistics a couple of days ago.  Since then
BAYES_00 has

 triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other
 BAYES_XX are less then 1000 times).

Do they all add up to about 45,000?


Doh!  Good catch, John.

No, they cannot possibly.  Do the math. These 3 rules are less than 10k,
remaining 35k. Each less than 1k hits means we need another  35 rules.
However, there are merely 6 ones left.

  $ grep -c BAYES_ 50_scores.cf
  9

The stats are incorrect.  Well, unless the lions share is processed with
Bayes disabled, or otherwise not processed by SA.


I do have sanesecurity rules in clamav which may be filtering messages
before spamassassin sees them which would account for some of the
difference between the total BAYES triggered and messages received.
We also relay all outbound mail through these same servers but do not
send outbound mail through spamassassin which again would make for  
some difference.  I should have thought to mention that before.


I couldn't get sa-stats to give me any useful information.  I did get
amavis-logwatch and I am not sure if I like what it's showing me.  I ran it
against the last few maillogs I have so it encompasses basically the  
last month.  Here is the relevant parts of the output:


http://pastebin.com/m59ddaf1d

If I'm reading that correctly less then 50% of mail is actually
being filtered (seems like it should be higher then that). Those stats  
don't count the messages we completely reject.  We don't reject solely  
on one RBL but use policy-weightd to reject messages.  I guess I could  
just let all messages through to SA for a few days to see how things  
change, but I don't see the point of wasting CPU/Memory for messages  
that are pretty much guaranteed spam.


Here is the stats on my postfix:

http://pastebin.com/m15d2533e

Maybe I'm worried about nothing but given some of the spam that I get  
forwarded that gets through (some very obvious spam) and then to see  
what rules it hits just makes me think that something isn't quite right.


--Dennis




Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Fri, 2009-07-31 at 08:25 -0700, John Hardin wrote:
 On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote:
 
  ... dropping in here and making jokes at such low hanging fruit.
 
 Make all the jokes at Barracuda's expense that you like, complain about 
 them all you like, just avoid offensive language. Vitriol is more 
 impressive if you are creative enough to avoid using profanity and 
 vulgarity while still blasting your target to pieces.
 
Received and understood.



Re: Number of rules

2009-07-31 Thread Karsten Bräckelmann
On Fri, 2009-07-31 at 10:36 -0500, Dennis B. Hopp wrote:
 I couldn't get sa-stats to give me any useful information.

AFAIK it understands spamd logs, not Amavis logs. You would need to
adjust the script for that -- as discussed just a few days ago.


 If I'm reading that correctly less then 50% of mail is actually
 being filtered (seems like it should be higher then that). Those stats  

Actually, the numbers you gave for the last couple days are even
lower. About one third, 15k out of 45k do have a BAYES_xx hit and thus
are scanned by SA.

I told you how to train your Bayes, if you're not satisfied with the
result. Whether you like it not, there really isn't an other way. FWIW,
blocking the obvious offenders early seems like a proper explanation for
Bayes not showing a lot of high hitters.

Anyway, considering the back and forth -- IMHO, you *first* should get a
clear picture how exactly your mail is being processed. I don't feel
like stabbing in the dark.


 Maybe I'm worried about nothing but given some of the spam that I get  
 forwarded that gets through (some very obvious spam) and then to see  
 what rules it hits just makes me think that something isn't quite right.

Forwarded -- as in reports by your users, or forwarded from external MXs
to yours? In the latter case, the obvious thing to check is your
internal and trusted network settings.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Number of rules

2009-07-31 Thread Dennis B. Hopp

Quoting Karsten Bräckelmann guent...@rudersport.de:


If I'm reading that correctly less then 50% of mail is actually
being filtered (seems like it should be higher then that). Those stats


Actually, the numbers you gave for the last couple days are even
lower. About one third, 15k out of 45k do have a BAYES_xx hit and thus
are scanned by SA.

I told you how to train your Bayes, if you're not satisfied with the
result. Whether you like it not, there really isn't an other way. FWIW,
blocking the obvious offenders early seems like a proper explanation for
Bayes not showing a lot of high hitters.


Yes you did and I'm going to set something up to make a copy of the  
messages that trigger BAYES_20 through BAYES_80 into a separate  
mailbox that I can then inspect periodically for a while (while still  
letting the message be delivered to the user)




Anyway, considering the back and forth -- IMHO, you *first* should get a
clear picture how exactly your mail is being processed. I don't feel
like stabbing in the dark.



And I don't expect you to take a stab in the dark.  The 45K messages  
was the total processed inbound and outbound which I didn't think  
about that outbound is not funneled through SA and so would not be  
seen in BAYES.  So I admit, it was a poor analysis on my part.





Maybe I'm worried about nothing but given some of the spam that I get
forwarded that gets through (some very obvious spam) and then to see
what rules it hits just makes me think that something isn't quite right.


Forwarded -- as in reports by your users, or forwarded from external MXs
to yours? In the latter case, the obvious thing to check is your
internal and trusted network settings.



Forwarded from internal users asking how it got through the spam  
filters.  I rarely get reports to our abuse/postmaster addresses (with  
the exception of AOL users who mark messages as spam when they clearly  
are not spam).


Re: Number of rules

2009-07-31 Thread Mike Cappella

Dennis,

On 7/31/2009 8:36 AM, Dennis B. Hopp wrote:



I couldn't get sa-stats to give me any useful information. I did get
amavis-logwatch and I am not sure if I like what it's showing me. I ran it
against the last few maillogs I have so it encompasses basically the
last month. Here is the relevant parts of the output:

http://pastebin.com/m59ddaf1d

If I'm reading that correctly less then 50% of mail is actually
being filtered (seems like it should be higher then that). Those stats
don't count the messages we completely reject. We don't reject solely on


Correct.  Amavis-logwatch will only show you what it saw.  It does not 
poke into your MTAs reject stats.  Its a *good* thing that the major 
junk isn't hitting amavis.  Think in terms of reject *layers*.



one RBL but use policy-weightd to reject messages. I guess I could just
let all messages through to SA for a few days to see how things change,
but I don't see the point of wasting CPU/Memory for messages that are
pretty much guaranteed spam.


No, don't do that.  What's the point of letting in clearly forged, 
bogus, or other junk?  It will just slow/hinder delivery to your customers.




Here is the stats on my postfix:

http://pastebin.com/m15d2533e


You have a 90% MTA reject rate.  That's a pretty good first cut.



Maybe I'm worried about nothing but given some of the spam that I get
forwarded that gets through (some very obvious spam) and then to see
what rules it hits just makes me think that something isn't quite right.



Just start fine tuning your rules, and monitor what types of things are 
getting passed your MTA.  I don't see any unverified client host rejects 
- you might want to consider that safe method of culling out more at the 
front door.  Mine cuts out about 15.5%


  Reject unverified client host   15.47%

but some of this may ultimately be overlap into another reject area such 
as an RBL.


   man 5 postconf | less +/check_reverse_client_hostname_access

Mike


Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh



Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The rules sets were default ..
1. Took a fresh SA download
2. Run [configured number of parallel] SA on a [different giant] mbox file
without DNSBL and 'use_bayes 0' and 'bayes_auto_learn 0'


-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760106.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh


Bernd Petrovitsch wrote:
 
 On Thu, 2009-07-30 at 23:55 -0700, poifgh wrote:
 [...]
 I ran freshly build SA with Bayes and DNSBL turned off. Why am I not
 seeing
 a linear increase in the throughput? Is a file locking creating the
 Because the bottleneck is not (only) the CPUs?
 Run `vmstat 1` or similar to see (or at least get an idea;-) if the
 workload is I/O bound or CPU-bound or 
 
 bottleneck? If yes, which particular file is being locked? If no, what
 could
 Maybe. The default store in files drivers locks the DBs exclusively
 for each access.
 
 be the reason for this?
 Switch the DB backend to some MySQL or PostgreSQL (or whatever you like
 using from the supported ones). Run that on the very same machine and
 compare the numbers with the above.
 

Running 'top' with a single SA process running gives 12.5% CPU utilization
which makes sense since one core is fully utilized at this point out of 8
cores. The SA process reports 100% util for that CPU

When fork goes to 8, each individual CPU is utilized from 30-70%  mostly
staying about 30 and only a few reaching 70.

I can vmstat to check out the IO which I dont think should be a problem -
the disks are fast enough to deliver order of magnitudes more reads than 50
msgs/sec.


Can you elaborate on 'store in files'? What are these files, what are they
used for - can they be turned off?

Thnx
-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760163.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh



c. r. wrote:
 
 On Thu, Jul 30, 2009 at 11:55:21PM -0700, poifgh wrote:
 Why am I not seeing a linear increase in the throughput? 
 Is a file locking creating the bottleneck?
 
 Maybe the auto white list.
 
 -- 
 

I can try turning off AWL and get back here..

Thnx
-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760203.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh



Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The problem is not with 22 being a low number, but when we have other free
cores to run different SA parallely why doesnt the throughput scale linearly
.. I expect for 8 cores with 8 SA running simultaneously the number to be
150+ msgs/sec but it is 1/3rd at 50 msgs/sec



-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24760294.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread Nigel Frankcom
I'm assuming you run a tad more messages than I, but on a quad with a
failover I have never seen the failover kick in 4 years. This is not
disputing your observations, just noting mine.

I claim absolutely no knowledge about the core processing/stacking
though I would assume (perhaps incorrectly) that the parsing would be
part of the software (MTA).

I freely admit I only picked up what seems the tail end of this thread
but having used SA for so many years I think I have at least a handle
on how it plays (hence the failover). My failover SA is in place to
handle slow queries from the primary SA. Assuming (again) that mail
size has been factored and any AV is running remotely?

Just a few thoughts based on a very cursory read of a few posts, sadly
- or happily, work make my contributions here limited.

I'd be interested in the results of this though.

Kind regards

Nigel

PS - apologies if I'm repeating prior observations.

On Fri, 31 Jul 2009 10:41:47 -0700 (PDT), poifgh
abhinav.pat...@gmail.com wrote:




Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The problem is not with 22 being a low number, but when we have other free
cores to run different SA parallely why doesnt the throughput scale linearly
.. I expect for 8 cores with 8 SA running simultaneously the number to be
150+ msgs/sec but it is 1/3rd at 50 msgs/sec


Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh

In my tests - there was not MTA. The mails/spam were collected from some
server in mbox format and fed to SA using --mbox switch. The size of msgs
was not altered in any fashion - just the usual size of incoming spam/mails

There are no AV [you mean Anti Virus right?] running on the machine

Would be back with results

--




Nigel Frankcom-2 wrote:
 
 I'm assuming you run a tad more messages than I, but on a quad with a
 failover I have never seen the failover kick in 4 years. This is not
 disputing your observations, just noting mine.
 
 I claim absolutely no knowledge about the core processing/stacking
 though I would assume (perhaps incorrectly) that the parsing would be
 part of the software (MTA).
 
 I freely admit I only picked up what seems the tail end of this thread
 but having used SA for so many years I think I have at least a handle
 on how it plays (hence the failover). My failover SA is in place to
 handle slow queries from the primary SA. Assuming (again) that mail
 size has been factored and any AV is running remotely?
 
 Just a few thoughts based on a very cursory read of a few posts, sadly
 - or happily, work make my contributions here limited.
 
 I'd be interested in the results of this though.
 
 Kind regards
 
 Nigel
 
 PS - apologies if I'm repeating prior observations.
 
 On Fri, 31 Jul 2009 10:41:47 -0700 (PDT), poifgh
 abhinav.pat...@gmail.com wrote:
 



Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The problem is not with 22 being a low number, but when we have other free
cores to run different SA parallely why doesnt the throughput scale
linearly
.. I expect for 8 cores with 8 SA running simultaneously the number to be
150+ msgs/sec but it is 1/3rd at 50 msgs/sec
 
 

-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24761236.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Bogus Data within style tags poisoning SA results

2009-07-31 Thread Nathan M
This seems to be a newer tactic, and a lot of email with content
poisoning seems to be slipping through our spam filters.  The reason
is all the legitimate content between style tags.  Most email apps
don't show the data between style tags and therefore goes ignored
and unseen by the recipient, but SA seems to be looking at it and
using it to poison the scoring system.

Here's an example of what we're seeing within the message source.

style
  creatures quickly produce approve crevice nuclear moping
esoteric pernicious motion faith does embodies does
purify testament maximum exceeding centralism intellect prey
tidying welcomed traal impress tuneless athwart mansions
endures flames echo motion rooms alcohol rituals
etc.. etc.. etc..
/style

That is followed usually by common image spam with a link Click for
more or Go to details.

Anyone have a solution for this?  Can SA be trained to ignore whats in
between style tags?  What would that break?

Thanks,

- N


running two versions of spamd?

2009-07-31 Thread torleif

Hi
I have set up spamassassin to run as a damon and run as the user spamd
instead of root.
When I run  ps xafu | grep spamd I get this output:

root  2892  0.0  0.0   3116   716 pts/0S+   20:34   0:00
 
\_ grep spamd
root  2389  0.0  2.5  29288 26852 ?Ss   17:15   0:02
/usr/sbin/spamd --create-prefs --max-children 5 --username spamd
--helper-home-dir /var/lib/spamassassin/ -s /var/lib/spamassassin/spamd.log
-d --pidfile=/var/lib/spamassassin/spamd.pid
spamd 2581  0.0  2.8  32304 29732 ?S17:16   0:07  \_ spamd
child
spamd 2582  0.0  2.6  30432 27920 ?S17:16   0:00  \_ spamd
child


Is this normal or is spamd running both as root and spamd?


Another question: When I run sa-update should I run it as root or spamd?

Thanks!!
-- 
View this message in context: 
http://www.nabble.com/running-two-versions-of-spamd--tp24761508p24761508.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread Nigel Frankcom
OK - I can see what metrics you are trying to ascertain - I think. I'm
not sure that your test and real life are 'right'. For obvious reasons
I don't want to carry this one on via list - I would suggest you ask
Justin and I will be happy to give info on my local setup (this
assumes Justin can grab time away from toxic nappies/daipers)

There is a lot you can do to ameliorate load. On bad days my quad does
50 a second so it's doable. I will freely admit I have no clue quite
how this came to be, but it is (a case of having colleagues knowing
more than I do - for which I am eternally grateful; the usual culprits
know who they are)

Kind regards

Nigel



On Fri, 31 Jul 2009 11:41:14 -0700 (PDT), poifgh
abhinav.pat...@gmail.com wrote:


In my tests - there was not MTA. The mails/spam were collected from some
server in mbox format and fed to SA using --mbox switch. The size of msgs
was not altered in any fashion - just the usual size of incoming spam/mails

There are no AV [you mean Anti Virus right?] running on the machine

Would be back with results

--




Nigel Frankcom-2 wrote:
 
 I'm assuming you run a tad more messages than I, but on a quad with a
 failover I have never seen the failover kick in 4 years. This is not
 disputing your observations, just noting mine.
 
 I claim absolutely no knowledge about the core processing/stacking
 though I would assume (perhaps incorrectly) that the parsing would be
 part of the software (MTA).
 
 I freely admit I only picked up what seems the tail end of this thread
 but having used SA for so many years I think I have at least a handle
 on how it plays (hence the failover). My failover SA is in place to
 handle slow queries from the primary SA. Assuming (again) that mail
 size has been factored and any AV is running remotely?
 
 Just a few thoughts based on a very cursory read of a few posts, sadly
 - or happily, work make my contributions here limited.
 
 I'd be interested in the results of this though.
 
 Kind regards
 
 Nigel
 
 PS - apologies if I'm repeating prior observations.
 
 On Fri, 31 Jul 2009 10:41:47 -0700 (PDT), poifgh
 abhinav.pat...@gmail.com wrote:
 



Henrik K wrote:
 
 Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
 Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
 used
 and any nondefault rules/settings? Certainly sounds strange that 1 core
 could top out the same. Anyone else have figures? Maybe I've borked
 something myself..
 

The problem is not with 22 being a low number, but when we have other free
cores to run different SA parallely why doesnt the throughput scale
linearly
.. I expect for 8 cores with 8 SA running simultaneously the number to be
150+ msgs/sec but it is 1/3rd at 50 msgs/sec
 
 


privacy policy updates?

2009-07-31 Thread LuKreme
I've gotten a message from realage-privacypolicy.com which looks like  
it is a typical corporate html-heavy message. This one is updating me  
that their privacy policy has changed. The reason I am suspicious is  
that I've received at least 3 others this week that look very similar  
from various other sites. I think the others were from something  
called Kaboose and another was Harmony (or something similar, not  
eHarmony) and the third was... can't remember the third and it's  
already been deleted.


I haven't gone to any of the sites, and it could all be coincidence,  
but it seemed a little suspicious to me.


Over-reaction?

The realage one really does have a lot of spam sign (the domain name  
for one, though it is real), the content-type text/html with no plain  
alternative, obviously tracked URLS like


http://link.realage-mail.com/u.d?B4GsbPkLdHyrFL8gOixE=914

and the fact I have no idea who these people are.

--
Hi, I'm Gary Cooper, but not the Gary Cooper that's dead.



Re: Any one interested in using a proper forum?

2009-07-31 Thread jdow

profanity no. Even if you cannot think properly and use your brain
the people here have brains that function.

{^_^}
- Original Message - 
From: snowweb pe...@snowweb.co.uk

Sent: Tuesday, 2009/July/28 04:07




I don't know about anyone else, but I'm getting a bit hacked of with this
1980's style forum. I'm trying to get to the bottom of an SA issue and 
this

list/forum thing is giving me a bigger headache than SA!

Spamassassin has more than one or two users now and I personally think 
that
it should have a support forum to match the class of software, which is 
now

world class.

I know it's free and all that, but even so, if this is the only form of
support they provide, I'm thinking that I'll just start an alternative
support forum, using standard, full featured forum software (like SMF).

Is there any support for this (I already know there will be opposition 
from
those who are 'resident' here. Sorry guys, I just want do something to 
help
those who just dive in when they have an urgent problem. No hard feelings 
I

hope.)

Peter Snow


--
View this message in context: 
http://www.nabble.com/Any-one-interested-in-using-a-proper-forum--tp24697144p24697144.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com. 




Re: Parallelizing Spam Assassin

2009-07-31 Thread Paweł Sasin
 In my tests - there was not MTA. The mails/spam were collected from
 some server in mbox format and fed to SA using --mbox switch. The
 size of msgs was not altered in any fashion - just the usual size of
 incoming spam/mails

If you're interested in testing/tuning spamassassin for heavy loads you
should consider using spamd daemon. Then you may use SLAMD [1] as
performance evaluation platform [2].

It takes some effort to set up the environment, but SLAMD helps in
repetitive testing and keeping track of the results (comparison,
history, charts).

[1] http://www.slamd.com
[2] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5689

-- 
Pawel Sasin

WIRTUALNA POLSKA Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.


Re: Parallelizing Spam Assassin

2009-07-31 Thread Michael Parker


On Jul 31, 2009, at 1:55 AM, poifgh wrote:


I ran freshly build SA with Bayes and DNSBL turned off. Why am I not  
seeing

a linear increase in the throughput? Is a file locking creating the
bottleneck? If yes, which particular file is being locked? If no,  
what could

be the reason for this?


There could be many reasons, check out my talk (admittedly out of date  
a little but should still be mostly relevant) on High Performance  
Apache SpamAssassin at the following link:


http://people.apache.org/~parker/presentations/index.html

Keep in mind that you might also be seeing other factors like memory  
and disk I/O contention.  You don't really spell out your testing  
infrastructure so its not real clear if you're even performing a valid  
test.


Also, I wouldn't necessarily expect to see a linear increase, although  
you might be able to take some easy steps for increasing your overall  
performance.


Michael



Re: Parallelizing Spam Assassin

2009-07-31 Thread LuKreme

On Jul 31, 2009, at 2:53 AM, Justin Mason wrote:
On Fri, Jul 31, 2009 at 09:32,

rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:
Imagine what Barracuda Networks could do with that if they did not  
fill
their gay little boxes with hardware rubbish from the floors of MSI  
and

supermicro. Jesus, try and process that many messages with a $30,000
Barracuda and watch support bitch 'You are fully scanning to much  
mail

and making our rubbish hardware wet the bed.' LOL.


Richard -- please watch your language.   This is a public mailing
list, and offensive language here is inappropriate.


I dunno, 'gay' isn't that offensive.


--
Overhead, without any fuss, the stars were going out.



Re: Parallelizing Spam Assassin

2009-07-31 Thread LuKreme

On Jul 31, 2009, at 9:25 AM, John Hardin wrote:

On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote:


... dropping in here and making jokes at such low hanging fruit.


Make all the jokes at Barracuda's expense that you like, complain  
about them all you like, just avoid offensive language.


Really? Referring to gay hardware is THAT offensive that someone would  
need to be banned over it?


--
Is a vegetarian permitted to eat animal crackers?



Re: Parallelizing Spam Assassin

2009-07-31 Thread jdow

From: Matt Kettler mkettler...@verizon.net
Sent: Friday, 2009/July/31 04:26



rich...@buzzhost.co.uk wrote:

On Fri, 2009-07-31 at 09:53 +0100, Justin Mason wrote:
  

On Fri, Jul 31, 2009 at 09:32,
rich...@buzzhost.co.ukrich...@buzzhost.co.uk wrote:


...
  

Richard -- please watch your language.   This is a public mailing
list, and offensive language here is inappropriate.



...




  

Richard, we are not joking. Please watch your language on this mailing
list, or you will be banned from it.

You have now been warned by 2 members of the Project Management
Committee. You will not be warned again.


Given that profanity is the effort of a small mind to express itself
I have a feeling he's going to receive his third and final warning any
time now, Matt.

{^_-}


Re: Parallelizing Spam Assassin

2009-07-31 Thread LuKreme

On Jul 31, 2009, at 1:33 PM, jdow wrote:

Given that profanity is the effort of a small mind to express itself
I have a feeling he's going to receive his third and final warning any
time now, Matt


Given that nothing that richard said is not anything I've heard on,  
say, prime time TV or... a committee meeting I am really curious now  
as to what was considered 'obscene'.


I'm quite serious.

Have I stumbled into a list run by religious freaks?

--
Clark's Law: Sufficiently advanced cluelessness is
indistinguishable from malice
Clark Slaw: Anything that has been severely damaged or destroyed
by application of Clark's Law



Re: Parallelizing Spam Assassin

2009-07-31 Thread John Rudd
On Fri, Jul 31, 2009 at 12:37, LuKremekrem...@kreme.com wrote:
 On Jul 31, 2009, at 1:33 PM, jdow wrote:

 Given that profanity is the effort of a small mind to express itself
 I have a feeling he's going to receive his third and final warning any
 time now, Matt

 Given that nothing that richard said is not anything I've heard on, say,
 prime time TV or... a committee meeting I am really curious now as to what
 was considered 'obscene'.

 I'm quite serious.

 Have I stumbled into a list run by religious freaks?

(mods: sorry if this also falls into the verboten category, I'm more
trying to explore/catalog than perpetuate)

Maybe it was using the word bitch, where he could have used the word
complain.

(and, religious freaks aren't the only freaks that don't like to see
the word Jesus used in that kind of context ... saying words like
Jesus around atheist freaks can also result in them claiming offence
... luckily religious freaks and atheist freaks aren't as common as
merely religious people and merely atheist people)


Re: Bogus Data within style tags poisoning SA results

2009-07-31 Thread John Hardin

On Fri, 31 Jul 2009, Nathan M wrote:


Here's an example of what we're seeing within the message source.

style
 creatures quickly produce approve crevice nuclear moping
esoteric pernicious motion faith does embodies does
purify testament maximum exceeding centralism intellect prey
tidying welcomed traal impress tuneless athwart mansions
endures flames echo motion rooms alcohol rituals
etc.. etc.. etc..
/style


Style tags have some format requirements. It might be reasonable (though 
expensive) to try to detect style tags that do not have any of those 
syntactic elements...


For now, though, this is just more bayes poison. Train it as spam and the 
scores will go up.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  False is the idea of utility that sacrifices a thousand real
  advantages for one imaginary or trifling inconvenience; that would
  take fire from men because it burns, and water because one may drown
  in it; that has no remedy for evils except destruction. The laws
  that forbid the carrying of arms are laws of such a nature. They
  disarm only those who are neither inclined nor determined to commit
  crime.   -- Cesare Beccaria, quoted by Thomas Jefferson
---
 5 days until the 274th anniversary of John Peter Zenger's acquittal


Re: running two versions of spamd?

2009-07-31 Thread Karsten Bräckelmann
On Fri, 2009-07-31 at 11:59 -0700, an anonymous Nabble user wrote:
 I have set up spamassassin to run as a damon and run as the user spamd
 instead of root.
 When I run  ps xafu | grep spamd I get this output:

 root  2389  0.0  2.5  29288 26852 ?Ss   17:15   0:02
 /usr/sbin/spamd --create-prefs --max-children 5 --username spamd
 --helper-home-dir /var/lib/spamassassin/ -s /var/lib/spamassassin/spamd.log
 -d --pidfile=/var/lib/spamassassin/spamd.pid

 spamd 2581  0.0  2.8  32304 29732 ?S17:16   0:07  \_ spamd 
 child
 spamd 2582  0.0  2.6  30432 27920 ?S17:16   0:00  \_ spamd 
 child
 
 Is this normal or is spamd running both as root and spamd?

You are starting the daemon as root. And tell it to setuid to the user
spamd. I believe this is perfectly normal. Btw, see 'man spamd' for
the -u option.

Only the child processes, which correctly setuid'd, will process
messages.


 Another question: When I run sa-update should I run it as root or spamd?

The master process (which does not scan messages, but care about its
busy children) will read that data. So you want to ensure it's readable
by that user.

FWIW, if you would not explicitly specify the -u option, the child
spamds would setuid to the user calling spamc.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Parallelizing Spam Assassin

2009-07-31 Thread Glenn Sieb
LuKreme said the following on 7/31/09 3:27 PM:
 Richard -- please watch your language.   This is a public mailing
 list, and offensive language here is inappropriate.

 I dunno, 'gay' isn't that offensive.



Gay is *not* a synonym for stupid.

I do take offense to the term being used in that manner.

--Glenn



Re: Parallelizing Spam Assassin

2009-07-31 Thread Matt Kettler
rich...@buzzhost.co.uk wrote:
 email me off list as I've just been
 banned for upsetting a sponsor LOL
   
Richard, this has nothing to do with Barracuda. They have no influence
over my opinions whatsoever. I don't work for Apache or Barracuda, or
any company sponsored by either.Neither Apache nor Barracuda has
complained. At the time I warned you, I didn't even remember that
Barracuda ever donated to Apache. I don't think any member of the PMC
has any regular contact with Barracuda, although we've had occasional
contact about using their RBL.

Your warning is about using foul language, and then choosing to thumb
your nose at the warning Justin gave you. You're behaving like an
impudent and foul mouthed child, and that's unwelcome her.

That said, I really don't appreciate you using this list to rant about
Barracuda's products, or discuss them at all. This is the SpamAssassin
list, not the Barracuda list. Barracuda may use SpamAssassin, and
SpamAssassin may support the Barracuda public RBL, but beyond that, any
discussion of them is, quite frankly, off-topic. I don't care how good
or bad their commercial product, or its support is, because it is
off-topic here. I don't welcome people praising Barracuda any more than
I welcome complaints. It simply doesn't matter to SpamAssassin, so it
doesn't belong here.

You may as well be ranting about Ford cars for all I care, it still
doesn't belongs here.

This list is about SpamAssassin, nothing more, nothing less.

Continue with the foul language, and you'll find the door very quickly.
Keep harping on the same off-topic subject and we will eventually get
tired of it. You've said your peace about Barracuda, now give it a rest,
because frankly I don't care about their products, I care about our product.

Is that difficult to understand?













   



Re: Parallelizing Spam Assassin

2009-07-31 Thread Henrik K
On Fri, Jul 31, 2009 at 10:41:47AM -0700, poifgh wrote:

 Henrik K wrote:
  
  Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
  Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
  used
  and any nondefault rules/settings? Certainly sounds strange that 1 core
  could top out the same. Anyone else have figures? Maybe I've borked
  something myself..
  
 
 The problem is not with 22 being a low number, but when we have other free

I did not say it was a problem. I was just wondering how fast CPU/memory you
have, since my 3Ghz AMD doesn't seem to keep up.

I just tested with fresh 3.2.5 install, and running 500 mail mbox with
single core resulted in 11 msgs / sec. Then I used sa-compile, and it raised
to 15. Did you use it also?

Of course your mailbox could be a lot different, so hard to compare.

 cores to run different SA parallely why doesnt the throughput scale linearly
 .. I expect for 8 cores with 8 SA running simultaneously the number to be
 150+ msgs/sec but it is 1/3rd at 50 msgs/sec

Anyway as people have already said here, disable AWL:

use_auto_whitelist 0



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh

I am sorry, I did not provide any statistics of the machine involved.
CPU - 8 cores with each core 2327 MHz
RAM - 16GB
Afair its has 7200RPM disk - 2TB.

Yes, people were right in indicating AWL could be the problem. turning off
AWL results in near linear scaling of SA as we increase number of processes.
My input is more than a 100K [mostly] spams which allowed me to have each
run last for several minutes and then take an avg to get #msgs/sec


With AWL, bayes and DNSBL turned off - i get about 24 msgs/sec for 1 fork
and 166 msgs/sec for 8 fork

with awl on and bayes and DNSBL off, i get about 22 msgs/sec for 1 fork and
50 msgs/sec for 8 fork

Thnx everyone for helping out.

--



Henrik K wrote:
 
 On Fri, Jul 31, 2009 at 10:41:47AM -0700, poifgh wrote:
 
 
 I did not say it was a problem. I was just wondering how fast CPU/memory
 you
 have, since my 3Ghz AMD doesn't seem to keep up.
 
 I just tested with fresh 3.2.5 install, and running 500 mail mbox with
 single core resulted in 11 msgs / sec. Then I used sa-compile, and it
 raised
 to 15. Did you use it also?
 
 Of course your mailbox could be a lot different, so hard to compare.
 
 cores to run different SA parallely why doesnt the throughput scale
 linearly
 .. I expect for 8 cores with 8 SA running simultaneously the number to be
 150+ msgs/sec but it is 1/3rd at 50 msgs/sec
 
 Anyway as people have already said here, disable AWL:
 
 use_auto_whitelist 0
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24765545.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread poifgh

I havent tried with sa-compile yet - I can give it a shot

--


Henrik K wrote:
 
 On Fri, Jul 31, 2009 at 10:41:47AM -0700, poifgh wrote:

 Henrik K wrote:
  
  Yeah, given that my 4x3Ghz box masscheck peaks at 22 msgs/sec, without
  Net/AWL/Bayes. But that's the 3.3 SVN ruleset.. wonder what version was
  used
  and any nondefault rules/settings? Certainly sounds strange that 1 core
  could top out the same. Anyone else have figures? Maybe I've borked
  something myself..
  
 
 The problem is not with 22 being a low number, but when we have other
 free
 
 I did not say it was a problem. I was just wondering how fast CPU/memory
 you
 have, since my 3Ghz AMD doesn't seem to keep up.
 
 I just tested with fresh 3.2.5 install, and running 500 mail mbox with
 single core resulted in 11 msgs / sec. Then I used sa-compile, and it
 raised
 to 15. Did you use it also?
 
 Of course your mailbox could be a lot different, so hard to compare.
 
 cores to run different SA parallely why doesnt the throughput scale
 linearly
 .. I expect for 8 cores with 8 SA running simultaneously the number to be
 150+ msgs/sec but it is 1/3rd at 50 msgs/sec
 
 Anyway as people have already said here, disable AWL:
 
 use_auto_whitelist 0
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Parallelizing-Spam-Assassin-tp24751958p24765570.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Parallelizing Spam Assassin

2009-07-31 Thread rich...@buzzhost.co.uk
On Fri, 2009-07-31 at 17:37 -0400, Glenn Sieb wrote:
 LuKreme said the following on 7/31/09 3:27 PM:
  Richard -- please watch your language.   This is a public mailing
  list, and offensive language here is inappropriate.
 
  I dunno, 'gay' isn't that offensive.
 
 
 
 Gay is *not* a synonym for stupid.
 
 I do take offense to the term being used in that manner.
 
 --Glenn
 
I find it deeply offensive that the word 'gay' is used as a synonym for
homosexual in an attempt to stop people from using 'queer' - but hey
'gays' are not the only ones with opinions that 'matter'.

Gay **is** a synonym for 'stupid' (silly) as far as I am concerned. It's
original meaning of 'carefree','happy','silly' and 'showy' are clearly
being used with sarcasm. The fact is 'queers' hijacked the word as per
this;

— USAGE Gay is now a standard term for ‘homosexual’, and is the term
preferred by homosexual men to describe themselves. As a result, it is
now very difficult to use gay in its earlier meanings ‘carefree’ or
‘bright and showy’ without arousing a sense of double entendre. Gay in
its modern sense typically refers to men, lesbian being the standard
term for homosexual women.
http://www.askoxford.com/concise_oed/gay?view=uk

So please *quit* with the sympathetic pink preaching and learn what the
word actually means. Just because it is the term preferred by
homosexual men to describe themselves does not mean a minority have the
right to slate people who use the word properly.

With regards to the dig about Barracuda - this *WAS* OT. There were some
benchmark tests discussed here that were impressive. My experience of SA
in daily production is on Barracuda Appliances that STRUGGLE to
push 6-8 messages a second through, so it was relevant as comparison.
The wording could have been chosen with more care and I apologise to
Christians or dog lovers who found the use of the messiah or female form
offensive. However, the use of gay in a sarcastic context clearly fits
with the original origin of the word, not by that section of the society
who have stolen it and made it OT and OM. For that I make ***NO***
apology. I appreciate that using 'gay' in it's real meaning may hurt the
feelings of some 'homosexuals' but as I have to respect their choices
and views, they should show *me* the same respect for *my* views and
choices. You may not like who I am and what I do, I may not like who you
are and what you do.

Now do we need to continue this or throw little tin God banning threats
around more or can we just *get along* knowing we are all different but
frequenting this list for Spamassassin information ?





Re: Parallelizing Spam Assassin

2009-07-31 Thread jdow

From: LuKreme krem...@kreme.com
Sent: Friday, 2009/July/31 12:30



On Jul 31, 2009, at 9:25 AM, John Hardin wrote:

On Fri, 31 Jul 2009, rich...@buzzhost.co.uk wrote:


... dropping in here and making jokes at such low hanging fruit.


Make all the jokes at Barracuda's expense that you like, complain  
about them all you like, just avoid offensive language.


Really? Referring to gay hardware is THAT offensive that someone would  
need to be banned over it?


No, it's the word expensive.

{+_+}


Re: Parallelizing Spam Assassin

2009-07-31 Thread jdow

From: LuKreme krem...@kreme.com
Sent: Friday, 2009/July/31 12:37



On Jul 31, 2009, at 1:33 PM, jdow wrote:

Given that profanity is the effort of a small mind to express itself
I have a feeling he's going to receive his third and final warning any
time now, Matt


Given that nothing that richard said is not anything I've heard on,  
say, prime time TV or... a committee meeting I am really curious now  
as to what was considered 'obscene'.


I'm quite serious.

Have I stumbled into a list run by religious freaks?


Not me. I can happily go several whole days without hearing the
B word. When I hear it I get B...y.

{^_^}   Joanne


Re: Parallelizing Spam Assassin

2009-07-31 Thread jdow

From: poifgh abhinav.pat...@gmail.com
Sent: Friday, 2009/July/31 19:47




I am sorry, I did not provide any statistics of the machine involved.
CPU - 8 cores with each core 2327 MHz
RAM - 16GB
Afair its has 7200RPM disk - 2TB.


One disk you might consider a striped array to get disk speed.
50 megabytes per second stresses most disks pretty hard - not to the
limit. But if there is a lot of seeking involved as well as multiple
copies of the files being made as they pass through the system I can
see how it'd be a little rough on the disk throughput.

{^_^}