Re: Stability of spamassassin command-line tool

2010-07-16 Thread Gnanam

Thank you all experts for your valuable ideas/opinions on this topic.
-- 
View this message in context: 
http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29189632.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Stability of spamassassin command-line tool

2010-07-15 Thread Gnanam

Hi,

In continuation to my original posting here,
http://old.nabble.com/SpamAssassin-Integration-ts28903365.html


Gnanam wrote:
 I want to integrate SpamAssassin in my web-based application to test spam
 score of the email content that our application User's wish to send in
 mail composing page itself - even before sending.  When I say mail
 composing page here, it is not an email client like Outlook, Outlook
 Express, etc. but rather it is a regular/normal web-based form with HTML
 editor. 
 
 How do I integrate SpamAssassin for my Use Case explained above?  Relevant
 documentation links on the same are appreciated.

As I'm integrating SpamAssassin command-line tool in our web-based
application to test spam score of the email message, hundreds of application
Users may perform spam score test at the same time.

My question is: Will the command-line tool spamassassin or spamc be
stable/reliable enough to test hundreds of different email messages at the
same time?

Experts ideas/advice/opinions/comments are appreciated.

Regards,
Gnanam
-- 
View this message in context: 
http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29171831.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Stability of spamassassin command-line tool

2010-07-15 Thread Martin Gregorie
On Thu, 2010-07-15 at 04:31 -0700, Gnanam wrote:
 As I'm integrating SpamAssassin command-line tool in our web-based
 application to test spam score of the email message, hundreds of application
 Users may perform spam score test at the same time.
 
I'd say suck it and see initially, with your web app calling spamc to
pass messages to spamd - that will give the fastest per-message
processing time because there is no startup/teardown overheads for each
message.

If the service gets as much use as you're guessing, concurrent requests
will collide, causing slower responses. If this can't be sufficiently
minimised by adjusting the spamd child process population, you can
always change the web app to accept and queue messages to be checked and
e-mail the results back to the submitter.

 My question is: Will the command-line tool spamassassin or spamc be
 stable/reliable enough to test hundreds of different email messages at the
 same time?
 
Its reliable enough, but concurrency will be limited by the number of
child processes you allow spamd to run - on normal MTAs this limit is in
single or low double figures. To allow 'hundreds' of simultaneous tests
you'd have to launch a copy of spamassassin as each message is
submitted, which means also providing enough hardware to run 'hundreds'
of copies of spamassassin in parallel.
  
 Experts ideas/advice/opinions/comments are appreciated.
 
Take a good look at your existing SA installation(s) and scale the mail
checking installation accordingly: its just a normal hardware sizing
exercise based round an application that requires fairly significant
resources (memory and cpu) to process each submitted message.
 

Martin




Re: Stability of spamassassin command-line tool

2010-07-15 Thread Karsten Bräckelmann
On Thu, 2010-07-15 at 04:31 -0700, Gnanam wrote:
  I want to integrate SpamAssassin in my web-based application to test spam
  score of the email content that our application User's wish to send in
  mail composing page itself - even before sending.

 As I'm integrating SpamAssassin command-line tool in our web-based
 application to test spam score of the email message, hundreds of application
 Users may perform spam score test at the same time.
 
 My question is: Will the command-line tool spamassassin or spamc be
 stable/reliable enough to test hundreds of different email messages at the
 same time?

No stability concerns with either.

However, with anything other than a trivial load, do not use the plain
spamassassin script, but the spamd daemon with the light-weight spamc
client. The daemon is much faster and consumes less resources, because
SA does not have to compile all rules and start a full Perl process each
time -- unlike the spamassassin script, which does.


However, you will most likely *not* be able to scan hundreds of
messages at the same time. Your machine simply doesn't have the
resources for that.

Do you really expect them to be scanned simultaneously, at the very same
moment!? A continuous, steady stream of a hundred messages every few
seconds? Or are you referring to same time as in a human meaning,
actually covering minutes or even more.

Depending on your hardware and rules / DNS BL optimization for your
specific case, scanning a message should take less than a few seconds,
while some of them actually can be processed simultaneously. Think RAM.
So yes, hundreds of messages per minute definitely is possible.


But these are human generated messages, right? Exactly how many monkeys
do you have typing, to expect anything even close to that throughput?

I guess I'm still not clear on your actual intention and environment.
Hope you do understand, that the receiver is likely to perform his own
spam filtering, with criteria differing from yours. In particular, all
those DNS BL and reputation based tests you simply cannot perform before
sending the mail.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Stability of spamassassin command-line tool

2010-07-15 Thread Emin Akbulut
Testing hundreds of different email messages at the same time is a bit
excessive;
ram usage, harddisk I/O bottleneck, etc... In my case if threads are more
than 16
then server may become non-responsive because of virtual memory is too
high.
SA is not a cpu hunger application but it uses
quite high memory, especially spamassassin.exe (50 MB avg./ takes 5 seconds
/ per session)


On Thu, Jul 15, 2010 at 3:41 PM, Emin Akbulut eminakbu...@gmail.com wrote:

 Testing hundreds of different email messages at the same time is a bit
 excessive;
 ram usage, harddisk I/O bottleneck, etc... In my case if threads are more
 than 16
 then server may become non-responsive because of virtual memory is too
 high.
 SA is not a cpu hunger application but it uses
 quite high memory, especially spamassassin.exe (50 MB avg. per session)




 On Thu, Jul 15, 2010 at 2:31 PM, Gnanam gna...@zoniac.com wrote:


 Hi,

 In continuation to my original posting here,
 http://old.nabble.com/SpamAssassin-Integration-ts28903365.html


 Gnanam wrote:
  I want to integrate SpamAssassin in my web-based application to test
 spam
  score of the email content that our application User's wish to send in
  mail composing page itself - even before sending.  When I say mail
  composing page here, it is not an email client like Outlook, Outlook
  Express, etc. but rather it is a regular/normal web-based form with HTML
  editor.
 
  How do I integrate SpamAssassin for my Use Case explained above?
  Relevant
  documentation links on the same are appreciated.

 As I'm integrating SpamAssassin command-line tool in our web-based
 application to test spam score of the email message, hundreds of
 application
 Users may perform spam score test at the same time.

 My question is: Will the command-line tool spamassassin or spamc be
 stable/reliable enough to test hundreds of different email messages at the
 same time?

 Experts ideas/advice/opinions/comments are appreciated.

 Regards,
 Gnanam
 --
 View this message in context:
 http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29171831.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.





Re: Stability of spamassassin command-line tool

2010-07-15 Thread Gnanam

I'm posting a reply which I received from Emin Akbulut here:

Testing hundreds of different email messages at the same time is a bit
excessive;
ram usage, harddisk I/O bottleneck, etc... In my case if threads are more
than 16 
then server may become non-responsive because of virtual memory is too high. 
SA is not a cpu hunger application but it uses 
quite high memory, especially spamassassin.exe (50 MB avg. per session)

-- 
View this message in context: 
http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29172566.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Stability of spamassassin command-line tool

2010-07-15 Thread Emin Akbulut
Ops sorry, I use Gmail, it stacks messages well but when I hit the Reply
the message will send only the last person on thread. I have to modify
To: field : )


On Thu, Jul 15, 2010 at 3:51 PM, Gnanam gna...@zoniac.com wrote:


 I'm posting a reply which I received from Emin Akbulut here:




Re: Stability of spamassassin command-line tool

2010-07-15 Thread Gnanam


Martin Gregorie-2 wrote:
 
 Its reliable enough, but concurrency will be limited by the number of
 child processes you allow spamd to run - on normal MTAs this limit is in
 single or low double figures. To allow 'hundreds' of simultaneous tests
 you'd have to launch a copy of spamassassin as each message is
 submitted, which means also providing enough hardware to run 'hundreds'
 of copies of spamassassin in parallel.

Where do I limit/configure the number of child processes that spamd can run? 
Can you provide me documentation link for the same?
Can you share with me the normal limit imposed by a typical MTA?


Martin Gregorie-2 wrote:
 
 Take a good look at your existing SA installation(s) and scale the mail
 checking installation accordingly: its just a normal hardware sizing
 exercise based round an application that requires fairly significant
 resources (memory and cpu) to process each submitted message.

If you were in my place, what would you recommend me to check with incase of
SA installation?

-- 
View this message in context: 
http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29172749.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Stability of spamassassin command-line tool

2010-07-15 Thread Karsten Bräckelmann
On Thu, 2010-07-15 at 06:09 -0700, Gnanam wrote:
 Martin Gregorie wrote:
  Its reliable enough, but concurrency will be limited by the number of
  child processes you allow spamd to run - on normal MTAs this limit is in
  single or low double figures. To allow 'hundreds' of simultaneous tests
  you'd have to launch a copy of spamassassin as each message is
  submitted, which means also providing enough hardware to run 'hundreds'
  of copies of spamassassin in parallel.
 
 Where do I limit/configure the number of child processes that spamd can run? 
 Can you provide me documentation link for the same?

man spamd

-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Stability of spamassassin command-line tool

2010-07-15 Thread Jari Fredriksson
On 15.7.2010 16:09, Gnanam wrote:
 
 Where do I limit/configure the number of child processes that spamd can run? 
 Can you provide me documentation link for the same?
 Can you share with me the normal limit imposed by a typical MTA?
 

It depends. If you are using *nix it is dependent on the *nix brand or
Linux distribution. For example in Debian it is in /etc/default/spamassassin

My config: default --maxchildren

---(8)---
# NOTE: version 3.0.x has switched to a preforking model, so you
# need to make sure --max-children is not set to anything higher than
# 5, unless you know what you're doing.

OPTIONS=--siteconfigpath=/etc/spamassassin --nouser-config -q \
 -i 0.0.0.0 -A 127.0.0.1,10.0.0.0/8 -u spam -l --allow-tell

---(8)---

The config location can be seen in /etc/init.d/spamassassin (or spamd),
where it reads the config settings. The settings may be in that script
itself, as well.


-- 
http://www.iki.fi/jarif/
I use PGP. If there is an incompatibility problem with your mail
client, please contact me.

You will be audited by the Internal Revenue Service.



signature.asc
Description: OpenPGP digital signature


Re: Stability of spamassassin command-line tool

2010-07-15 Thread Gnanam


Karsten Bräckelmann-2 wrote:
 
 No stability concerns with either.
 
 However, with anything other than a trivial load, do not use the plain
 spamassassin script, but the spamd daemon with the light-weight spamc
 client. The daemon is much faster and consumes less resources, because
 SA does not have to compile all rules and start a full Perl process each
 time -- unlike the spamassassin script, which does.

Thanks for making me understand this important and critical difference.  But
why then spamassassin script should exist - just for my understanding?


Karsten Bräckelmann-2 wrote:
 
 However, you will most likely *not* be able to scan hundreds of
 messages at the same time. Your machine simply doesn't have the
 resources for that.
 
 Do you really expect them to be scanned simultaneously, at the very same
 moment!? A continuous, steady stream of a hundred messages every few
 seconds? Or are you referring to same time as in a human meaning,
 actually covering minutes or even more.

We usually benchmark any software/tool that is integrated within our
web-based application for a maximum of 100 concurrent users.  As you'd
pointed rightly, there may be few seconds/minutes apart between each
requests.


Karsten Bräckelmann-2 wrote:
 
 But these are human generated messages, right? Exactly how many monkeys
 do you have typing, to expect anything even close to that throughput?

Hope my just above reply answers this.


Karsten Bräckelmann-2 wrote:
 
 I guess I'm still not clear on your actual intention and environment.
 Hope you do understand, that the receiver is likely to perform his own
 spam filtering, with criteria differing from yours. In particular, all
 those DNS BL and reputation based tests you simply cannot perform before
 sending the mail.

Yes, I totally understand and agree with your view points.  To increase the
email deliverability of our application Users, they themselves can test spam
score of the email content they wish to send.  Hence, it's just a spam score
test purely on the email content from the sender's point of view.  Hope this
makes things clear.
-- 
View this message in context: 
http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29173062.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Stability of spamassassin command-line tool

2010-07-15 Thread Daniel Lemke


Gnanam wrote:
 
 
 Karsten Bräckelmann-2 wrote:
 
 No stability concerns with either.
 
 However, with anything other than a trivial load, do not use the plain
 spamassassin script, but the spamd daemon with the light-weight spamc
 client. The daemon is much faster and consumes less resources, because
 SA does not have to compile all rules and start a full Perl process each
 time -- unlike the spamassassin script, which does.
 
 Thanks for making me understand this important and critical difference. 
 But why then spamassassin script should exist - just for my understanding?
 
 


Like already mentioned, Spamd needs a lot of memory and runs as a Daemon,
therefore it uses some of your system resources all the time. No need for
that if you're only receiving a few mails per day (and for this, we've got
SpamAssassin).
-- 
View this message in context: 
http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29173345.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Stability of spamassassin command-line tool

2010-07-15 Thread Karsten Bräckelmann
On Thu, 2010-07-15 at 06:40 -0700, Gnanam wrote:
 Karsten Bräckelmann wrote:
  No stability concerns with either.
  
  However, with anything other than a trivial load, do not use the plain
  spamassassin script, but the spamd daemon with the light-weight spamc
  client. The daemon is much faster and consumes less resources, because
  SA does not have to compile all rules and start a full Perl process each
  time -- unlike the spamassassin script, which does.
 
 Thanks for making me understand this important and critical difference.  But
 why then spamassassin script should exist - just for my understanding?

One particular good reason would be rule development, or even simple
configuration changes -- and the ability to test and lint them, without
restarting the daemon. A lint error in that case can have quite drastic
consequences. There are more reasons.


 Karsten Bräckelmann wrote:
  However, you will most likely *not* be able to scan hundreds of
  messages at the same time. Your machine simply doesn't have the
  resources for that.
  
  Do you really expect them to be scanned simultaneously, at the very same
  moment!? A continuous, steady stream of a hundred messages every few
  seconds? Or are you referring to same time as in a human meaning,
  actually covering minutes or even more.
 
 We usually benchmark any software/tool that is integrated within our
 web-based application for a maximum of 100 concurrent users.  As you'd
 pointed rightly, there may be few seconds/minutes apart between each
 requests.

100 concurrent users. How long does it take each user to write the
message in average? That's the time span you want to process the
messages in. Assuming the users spending minutes writing, while scanning
is seconds, the actual load ist almost about entirely serial. No
concurrency. After all, there are 60 seconds in a minute...

Rule of thumb: 2 minutes writing, 2 seconds scanning. You can process 60
messages with a single SA child.

Not even close to how I understood your original question. Something
like that is possible even with very moderate hardware. And I kind of
hope your users spend more than 2 minutes per message... ;)


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Stability of spamassassin command-line tool

2010-07-15 Thread Karsten Bräckelmann
On Thu, 2010-07-15 at 07:02 -0700, Daniel Lemke wrote:
  Thanks for making me understand this important and critical difference. 
  But why then spamassassin script should exist - just for my understanding?
 
 Like already mentioned, Spamd needs a lot of memory and runs as a Daemon,
 therefore it uses some of your system resources all the time. No need for
 that if you're only receiving a few mails per day (and for this, we've got
 SpamAssassin).

Heh, that came out wrong. :)

Nitpicking. You meant spamassassin there (as in the script's name).
SpamAssassin is much more, includes the daemon, and we do *not* have it
just for a few mails per day. ;)


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Stability of spamassassin command-line tool

2010-07-15 Thread Daniel Lemke


Karsten Bräckelmann-2 wrote:
 
 On Thu, 2010-07-15 at 07:02 -0700, Daniel Lemke wrote:
  Thanks for making me understand this important and critical difference. 
  But why then spamassassin script should exist - just for my
 understanding?
 
 Like already mentioned, Spamd needs a lot of memory and runs as a Daemon,
 therefore it uses some of your system resources all the time. No need for
 that if you're only receiving a few mails per day (and for this, we've
 got
 SpamAssassin).
 
 Heh, that came out wrong. :)
 
 Nitpicking. You meant spamassassin there (as in the script's name).
 SpamAssassin is much more, includes the daemon, and we do *not* have it
 just for a few mails per day. ;)
 


Errm, sorry for that :P
-- 
View this message in context: 
http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29173685.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.