Re: Stability of spamassassin command-line tool
Thank you all experts for your valuable ideas/opinions on this topic. -- View this message in context: http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29189632.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Stability of spamassassin command-line tool
Hi, In continuation to my original posting here, http://old.nabble.com/SpamAssassin-Integration-ts28903365.html Gnanam wrote: I want to integrate SpamAssassin in my web-based application to test spam score of the email content that our application User's wish to send in mail composing page itself - even before sending. When I say mail composing page here, it is not an email client like Outlook, Outlook Express, etc. but rather it is a regular/normal web-based form with HTML editor. How do I integrate SpamAssassin for my Use Case explained above? Relevant documentation links on the same are appreciated. As I'm integrating SpamAssassin command-line tool in our web-based application to test spam score of the email message, hundreds of application Users may perform spam score test at the same time. My question is: Will the command-line tool spamassassin or spamc be stable/reliable enough to test hundreds of different email messages at the same time? Experts ideas/advice/opinions/comments are appreciated. Regards, Gnanam -- View this message in context: http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29171831.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Stability of spamassassin command-line tool
On Thu, 2010-07-15 at 04:31 -0700, Gnanam wrote: As I'm integrating SpamAssassin command-line tool in our web-based application to test spam score of the email message, hundreds of application Users may perform spam score test at the same time. I'd say suck it and see initially, with your web app calling spamc to pass messages to spamd - that will give the fastest per-message processing time because there is no startup/teardown overheads for each message. If the service gets as much use as you're guessing, concurrent requests will collide, causing slower responses. If this can't be sufficiently minimised by adjusting the spamd child process population, you can always change the web app to accept and queue messages to be checked and e-mail the results back to the submitter. My question is: Will the command-line tool spamassassin or spamc be stable/reliable enough to test hundreds of different email messages at the same time? Its reliable enough, but concurrency will be limited by the number of child processes you allow spamd to run - on normal MTAs this limit is in single or low double figures. To allow 'hundreds' of simultaneous tests you'd have to launch a copy of spamassassin as each message is submitted, which means also providing enough hardware to run 'hundreds' of copies of spamassassin in parallel. Experts ideas/advice/opinions/comments are appreciated. Take a good look at your existing SA installation(s) and scale the mail checking installation accordingly: its just a normal hardware sizing exercise based round an application that requires fairly significant resources (memory and cpu) to process each submitted message. Martin
Re: Stability of spamassassin command-line tool
On Thu, 2010-07-15 at 04:31 -0700, Gnanam wrote: I want to integrate SpamAssassin in my web-based application to test spam score of the email content that our application User's wish to send in mail composing page itself - even before sending. As I'm integrating SpamAssassin command-line tool in our web-based application to test spam score of the email message, hundreds of application Users may perform spam score test at the same time. My question is: Will the command-line tool spamassassin or spamc be stable/reliable enough to test hundreds of different email messages at the same time? No stability concerns with either. However, with anything other than a trivial load, do not use the plain spamassassin script, but the spamd daemon with the light-weight spamc client. The daemon is much faster and consumes less resources, because SA does not have to compile all rules and start a full Perl process each time -- unlike the spamassassin script, which does. However, you will most likely *not* be able to scan hundreds of messages at the same time. Your machine simply doesn't have the resources for that. Do you really expect them to be scanned simultaneously, at the very same moment!? A continuous, steady stream of a hundred messages every few seconds? Or are you referring to same time as in a human meaning, actually covering minutes or even more. Depending on your hardware and rules / DNS BL optimization for your specific case, scanning a message should take less than a few seconds, while some of them actually can be processed simultaneously. Think RAM. So yes, hundreds of messages per minute definitely is possible. But these are human generated messages, right? Exactly how many monkeys do you have typing, to expect anything even close to that throughput? I guess I'm still not clear on your actual intention and environment. Hope you do understand, that the receiver is likely to perform his own spam filtering, with criteria differing from yours. In particular, all those DNS BL and reputation based tests you simply cannot perform before sending the mail. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Stability of spamassassin command-line tool
Testing hundreds of different email messages at the same time is a bit excessive; ram usage, harddisk I/O bottleneck, etc... In my case if threads are more than 16 then server may become non-responsive because of virtual memory is too high. SA is not a cpu hunger application but it uses quite high memory, especially spamassassin.exe (50 MB avg./ takes 5 seconds / per session) On Thu, Jul 15, 2010 at 3:41 PM, Emin Akbulut eminakbu...@gmail.com wrote: Testing hundreds of different email messages at the same time is a bit excessive; ram usage, harddisk I/O bottleneck, etc... In my case if threads are more than 16 then server may become non-responsive because of virtual memory is too high. SA is not a cpu hunger application but it uses quite high memory, especially spamassassin.exe (50 MB avg. per session) On Thu, Jul 15, 2010 at 2:31 PM, Gnanam gna...@zoniac.com wrote: Hi, In continuation to my original posting here, http://old.nabble.com/SpamAssassin-Integration-ts28903365.html Gnanam wrote: I want to integrate SpamAssassin in my web-based application to test spam score of the email content that our application User's wish to send in mail composing page itself - even before sending. When I say mail composing page here, it is not an email client like Outlook, Outlook Express, etc. but rather it is a regular/normal web-based form with HTML editor. How do I integrate SpamAssassin for my Use Case explained above? Relevant documentation links on the same are appreciated. As I'm integrating SpamAssassin command-line tool in our web-based application to test spam score of the email message, hundreds of application Users may perform spam score test at the same time. My question is: Will the command-line tool spamassassin or spamc be stable/reliable enough to test hundreds of different email messages at the same time? Experts ideas/advice/opinions/comments are appreciated. Regards, Gnanam -- View this message in context: http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29171831.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Stability of spamassassin command-line tool
I'm posting a reply which I received from Emin Akbulut here: Testing hundreds of different email messages at the same time is a bit excessive; ram usage, harddisk I/O bottleneck, etc... In my case if threads are more than 16 then server may become non-responsive because of virtual memory is too high. SA is not a cpu hunger application but it uses quite high memory, especially spamassassin.exe (50 MB avg. per session) -- View this message in context: http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29172566.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Stability of spamassassin command-line tool
Ops sorry, I use Gmail, it stacks messages well but when I hit the Reply the message will send only the last person on thread. I have to modify To: field : ) On Thu, Jul 15, 2010 at 3:51 PM, Gnanam gna...@zoniac.com wrote: I'm posting a reply which I received from Emin Akbulut here:
Re: Stability of spamassassin command-line tool
Martin Gregorie-2 wrote: Its reliable enough, but concurrency will be limited by the number of child processes you allow spamd to run - on normal MTAs this limit is in single or low double figures. To allow 'hundreds' of simultaneous tests you'd have to launch a copy of spamassassin as each message is submitted, which means also providing enough hardware to run 'hundreds' of copies of spamassassin in parallel. Where do I limit/configure the number of child processes that spamd can run? Can you provide me documentation link for the same? Can you share with me the normal limit imposed by a typical MTA? Martin Gregorie-2 wrote: Take a good look at your existing SA installation(s) and scale the mail checking installation accordingly: its just a normal hardware sizing exercise based round an application that requires fairly significant resources (memory and cpu) to process each submitted message. If you were in my place, what would you recommend me to check with incase of SA installation? -- View this message in context: http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29172749.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Stability of spamassassin command-line tool
On Thu, 2010-07-15 at 06:09 -0700, Gnanam wrote: Martin Gregorie wrote: Its reliable enough, but concurrency will be limited by the number of child processes you allow spamd to run - on normal MTAs this limit is in single or low double figures. To allow 'hundreds' of simultaneous tests you'd have to launch a copy of spamassassin as each message is submitted, which means also providing enough hardware to run 'hundreds' of copies of spamassassin in parallel. Where do I limit/configure the number of child processes that spamd can run? Can you provide me documentation link for the same? man spamd -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Stability of spamassassin command-line tool
On 15.7.2010 16:09, Gnanam wrote: Where do I limit/configure the number of child processes that spamd can run? Can you provide me documentation link for the same? Can you share with me the normal limit imposed by a typical MTA? It depends. If you are using *nix it is dependent on the *nix brand or Linux distribution. For example in Debian it is in /etc/default/spamassassin My config: default --maxchildren ---(8)--- # NOTE: version 3.0.x has switched to a preforking model, so you # need to make sure --max-children is not set to anything higher than # 5, unless you know what you're doing. OPTIONS=--siteconfigpath=/etc/spamassassin --nouser-config -q \ -i 0.0.0.0 -A 127.0.0.1,10.0.0.0/8 -u spam -l --allow-tell ---(8)--- The config location can be seen in /etc/init.d/spamassassin (or spamd), where it reads the config settings. The settings may be in that script itself, as well. -- http://www.iki.fi/jarif/ I use PGP. If there is an incompatibility problem with your mail client, please contact me. You will be audited by the Internal Revenue Service. signature.asc Description: OpenPGP digital signature
Re: Stability of spamassassin command-line tool
Karsten Bräckelmann-2 wrote: No stability concerns with either. However, with anything other than a trivial load, do not use the plain spamassassin script, but the spamd daemon with the light-weight spamc client. The daemon is much faster and consumes less resources, because SA does not have to compile all rules and start a full Perl process each time -- unlike the spamassassin script, which does. Thanks for making me understand this important and critical difference. But why then spamassassin script should exist - just for my understanding? Karsten Bräckelmann-2 wrote: However, you will most likely *not* be able to scan hundreds of messages at the same time. Your machine simply doesn't have the resources for that. Do you really expect them to be scanned simultaneously, at the very same moment!? A continuous, steady stream of a hundred messages every few seconds? Or are you referring to same time as in a human meaning, actually covering minutes or even more. We usually benchmark any software/tool that is integrated within our web-based application for a maximum of 100 concurrent users. As you'd pointed rightly, there may be few seconds/minutes apart between each requests. Karsten Bräckelmann-2 wrote: But these are human generated messages, right? Exactly how many monkeys do you have typing, to expect anything even close to that throughput? Hope my just above reply answers this. Karsten Bräckelmann-2 wrote: I guess I'm still not clear on your actual intention and environment. Hope you do understand, that the receiver is likely to perform his own spam filtering, with criteria differing from yours. In particular, all those DNS BL and reputation based tests you simply cannot perform before sending the mail. Yes, I totally understand and agree with your view points. To increase the email deliverability of our application Users, they themselves can test spam score of the email content they wish to send. Hence, it's just a spam score test purely on the email content from the sender's point of view. Hope this makes things clear. -- View this message in context: http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29173062.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Stability of spamassassin command-line tool
Gnanam wrote: Karsten Bräckelmann-2 wrote: No stability concerns with either. However, with anything other than a trivial load, do not use the plain spamassassin script, but the spamd daemon with the light-weight spamc client. The daemon is much faster and consumes less resources, because SA does not have to compile all rules and start a full Perl process each time -- unlike the spamassassin script, which does. Thanks for making me understand this important and critical difference. But why then spamassassin script should exist - just for my understanding? Like already mentioned, Spamd needs a lot of memory and runs as a Daemon, therefore it uses some of your system resources all the time. No need for that if you're only receiving a few mails per day (and for this, we've got SpamAssassin). -- View this message in context: http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29173345.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Stability of spamassassin command-line tool
On Thu, 2010-07-15 at 06:40 -0700, Gnanam wrote: Karsten Bräckelmann wrote: No stability concerns with either. However, with anything other than a trivial load, do not use the plain spamassassin script, but the spamd daemon with the light-weight spamc client. The daemon is much faster and consumes less resources, because SA does not have to compile all rules and start a full Perl process each time -- unlike the spamassassin script, which does. Thanks for making me understand this important and critical difference. But why then spamassassin script should exist - just for my understanding? One particular good reason would be rule development, or even simple configuration changes -- and the ability to test and lint them, without restarting the daemon. A lint error in that case can have quite drastic consequences. There are more reasons. Karsten Bräckelmann wrote: However, you will most likely *not* be able to scan hundreds of messages at the same time. Your machine simply doesn't have the resources for that. Do you really expect them to be scanned simultaneously, at the very same moment!? A continuous, steady stream of a hundred messages every few seconds? Or are you referring to same time as in a human meaning, actually covering minutes or even more. We usually benchmark any software/tool that is integrated within our web-based application for a maximum of 100 concurrent users. As you'd pointed rightly, there may be few seconds/minutes apart between each requests. 100 concurrent users. How long does it take each user to write the message in average? That's the time span you want to process the messages in. Assuming the users spending minutes writing, while scanning is seconds, the actual load ist almost about entirely serial. No concurrency. After all, there are 60 seconds in a minute... Rule of thumb: 2 minutes writing, 2 seconds scanning. You can process 60 messages with a single SA child. Not even close to how I understood your original question. Something like that is possible even with very moderate hardware. And I kind of hope your users spend more than 2 minutes per message... ;) -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Stability of spamassassin command-line tool
On Thu, 2010-07-15 at 07:02 -0700, Daniel Lemke wrote: Thanks for making me understand this important and critical difference. But why then spamassassin script should exist - just for my understanding? Like already mentioned, Spamd needs a lot of memory and runs as a Daemon, therefore it uses some of your system resources all the time. No need for that if you're only receiving a few mails per day (and for this, we've got SpamAssassin). Heh, that came out wrong. :) Nitpicking. You meant spamassassin there (as in the script's name). SpamAssassin is much more, includes the daemon, and we do *not* have it just for a few mails per day. ;) -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Stability of spamassassin command-line tool
Karsten Bräckelmann-2 wrote: On Thu, 2010-07-15 at 07:02 -0700, Daniel Lemke wrote: Thanks for making me understand this important and critical difference. But why then spamassassin script should exist - just for my understanding? Like already mentioned, Spamd needs a lot of memory and runs as a Daemon, therefore it uses some of your system resources all the time. No need for that if you're only receiving a few mails per day (and for this, we've got SpamAssassin). Heh, that came out wrong. :) Nitpicking. You meant spamassassin there (as in the script's name). SpamAssassin is much more, includes the daemon, and we do *not* have it just for a few mails per day. ;) Errm, sorry for that :P -- View this message in context: http://old.nabble.com/Stability-of-spamassassin-command-line-tool-tp29171831p29173685.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.