On Wednesday, March 16, 2005, 9:01:34 AM, Nick wrote:

NM> Pete

NM> OK, I now have much more information on this problem with
NM> Declude/Sniffer/SmarterMail.

NM> It seems the current version of Declude does not have an Overflow Directory
NM> for SmarterMail, which therefore allows unlimited Declude processes to be
NM> spawned at any time. At our peak we were seeing a surge of more than 1,000
NM> declude.exe instances running at the same time! This of course flattened the
NM> server, and seems the reason why Sniffer was dropping out of its perpetual
NM> mode, unfortunately compounding the problem when the server had least
NM> resources.

Thanks for this --- I think this proves one of my theories on the
problem.

This is what I think happened to SNF... A burst of 1000 or more active
requests arrives which is more than the server can really handle at
one time.

The persistent server queues (internally) all of the jobs and begins
processing them all at once. As a result, it does not report to the
.stat file for an extended period.

Also, due to the very large number of jobs, many of the client
instances do not hear back from the server instance until their
maximum wait time has expired. As a result, they abandon the wait and
begin processing the messages themselves, each one now loading the
rulebase.

The clients loading the rulebase individually further slows the server
and accelerates the problem until an insurmountable backlog is in
place.

Once the backlog is cleared (requiring manual intervention) the
persistent instance (which has not died) is able to respond to client
requests quickly enough so that they do not "give up" and so the
system operates normally.

---

If this fits the profile (and I think it does) then it would be
possible to adjust SNF with an alternate fail-safe mode which would
solve the problem at the expense of letting more spam through.
Specifically, once a particular threshold is reached then the clients
would abandon a job and fail safe rather than loading the rulebase and
processing the message. This would cause spam to "leak" but it would
also provide for a more rapid, automatic recovery.

Think of it as a safety pressure valve on a boiler.

It's not a great solution, but it's better than an "explosion".

The best answer is proper overflow control and since Declude is
working on that I'm inclined to let that solution develop. Not only
because it's a better solution, but also because the "safety valve"
mechanism on SNF doesn't solve the whole problem - for example virus
scanning would still potentially cause the same problem along with
other things that might be running in Declude.

Thanks Nick!

_M




This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html

Reply via email to