On Wednesday, March 16, 2005, 9:01:34 AM, Nick wrote: NM> Pete
NM> OK, I now have much more information on this problem with NM> Declude/Sniffer/SmarterMail. NM> It seems the current version of Declude does not have an Overflow Directory NM> for SmarterMail, which therefore allows unlimited Declude processes to be NM> spawned at any time. At our peak we were seeing a surge of more than 1,000 NM> declude.exe instances running at the same time! This of course flattened the NM> server, and seems the reason why Sniffer was dropping out of its perpetual NM> mode, unfortunately compounding the problem when the server had least NM> resources. Thanks for this --- I think this proves one of my theories on the problem. This is what I think happened to SNF... A burst of 1000 or more active requests arrives which is more than the server can really handle at one time. The persistent server queues (internally) all of the jobs and begins processing them all at once. As a result, it does not report to the .stat file for an extended period. Also, due to the very large number of jobs, many of the client instances do not hear back from the server instance until their maximum wait time has expired. As a result, they abandon the wait and begin processing the messages themselves, each one now loading the rulebase. The clients loading the rulebase individually further slows the server and accelerates the problem until an insurmountable backlog is in place. Once the backlog is cleared (requiring manual intervention) the persistent instance (which has not died) is able to respond to client requests quickly enough so that they do not "give up" and so the system operates normally. --- If this fits the profile (and I think it does) then it would be possible to adjust SNF with an alternate fail-safe mode which would solve the problem at the expense of letting more spam through. Specifically, once a particular threshold is reached then the clients would abandon a job and fail safe rather than loading the rulebase and processing the message. This would cause spam to "leak" but it would also provide for a more rapid, automatic recovery. Think of it as a safety pressure valve on a boiler. It's not a great solution, but it's better than an "explosion". The best answer is proper overflow control and since Declude is working on that I'm inclined to let that solution develop. Not only because it's a better solution, but also because the "safety valve" mechanism on SNF doesn't solve the whole problem - for example virus scanning would still potentially cause the same problem along with other things that might be running in Declude. Thanks Nick! _M This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html
