Pete,
I have left all of those processes active for troubleshooting, and they
are still there and definitely Sniffer. Process Explorer even shows
what command line the executable was run with so I was able to do some
digging in the logs for specifics.
I found that Declude was recording errors related to Sniffer, and
Sniffer was not logging the messages or associated events at all.
Here's what Declude is showing:
06/14/2007 09:11:01.665 q3d2b00d400009610.smd ERROR: External
program SNIFFER-IP didn't finish quick enough; terminating.
06/14/2007 09:11:01.665 q3d2b00d400009610.smd Couldn't get external
program exit code
Now it could be that Declude is causing issues by trying to terminate
Sniffer.
FYI, I don't have a stack up of any additional files in my Sniffer
directory, and the service is started and everything seems fine outside
of this one period of time when my server got wallopped. What happened
was that one customer with over 1,000 addresses received 950 E-mails
from a ConstantContact customer in spam run from harvested addresses.
They can deliver fast enough that it likely stressed my system for a
moment, and triggered this behavior. It could also be that the content
of these messages caused an issue with Sniffer. My server has 8 cores
in it, and if it reached 100% CPU, it only did so for a moment in time.
This very likely could be associated with heap issues, but I did double
my heap memory the other day, and normally it doesn't cause processes
like this to just hang in the background doing nothing. That other
application that hung about 10 times during this period is what suggests
that it could be a heap issue because I know that app to be the first to
go under stress (it is not a service). That app does an average of over
50 DNS lookups and has a lot more latency than Sniffer does, so it is
remarkable that Sniffer hung 100 times and that app only hung 10 times.
That suggests to me that maybe something better could be done in terms
of cleaning up these processes.
I'll keep the server in this state until the evening in the event that
you want to take a look at it.
Thanks,
Matt
Pete McNeil wrote:
Hello Matt,
Thursday, June 14, 2007, 12:44:32 PM, you wrote:
<snip/>
>
I also had about 10 errors waiting to be cleared from another
application, but probably because of the way that Sniffer works (as a
service or something related), the Sniffer processes are just hung
without a prompt. I saw this last week also.
I have Declude set for 200 processes, so it probably reached 300 when
the first 100 hung, and then it stayed with those 100 hung. Is there
anything that can be done in Sniffer to kill off these hung processes
in an automated and proactive manner? I recently upgraded to the
latest version and I was probably a version or two behind, and I don't
recall this happening before.
It seems very unlikely that SNF instances would be hung -- they will
either time-out themselves or be killed off by Declude. Please let us
know if there are any errors in your SNF log.
Also - check the SNF working directory to make sure you don't have a
lot of old job files hanging around. That can cause SNF instances to
relax their timing based on the assumption there is a high system load
-- with relaxed timing they will stay around longer waiting for results.
If you find that you do have a lot of old job files hanging around
then you should clean them out to get things going normally again.
Stop SMTP
Wait for all jobs to finish
Stop your persistent instance
Remove all left-over job files (QUE, WRK, FIN, ABT, XXX, SVR)
Restart your persistent instance
Restart SMTP
Also, presuming you have a persistent instance - make sure that is
still running. If that had failed for some reason then you might be
running now in peer-server mode which will be a bit slower than
persistent mode.
Hope this helps,
_M
--
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.
#############################################################
This message is sent to you because you are subscribed to
the mailing list <sniffer@sortmonster.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to <[EMAIL PROTECTED]>