How about running performance monitor to watch disk I/O, mem, cpu,
page file, etc. over time in the hopes of catching one of the events?
Darin.
*From:* Matt <mailto:for...@mailpure.com>
*Sent:* Friday, June 28, 2013 12:10 PM
*To:* Message Sniffer Community <mailto:sniffer@sortmonster.com>
*Subject:* [sniffer] Re: Slow processing times, errors
Pete,
I'm near positive that it's not system resources that are causing
Sniffer to not be able to _access the files_. I believe these errors
are a symptom and not the cause.
You have to keep in mind that on the messages that don't throw errors,
they were taking 30-90 seconds to scan, but immediately after a
restart it was under 1 second. The system stayed the same, it was
just the state of the service that was off in a bad way.
I did add a larger client about a month ago around the time that this
started, which did inch up load by between 1% and 5% I figure, but I
can't say for sure that the two things are connected. I've seen much
bigger changes however in spam volumes from single spammers. I have
looked at my SNFclient.exe.err log and found that the previous
slowdowns were all represented in this file, and nothing else really
since a smattering in 2012 of other stuff. I believe that I/O could
be the trigger, or general system load, but the error in the service
that misses opening some files, and is otherwise slower than normal by
100 times, will persist when everything else is fine again. I figure
that this is all triggered by a short-term lack of resources or a
killer message type of issue that does something like run away with
memory. Certainly there were no recent changes on the server prior to
this starting to happen, including Sniffer itself which has been
perfectly solid up until 5/22.
Regarding the ERROR_MSG_FILE batch that I sent you in that log, it did
happen exactly when I restarted Sniffer, and in fact the
SNFclient.exe.err log showed a different error while this was
happening, and maybe this will point you to something else? That log
says "Could Not Connect!" when the regular Sniffer log shows
"ERROR_MSG_FILE" about 1/8th of the time while in a bad state. When I
restarted the Sniffer service, the regular log showed a bunch of
"ERROR_MSG_FILE" in a row, but the SNFclient.exe.err log below shows
"XCI Error!: FileError snf_EngineHandler::scanMessageFile()
Open/Seek". You can match the message ID's with the other log that I
provided. I believe that block of messages was already called to
SNFclient.exe, but the Sniffer service haddn't yet responded, and so
they were dumped as a batch into both logs during shut down of the
service.
20130627183807, arg1=F:\\proc\work\D862600e600004269.smd : Could
Not Connect!
20130627183808, arg1=F:\\proc\work\D864401770000431f.smd : Could
Not Connect!
20130627183808, arg1=F:\\proc\work\D861200ce000041ce.smd : Could
Not Connect!
20130627183809, arg1=F:\\proc\work\D8644017300004321.smd : Could
Not Connect!
20130627183809, arg1=F:\\proc\work\D861400da000041e3.smd : Could
Not Connect!
20130627183810, arg1=F:\\proc\work\D862600d70000425f.smd : Could
Not Connect!
20130627183811, arg1=F:\\proc\work\D864a00e900004346.smd : Could
Not Connect!
20130627183811, arg1=F:\\proc\work\D8615019b000041f4.smd : Could
Not Connect!
20130627183813, arg1=F:\\proc\work\D862900e900004282.smd : Could
Not Connect!
20130627183815, arg1=F:\\proc\work\D863d015800004306.smd : Could
Not Connect!
20130627183817, arg1=F:\\proc\work\D860301580000416f.smd : Could
Not Connect!
20130627183818, arg1=F:\\proc\work\D862300e900004255.smd : Could
Not Connect!
20130627183819, arg1=F:\\proc\work\D862900e600004281.smd : Could
Not Connect!
20130627183819, arg1=F:\\proc\work\D864b00d700004357.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D864800d70000433c.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8619017300004205.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D861d017700004230.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8641016d00004310.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865000e600004363.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865000e100004361.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D85fe00e600004152.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8610016d000041c5.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D861f00ce00004231.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8644018b00004320.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86150173000041ef.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D864100da00004314.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8644009500004329.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8633016d000042c1.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D861200da000041d1.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865700d700004376.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D862c01770000429a.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86360170000042ce.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86360183000042d0.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8655019700004373.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D863600e9000042cd.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86350177000042c6.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D862500ce0000425b.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D864400da00004327.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8644016d00004325.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D864800e10000433d.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D862600e10000426e.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865c00e600004391.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8638018b000042e1.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865d009500004394.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8628017300004273.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8657018000004378.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86390177000042eb.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865d00d700004396.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8659015800004388.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865700e10000437b.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D863500ce000042c7.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D864a015800004348.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D864800e600004343.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865b01970000438f.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865c016d00004392.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865b00e10000438c.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865b00e90000438b.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D864a017d00004347.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D86380180000042df.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D861901770000420d.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865700950000437a.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D8629017700004280.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D8638016d000042db.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D862b00e600004292.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D86380173000042e0.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865d018000004393.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D86310177000042b1.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D85b100e300003f7f.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
I am looking to retool presently just because it's time. So if you
are convinced that this is due to low resources, don't concern
yourself with it.
Matt
On 6/28/2013 10:36 AM, Pete McNeil wrote:
On 2013-06-27 20:01, Matt wrote:
I'm attaching a snippet of my log. About 100 lines past the start,
where you see a smattering of error messages, you then see a large
block of them while the Sniffer service is restarting, and then
after that no errors at all. There have in fact been no errors at
all in several hours since this restart of Sniffer.
I can promise you that the error you're reporting comes directly from
a problem with the file system. Here is the code where that error is
generated. Put simply the code tries to open the file and determine
it's size. If that doesn't work it throws the ERROR_MSG_FILE
exception in one of two forms -- that's what ends up in the log.
| *try*
{
/// Try opening the message file.
/|/||/|
MessageFile.open(MessageFilePath.c_str(), ios::in | ios::binary);
/// Open the file, binary mode.
/|/||/|
MessageFile.seekg(0, ios::end);
/// Find the end of the file,
/|/||/|
MessageFileSize = MessageFile.tellg();
/// read that position as the size,
/|/||/|
MessageFile.seekg(0, ios::beg);
/// then go back to the beginning.
/|/||/|
MyScanData.ScanSize = MessageFileSize;
/// Capture the message file size.
/|/||/| }
|| *catch*(...)
{/// Trouble? Throw FileError.
/|/||/|
MyRulebase->MyLOGmgr.logThisError(
/// Log the error.
/|/||/|
MyScanData,*"scanMessageFile().open"*,
|| snf_ERROR_MSG_FILE,*"ERROR_MSG_FILE"*
|| );
|| *throw*
FileError(*"snf_EngineHandler::scanMessageFile()
Open/Seek"*);
|| }
||
|| *if*(0 >=
MessageFileSize) {/// Handle zero length files.
/|/||/|
MessageFile.close();
/// No need to keep this open.
/|/||/|
MyRulebase->MyLOGmgr.logThisError(
/// Log the error.
/|/||/|
MyScanData,*"scanMessageFile().isFileEmpty?"*,
|| snf_ERROR_MSG_FILE,*"ERROR_MSG_FILE"*
|| );
|| *throw*
FileError(*"snf_EngineHandler::scanMessageFile()
FileEmpty!"*);
|| }
|
Another clue is that in the log snippet you provide, there are hints
of a problem brewing when there are sporadic instances of this error.
Then, when there is a large block -- virtually all requests to open
the files for scan are rejected by the OS. Either something made
those files unavailable, or the OS was unable to handle the request.
I find it interesting also that the time required to report the error
started at about 172 milliseconds and continued to climb to 406, 578,
and then 656 before the restart.
SNF does not make log entries in the classic log during a restart, by
the way.
Note also the timestamps associated with these events and you can see
that the event was precipitated by a dramatic rise in message rates.
The first part of your log seems to indicate about 7-10 messages per
second. During the large block of errors, the message rate appears to
have been in excess of 120 (I counted approximately 126 at timestamp
20130627183819). That's an increase at least an order of magnitude
higher than the rate that was causing sporadic errors.
I suspect based on the data you have provided that something on your
system generated a very large spike of activity that your IO
subsystem was unable to manage and this caused snf scans to fail
because snf was unable to open the files it was asked to scan.
Your restart of SNF apparently coincided with the event, but since
all of the SMD file names are unique during the event, and since SNF
has no way to generate scan requests on it's own, SNF does not appear
to have been the cause of the event in any way. It was able to record
the event, none the less.
So the question in my mind now is:
* Is there a way to improve your IO subsystem so that it can gain
some headroom above 10 msg/sec?
* What caused the sudden dramatic spike that led to this episode?
On most tiny systems I monitor, scan times are generally < 100 ms. On
your system they are frequently in excess of 400 ms which leads me to
believe your system is a bit underpowered for it's current load.
Hope this helps,
_M
--
Pete McNeil
Chief Scientist
ARM Research Labs, LLC
www.armresearch.com
866-770-1044 x7010
twitter/codedweller
#############################################################
This message is sent to you because you are subscribed to
the mailing listmailto:sniffer@sortmonster.com.
This list is for discussing Message Sniffer,
Anti-spam, Anti-Malware, and related email topics.
For More information seehttp://www.armresearch.com
To unsubscribe, E-mail to:mailto:sniffer-...@sortmonster.com
To switch to the DIGEST mode, E-mail tomailto:sniffer-dig...@sortmonster.com
To switch to the INDEX mode, E-mail tomailto:sniffer-in...@sortmonster.com
Send administrative queries tomailto:sniffer-requ...@sortmonster.com