[sniffer] Re: XCI Error!: snf_EngineHandler::MaxEvals

Pete McNeil Fri, 02 Nov 2007 14:08:05 -0800

Hello Pi-Web,

Friday, November 2, 2007, 5:04:47 PM, you wrote:


> The SNFserver.exe is present on the task list, so it will not automatic 
> restart.

> "ERROR" in todays log:

<snip/>

> <e u='20071102100835' context='SNF_NETWORK' code='99'
> text='ERROR_SYNC_FAILED'/> <e u='20071102100956'
> context='SNF_NETWORK' code='99' text='ERROR_SYNC_FAILED'/> <e
> u='20071102113453' context='SNF_NETWORK' code='99'
> text='ERROR_SYNC_FAILED'/>

<snip/>

The ERROR_SYNC_FAILED errors are caused by network congestion between
your systems and ours. Ping times are well above 120ms at the moment,
for example. I note that there are periods of time when there is no
trouble making the connection and your current telemetry also looks
good so we can ignore that error for the time being.

Your latest SYNC took only 290ms and occurred with no retries. Here is
my telemetry on that:

<session-data time="290" standby="15" cycles="6" sent="424"
received="1930" comms="Completed Ok" success="true">

> <s u='20071102131619' m='C:\Program
> Files\Merak\temp\200711021416181623.tmp' code='72' 
> error='ERROR_MAX_EVALS'/>

The above scan <s/> failed due to too many evaluators.

> <s u='20071102174238' m='c:\...tmp' code='69' error='ERROR_MSG_FILE'/>
> ... cut a lot...
> <s u='20071102174358' m='c:\...tmp' code='69' error='ERROR_MSG_FILE'/>

ERROR_MSG_FILE indicates that the SNFServer program was unable to open
or read the file. Something must have removed it before it could be
processed. This error is unrelated to the SYNC and MAX_EVALS errors.

I also noted that the SYNC errors do not seem to coincide closely with
the MSG_FILE errors. For now we will need to treat all three as
separate cases.

On some systems we have found cases where the system becomes so busy
that scans take too long and are then cancelled before they are
complete. This condition might account for some of the MSG_FILE
errors.

Is there a timeout on the mechanism that calls the SNFClient?
If there is, then we might be able to mitigate the ERROR_MSG_FILE
condition by extending that timeout.

Considering the SYNC errors -- they are not critical because the SNF
engine will tolerate them provided it is able to make a connection
most of the time. When a connection is made and the SYNC session is
successful then all of the data from previously unsuccessful sessions
is transferred in the process.

> "<p" before/after max_evals: - What does that tell me?

> <p s='0' t='411' l='34867' d='72'/>

The <p/> element always "belongs to" an <s/> element. An <s/> element
represents a single message scan. The <p/> element describes the
system's performance during that scan.

In the case of the <p/> element above, it took 0ms to setup the scan
(read the file etc) and then took 411ms to perform the scan. This
would usually indicate that your system is CPU bound. Normally an SNF
scan will take a very short time. This one took almost half a second.

The l indicates the length of the message scan in bytes and the d
indicates the scan depth. That is, the maximum number of evaluators
that were alive during the scan.

> ...
> <s u='20071102131619' m='C:\Program
> Files\Merak\temp\200711021416181623.tmp' code='72' 
> error='ERROR_MAX_EVALS'/>
> ...
>         <p s='0' t='80' l='4140' d='93'/>

The <p/> element here does not belong to the <s/> element. It belongs
to a different scan.

Once the <s/> element closes (with </s>) anything after that point
belongs to a different event.

---

I don't have any other reports of MAX_EVAL errors. That doesn't mean
that they are not out there, but it does mean that they are not
usually a problem for other folks.

I'm not sure what can be causing your SNFServer to crash -- it should
not be MAX_EVAL errors. They are handled safely by the code according
to what I've seen so far in my search.

None the less, I will be increasing the max eval setting in the next
release and I will push it out sooner rather than later. Since you
have reported this problem I won't wait for the other features before
pushing out beta 1.6. If I can get to it tonight I will.

In the mean time, do you have any idea what might be causing your CPU
to be so heavily loaded that your SNF scans are taking 400+
milliseconds?

Do you have many <p/> records that show high t values like that? (I do
see the 80 that you reported above. That's on the high end of normal).

Your telemetry shows about 10 msg/minute on average, 90% capture. This
seems a low number for such high scan times. In contrast, I have a
generic single CPU server that is currently showing 400-500 msg/minute
w/ times in the 20-30ms range consistently.

Hope this helps,

Thanks,

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#############################################################
This message is sent to you because you are subscribed to
  the mailing list <[email protected]>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

[sniffer] Re: XCI Error!: snf_EngineHandler::MaxEvals

Reply via email to