On 8438 "<t" today we got a average T=111,1176819
Min=0, Max=7211. (57 scans took above 1000, 6384 scans took less than 101).

The server is rather old and serving both web mail, pop3 and smtp.
And heavy usage of web mail does slow it down. This might be the case on the 
slow scans.

The long scans is not at the same time, but from time to time during the day.

Still this should not "lock up" snfserver.

To call snf we use a dll of own development (pluged in to Merak mail server).

The call to snfclient is done using a: WaitforSingleObject with INFINITE wait 
time.
(perhaps we should change this).

When it finish - and it does - we get the snf result using GetExitCodeProcess.
This return zero (whitch is good, else all messages would be rejected) when the
snfserver is in the "Could Not Connect!" state.




Friday, November 2, 2007, 5:04:47 PM, you wrote:

The SNFserver.exe is present on the task list, so it will not automatic restart.

"ERROR" in todays log:

<snip/>

<e u='20071102100835' context='SNF_NETWORK' code='99'
text='ERROR_SYNC_FAILED'/> <e u='20071102100956'
context='SNF_NETWORK' code='99' text='ERROR_SYNC_FAILED'/> <e
u='20071102113453' context='SNF_NETWORK' code='99'
text='ERROR_SYNC_FAILED'/>

<snip/>

The ERROR_SYNC_FAILED errors are caused by network congestion between
your systems and ours. Ping times are well above 120ms at the moment,
for example. I note that there are periods of time when there is no
trouble making the connection and your current telemetry also looks
good so we can ignore that error for the time being.

Your latest SYNC took only 290ms and occurred with no retries. Here is
my telemetry on that:

<session-data time="290" standby="15" cycles="6" sent="424"
received="1930" comms="Completed Ok" success="true">

<s u='20071102131619' m='C:\Program
Files\Merak\temp\200711021416181623.tmp' code='72' error='ERROR_MAX_EVALS'/>

The above scan <s/> failed due to too many evaluators.

<s u='20071102174238' m='c:\...tmp' code='69' error='ERROR_MSG_FILE'/>
... cut a lot...
<s u='20071102174358' m='c:\...tmp' code='69' error='ERROR_MSG_FILE'/>

ERROR_MSG_FILE indicates that the SNFServer program was unable to open
or read the file. Something must have removed it before it could be
processed. This error is unrelated to the SYNC and MAX_EVALS errors.

I also noted that the SYNC errors do not seem to coincide closely with
the MSG_FILE errors. For now we will need to treat all three as
separate cases.

On some systems we have found cases where the system becomes so busy
that scans take too long and are then cancelled before they are
complete. This condition might account for some of the MSG_FILE
errors.

Is there a timeout on the mechanism that calls the SNFClient?
If there is, then we might be able to mitigate the ERROR_MSG_FILE
condition by extending that timeout.

Considering the SYNC errors -- they are not critical because the SNF
engine will tolerate them provided it is able to make a connection
most of the time. When a connection is made and the SYNC session is
successful then all of the data from previously unsuccessful sessions
is transferred in the process.

"<p" before/after max_evals: - What does that tell me?

<p s='0' t='411' l='34867' d='72'/>

The <p/> element always "belongs to" an <s/> element. An <s/> element
represents a single message scan. The <p/> element describes the
system's performance during that scan.

In the case of the <p/> element above, it took 0ms to setup the scan
(read the file etc) and then took 411ms to perform the scan. This
would usually indicate that your system is CPU bound. Normally an SNF
scan will take a very short time. This one took almost half a second.

The l indicates the length of the message scan in bytes and the d
indicates the scan depth. That is, the maximum number of evaluators
that were alive during the scan.

...
<s u='20071102131619' m='C:\Program
Files\Merak\temp\200711021416181623.tmp' code='72' error='ERROR_MAX_EVALS'/>
...
        <p s='0' t='80' l='4140' d='93'/>

The <p/> element here does not belong to the <s/> element. It belongs
to a different scan.

Once the <s/> element closes (with </s>) anything after that point
belongs to a different event.

---

I don't have any other reports of MAX_EVAL errors. That doesn't mean
that they are not out there, but it does mean that they are not
usually a problem for other folks.

I'm not sure what can be causing your SNFServer to crash -- it should
not be MAX_EVAL errors. They are handled safely by the code according
to what I've seen so far in my search.

None the less, I will be increasing the max eval setting in the next
release and I will push it out sooner rather than later. Since you
have reported this problem I won't wait for the other features before
pushing out beta 1.6. If I can get to it tonight I will.

In the mean time, do you have any idea what might be causing your CPU
to be so heavily loaded that your SNF scans are taking 400+
milliseconds?

Do you have many <p/> records that show high t values like that? (I do
see the 80 that you reported above. That's on the high end of normal).

Your telemetry shows about 10 msg/minute on average, 90% capture. This
seems a low number for such high scan times. In contrast, I have a
generic single CPU server that is currently showing 400-500 msg/minute
w/ times in the 20-30ms range consistently.

Hope this helps,

Thanks,

_M



--
Mvh. Frank Jensen
[EMAIL PROTECTED]
www.pi.dk


------------------------------------------------------------
Imponerende, fascinerende og kæmpe
Plakater f.eks. 149 x 149 = 629 kr
Vi kan også lave plakat fra dit digitale foto

www.plakatkunst.dk



#############################################################
This message is sent to you because you are subscribed to
 the mailing list <sniffer@sortmonster.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

Reply via email to