Hello Pi-Web, Friday, November 2, 2007, 5:04:47 PM, you wrote:
> The SNFserver.exe is present on the task list, so it will not automatic > restart. > "ERROR" in todays log: <snip/> > <e u='20071102100835' context='SNF_NETWORK' code='99' > text='ERROR_SYNC_FAILED'/> <e u='20071102100956' > context='SNF_NETWORK' code='99' text='ERROR_SYNC_FAILED'/> <e > u='20071102113453' context='SNF_NETWORK' code='99' > text='ERROR_SYNC_FAILED'/> <snip/> The ERROR_SYNC_FAILED errors are caused by network congestion between your systems and ours. Ping times are well above 120ms at the moment, for example. I note that there are periods of time when there is no trouble making the connection and your current telemetry also looks good so we can ignore that error for the time being. Your latest SYNC took only 290ms and occurred with no retries. Here is my telemetry on that: <session-data time="290" standby="15" cycles="6" sent="424" received="1930" comms="Completed Ok" success="true"> > <s u='20071102131619' m='C:\Program > Files\Merak\temp\200711021416181623.tmp' code='72' > error='ERROR_MAX_EVALS'/> The above scan <s/> failed due to too many evaluators. > <s u='20071102174238' m='c:\...tmp' code='69' error='ERROR_MSG_FILE'/> > ... cut a lot... > <s u='20071102174358' m='c:\...tmp' code='69' error='ERROR_MSG_FILE'/> ERROR_MSG_FILE indicates that the SNFServer program was unable to open or read the file. Something must have removed it before it could be processed. This error is unrelated to the SYNC and MAX_EVALS errors. I also noted that the SYNC errors do not seem to coincide closely with the MSG_FILE errors. For now we will need to treat all three as separate cases. On some systems we have found cases where the system becomes so busy that scans take too long and are then cancelled before they are complete. This condition might account for some of the MSG_FILE errors. Is there a timeout on the mechanism that calls the SNFClient? If there is, then we might be able to mitigate the ERROR_MSG_FILE condition by extending that timeout. Considering the SYNC errors -- they are not critical because the SNF engine will tolerate them provided it is able to make a connection most of the time. When a connection is made and the SYNC session is successful then all of the data from previously unsuccessful sessions is transferred in the process. > "<p" before/after max_evals: - What does that tell me? > <p s='0' t='411' l='34867' d='72'/> The <p/> element always "belongs to" an <s/> element. An <s/> element represents a single message scan. The <p/> element describes the system's performance during that scan. In the case of the <p/> element above, it took 0ms to setup the scan (read the file etc) and then took 411ms to perform the scan. This would usually indicate that your system is CPU bound. Normally an SNF scan will take a very short time. This one took almost half a second. The l indicates the length of the message scan in bytes and the d indicates the scan depth. That is, the maximum number of evaluators that were alive during the scan. > ... > <s u='20071102131619' m='C:\Program > Files\Merak\temp\200711021416181623.tmp' code='72' > error='ERROR_MAX_EVALS'/> > ... > <p s='0' t='80' l='4140' d='93'/> The <p/> element here does not belong to the <s/> element. It belongs to a different scan. Once the <s/> element closes (with </s>) anything after that point belongs to a different event. --- I don't have any other reports of MAX_EVAL errors. That doesn't mean that they are not out there, but it does mean that they are not usually a problem for other folks. I'm not sure what can be causing your SNFServer to crash -- it should not be MAX_EVAL errors. They are handled safely by the code according to what I've seen so far in my search. None the less, I will be increasing the max eval setting in the next release and I will push it out sooner rather than later. Since you have reported this problem I won't wait for the other features before pushing out beta 1.6. If I can get to it tonight I will. In the mean time, do you have any idea what might be causing your CPU to be so heavily loaded that your SNF scans are taking 400+ milliseconds? Do you have many <p/> records that show high t values like that? (I do see the 80 that you reported above. That's on the high end of normal). Your telemetry shows about 10 msg/minute on average, 90% capture. This seems a low number for such high scan times. In contrast, I have a generic single CPU server that is currently showing 400-500 msg/minute w/ times in the 20-30ms range consistently. Hope this helps, Thanks, _M -- Pete McNeil Chief Scientist, Arm Research Labs, LLC. ############################################################# This message is sent to you because you are subscribed to the mailing list <[email protected]>. To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>
