A while back we had discussed differentiating between operating system
errors and actual check result.  Is this something that might be planned
provided the symantecs can be worked out.

Jason Passow
Mississippi Welders Supply
[EMAIL PROTECTED]
ph: (507) 494-5178
fax: (507) 454-8104

"If you do everything right, nobody will realize you've done anything at all."



Dirk Bulinckx wrote:
> If a check is not OK, then it's down, that's indeed the way it works.  The
> "down" word should be interpreted as being " no confirmation of it being
> UP". 
> Within the alert using the %e parameter will show the reason of the down.
> Then you could see (for example) that an NT service is seen as being down
> because of "Access Denied" (meaning that at the time of the check we got a
> access denied back from the OS and as such can't confirm that the service is
> running).
>
>
> Dirk. 
>
> -----Original Message-----
> From: Stephen Ryan [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, April 11, 2006 8:45 AM
> To: Servers Alive Discussion List
> Subject: [SA-list] False positives
>
>
> One frustration I have is that checks report a DOWN status if they
> experience any error whatsoever, a timeout for exapmle. One example is the
> "CountFiles" external COM check, which checks for the existence of certain
> files, and in my case should alarm if any (count > 0) files are found.
>
> What it does is alarm if the files are "not not found", i.e. if for any
> reason it can't count the files it thinks "Aha! DOWN" and also sends an
> alarm. NTProcess checks seem to be like this too, if they fail to disprove
> the negative they interpret this as a positive and send an alarm, rather
> than handle the error (timeout, logon failure, etc).
>
> Sometimes I don't have a couple of weeks to run a new check in the test
> envinronment before it is needed in production - any ideas on how to avoid
> false positives with CountFiles, or even generally? 
>
> To pick up on yesterday's thread of ideas around alarm management etc in
> future versions, it might be useful to make communicating via the same alarm
> mechanism possible. False positives have a large "crying wolf"
> impact on the credibility of alarms, which reduces the reaction of the team
> and the effectiveness of Servers Alive. Sometimes I rely more on external
> scripts returning errorlevels, which I can tune more finely, even if the
> check type is built in. 
>
> //Steve
> To unsubscribe send a message with UNSUBSCRIBE as subject to
> [email protected]
> To unsubscribe send a message with UNSUBSCRIBE as subject to 
> [email protected]
>
>
To unsubscribe send a message with UNSUBSCRIBE as subject to [email protected]

Reply via email to