That is correct. Am I asking too much?
Regards, Brett From: Servers Alive Discussion List [mailto:[email protected]] On Behalf Of Dirk Bulinckx Sent: Monday, December 02, 2013 1:31 PM To: Servers Alive Discussion List Subject: RE: [SA-list] Enhancement request - differentiate between down and timeout So you want to be able to set per entry if a timeout should be seen as down or as unavailable AND you want the alerting engine to be able to alert after <number> of unavalaible too? Dirk Bulinckx Servers Alive - http://www.woodstone.nu (http://www.woodstone.nu/) StellarDNS (DNS Hosting) - http://www.stellardns.com (http://www.stellardns.com/) -------------------------------------------------------------------------------- From: Servers Alive Discussion List [mailto:[email protected] (mailto:[email protected])] On Behalf Of Hanson, Brett Sent: Monday, December 02, 2013 9:01 PM To: Servers Alive Discussion List Subject: RE: [SA-list] Enhancement request - differentiate between down and timeout The interesting thing about Servers Alive for me is that it is used primarily to determine if something is UP. I have very few checks where I am specifically looking for something bad – for example, response contains “backup failed”. Nearly all of my checks are watching for a success indicator, and make the assumption that not getting a success indicator implies a course of action. For example, a service not giving a definite up response implies the service should be restarted. But, what should you do when you want Servers Alive to automate responses only for the definitely down cases, such as when restarting a service that wasn’t actually down will terminate sessions for hundreds of clients? Having the ability to exclude timeouts isn’t a complete solution, but it does address the number one false alarm that we get. I would be perfectly happy with your proposed solution of calling timeouts unavailable, provided that I can specify on the check that timeout should mean unavailable. I would like to have the ability to specify that for a particular check that a timeout is definitely a down, and for others to specify that timeouts happen, and doesn’t necessarily mean anything. I would also agree that if I got 100 timeouts in a row, that there is definitely something wrong that needs attention, although 100 seems a little high. Ideally, that number could also be configured by check. Regards, Brett From: Servers Alive Discussion List [mailto:[email protected] (mailto:[email protected])] On Behalf Of Dirk Bulinckx Sent: Monday, December 02, 2013 12:01 PM To: Servers Alive Discussion List Subject: RE: [SA-list] Enhancement request - differentiate between down and timeout The main issue is that people see the word "down" and think that it means down (hmm why would that be :-)), while it means we can't say that it's UP. For example an NT SERVICE check gives a DOWN while the system is not down, it's saying that the service is not running, and HTTP check that gives a 404 result (while you would expect a 200 OK), will also show a down, while the webserver is not down. There aren't many options in the status that a check can have (at the end of a cycle) so up/down/maintenance/unchecked/unavailable. What *could* be an idea is to be able to force a timeout as an unavailable, but you should then know that an unavailable will NOT generate an aler (except with the status change option)! And if you get the unavailable (timeout :-)) for 100 cycles, then maybe you do want to know it :-) Dirk Bulinckx Servers Alive - http://www.woodstone.nu (http://www.woodstone.nu/) StellarDNS (DNS Hosting) - http://www.stellardns.com (http://www.stellardns.com/) -------------------------------------------------------------------------------- From: Servers Alive Discussion List [mailto:[email protected] (mailto:[email protected])] On Behalf Of Hanson, Brett Sent: Monday, December 02, 2013 7:01 PM To: Servers Alive Discussion List Subject: RE: [SA-list] Enhancement request - differentiate between down and timeout I absolutely agree that in some circumstances, a timeout is a down. I’m sure you would also agree that in some circumstances, a timeout is not necessarily a down. My preference is that the person configuring Servers Alive decide what to do with a timeout and what to do with a down. Perhaps a checkbox on the alert screen would make configuring timeouts versus down easier? Regards, Brett From: Servers Alive Discussion List [mailto:[email protected] (mailto:[email protected])] On Behalf Of Dirk Bulinckx Sent: Saturday, November 30, 2013 1:31 AM To: Servers Alive Discussion List Subject: RE: [SA-list] Enhancement request - differentiate between down and timeout I understand the request, but as you already said it yoursefl the feature you want is already in the product. Also how to dfiferenciate between a timeout that is a real timeout and a timeout that is a real down? For example in some cases a ping to an host that is down will result (talking about what you would get at the commandline) Pinging x.y.z.aa with 32 bytes of data: Request timed out. Request timed out. Request timed out. Request timed out. => is this down or a timeout? Dirk Bulinckx Servers Alive - http://www.woodstone.nu (http://www.woodstone.nu/) StellarDNS (DNS Hosting) - http://www.stellardns.com (http://www.stellardns.com/) -------------------------------------------------------------------------------- From: Servers Alive Discussion List [mailto:[email protected] (mailto:[email protected])] On Behalf Of Hanson, Brett Sent: Friday, November 29, 2013 10:01 PM To: Servers Alive Discussion List Subject: [SA-list] Enhancement request - differentiate between down and timeout I get a fair number of false alarms due to timeouts. In my opinion, not getting a response in the allotted time window doesn’t necessarily mean the check is down, and I don’t want a service restarted just because Servers Alive had some network issues. For instance, I have two instances of Servers Alive running. The first instance does all the real work. The second instance watches the first instance. To do this, it is looking to see if the log file has changed in the last 30 minutes (file properties check looking for the youngest matching file write date being more than 30 minutes ago). Both instances are located in the same data center, the first instance is a physical machine, the second is a virtual machine. A few times a day, the second instance will alert that Servers Alive Application is DOWN (ERR: didn't stop within the given timeout). The timeout is set at 15 seconds. In this case, a timeout doesn’t mean that the first instance of Servers alive is down, it just means that the second instance couldn’t figure it out. The last thing I want is for the first instance of Serves Alive to restart and lose its state because the second instance decided “I’m not sure, so I’ll say it is down”. I am aware that you can use the “When the Extra info field” on the alerts, but it just seems messy and prevents you from using the extra info field for more interesting uses. I would like an enhancement to Servers Alive that would treat timeouts as possible down or possibly available, or as a completely different condition. Ideally, you could have alerts specifically for timeout conditions, or (what I’d really like to see) have a different check called if a timeout occurred. Thank you, Brett Hanson Systems Analyst Agrium Inc. -------------------------------------------------------------------------------- IMPORTANT NOTICE ! This E-Mail transmission and any accompanying attachments may contain confidential information intended only for the use of the individual or entity named above. Any dissemination, distribution, copying or action taken in reliance on the contents of this E-Mail by anyone other than the intended recipient is strictly prohibited and is not intended to, in anyway, waive privilege or confidentiality. If you have received this E-Mail in error please immediately delete it and notify sender at the above E-Mail address. Agrium uses state of the art anti-virus technology on all incoming and outgoing E-Mail. We encourage and promote the use of safe E-Mail management practices and recommend you check this, and all other E-Mail and attachments you receive for the presence of viruses. The sender and Agrium accept no liability for any damage caused by a virus or otherwise by the transmittal of this E-Mail. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] (mailto:[email protected]) If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] (mailto:[email protected]) If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] (mailto:[email protected]) If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] (mailto:[email protected]) If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] (mailto:[email protected]) If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] (mailto:[email protected]) If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.
