Gavin, We’ve been working on this recently, so I can provide some insight here. We use Pingdom and BGPmon at the moment for basic reachability tests but it’s very basic so we we are in the process of beefing it up, and have taken a pretty belt and braces approach:
We’re replicating our internal monitoring (Nagios + Xymon) to an external VM hosted with Digital Ocean (but could easily be Amazon or similar). This will mean our monitoring servers will be monitored which gives us some extra piece of mind. It also means that we can host a status webpage for customers to access if we have a problem. The status page is on a completely separate domain name hosted on external DNS servers, just in case we have a problem that affects DNS. We have integrated this to our broadcaster platform so that our admins can post an alert or maintenance window and it uses the Wordpress API to post it on the status page as well as emailing the affected customers as normal. We also have the monitoring servers using an SMS API (I’m sure you know the one) to send text alerts in case emails can’t reach us. The system sends me a ‘sanity’ text message every day at 6pm so we know even at quiet times that all is working correctly. Ofcourse the on-net monitoring server will look out for the one hosted externally and visa-versa, so theoretically there would have to be multiple failures with both us and external parties for there to be a problem that we don’t know about. Hope that helps. C -- Charlie Boisseau Fluency Communications Ltd. e. [email protected]<mailto:[email protected]> w. http://fluency.net.uk/ On 15 Dec 2013, at 14:45, Gavin Henry <[email protected]<mailto:[email protected]>> wrote: Hi all, So we're monitoring everything possible inside our network but wondered what others do to check routes that come in to your network via transit for latency/pl etc.? With the mixture of transit and public peering, even on our startup network, it's something to think about the best way. Also, how far out do you monitor? Just to your BGP peers or some known point after that? It's not good just pinging some public service as I'm sure they won't like it. I hear Pingdom and others but not sure. Thanks, Gavin. -- Kind Regards, Gavin Henry.
