To rule out DNS - are the boxes using a DNS cache server on themselves or using a secondary server? What's the TTL on those A/CNAME records and how long was your outage?
Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373 "When you have eliminated the impossible, that which remains, however improbable, must be the truth." --- Sir Arthur Conan Doyle On Thu, Sep 17, 2009 at 7:02 PM, David Rees <[email protected]> wrote: > Hi, > > We use smokeping to monitor a number of hosts on various networks. We > have a master with a handful of slaves which monitor various sites. > > This morning we had an outage which affected one of those sites, but > the slaves which were monitoring the site that went down, failed to > report any data at all for any networks - even if they were reachable > from that network. Communications between the master/slaves were not > affected. > > The affected slaves were reporting this message: > > WARNING Master said 500 read timeout > > While the master had messages like: > > RRDs::update ERROR: /var/lib/smokeping/rrd/slave/slave~site1.rrd: > illegal attempt to update using time 1253201797 when last update time > is 1253201797 (minimum one second step) > > All machines are running smokeping 2.4.2. Any ideas? > > The only thing I can think of is that DNS for the site that went down > was also down so the master timed out trying to look it up the site's > IP address? > > Thanks > > Dave > > _______________________________________________ > smokeping-users mailing list > [email protected] > https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users >
_______________________________________________ smokeping-users mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
