Hello, On 22-02-17 15:44, Max Grobecker wrote: > the problem with "Don't know what the monitoring sees, but for me it's > working" is a long topic. > I know (Ask said it in a thread) that the monitoring can't be simply extended > by a second monitoring station > without massive changes to the monitoring software.
First you have to know what problem you are trying to solve. As far as I know, the monitor and direct upstream service provider have not been the source of any monitoring problem for years. Usually, the problem is very close to the server that gets its score dropped. You can easily check this by looking at the number of active servers in continents and countries. Even at country level, there's usually no visible drop in the number of active servers. The last time I can remember there was a visible drop on country level was Germany a few years ago. But even then, most servers in Germany were still reachable by the monitor. Several times I've checked reachability of servers with problems from a machine very close to AMS-IX in The Netherlands. Most of the time (though not always) with the same results as the monitor in LA. For example, a few years ago, the affected German servers were unreachable from my system as well at a distance of less than 200 km. Sometimes I can connect to a server in the USA, while it is unreachable from the monitor in LA! All you do with multiple monitoring servers is check more routing options. The only logical thing to do is to reject the server from the pool as soon as it is unreachable via at least one route. From our client's perspective this would make the pool more reliable. However, even more servers will see their score drop... Please, keep in mind the monitor is NOT about checking if a server knows the correct time. It is about checking if a server (with correct time) is able to *provide* that time to a client connecting to it from just anywhere on the internet using the DNS system. > Would it be an option to do a second look at low-scoring pool servers by > checking their reachability and accuracy by using > the RIPE ATLAS [1] network? > Then you would be able to get results from networks within that specific > region the server is serving and maybe > a few others wordwide and could calculate the median of all measurements. > That, added to the original monitor's result > could be used to distinct between server and monitor problems. Usually both the server AND the monitor are operating fine. The problems we see nowadays are often caused by local (regional) measures against DDoS attacks over UDP. As said above, you can check the total number of servers in all zones to verify the monitoring server is operating correctly. Best regards, Arnold _______________________________________________ pool mailing list [email protected] http://lists.ntp.org/listinfo/pool
