In <[EMAIL PROTECTED]> "Karel Sandler" <[EMAIL PROTECTED]> writes:
> From: "wayne" <[EMAIL PROTECTED]> > >> The "short term" rates returned by the NTP monitoring script is going >> to be pretty close to the exact average over the last 15-30 minutes, >> while the the "long term" rates are going to be close to the last >> 15-30 days. >> >> If there really are peaks of 1500 requests in a single second, or 6000 >> requests in a minute when the normal average is closer to 10, then >> even the short term rate will not have a chance to jump to those >> levels. It would probably jump to several hundred, but not 1000. >> >> Looking at John's logs, I do see spikes to 100req/sec for what I'm >> guessing is the 15-30 minute "short term" average. The bursts when >> your server got put into the pool didn't use to be that bad, but maybe >> it is time to take a closer look. Hmm... Ok, I've been looking into the spikes in NTP traffic. I've been collecting the short term rates once a minute for the last day and a half. Since then, I've only been in the DNS pool twice. Each time, I saw an increase of 25-35 req/sec. This spike started to grow soon after entering the pool and peaks soon after leaving the pool. The spikes are long enough that the short term rates should be pretty accurate measuring the times you are in the DNS pool, but as mentioned in the above quoted text, it wont be that accurate for events that last only a couple of minutes or a few seconds. You will still see a jump, but the jump won't show the true magnitude. For those collecting ntp_pool_dns stats, the peak req/sec occurs around the same time as the peak number of clients being monitored. The number of clients seem to grow by around 2000 or when you enter the pool. So, it looks like there are a couple thousand users of the pool that are only fetching the time after querying the DNS for the current list of pool servers. I suspect some of those are because of things like running ntpdate on booting up and then not ask for the time again for the rest of the day. However, I strongly suspect that the vast majority are constantly checking the time. This is probably not intentional, but rather some sort of bug or configuration problem, either with a firewall or the client software. To put this in perspective, if the average pool server supports 1500 ntp clients on a regular basis, and there are 750 servers in the pool, then the pool has 1,125,000 people using the pool. My guess is there are around 2000 pool users that are abusive, which is only 0.18%. That really is pretty good! Looking at the logs, it appears that some of these abusive clients that follow the DNS are querying my server several times per second. If they are doing a DNS lookup each time and getting rotated through all the servers in the DNS pool, these clients are doing dozens or maybe even a hundred requests per second. *sigh* It might be worth doing a little more detailed investigation and see if we can contact the most abusive clients. If there is any commonality into what they are doing, we might be able to eliminate an entire class of abusive clients. > Usually, the client's requests are statistically independent. Like ticks > from a Geiger counter. But, when hundreds of crons decide to send a request > just at two o'clock, it's something else. Although such events may be > curious, they are short and rare at a single server All the spikes I've seen so far have been correlated with being in the pool DNS. As you say, it may be rare and I just haven't seen it yet, or my server may never see it. > and as mentioned > elsewhere, they are not important for an overall QoS of the pool. Yeah, my only concern is that if the spikes from being in the pool DNS becomes so large that it becomes a problem for a significant number of pool servers. Kind of like trying to drink from a fire hose. -wayne _______________________________________________ timekeepers mailing list [email protected] https://fortytwo.ch/mailman/cgi-bin/listinfo/timekeepers
