Re: Notify storms
I agree - we are removing 1/2 the masters in a couple weeks to help things. Slaves only talk to masters, there are no slaves of slaves as we refer to them. Our architecture goal has been uniformity among the configurations, and this is part of the price we pay for that. At a functional level, this will persist, just not with as much traffic. As a result of our use of CVS, and our inability to control notifies, whenever we push out big updates, the first master takes all the traffic, while the others sit there unused. We've experimented with breaking up the notifies by pushing updates in chunks to various masters, but that really breaks things, both process-wise and logically. What we really need is some way to say there is an update, any one of these servers has an acceptable version of the update instead of hey, there's an update hey, there's an update hey there's an update and having each slave go to each of the masters. While it seems trivial operationally to handle these loads, and we're not concerned about network bandwidth, a single master can only manage so many transactions at once. I doubt we're even in the top 50% of deployments, zone count wise, so I am confident that our number of zones isn't an issue. But I suspect that 80+ slaves are a little out there. with 80 slaves and 1800 zones, each master sends out 144000 notifies (for major changes/a master reload), which triggers 144000 SOA queries back to the master very quickly. That is bound to cause delays. One option we've considered is making our MASTERS and NS records point to an anycast IP/load balancer so that there can be multiple masters answering for the same notify. Another option would be to stop all notifies altogether, then figure out a way to manually trigger notifies (generating notifys via perl script/something clever) so we can control where the notifies come from. When all the EU DNS servers get notifies first from an NA master, they grab the data from there, so being able to control notifies would be nice sometimes. Thankfully we're mid-rearchitecture, and this will (hopefully) be torn out soon, but until it is we need to make sure that our users can manage their changes in a reasonable manner. A for loop doing rndc retransfer for changed zones, which seems to bypass all the congestion, is a short term fix until we can figure out how to make things a little smoother. Apologies for the wall of text - this is a frequent discussion with very little in the way of conclusion around here :) Todd. On Wed, Jan 20, 2010 at 10:33 PM, Joseph S D Yao j...@tux.org wrote: On Wed, Jan 20, 2010 at 03:52:33PM -0500, Todd wrote: serial-query-rate While this appears to be helping in the lab, it's still taking between 2 and 3 minutes for each slave to even finish receiving the NOTIFYs from the master. They then start hitting the master(s) with SOA queries whch seems to take a really long time. Your NOTIFY tree sounds like it's many-to-many. Maybe you should be using a sparser tree. -- /*\ ** ** Joe Yao j...@tux.org - Joseph S. D. Yao ** \*/ ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Notify storms
On 2010/01/20, at 13:03, Dave Sparro wrote: We would like to make this better. Can anyone help with ideas on this? Are we missing something obvious? In that situation I'd consider using CVS on all of the servers to maintain the DNS data. Just make all of the servers masters, and forget about slaves. Agreed .. that's definitely one solution. With your data already in a version control system, and that many name servers, you might benefit from replacing zone transfers with a configuration management tool (cfengine, bcfg2, etc.) which can take care of noticing that there's new data in the version control system, getting it onto the slaves, and then telling them to reload or reconfig as appropriate (depending on whether it's zone files or named.conf that changed). Another option if you want to stick with the master/slave approach is to tier your slaves. Reduce the masters to just two or three, and then assign 10 or so of the slaves to be intermediate masters. The intermediates slave from the real masters, and then every other server slaves from, at most, two or three of the intermediates each. If you group these appropriately, then you can get it down to a maximum of 10 or so slaves talking to any one upstream master, with a nice mesh to maintain redundancies. How you divide them up is up to you ... regionally works well though. Matt ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Notify storms
serial-query-rate While this appears to be helping in the lab, it's still taking between 2 and 3 minutes for each slave to even finish receiving the NOTIFYs from the master. They then start hitting the master(s) with SOA queries whch seems to take a really long time. We're going to keep tuning, but it looks like we've reached some sort of tipping point where inefficiencies in our methodology, architecture and the underlying protocol might be combining to make for less than ideal conditions for fast changes. Thanks for this tip ... big 'ah-ha' moment for us. Cheers, Todd. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Notify storms
On Wed, Jan 20, 2010 at 03:52:33PM -0500, Todd wrote: serial-query-rate While this appears to be helping in the lab, it's still taking between 2 and 3 minutes for each slave to even finish receiving the NOTIFYs from the master. They then start hitting the master(s) with SOA queries whch seems to take a really long time. Your NOTIFY tree sounds like it's many-to-many. Maybe you should be using a sparser tree. -- /*\ ** ** Joe Yao j...@tux.org - Joseph S. D. Yao ** \*/ ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Notify storms
Good day all, We've run into a problem with our DNS servers. The way we update our masters is via a CVS Checkout and reload of the zones modified. Sometimes though, we need to reload the whole config for big changs/etc. When that happens, all 6 masters (I know, we're getting rid of some) send notifies to all 80+ (I know, we're getting rid of some) slaves for all 1800 zones. This causes all the slaves to verify all 1800 zones on 6 masters, which then delays the changes we made from actually getting to the slaves. Right now it's about 2.5 hours for all slaves to do all zones. We would like to make this better. We're trying to figure out what mechanism might be limiting the rate at which the slave does SOA checks against the master so it can perform that step quicker. We have looked at the zone transfer limits on the master/slave, but that is related to the transfer mechanism, not the SOA query. Can anyone help with ideas on this? Are we missing something obvious? Cheers, Todd. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Notify storms
On Mon, Jan 18, 2010 at 1:27 PM, Todd canada...@gmail.com wrote: Good day all, We've run into a problem with our DNS servers. The way we update our masters is via a CVS Checkout and reload of the zones modified. Sometimes though, we need to reload the whole config for big changs/etc. When that happens, all 6 masters (I know, we're getting rid of some) send notifies to all 80+ (I know, we're getting rid of some) slaves for all 1800 zones. This causes all the slaves to verify all 1800 zones on 6 masters, which then delays the changes we made from actually getting to the slaves. Right now it's about 2.5 hours for all slaves to do all zones. We would like to make this better. We're trying to figure out what mechanism might be limiting the rate at which the slave does SOA checks against the master so it can perform that step quicker. We have looked at the zone transfer limits on the master/slave, but that is related to the transfer mechanism, not the SOA query. Can anyone help with ideas on this? Are we missing something obvious? Might not be what you are looking for but sounds like some of the ideas presented at infrastructures.org might help. -B ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Notify storms
In message 91aa34af1001181327q7f5de882vf47052ed39d87...@mail.gmail.com, Todd writes: Good day all, We've run into a problem with our DNS servers. The way we update our masters is via a CVS Checkout and reload of the zones modified. Sometimes though, we need to reload the whole config for big changs/etc. When that happens, all 6 masters (I know, we're getting rid of some) send notifies to all 80+ (I know, we're getting rid of some) slaves for all 1800 zones. This causes all the slaves to verify all 1800 zones on 6 masters, which then delays the changes we made from actually getting to the slaves. Right now it's about 2.5 hours for all slaves to do all zones. We would like to make this better. We're trying to figure out what mechanism might be limiting the rate at which the slave does SOA checks against the master so it can perform that step quicker. We have looked at the zone transfer limits on the master/slave, but that is related to the transfer mechanism, not the SOA query. Can anyone help with ideas on this? Are we missing something obvious? serial-query-rate Cheers, Todd. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users