On Wed, Dec 01, 2010 at 12:40:40PM -0600, Jeremy Utley wrote: > Good afternoon, > > We've been working on upgrading our recursors from pdns-recursor-3.1.7.1-1 > to pdns-recursor-3.3-1, and have seen some oddities I wanted to ask the > list about. First, a basic rundown of our environment: > > Our existing production servers are running pdns-recursor-3.1.7.1-1 > installed via RPMs downloaded from your website. The recursor itself is > ran within a Xen PV virtual machine on a CentOS 5.5 base. To ensure we > utilize all 4 cores of the processors in those machines, 2 instances of the > recursor are launched simultaneously, listening on different IP addresses, > and we utilize the fork option. We have a total of 6 machines configured > this way, behind a Foundry load balancer which handles sharing the load > between them. This implementation has been in place for about a year with > no issues. We also use Cacti graphs for collecting performance data, by > extending SNMP with output from the rec_control command. > > The new test server is pdns-recursor-3.3-1 installed via RPM downloaded > from your website, and also running within a Xen PV virtual machine on a > CentOS 5.5 base. Rather than launching multiple instances, we are > launching 4 recursor threads (machines have 4 CPU cores). Most other > settings are configured identically between old and new servers. This test > server was added to the load balancer on Monday afternoon, taking a > fraction of the traffic that would have gone to the 6 old machines. > > The problem I'm seeing is the caching does not seem to be working properly, > which is causing a performance hit. To document this effect, the following > graph images were taken a little while ago from our Cacti installation: > > http://www.jutley.org/DNS > > Looking at the 4th graph down, which is the cache statistics on the old > version recursor, you will see that around 90% of all questions are cache > hits, with around 10% as cache misses. And, looking at the third graph > (showing how fast queries are answered), you'll see that over 90% of all > queries are answered in less than 1 ms. > > However, looking at the bottom graph, which is the cache statistics on the > new recursor, the statistics are totally different. Only 1.1% of the total > questions are cache hits, while 6.8% are cache misses, which to me makes no > sense, since a question *HAS* to be either a cache hit or cache miss. And, > looking at the 7th graph (answer speed on the new recursor version), most > queries are taking more than 10ms to answer. > > Just as additional info, the data collected by cacti to generate these > graphs comes from the following command: > > /usr/bin/rec_control get questions cache-entries cache-hits cache-misses > concurrent-queries resource-limits unauthorized-tcp unauthorized-udp > spoof-prevents answers-slow client-parse-errors answers0-1 answers1-10 > answers10-100 answers100-1000 qa-latency > > Am I mis-interpreting this, or is there something definately going on? > > Thanks for your time, > > Jeremy
Hi Jeremy, You are not including the statistics for packetcache-hits/misses. If it hits their it will not check the cache. I would bet that your packetcache-hits are pretty substantial. Ours are almost 3X the cache-hits. Cheers, Ken _______________________________________________ Pdns-users mailing list [email protected] http://mailman.powerdns.com/mailman/listinfo/pdns-users
