-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Jakub,
I cannot know for sure since the log seems to be shortened by syslog, but I think you may have the same problem as Haw Loeung reported. I have since corresponded with him and I think have resolved the issue. A patch is in svn for unbound. The issue seems to be caused by the parent-child disagreement fix in 1.4.1, worsened by no IPv6 available (either not in the kernel or do-ip6:no statement) - just turning ipv6 back on (also if you have no ipv6 connectivity) may stop this bug. Sorry for the inconvenience, Wouter On 01/08/2010 10:36 PM, Jakub Heichman wrote: > Hello there, > > I have upgraded from Unbound 1.3.0 to 1.4.1 recently on a couple of > servers. The servers are doing about 3-7k requests per second. > I have noticed that after some time of operation, servers > intermittently stop responding to queries to certain domains from > cache (return SERVFAIL). It particularly doesn't like webex.com, > sometimes vmware.com, dell.com. > > # unbound-host webex.com > webex.com has address 64.68.120.99 > webex.com mail is handled by 50 mx3.webex.com. > webex.com mail is handled by 10 mx1.webex2.iphmx.com. > webex.com mail is handled by 10 mx2.webex2.iphmx.com. > webex.com mail is handled by 50 mx2.webex.com. > > But the A record doesnt seem to get cached: > > # dig webex.com @localhost > > ; <<>> DiG 9.3.3rc2 <<>> webex.com @localhost > ; (1 server found) > ;; global options: printcmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 49004 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 > > ;; QUESTION SECTION: > ;webex.com. IN A > > ;; Query time: 0 msec > ;; SERVER: 127.0.0.1#53(127.0.0.1) > ;; WHEN: Fri Jan 8 19:16:01 2010 > ;; MSG SIZE rcvd: 27 > > # unbound-control dump_cache | grep ^webex.com > webex.com. 4589 IN NS ns01.webex.com. > webex.com. 4589 IN NS ns05.webex.com. > webex.com. 4589 IN NS ns07.webex.com. > webex.com. 4589 IN NS ns03.webex.com. > webex.com. 4589 IN NS ns08.webex.com. > webex.com. 4589 IN NS ns06.webex.com. > > It doesnt respond to NS queries either (No A records for > nsxx.webex.com in cache) > > # dig ns webex.com @localhost > > ; <<>> DiG 9.3.3rc2 <<>> ns webex.com @localhost > ; (1 server found) > ;; global options: printcmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 54946 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 > > ;; QUESTION SECTION: > ;webex.com. IN NS > > ;; Query time: 0 msec > ;; SERVER: 127.0.0.1#53(127.0.0.1) > ;; WHEN: Fri Jan 8 20:53:18 2010 > ;; MSG SIZE rcvd: 27 > > > > After flushing webex.com I can query the cache again: > > # unbound-control flush webex.com > ok > # dig +short webex.com @localhost > 64.68.120.99 > > Log with increased verbosity: > > an 8 21:05:32 cns01j unbound: [25052:3] info: validator operate: > query <webex.com. A IN> > Jan 8 21:05:32 cns01j unbound: [25052:3] info: resolving <webex.com. A IN> > Jan 8 21:05:32 cns01j unbound: [25052:3] info: > DelegationPoint<webex.com.>: 6 names (6 missing), 6 addrs (0 result, 6 > avail) > Jan 8 21:05:32 cns01j unbound: [25052:3] info: ns08.webex.com. A > Jan 8 21:05:32 cns01j unbound: [25052:3] info: ns05.webex.com. A > Jan 8 21:05:32 cns01j unbound: [25052:3] info: ns01.webex.com. A > Jan 8 21:05:32 cns01j unbound: [25052:3] info: ns06.webex.com. A > Jan 8 21:05:32 cns unbound: [25052:3] info: ns07.webex.com. A > Jan 8 21:05:32 cns unbound: [25052:3] info: ns03.webex.com. A > Jan 8 21:05:32 cns unbound: [25052:3] info: resolving (init part 2): > <webex.com. A IN> > Jan 8 21:05:32 cns unbound: [25052:3] info: resolving (init part 3): > <webex.com. A IN> > Jan 8 21:05:32 cns unbound: [25052:3] info: processQueryTargets: > <webex.com. A IN> > Jan 8 21:05:32 cns unbound: [25052:3] info: > DelegationPoint<webex.com.>: 6 names (6 missing), 6 addrs (0 result, 6 > avail) > Jan 8 21:05:32 cns unbound: [25052:3] info: ns08.webex.com. A > Jan 8 21:05:32 cns unbound: [25052:3] info: ns05.webex.com. A > Jan 8 21:05:32 cns unbound: [25052:3] info: ns01.webex.com. A > Jan 8 21:05:32 cns unbound: [25052:3] info: ns06.webex.com. A > Jan 8 21:05:32 cns unbound: [25052:3] info: ns07.webex.com. A > Jan 8 21:05:32 cns unbound: [25052:3] info: ns03.webex.com. A > Jan 8 21:05:32 cns unbound: [25052:3] info: validator operate: query > <webex.com. A IN> > Jan 8 21:05:32 cns unbound: [25052:3] info: mesh_run: end 8 recursion > states (8 with reply, 0 detached), 28 waiting replies, 84 recursion > replies sent, 0 replies dropped, 0 states jostled out > Jan 8 21:05:32 cns unbound: [25052:3] info: average recursion > processing time 4.172319 sec > Jan 8 21:05:32 cns unbound: [25052:3] info: histogram of recursion > processing times > Jan 8 21:05:32 cns unbound: [25052:3] info: [25%]=0.00107789 > median[50%]=0.00744727 [75%]=0.0524288 > Jan 8 21:05:32 cns unbound: [25052:3] info: lower(secs) upper(secs) > recursions > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.000000 0.000001 9 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.000256 0.000512 1 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.001024 0.002048 1 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.002048 0.004096 19 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.004096 0.008192 2 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.008192 0.016384 11 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.016384 0.032768 3 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.032768 0.065536 5 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.065536 0.131072 15 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.131072 0.262144 10 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.262144 0.524288 5 > Jan 8 21:05:32 cns unbound: [25052:3] info: 0.524288 1.000000 2 > Jan 8 21:05:32 cns unbound: [25052:3] info: 256.000000 512.000000 1 > > > > I have downgraded one of the servers back to 1.3.0 (libev 3.53), where > I don't see this problem. > > Unbound 1.4.1 compiled with libev-3.9, without threads - 4 processes running > > num-threads: 4 > outgoing-range: 10000 > msg-cache-size: 500m > msg-cache-slabs: 1 > num-queries-per-thread: 10000 > jostle-timeout: 4000 > so-rcvbuf: 8m > rrset-cache-size: 1024m > rrset-cache-slabs: 1 > cache-max-ttl: 86400 > > Any help would be greatly appreciated :-) > > Thanks, > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAktIS0QACgkQkDLqNwOhpPjQ2ACfeCESH6t5ezR94AnasM6DP16f EsIAoJbV90YIEyovcdz2VWTRmEk4/tMY =Zfvf -----END PGP SIGNATURE----- _______________________________________________ Unbound-users mailing list [email protected] http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users
