Subject: DNSMASQ failing to return SRV records with loss of communication to a single DNS server
Issue: We have SIP SRV records for a domain which can be provided by two DNS servers in our environment. During testing we have noticed that if one of the DNS servers is un-reachable, the request for the SRV records via dnsmasq times out. This only happens when the query is originated from outside the box where dnsmasq is running. IE - if we issue the SRV query from the dnsmasq server, the SRV records are returned. If we issue the request from a client VM which is set to resolve queries against our dnsmasq host - the request times out. Note: some of the information below has been changed/replaced with xxx, such as IP addresses and domain names for security reasons. Dnsmasq.conf has the following entries - indicating to forward requests for labdomain.net to 10.xx.xx.12 and 10.xx.xx.20. server=/labdomain.net/10.xx.xx.12 server=/labdomain.net/10.xx.xx.20 VM making SRV queries is 10.xx.xx.99 When we query for an SRV record with 10.xx.xx.5 being our DNSMASQ server, and have commented out the non-reachable DNS server: 10.xx.xx.12 - we receive a response to the SRV query. #server=/labdomain.net/10.xx.xx.12 server=/labdomain.net/10.xx.xx.20 [labuser@f5-test ~]$ dig srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5 ;; Truncated, retrying in TCP mode. ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.62.rc1.el6_9.5 <<>> srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14584 ;; flags: qr aa; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 5 ;; QUESTION SECTION: ;_sip._udp.scscf.sprout.lp.labdomain.net. IN SRV ;; ANSWER SECTION: _sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-05.labdomain.net. _sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-01.labdomain.net. _sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-02.labdomain.net. _sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-03.labdomain.net. _sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-04.labdomain.net. ;; ADDITIONAL SECTION: ovpklp-viscscf-spn-05.labdomain.net. 43200 IN A 10.xx.xx.18 ovpklp-viscscf-spn-01.labdomain.net. 43200 IN A 10.xx.xx.14 ovpklp-viscscf-spn-02.labdomain.net. 43200 IN A 10.xx.xx.15 ovpklp-viscscf-spn-03.labdomain.net. 43200 IN A 10.xx.xx.16 ovpklp-viscscf-spn-04.labdomain.net. 43200 IN A 10.xx.xx.17 ;; Query time: 2 msec ;; SERVER: 10.xx.xx.5#53(10.xx.xx.5) ;; WHEN: Mon Aug 13 16:34:40 2018 ;; MSG SIZE rcvd: 528 When we query for an SRV record with 10.xx.xx.5 being our DNSMASQ server, and have both the good and non-reachable DNS server in play - we receive a timeout to the SRV query. In this case - 10.xx.xx.20 is fully capable of responding to the SRV query. server=/labdomain.net/10.xx.xx.12 <-- not reachable server=/labdomain.net/10.xx.xx.20 [labuser@f5-test ~]$ dig srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5 ;; Truncated, retrying in TCP mode. ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.62.rc1.el6_9.5 <<>> srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5 ;; global options: +cmd ;; connection timed out; no servers could be reached Dnsmasq logging shows: Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99 Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: forwarded _sip._udp.scscf.sprout.lp.labdomain.net to 10.xx.xx.12 Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: forwarded _sip._udp.scscf.sprout.lp.labdomain.net to 10.xx.xx.20 Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: nameserver 10.xx.xx.20 refused to do a recursive query Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5172]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99 Aug 14 16:22:24 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5173]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99 Aug 14 16:22:34 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5174]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99 I could use some ideas on how to further troubleshoot this issue. Andy Warner Telecom Design Engineer O: 406-752-3330 / M: 913-972-7521 andrew.c.war...@sprint.com [cid:408000_086801428601138001@pvmxe13g01]
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss