Re: [systemd-devel] systemd-resolved continuously switching DNS servers
Am Wed, 10 May 2017 22:12:37 +0200 schrieb Kai Krakow : > Am Tue, 9 May 2017 20:37:16 +0200 > schrieb Lennart Poettering : > > > On Tue, 09.05.17 00:42, Kai Krakow (hurikha...@gmail.com) wrote: > > > > > Am Sat, 6 May 2017 14:22:21 +0200 > > > schrieb Kai Krakow : > > > > [...] > > [...] > > [...] > > [...] > [...] > > > > > > Fixed by restarting the router. The cable modem seems to be buggy > > > with UDP packets after a lot of uptime: it simply silently drops > > > UDP packets at regular intervals, WebUI was also very slow, > > > probably a CPU issue. > > > > > > I'll follow up on this with the cable provider. > > > > > > When the problem starts to show up, systemd-resolved is affected > > > more by this than direct resolving. I don't know if there's > > > something that could be optimized in systemd-resolved to handle > > > such issues better but I don't consider it a bug in > > > systemd-resolved, it was a local problem. > > > > Normally configured DNS servers should be equivalent, and hence > > switching them for each retry should not come at any cost, hence, > > besides the extra log output, do you experience any real issues? > > Since I restarted the router, there are no longer any such logs except > maybe a few per day (less than 4). > > But when I got those logs spammed to the journal, the real problem was > the DNS resolver taking 10s about once per minute to resolve a website > address - which really was a pita. > > But well, what could systemd-resolved have done about it when the real > problem was some network equipment? > > I just wonder why it was less visible when directly using those DNS > servers. Since DNS must have been designed with occassional packet > loss in mind (because it uses UDP), there must be a way to handle this > better. So I read a little bit in > https://www.ietf.org/rfc/rfc1035.txt. > > RFC1035 section 4.2.1 suggests that the retransmission interval for > queries should be 2-5 seconds, depending on statistics of previous > queries. To me, "retransmissions" means the primary DNS server should > not be switched for each query timeout it got (while still allowing to > transfer the same request to the next available server). > > RFC1035 section 7 discusses the suggested implementation of the > resolver and covers retransmission and server selection algorithms: > > It suggests to record average response time for each server it queries > to select the ones which respond faster first. Without query history, > the selection algorithm should pretend a response time of 5-10 > seconds. > > It also suggests to switch the primary server only when there was some > "bizarre" error or a server error reply. However, I don't think it > should actually remove them from the list as suggested there since we > are a client resolver, not a server resolver which can update its peer > lists from neighbor servers. However, we could reset query time > statistics to move it to the end of the list, and/or blacklist it for > a while. > > Somewhere else in that document I've read that it is well permitted to > interleave multiple parallel requests to multiple DNS servers in the > list. So I guess it would be nice and allowed if systemd-resolved used > more than only one DNS server at the same time by alternating between > them each request - maybe taking the two best according to query time > statistics. > > I also guess that it should maybe use shorter timeouts for queries as > you could have more than one DNS server and the initial query time > statistic should pretend 5-10 seconds, while the rotation interval > suggests 2-5 seconds. > > I think it would work to have "10 seconds divided by servers count" or > 2 seconds, whatever is bigger, as a timeout for query rotation. But a > late reply should still be accepted as pointed out in section 7.3, > even when the query was already rotated to the next DNS server. Using > only a single DNS server can skip all this logic as there's no > rotation and would work with timeouts of 10 seconds. > > That way, systemd-resolved would "learn" to use only the fastest DNS > server and when it becomes too slow (which is 5-10 seconds based on > the RFC), it would switch to the next server. If parallel requests > come in, it would use more DNS servers from the list in parallel, > auto-sorted by query reply time. The startup order is the one given > by the administrator (or whatever provides the DNS server list). > > Applied to my UDP packet loss (which seem to be single packet losses > as an immediate next request would've got a reply), it would mean > that the systemd resolver gives me an address after 2-3 seconds > instead of 5 or 10 because I had 4 DNS servers on that link. This is > more or less what I've seen previously in my situation when I > switched back to direct resolving instead of through systemd-resolved. > > What do you think? I could think of this being an implementation > improvement project in the Git
Re: [systemd-devel] systemd-resolved continuously switching DNS servers
Am Tue, 9 May 2017 20:37:16 +0200 schrieb Lennart Poettering : > On Tue, 09.05.17 00:42, Kai Krakow (hurikha...@gmail.com) wrote: > > > Am Sat, 6 May 2017 14:22:21 +0200 > > schrieb Kai Krakow : > > > > > Am Fri, 5 May 2017 20:18:41 +0200 > > > schrieb Lennart Poettering : > > > > [...] > [...] > [...] > > > > > > It looks like this all has to do with timeouts: > > > > Fixed by restarting the router. The cable modem seems to be buggy > > with UDP packets after a lot of uptime: it simply silently drops UDP > > packets at regular intervals, WebUI was also very slow, probably a > > CPU issue. > > > > I'll follow up on this with the cable provider. > > > > When the problem starts to show up, systemd-resolved is affected > > more by this than direct resolving. I don't know if there's > > something that could be optimized in systemd-resolved to handle > > such issues better but I don't consider it a bug in > > systemd-resolved, it was a local problem. > > Normally configured DNS servers should be equivalent, and hence > switching them for each retry should not come at any cost, hence, > besides the extra log output, do you experience any real issues? Since I restarted the router, there are no longer any such logs except maybe a few per day (less than 4). But when I got those logs spammed to the journal, the real problem was the DNS resolver taking 10s about once per minute to resolve a website address - which really was a pita. But well, what could systemd-resolved have done about it when the real problem was some network equipment? I just wonder why it was less visible when directly using those DNS servers. Since DNS must have been designed with occassional packet loss in mind (because it uses UDP), there must be a way to handle this better. So I read a little bit in https://www.ietf.org/rfc/rfc1035.txt. RFC1035 section 4.2.1 suggests that the retransmission interval for queries should be 2-5 seconds, depending on statistics of previous queries. To me, "retransmissions" means the primary DNS server should not be switched for each query timeout it got (while still allowing to transfer the same request to the next available server). RFC1035 section 7 discusses the suggested implementation of the resolver and covers retransmission and server selection algorithms: It suggests to record average response time for each server it queries to select the ones which respond faster first. Without query history, the selection algorithm should pretend a response time of 5-10 seconds. It also suggests to switch the primary server only when there was some "bizarre" error or a server error reply. However, I don't think it should actually remove them from the list as suggested there since we are a client resolver, not a server resolver which can update its peer lists from neighbor servers. However, we could reset query time statistics to move it to the end of the list, and/or blacklist it for a while. Somewhere else in that document I've read that it is well permitted to interleave multiple parallel requests to multiple DNS servers in the list. So I guess it would be nice and allowed if systemd-resolved used more than only one DNS server at the same time by alternating between them each request - maybe taking the two best according to query time statistics. I also guess that it should maybe use shorter timeouts for queries as you could have more than one DNS server and the initial query time statistic should pretend 5-10 seconds, while the rotation interval suggests 2-5 seconds. I think it would work to have "10 seconds divided by servers count" or 2 seconds, whatever is bigger, as a timeout for query rotation. But a late reply should still be accepted as pointed out in section 7.3, even when the query was already rotated to the next DNS server. Using only a single DNS server can skip all this logic as there's no rotation and would work with timeouts of 10 seconds. That way, systemd-resolved would "learn" to use only the fastest DNS server and when it becomes too slow (which is 5-10 seconds based on the RFC), it would switch to the next server. If parallel requests come in, it would use more DNS servers from the list in parallel, auto-sorted by query reply time. The startup order is the one given by the administrator (or whatever provides the DNS server list). Applied to my UDP packet loss (which seem to be single packet losses as an immediate next request would've got a reply), it would mean that the systemd resolver gives me an address after 2-3 seconds instead of 5 or 10 because I had 4 DNS servers on that link. This is more or less what I've seen previously in my situation when I switched back to direct resolving instead of through systemd-resolved. What do you think? I could think of this being an implementation improvement project in the Github bug tracker. I would be willing to work on such an improvement given that the existing code is not too difficult to understand since I'm not a
Re: [systemd-devel] systemd-resolved continuously switching DNS servers
On Tue, 09.05.17 00:42, Kai Krakow (hurikha...@gmail.com) wrote: > Am Sat, 6 May 2017 14:22:21 +0200 > schrieb Kai Krakow : > > > Am Fri, 5 May 2017 20:18:41 +0200 > > schrieb Lennart Poettering : > > > > > On Fri, 05.05.17 01:01, Kai Krakow (hurikha...@gmail.com) wrote: > > > > > > > Hello! > > > > > > > > Why is systemd-resolved switching DNS servers all day long? This > > > > doesn't seem to be right... > > > > > > If you turn on debug logging, you should see an explanation right > > > before each switch. I figure we should choose the log levels more > > > carefully, so that whenever we switch we also log the reason at the > > > same level... > > > > It looks like this all has to do with timeouts: > > Fixed by restarting the router. The cable modem seems to be buggy with > UDP packets after a lot of uptime: it simply silently drops UDP > packets at regular intervals, WebUI was also very slow, probably a CPU > issue. > > I'll follow up on this with the cable provider. > > When the problem starts to show up, systemd-resolved is affected more > by this than direct resolving. I don't know if there's something that > could be optimized in systemd-resolved to handle such issues better but > I don't consider it a bug in systemd-resolved, it was a local problem. Normally configured DNS servers should be equivalent, and hence switching them for each retry should not come at any cost, hence, besides the extra log output, do you experience any real issues? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-resolved continuously switching DNS servers
Am Sat, 6 May 2017 14:22:21 +0200 schrieb Kai Krakow : > Am Fri, 5 May 2017 20:18:41 +0200 > schrieb Lennart Poettering : > > > On Fri, 05.05.17 01:01, Kai Krakow (hurikha...@gmail.com) wrote: > > > > > Hello! > > > > > > Why is systemd-resolved switching DNS servers all day long? This > > > doesn't seem to be right... > > > > If you turn on debug logging, you should see an explanation right > > before each switch. I figure we should choose the log levels more > > carefully, so that whenever we switch we also log the reason at the > > same level... > > It looks like this all has to do with timeouts: Fixed by restarting the router. The cable modem seems to be buggy with UDP packets after a lot of uptime: it simply silently drops UDP packets at regular intervals, WebUI was also very slow, probably a CPU issue. I'll follow up on this with the cable provider. When the problem starts to show up, systemd-resolved is affected more by this than direct resolving. I don't know if there's something that could be optimized in systemd-resolved to handle such issues better but I don't consider it a bug in systemd-resolved, it was a local problem. Thanks, Kai > Mai 06 14:17:09 jupiter systemd-resolved[5585]: Cache miss for > ssl.gstatic.com IN Mai 06 14:17:09 jupiter > systemd-resolved[5585]: Transaction 54375 for > scope dns on enp5s0/*. Mai 06 14:17:09 jupiter > > systemd-resolved[5585]: Using feature level UDP for transaction > > 54375. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using > > DNS server fe80::b248:7aff:fee7:f438%2 for transaction 54375. > Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending query packet > with id 54375. Mai 06 14:17:09 jupiter systemd-resolved[5585]: > Timeout reached on transaction 33004. Mai 06 14:17:09 jupiter > systemd-resolved[5585]: Retrying transaction 33004. Mai 06 14:17:09 > jupiter systemd-resolved[5585]: Switching to DNS server > 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 06 > 14:17:09 jupiter systemd-resolved[5585]: Cache miss for > ssl.gstatic.com IN A Mai 06 14:17:09 jupiter systemd-resolved[5585]: > Transaction 33004 for scope dns on enp5s0/*. > Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using feature level > UDP for transaction 33004. Mai 06 14:17:09 jupiter > systemd-resolved[5585]: Using DNS server > 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for transaction 33004. Mai 06 > 14:17:09 jupiter systemd-resolved[5585]: Sending query packet with id > 33004. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Processing > incoming packet on transaction 33004. (rcode=SUCCESS) Mai 06 14:17:09 > jupiter systemd-resolved[5585]: Not validating response for 33004, > used server feature level does not support DNSSEC. Mai 06 14:17:09 > jupiter systemd-resolved[5585]: Added positive unauthenticated cache > entry for ssl.gstatic.com IN A 143s on > */INET6/2a02:8109:1ec0:6f5:5667:51ff:feea:385f Mai 06 14:17:09 > jupiter systemd-resolved[5585]: Transaction 33004 for > on scope dns on enp5s0/* now complete with > from network (unsigned). Mai 06 14:17:09 jupiter > systemd-resolved[5585]: Sending response packet with id 42127 on > interface 1/AF_INET. Mai 06 14:17:09 jupiter systemd-resolved[5585]: > Sending response packet with id 22131 on interface 1/AF_INET. Mai 06 > 14:17:09 jupiter systemd-resolved[5585]: Processing incoming packet > on transaction 54375. (rcode=SUCCESS) Mai 06 14:17:09 jupiter > systemd-resolved[5585]: Not validating response for 54375, used > server feature level does not support DNSSEC. Mai 06 14:17:09 jupiter > systemd-resolved[5585]: Added positive unauthenticated cache entry > for ssl.gstatic.com IN 203s on > enp5s0/INET6/fe80::b248:7aff:fee7:f438 Mai 06 14:17:09 jupiter > systemd-resolved[5585]: Transaction 54375 for > on scope dns on enp5s0/* now complete with from > > network (unsigned). Mai 06 14:17:09 jupiter > > systemd-resolved[5585]: Freeing transaction 33004. Mai 06 > > 14:17:09 jupiter systemd-resolved[5585]: Sent message > > type=method_return sender=n/a destination=:1.352 object=n/a > > interface=n/a member=n/a cookie=234 reply_cookie=2 error=n/a > > Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sent message > > type=method_call sender=n/a destination=org.freedesktop.DBus > > object=/org/freedesktop/DBus interface=org.freedesktop.DBus > > member=RemoveMatch cookie=235 reply_cookie=0 erro Mai 06 > > 14:17:09 jupiter systemd-resolved[5585]: Got message > > type=method_return sender=org.freedesktop.DBus > > destination=:1.273 object=n/a interface=n/a member=n/a > > cookie=181 reply_cookie=235 error=n/a Mai 06 14:17:09 jupiter > > systemd-resolved[5585]: Freeing transaction 54375. > > I just don't unterstand why, because all these nameservers work > perfectly well when used directly and not through the stub resolver. -- Regards, Kai Replies to list-only preferred. ___
Re: [systemd-devel] systemd-resolved continuously switching DNS servers
Am Fri, 5 May 2017 20:18:41 +0200 schrieb Lennart Poettering : > On Fri, 05.05.17 01:01, Kai Krakow (hurikha...@gmail.com) wrote: > > > Hello! > > > > Why is systemd-resolved switching DNS servers all day long? This > > doesn't seem to be right... > > If you turn on debug logging, you should see an explanation right > before each switch. I figure we should choose the log levels more > carefully, so that whenever we switch we also log the reason at the > same level... It looks like this all has to do with timeouts: Mai 06 14:17:09 jupiter systemd-resolved[5585]: Cache miss for ssl.gstatic.com IN Mai 06 14:17:09 jupiter systemd-resolved[5585]: Transaction 54375 for scope dns on enp5s0/*. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using feature level UDP for transaction 54375. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using DNS server fe80::b248:7aff:fee7:f438%2 for transaction 54375. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending query packet with id 54375. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Timeout reached on transaction 33004. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Retrying transaction 33004. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Switching to DNS server 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Cache miss for ssl.gstatic.com IN A Mai 06 14:17:09 jupiter systemd-resolved[5585]: Transaction 33004 for scope dns on enp5s0/*. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using feature level UDP for transaction 33004. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using DNS server 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for transaction 33004. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending query packet with id 33004. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Processing incoming packet on transaction 33004. (rcode=SUCCESS) Mai 06 14:17:09 jupiter systemd-resolved[5585]: Not validating response for 33004, used server feature level does not support DNSSEC. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Added positive unauthenticated cache entry for ssl.gstatic.com IN A 143s on */INET6/2a02:8109:1ec0:6f5:5667:51ff:feea:385f Mai 06 14:17:09 jupiter systemd-resolved[5585]: Transaction 33004 for on scope dns on enp5s0/* now complete with from network (unsigned). Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending response packet with id 42127 on interface 1/AF_INET. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending response packet with id 22131 on interface 1/AF_INET. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Processing incoming packet on transaction 54375. (rcode=SUCCESS) Mai 06 14:17:09 jupiter systemd-resolved[5585]: Not validating response for 54375, used server feature level does not support DNSSEC. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Added positive unauthenticated cache entry for ssl.gstatic.com IN 203s on enp5s0/INET6/fe80::b248:7aff:fee7:f438 Mai 06 14:17:09 jupiter systemd-resolved[5585]: Transaction 54375 for on scope dns on enp5s0/* now complete with from network (unsigned). Mai 06 14:17:09 jupiter systemd-resolved[5585]: Freeing transaction 33004. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sent message type=method_return sender=n/a destination=:1.352 object=n/a interface=n/a member=n/a cookie=234 reply_cookie=2 error=n/a Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sent message type=method_call sender=n/a destination=org.freedesktop.DBus object=/org/freedesktop/DBus interface=org.freedesktop.DBus member=RemoveMatch cookie=235 reply_cookie=0 erro Mai 06 14:17:09 jupiter systemd-resolved[5585]: Got message type=method_return sender=org.freedesktop.DBus destination=:1.273 object=n/a interface=n/a member=n/a cookie=181 reply_cookie=235 error=n/a Mai 06 14:17:09 jupiter systemd-resolved[5585]: Freeing transaction 54375. I just don't unterstand why, because all these nameservers work perfectly well when used directly and not through the stub resolver. -- Regards, Kai Replies to list-only preferred. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-resolved continuously switching DNS servers
On Fri, 05.05.17 01:01, Kai Krakow (hurikha...@gmail.com) wrote: > Hello! > > Why is systemd-resolved switching DNS servers all day long? This > doesn't seem to be right... If you turn on debug logging, you should see an explanation right before each switch. I figure we should choose the log levels more carefully, so that whenever we switch we also log the reason at the same level... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd-resolved continuously switching DNS servers
Hello! Why is systemd-resolved switching DNS servers all day long? This doesn't seem to be right... Mai 05 00:52:46 jupiter systemd-resolved[658]: Switching to DNS server 192.168.4.254 for interface enp5s0. Mai 05 00:52:53 jupiter systemd-resolved[658]: Switching to DNS server fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:52:53 jupiter systemd-resolved[658]: Switching to DNS server 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 05 00:52:58 jupiter systemd-resolved[658]: Switching to DNS server 192.168.4.254 for interface enp5s0. Mai 05 00:52:59 jupiter systemd-resolved[658]: Switching to DNS server fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:53:02 jupiter systemd-resolved[658]: Switching to DNS server 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 05 00:53:06 jupiter systemd-resolved[658]: Switching to DNS server 192.168.4.254 for interface enp5s0. Mai 05 00:53:07 jupiter systemd-resolved[658]: Switching to DNS server fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:53:12 jupiter systemd-resolved[658]: Switching to DNS server 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 05 00:53:33 jupiter systemd-resolved[658]: Switching to DNS server 192.168.4.254 for interface enp5s0. Mai 05 00:53:35 jupiter systemd-resolved[658]: Switching to DNS server fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:53:40 jupiter systemd-resolved[658]: Switching to DNS server 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 05 00:54:01 jupiter systemd-resolved[658]: Switching to DNS server 192.168.4.254 for interface enp5s0. Mai 05 00:54:02 jupiter systemd-resolved[658]: Switching to DNS server fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:54:08 jupiter systemd-resolved[658]: Switching to DNS server 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Also, name resolving seems to be very slow. When I directly use these name servers in resolv.conf instead of going through the stub resolver, there are no slow-downs and these logs stop. -- Regards, Kai Replies to list-only preferred. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel