Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size
On Thu, Feb 22, 2018 at 2:04 AM, Lukas Tribuswrote: > Hello Baptiste, > > > > On 21 February 2018 at 19:59, Lukas Tribus wrote: > > Baptiste, I don't think you'd find the symptoms I have in mind > > acceptable on a load-balancer, so there has to be a misunderstanding > > here. I would like to do some tests, maybe I can come up with a simple > > testcase that shows the behavior and then we can review the situation > > based on that testcase; I will probably need a few days for this > > though. > > So this is what I did: I pulled current haproxy master (5e64286bab) > and applied your patch on top of it. I also added "hold obsolete 30s" > to the configuration in all those tests. > > > Two things that I noticed: > - GoogleDNS and recent Bind instances (and probably many others) don't > actually truncate the response; they don't add any A records to the > response when they set TC - so the TC response is not incomplete but > actually completely empty (repro: use testcase vs 8.8.8.8 max payload > 1280) > - OpenDNS (208.67.222.222) actually truncates the response (just like > old bind instances), however haproxy is unable to parse that response, > so a TC response from OpenDNS is always rejected (repro: use testcase > vs 208.67.222.222 max payload 1280) > > So surprisingly enough in both of those 2 cases, the "auto-downgrade" > does not reduce the amount of servers in the backend, instead it kills > the backend completely. With your patch and with "hold obsolete 30s" > of course. > > What I was actually looking for is a testcase that reduces the amount > of servers in the backend, but I guess that would require a DNS server > that actually truncates the reply "old-style" and at the same time > does not cause haproxy do reject the response, but I don't know what > haproxy doesn't like about the OpenDNS TC response. > > > Back to the original testcase though: > - testcase config attached > - "100_pointing_to.localhost.ltri.eu" returns 100 A records in the > localhost range, it requires aprox. 1600 byte in payload size > - we can trigger the "auto-downgrade" very easily by shortly > interrupting DNS traffic via a iptables rule (iptables -A INPUT -i > eth0 -s 8.8.8.8 -j DROP && sleep 10 && iptables -D INPUT -i eth0 -s > 8.8.8.8 -j DROP) > - after we triggered the auto-downgrade, haproxy does not recover and > no backend servers will be alive, until we reload > > > Auto-downgrade behaves exactly as expected in our previous > conversation. The exact end-result depends on the behavior of the DNS > server. But none of those cases are desirable: > > Case 1 (Testcase Google DNS, recent Bind): > - when auto-downgrade fires the response will be TC without any > records; Haproxy will disable all servers and the entire backend will > be down (fix: restart haproxy) > > Case 2 (Testcase OpenDNS): > - when auto-downgrade fires the response will be TC, which haproxy is > unable to parse; Haproxy will disable all servers and the entire > backend will be down (fix: restart haproxy) > > Case 3 (assumption based on what ASA on discourse reports, likely old > Bind): > - when auto-downgrade fires and the response is TC, TC is ignored > which means the reply is considered, downgrading the number of servers > in the backend to a lower number (whatever fitted in the 1280 reply), > which most likely will overload the existing backend servers (after > all, there is probably a reason a certain number of servers is in the > DNS) > > > "hold obsolete" can only help if haproxy is able to recover; but the > auto-downgrade makes sure no future DNS requests works as expected so > whatever value "hold obsolete" is set to, once "hold obsolete" is > over, the problem will show up. > > > Lets talk about the likelihood of an admin configuring a payload size > above 1280: I think its safe to assume that this is configured based > on actual needs, so an admin would hit one of the 3 cases above, > unless I'm missing something. I completely fail to understand the > benefit of this feature in haproxy. > > > So based on this tests and cases, I would ask you again to consider > removing this altogether. > > > > cheers, > lukas > Hi Lukas, (I was off last week with limited mail access) First, thanks a lot for all your testing! Your use case is right and I perfectly understand it and it makes sense to me. that said, in my use case, I was using (and meaning) SRV records and using consul / kubernetes as backend servers. What I saw is that when the response is too big, the server will send only the SRV records with TC flag and no "ADDITIONAL" section. So in such case, it was still acceptable for HAProxy. According to you, what would be the best option: - entirely remove this "feature" and consider the admins know what they do (I include the copy/paste admins that simply copy content of different blog articles until "it works") - enable this feature for a single "retry" of the same query? (that would happen after we did a
Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size
Hello Baptiste, On 21 February 2018 at 19:59, Lukas Tribuswrote: > Baptiste, I don't think you'd find the symptoms I have in mind > acceptable on a load-balancer, so there has to be a misunderstanding > here. I would like to do some tests, maybe I can come up with a simple > testcase that shows the behavior and then we can review the situation > based on that testcase; I will probably need a few days for this > though. So this is what I did: I pulled current haproxy master (5e64286bab) and applied your patch on top of it. I also added "hold obsolete 30s" to the configuration in all those tests. Two things that I noticed: - GoogleDNS and recent Bind instances (and probably many others) don't actually truncate the response; they don't add any A records to the response when they set TC - so the TC response is not incomplete but actually completely empty (repro: use testcase vs 8.8.8.8 max payload 1280) - OpenDNS (208.67.222.222) actually truncates the response (just like old bind instances), however haproxy is unable to parse that response, so a TC response from OpenDNS is always rejected (repro: use testcase vs 208.67.222.222 max payload 1280) So surprisingly enough in both of those 2 cases, the "auto-downgrade" does not reduce the amount of servers in the backend, instead it kills the backend completely. With your patch and with "hold obsolete 30s" of course. What I was actually looking for is a testcase that reduces the amount of servers in the backend, but I guess that would require a DNS server that actually truncates the reply "old-style" and at the same time does not cause haproxy do reject the response, but I don't know what haproxy doesn't like about the OpenDNS TC response. Back to the original testcase though: - testcase config attached - "100_pointing_to.localhost.ltri.eu" returns 100 A records in the localhost range, it requires aprox. 1600 byte in payload size - we can trigger the "auto-downgrade" very easily by shortly interrupting DNS traffic via a iptables rule (iptables -A INPUT -i eth0 -s 8.8.8.8 -j DROP && sleep 10 && iptables -D INPUT -i eth0 -s 8.8.8.8 -j DROP) - after we triggered the auto-downgrade, haproxy does not recover and no backend servers will be alive, until we reload Auto-downgrade behaves exactly as expected in our previous conversation. The exact end-result depends on the behavior of the DNS server. But none of those cases are desirable: Case 1 (Testcase Google DNS, recent Bind): - when auto-downgrade fires the response will be TC without any records; Haproxy will disable all servers and the entire backend will be down (fix: restart haproxy) Case 2 (Testcase OpenDNS): - when auto-downgrade fires the response will be TC, which haproxy is unable to parse; Haproxy will disable all servers and the entire backend will be down (fix: restart haproxy) Case 3 (assumption based on what ASA on discourse reports, likely old Bind): - when auto-downgrade fires and the response is TC, TC is ignored which means the reply is considered, downgrading the number of servers in the backend to a lower number (whatever fitted in the 1280 reply), which most likely will overload the existing backend servers (after all, there is probably a reason a certain number of servers is in the DNS) "hold obsolete" can only help if haproxy is able to recover; but the auto-downgrade makes sure no future DNS requests works as expected so whatever value "hold obsolete" is set to, once "hold obsolete" is over, the problem will show up. Lets talk about the likelihood of an admin configuring a payload size above 1280: I think its safe to assume that this is configured based on actual needs, so an admin would hit one of the 3 cases above, unless I'm missing something. I completely fail to understand the benefit of this feature in haproxy. So based on this tests and cases, I would ask you again to consider removing this altogether. cheers, lukas auto-payload-size-downgrade-testcase.cfg Description: Binary data
Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size
Hello Baptiste, I'm sorry if my comments are blunt, but I think this discussion is important and I do not want my messages to be ambiguous. I do appreciate all the work you are doing in the DNS subsystem. On 21 February 2018 at 18:05, Baptistewrote: >> However in Haproxy the administrator *explicitly* configures a higher >> payload size, because this higher payload size is probably actually >> *required* to get all backend servers. Silently downgrading the >> payload size is harmful in my opinion here. >> > > Maybe there is a misunderstood here. > HAProxy already downgrade "silently" the accepted payload size. I mean, this > "feature" already exists. > This patch simply ensure the downgrade happens in timeout cases only. I am aware that this patch is merely addressing a corner case of this already existing feature, I'm not saying the patch doesn't fix that corner case. I am saying that I have a strong opinion about this feature in the first place and I responded to this thread merely because I was not aware of this feature previously. >> Ok, but you can see how ignoring the TC flag, combined with automatic >> payload size downgrade will make the backend servers number fluctuate >> with a little bit of packet loss? So with 40 servers in DNS and the >> loss of a single packet we will downgrade the entire backend to >> whatever fitted in the 1280 byte downgraded response. >> >> I would call this behavior highly undesirable. >> > > You can play with "hold obsolete " to ensure that unseen records are > kept for time. I don't see how this is supposed to address the problem. Payload size is downgraded permanently (as I've found out below), not only per retry request, so we will forever use 1280, which will not contain the complete server set (after all, that's why the admin raised the payload size in the first place), so we will be missing - possibly a lot of - backend servers until we reload haproxy (which will work until the first DNS packet is lost), than haproxy will degrade again when the first DNS packet lost. >> Failing over to TCP on TC responses is the proper solution to this >> all, but in the meantime I would argue that we should make the >> behavior as predictable and stable as we can. >> >> >> Let me know what you think, >> > > I think that until we can do DNS over TCP, it is safer to downgrade the > announced accepted payload size in case of a timeout and ensure we'll still > receive answer than never ever downgrade and stop receiving answers. I disagree (strongly). We can't do anything about TC and DNS over TCP in haproxy 1.8. But it is my opinion that auto downgrading accepted payload size, while ignoring TC, is problematic in *a lot* of situations, with a very questionable benefit. When is the auto downgrade supposed to help? When we have fluctuating PathMTU issues in a datacenter I think we should fail hard and fast, not hide the problem. Other than that (hiding PathMTU problems) it will kneecap the loadbalancer when a single IP fragment of a DNS response is lost, by reducing the amount of available servers. Sure, if we would write lookup code for a recursive DNS server we should implement this fallback in every case. But we would also have proper TC handling in that case, so we would not work with truncated responses and we would certainly not permanently downgrade our ability to get large responses. > As I explained above, this "feature" (downgrade) already exists in HAProxy, > but is applied to too many cases. > This patch simply ensure that it is applied to timeouts only. > So I don't see what's wrong with this patch. I don't disagree with the patch, I disagree with the feature; I am fully aware that it already exist, I was not aware of the feature before you send the patch. You mentioned an RFC suggestion, and I believe RFC6891 #6.2.5 may be what you are talking about: https://tools.ietf.org/html/rfc6891#section-6.2.5 And it indeed suggests to fall back to a lower payload size, however: - it is a MAY: "A requestor MAY choose to implement a fallback" - it is kind-of implied (when reading the entire paragraph) that this is only relevant when the *default* payload size is above 1280 (so not relevant to haproxy, we don't default above 1280, we don't even enable edns0 by default), it's imo irrelevant when the admin is actually configuring the payload size - its obvious the TCP fallback on TC responses has to work - it imo also implies a fallback for the next retry-request, not a fallback for the entire runtime of the application > - create a new command on the CLI to set the accepted payload to a value > decided by the admin (so he can perform an upgrade in case a downgrade > happened) I don't understand; are you saying the feature currently downgrades the payload size for the resolver *permanently*, not just for the next retry? Therefor, when haproxy downgrades *currently*, we have to actually reload haproxy to restore the accept payload
Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size
On Wed, Feb 21, 2018 at 11:07 AM, Lukas Tribuswrote: > Hello Baptiste, > > > On 21 February 2018 at 08:45, Baptiste wrote: > >> Is this downgrade at good thing in the first place? Doesn't it hide > >> configuration and network issues, make troubleshooting more complex > >> and the haproxy behavior less predictable? > > > > > > It is an rfc recommendation (rfc number is commented somewhere in the > source > > code, but I am on a mobile and can't access it). > > Its purpose is to hide networking issues when responses has to cross the > > internet and behavior is not predictable. > > And I can see how this would be useful in end-user situations, > browser, smartphone apps, etc. > > However in Haproxy the administrator *explicitly* configures a higher > payload size, because this higher payload size is probably actually > *required* to get all backend servers. Silently downgrading the > payload size is harmful in my opinion here. > > Maybe there is a misunderstood here. HAProxy already downgrade "silently" the accepted payload size. I mean, this "feature" already exists. This patch simply ensure the downgrade happens in timeout cases only. > >> When we see a response with the TC flag set, do we drop it or do we > >> still consider the DNS response? > > > > Haproxy ignores tc flag for now. > > Ok, but you can see how ignoring the TC flag, combined with automatic > payload size downgrade will make the backend servers number fluctuate > with a little bit of packet loss? So with 40 servers in DNS and the > loss of a single packet we will downgrade the entire backend to > whatever fitted in the 1280 byte downgraded response. > > I would call this behavior highly undesirable. > > You can play with "hold obsolete " to ensure that unseen records are kept for time. > > Note: networks/firewalls/router may more likely drop fragmented IP > packets than "normal" IP packets (as they may try to reassemble them > to actually apply layer 4+ ACLs, which requires buffers, which can > overflow) and this makes it even worse. > > > > > Later, it will have to trigger a failover to tcp. > > Failing over to TCP on TC responses is the proper solution to this > all, but in the meantime I would argue that we should make the > behavior as predictable and stable as we can. > Let me know what you think, > > I think that until we can do DNS over TCP, it is safer to downgrade the announced accepted payload size in case of a timeout and ensure we'll still receive answer than never ever downgrade and stop receiving answers. As I explained above, this "feature" (downgrade) already exists in HAProxy, but is applied to too many cases. This patch simply ensure that it is applied to timeouts only. So I don't see what's wrong with this patch. Now, I agree the "silent" application of the downgrade could be problematic and we could imagine some solutions to fix this: - emit a WARNING message / log when it happens - create a new command on the CLI to display current accepted payload - create a new command on the CLI to set the accepted payload to a value decided by the admin (so he can perform an upgrade in case a downgrade happened) Any other suggestion is welcome. Baptiste
Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size
Hello Baptiste, On 21 February 2018 at 08:45, Baptistewrote: >> Is this downgrade at good thing in the first place? Doesn't it hide >> configuration and network issues, make troubleshooting more complex >> and the haproxy behavior less predictable? > > > It is an rfc recommendation (rfc number is commented somewhere in the source > code, but I am on a mobile and can't access it). > Its purpose is to hide networking issues when responses has to cross the > internet and behavior is not predictable. And I can see how this would be useful in end-user situations, browser, smartphone apps, etc. However in Haproxy the administrator *explicitly* configures a higher payload size, because this higher payload size is probably actually *required* to get all backend servers. Silently downgrading the payload size is harmful in my opinion here. >> When we see a response with the TC flag set, do we drop it or do we >> still consider the DNS response? > > Haproxy ignores tc flag for now. Ok, but you can see how ignoring the TC flag, combined with automatic payload size downgrade will make the backend servers number fluctuate with a little bit of packet loss? So with 40 servers in DNS and the loss of a single packet we will downgrade the entire backend to whatever fitted in the 1280 byte downgraded response. I would call this behavior highly undesirable. Note: networks/firewalls/router may more likely drop fragmented IP packets than "normal" IP packets (as they may try to reassemble them to actually apply layer 4+ ACLs, which requires buffers, which can overflow) and this makes it even worse. > Later, it will have to trigger a failover to tcp. Failing over to TCP on TC responses is the proper solution to this all, but in the meantime I would argue that we should make the behavior as predictable and stable as we can. Let me know what you think, Lukas
Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size
Hi Lukas, Le 19 févr. 2018 23:37, "Lukas Tribus"a écrit : Hello Baptiste, On 19 February 2018 at 18:59, Baptiste wrote: > Hi guys, > > While working with consul, I discovered a "false positive" corner case which > triggers a downgrade of the accepted_payload_size. Is this downgrade at good thing in the first place? Doesn't it hide configuration and network issues, make troubleshooting more complex and the haproxy behavior less predictable? It is an rfc recommendation (rfc number is commented somewhere in the source code, but I am on a mobile and can't access it). Its purpose is to hide networking issues when responses has to cross the internet and behavior is not predictable. By default, haproxy announces 512 bytes of accepted payload which can be too low for some use cases. Hence we can also announce acceptance of big payloads and must provide a failover to a safe value if we never ever receive any response. In my case, I received many response for multiple backend and was receiving nx domains and unfortunately triggered the downgrade to lower accepted buffers. When we see a response with the TC flag set, do we drop it or do we still consider the DNS response? Haproxy ignores tc flag for now. Later, it will have to trigger a failover to tcp. Also, do any of those code paths set srv_admin_state to 0x60 (combination between SRV_ADMF_RMAINT and SRV_ADMF_HMAINT state)? I'm thinking about this thread: https://discourse.haproxy.org/t/haproxy-1-8-2-1-8-3-dns- auto-discover-stop-working/2014/12 I don't think so. It's purely DNS layer. I will confirm that once I have access to a computer. Baptiste
Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size
Hello Baptiste, On 19 February 2018 at 18:59, Baptistewrote: > Hi guys, > > While working with consul, I discovered a "false positive" corner case which > triggers a downgrade of the accepted_payload_size. Is this downgrade at good thing in the first place? Doesn't it hide configuration and network issues, make troubleshooting more complex and the haproxy behavior less predictable? When we see a response with the TC flag set, do we drop it or do we still consider the DNS response? Also, do any of those code paths set srv_admin_state to 0x60 (combination between SRV_ADMF_RMAINT and SRV_ADMF_HMAINT state)? I'm thinking about this thread: https://discourse.haproxy.org/t/haproxy-1-8-2-1-8-3-dns-auto-discover-stop-working/2014/12 Thanks, Lukas