Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size

2018-02-26 Thread Baptiste
On Thu, Feb 22, 2018 at 2:04 AM, Lukas Tribus  wrote:

> Hello Baptiste,
>
>
>
> On 21 February 2018 at 19:59, Lukas Tribus  wrote:
> > Baptiste, I don't think you'd find the symptoms I have in mind
> > acceptable on a load-balancer, so there has to be a misunderstanding
> > here. I would like to do some tests, maybe I can come up with a simple
> > testcase that shows the behavior and then we can review the situation
> > based on that testcase; I will probably need a few days for this
> > though.
>
> So this is what I did: I pulled current haproxy master (5e64286bab)
> and applied your patch on top of it. I also added "hold obsolete 30s"
> to the configuration in all those tests.
>
>
> Two things that I noticed:
> - GoogleDNS and recent Bind instances (and probably many others) don't
> actually truncate the response; they don't add any A records to the
> response when they set TC - so the TC response is not incomplete but
> actually completely empty (repro: use testcase vs 8.8.8.8 max payload
> 1280)
> - OpenDNS (208.67.222.222) actually truncates the response (just like
> old bind instances), however haproxy is unable to parse that response,
> so a TC response from OpenDNS is always rejected (repro: use testcase
> vs 208.67.222.222 max payload 1280)
>
> So surprisingly enough in both of those 2 cases, the "auto-downgrade"
> does not reduce the amount of servers in the backend, instead it kills
> the backend completely. With your patch and with "hold obsolete 30s"
> of course.
>
> What I was actually looking for is a testcase that reduces the amount
> of servers in the backend, but I guess that would require a DNS server
> that actually truncates the reply "old-style" and at the same time
> does not cause haproxy do reject the response, but I don't know what
> haproxy doesn't like about the OpenDNS TC response.
>
>
> Back to the original testcase though:
> - testcase config attached
> - "100_pointing_to.localhost.ltri.eu" returns 100 A records in the
> localhost range, it requires aprox. 1600 byte in payload size
> - we can trigger the "auto-downgrade" very easily by shortly
> interrupting DNS traffic via a iptables rule (iptables -A INPUT -i
> eth0 -s 8.8.8.8 -j DROP && sleep 10 && iptables -D INPUT -i eth0 -s
> 8.8.8.8 -j DROP)
> - after we triggered the auto-downgrade, haproxy does not recover and
> no backend servers will be alive, until we reload
>
>
> Auto-downgrade behaves exactly as expected in our previous
> conversation. The exact end-result depends on the behavior of the DNS
> server. But none of those cases are desirable:
>
> Case 1 (Testcase Google DNS, recent Bind):
> - when auto-downgrade fires the response will be TC without any
> records; Haproxy will disable all servers and the entire backend will
> be down (fix: restart haproxy)
>
> Case 2 (Testcase OpenDNS):
> - when auto-downgrade fires the response will be TC, which haproxy is
> unable to parse;  Haproxy will disable all servers and the entire
> backend will be down (fix: restart haproxy)
>
> Case 3 (assumption based on what ASA on discourse reports, likely old
> Bind):
> - when auto-downgrade fires and the response is TC, TC is ignored
> which means the reply is considered, downgrading the number of servers
> in the backend to a lower number (whatever fitted in the 1280 reply),
> which most likely will overload the existing backend servers (after
> all, there is probably a reason a certain number of servers is in the
> DNS)
>
>
> "hold obsolete" can only help if haproxy is able to recover; but the
> auto-downgrade makes sure no future DNS requests works as expected so
> whatever value "hold obsolete" is set to, once "hold obsolete" is
> over, the problem will show up.
>
>
> Lets talk about the likelihood of an admin configuring a payload size
> above 1280: I think its safe to assume that this is configured based
> on actual needs, so an admin would hit one of the 3 cases above,
> unless I'm missing something. I completely fail to understand the
> benefit of this feature in haproxy.
>
>
> So based on this tests and cases, I would ask you again to consider
> removing this altogether.
>
>
>
> cheers,
> lukas
>


Hi Lukas,

(I was off last week with limited mail access)

First, thanks a lot for all your testing!

Your use case is right and I perfectly understand it and it makes sense to
me.
that said, in my use case, I was using (and meaning) SRV records and using
consul / kubernetes as backend servers.
What I saw is that when the response is too big, the server will send only
the SRV records with TC flag and no "ADDITIONAL" section. So in such case,
it was still acceptable for HAProxy.

According to you, what would be the best option:
- entirely remove this "feature" and consider the admins know what they do
(I include the copy/paste admins that simply copy content of different blog
articles until "it works")
- enable this feature for a single "retry" of the same query? (that would
happen after we did a 

Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size

2018-02-21 Thread Lukas Tribus
Hello Baptiste,



On 21 February 2018 at 19:59, Lukas Tribus  wrote:
> Baptiste, I don't think you'd find the symptoms I have in mind
> acceptable on a load-balancer, so there has to be a misunderstanding
> here. I would like to do some tests, maybe I can come up with a simple
> testcase that shows the behavior and then we can review the situation
> based on that testcase; I will probably need a few days for this
> though.

So this is what I did: I pulled current haproxy master (5e64286bab)
and applied your patch on top of it. I also added "hold obsolete 30s"
to the configuration in all those tests.


Two things that I noticed:
- GoogleDNS and recent Bind instances (and probably many others) don't
actually truncate the response; they don't add any A records to the
response when they set TC - so the TC response is not incomplete but
actually completely empty (repro: use testcase vs 8.8.8.8 max payload
1280)
- OpenDNS (208.67.222.222) actually truncates the response (just like
old bind instances), however haproxy is unable to parse that response,
so a TC response from OpenDNS is always rejected (repro: use testcase
vs 208.67.222.222 max payload 1280)

So surprisingly enough in both of those 2 cases, the "auto-downgrade"
does not reduce the amount of servers in the backend, instead it kills
the backend completely. With your patch and with "hold obsolete 30s"
of course.

What I was actually looking for is a testcase that reduces the amount
of servers in the backend, but I guess that would require a DNS server
that actually truncates the reply "old-style" and at the same time
does not cause haproxy do reject the response, but I don't know what
haproxy doesn't like about the OpenDNS TC response.


Back to the original testcase though:
- testcase config attached
- "100_pointing_to.localhost.ltri.eu" returns 100 A records in the
localhost range, it requires aprox. 1600 byte in payload size
- we can trigger the "auto-downgrade" very easily by shortly
interrupting DNS traffic via a iptables rule (iptables -A INPUT -i
eth0 -s 8.8.8.8 -j DROP && sleep 10 && iptables -D INPUT -i eth0 -s
8.8.8.8 -j DROP)
- after we triggered the auto-downgrade, haproxy does not recover and
no backend servers will be alive, until we reload


Auto-downgrade behaves exactly as expected in our previous
conversation. The exact end-result depends on the behavior of the DNS
server. But none of those cases are desirable:

Case 1 (Testcase Google DNS, recent Bind):
- when auto-downgrade fires the response will be TC without any
records; Haproxy will disable all servers and the entire backend will
be down (fix: restart haproxy)

Case 2 (Testcase OpenDNS):
- when auto-downgrade fires the response will be TC, which haproxy is
unable to parse;  Haproxy will disable all servers and the entire
backend will be down (fix: restart haproxy)

Case 3 (assumption based on what ASA on discourse reports, likely old Bind):
- when auto-downgrade fires and the response is TC, TC is ignored
which means the reply is considered, downgrading the number of servers
in the backend to a lower number (whatever fitted in the 1280 reply),
which most likely will overload the existing backend servers (after
all, there is probably a reason a certain number of servers is in the
DNS)


"hold obsolete" can only help if haproxy is able to recover; but the
auto-downgrade makes sure no future DNS requests works as expected so
whatever value "hold obsolete" is set to, once "hold obsolete" is
over, the problem will show up.


Lets talk about the likelihood of an admin configuring a payload size
above 1280: I think its safe to assume that this is configured based
on actual needs, so an admin would hit one of the 3 cases above,
unless I'm missing something. I completely fail to understand the
benefit of this feature in haproxy.


So based on this tests and cases, I would ask you again to consider
removing this altogether.



cheers,
lukas


auto-payload-size-downgrade-testcase.cfg
Description: Binary data


Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size

2018-02-21 Thread Lukas Tribus
Hello Baptiste,



I'm sorry if my comments are blunt, but I think this discussion is
important and I do not want my messages to be ambiguous. I do
appreciate all the work you are doing in the DNS subsystem.



On 21 February 2018 at 18:05, Baptiste  wrote:
>> However in Haproxy the administrator *explicitly* configures a higher
>> payload size, because this higher payload size is probably actually
>> *required* to get all backend servers. Silently downgrading the
>> payload size is harmful in my opinion here.
>>
>
> Maybe there is a misunderstood here.
> HAProxy already downgrade "silently" the accepted payload size. I mean, this
> "feature" already exists.
> This patch simply ensure the downgrade happens in timeout cases only.

I am aware that this patch is merely addressing a corner case of this
already existing feature, I'm not saying the patch doesn't fix that
corner case. I am saying that I have a strong opinion about this
feature in the first place and I responded to this thread merely
because I was not aware of this feature previously.



>> Ok, but you can see how ignoring the TC flag, combined with automatic
>> payload size downgrade will make the backend servers number fluctuate
>> with a little bit of packet loss? So with 40 servers in DNS and the
>> loss of a single packet we will downgrade the entire backend to
>> whatever fitted in the 1280 byte downgraded response.
>>
>> I would call this behavior highly undesirable.
>>
>
> You can play with "hold obsolete "  to ensure that unseen records are
> kept for  time.

I don't see how this is supposed to address the problem. Payload size
is downgraded permanently (as I've found out below), not only per
retry request, so we will forever use 1280, which will not contain the
complete server set (after all, that's why the admin raised the
payload size in the first place), so we will be missing - possibly a
lot of - backend servers until we reload haproxy (which will work
until the first DNS packet is lost), than haproxy will degrade again
when the first DNS packet lost.



>> Failing over to TCP on TC responses is the proper solution to this
>> all, but in the meantime I would argue that we should make the
>> behavior as predictable and stable as we can.
>>
>>
>> Let me know what you think,
>>
>
> I think that until we can do DNS over TCP, it is safer to downgrade the
> announced accepted payload size in case of a timeout and ensure we'll still
> receive answer than never ever downgrade and stop receiving answers.

I disagree (strongly). We can't do anything about TC and DNS over TCP
in haproxy 1.8. But it is my opinion that auto downgrading accepted
payload size, while ignoring TC, is problematic in *a lot* of
situations, with a very questionable benefit.

When is the auto downgrade supposed to help? When we have fluctuating
PathMTU issues in a datacenter I think we should fail hard and fast,
not hide the problem. Other than that (hiding PathMTU problems) it
will kneecap the loadbalancer when a single IP fragment of a DNS
response is lost, by reducing the amount of available servers.


Sure, if we would write lookup code for a recursive DNS server we
should implement this fallback in every case. But we would also have
proper TC handling in that case, so we would not work with truncated
responses and we would certainly not permanently downgrade our ability
to get large responses.



> As I explained above, this "feature" (downgrade) already exists in HAProxy,
> but is applied to too many cases.
> This patch simply ensure that it is applied to timeouts only.
> So I don't see what's wrong with this patch.

I don't disagree with the patch, I disagree with the feature; I am
fully aware that it already exist, I was not aware of the feature
before you send the patch.


You mentioned an RFC suggestion, and I believe RFC6891 #6.2.5 may be
what you are talking about:
https://tools.ietf.org/html/rfc6891#section-6.2.5


And it indeed suggests to fall back to a lower payload size, however:

- it is a MAY: "A requestor MAY choose to implement a fallback"
- it is kind-of implied (when reading the entire paragraph) that this
is only relevant when the *default* payload size is above 1280 (so not
relevant to haproxy, we don't default above 1280, we don't even enable
edns0 by default), it's imo irrelevant when the admin is actually
configuring the payload size
- its obvious the TCP fallback on TC responses has to work
- it imo also implies a fallback for the next retry-request, not a
fallback for the entire runtime of the application



> - create a new command on the CLI to set the accepted payload to a value
> decided by the admin (so he can perform an upgrade in case a downgrade
> happened)

I don't understand; are you saying the feature currently downgrades
the payload size for the resolver *permanently*, not just for the next
retry? Therefor, when haproxy downgrades *currently*, we have to
actually reload haproxy to restore the accept payload 

Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size

2018-02-21 Thread Baptiste
On Wed, Feb 21, 2018 at 11:07 AM, Lukas Tribus  wrote:

> Hello Baptiste,
>
>
> On 21 February 2018 at 08:45, Baptiste  wrote:
> >> Is this downgrade at good thing in the first place? Doesn't it hide
> >> configuration and network issues, make troubleshooting more complex
> >> and the haproxy behavior less predictable?
> >
> >
> > It is an rfc recommendation (rfc number is commented somewhere in the
> source
> > code, but I am on a mobile and can't access it).
> > Its purpose is to hide networking issues when responses has to cross the
> > internet and behavior is not predictable.
>
> And I can see how this would be useful in end-user situations,
> browser, smartphone apps, etc.
>
> However in Haproxy the administrator *explicitly* configures a higher
> payload size, because this higher payload size is probably actually
> *required* to get all backend servers. Silently downgrading the
> payload size is harmful in my opinion here.
>
>
Maybe there is a misunderstood here.
HAProxy already downgrade "silently" the accepted payload size. I mean,
this "feature" already exists.
This patch simply ensure the downgrade happens in timeout cases only.



> >> When we see a response with the TC flag set, do we drop it or do we
> >> still consider the DNS response?
> >
> > Haproxy ignores tc flag for now.
>
> Ok, but you can see how ignoring the TC flag, combined with automatic
> payload size downgrade will make the backend servers number fluctuate
> with a little bit of packet loss? So with 40 servers in DNS and the
> loss of a single packet we will downgrade the entire backend to
> whatever fitted in the 1280 byte downgraded response.
>
> I would call this behavior highly undesirable.
>
>
You can play with "hold obsolete "  to ensure that unseen records
are kept for  time.



>
> Note: networks/firewalls/router may more likely drop fragmented IP
> packets than "normal" IP packets (as they may try to reassemble them
> to actually apply layer 4+ ACLs, which requires buffers, which can
> overflow) and this makes it even worse.
>
>
>
> > Later, it will have to trigger a failover to tcp.
>
> Failing over to TCP on TC responses is the proper solution to this
> all, but in the meantime I would argue that we should make the
> behavior as predictable and stable as we can.
>

Let me know what you think,
>
>
I think that until we can do DNS over TCP, it is safer to downgrade the
announced accepted payload size in case of a timeout and ensure we'll still
receive answer than never ever downgrade and stop receiving answers.

As I explained above, this "feature" (downgrade) already exists in HAProxy,
but is applied to too many cases.
This patch simply ensure that it is applied to timeouts only.
So I don't see what's wrong with this patch.

Now, I agree the "silent" application of the downgrade could be problematic
and we could imagine some solutions to fix this:
- emit a WARNING message / log when it happens
- create a new command on the CLI to display current accepted payload
- create a new command on the CLI to set the accepted payload to a value
decided by the admin (so he can perform an upgrade in case a downgrade
happened)

Any other suggestion is welcome.

Baptiste


Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size

2018-02-21 Thread Lukas Tribus
Hello Baptiste,


On 21 February 2018 at 08:45, Baptiste  wrote:
>> Is this downgrade at good thing in the first place? Doesn't it hide
>> configuration and network issues, make troubleshooting more complex
>> and the haproxy behavior less predictable?
>
>
> It is an rfc recommendation (rfc number is commented somewhere in the source
> code, but I am on a mobile and can't access it).
> Its purpose is to hide networking issues when responses has to cross the
> internet and behavior is not predictable.

And I can see how this would be useful in end-user situations,
browser, smartphone apps, etc.

However in Haproxy the administrator *explicitly* configures a higher
payload size, because this higher payload size is probably actually
*required* to get all backend servers. Silently downgrading the
payload size is harmful in my opinion here.



>> When we see a response with the TC flag set, do we drop it or do we
>> still consider the DNS response?
>
> Haproxy ignores tc flag for now.

Ok, but you can see how ignoring the TC flag, combined with automatic
payload size downgrade will make the backend servers number fluctuate
with a little bit of packet loss? So with 40 servers in DNS and the
loss of a single packet we will downgrade the entire backend to
whatever fitted in the 1280 byte downgraded response.

I would call this behavior highly undesirable.


Note: networks/firewalls/router may more likely drop fragmented IP
packets than "normal" IP packets (as they may try to reassemble them
to actually apply layer 4+ ACLs, which requires buffers, which can
overflow) and this makes it even worse.



> Later, it will have to trigger a failover to tcp.

Failing over to TCP on TC responses is the proper solution to this
all, but in the meantime I would argue that we should make the
behavior as predictable and stable as we can.



Let me know what you think,

Lukas



Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size

2018-02-20 Thread Baptiste
Hi Lukas,




Le 19 févr. 2018 23:37, "Lukas Tribus"  a écrit :

Hello Baptiste,


On 19 February 2018 at 18:59, Baptiste  wrote:
> Hi guys,
>
> While working with consul, I discovered a "false positive" corner case
which
> triggers a downgrade of the accepted_payload_size.

Is this downgrade at good thing in the first place? Doesn't it hide
configuration and network issues, make troubleshooting more complex
and the haproxy behavior less predictable?


It is an rfc recommendation (rfc number is commented somewhere in the
source code, but I am on a mobile and can't access it).
Its purpose is to hide networking issues when responses has to cross the
internet and behavior is not predictable.
By default, haproxy announces 512 bytes of accepted payload which can be
too low for some use cases. Hence we can also announce acceptance of big
payloads and must provide a failover to a safe value if we never ever
receive any response.

In my case, I received many response for multiple backend and was receiving
nx domains and unfortunately triggered the downgrade to lower accepted
buffers.


When we see a response with the TC flag set, do we drop it or do we
still consider the DNS response?


Haproxy ignores tc flag for now.
Later, it will have to trigger a failover to tcp.


Also, do any of those code paths set srv_admin_state to 0x60
(combination between SRV_ADMF_RMAINT and SRV_ADMF_HMAINT state)? I'm
thinking about this thread:
https://discourse.haproxy.org/t/haproxy-1-8-2-1-8-3-dns-
auto-discover-stop-working/2014/12


I don't think so. It's purely DNS layer.
I will confirm that once I have access to a computer.

Baptiste


Re: BUG/MINOR: dns: false positive downgrade of accepted_payload_size

2018-02-19 Thread Lukas Tribus
Hello Baptiste,


On 19 February 2018 at 18:59, Baptiste  wrote:
> Hi guys,
>
> While working with consul, I discovered a "false positive" corner case which
> triggers a downgrade of the accepted_payload_size.

Is this downgrade at good thing in the first place? Doesn't it hide
configuration and network issues, make troubleshooting more complex
and the haproxy behavior less predictable?

When we see a response with the TC flag set, do we drop it or do we
still consider the DNS response?


Also, do any of those code paths set srv_admin_state to 0x60
(combination between SRV_ADMF_RMAINT and SRV_ADMF_HMAINT state)? I'm
thinking about this thread:
https://discourse.haproxy.org/t/haproxy-1-8-2-1-8-3-dns-auto-discover-stop-working/2014/12



Thanks,
Lukas