SRV Records + 2.4.0 Socket Error

2021-05-27 Thread Luke Seelenbinder
Hi List,

I'm upgrading our edge servers from 2.2.x LTS to 2.4.x LTS, and the first 
server I brought up exhibited an odd Socket Error message with SRV records (for 
ever server after the first). I've filed a bug over on Github with the details: 
https://github.com/haproxy/haproxy/issues/1270 
<https://github.com/haproxy/haproxy/issues/1270>.

Anyone else encountering this?

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com



Re: [Bug: 2.2.10] Regression in SRV TTL Handling + Unexpected Record Timeouts

2021-03-06 Thread Luke Seelenbinder
Hi Tim,

Ah, good eye! It does appear to be the same, but in my case it's due to short 
TTLs vs large responses.

Thanks. I'll link this over there.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com | (864) 735-8533

> On 6 Mar 2021, at 19:03, Tim Düsterhus  wrote:
> 
> Luke,
> 
> Am 06.03.21 um 18:57 schrieb Luke Seelenbinder:
>> I just upgraded one of our HAProxy installations to 2.2.10 (on Debian using 
>> the from the HAProxy maintained apt repo). It appears the changes made to 
>> how SRV records are expired is causing issues, at least with short-lived 
>> TTLs in the SRV records.
> 
> I believe this might be a duplicate of this existing issue:
> https://github.com/haproxy/haproxy/issues/1160
> 
> Best regards
> Tim Düsterhus



[Bug: 2.2.10] Regression in SRV TTL Handling + Unexpected Record Timeouts

2021-03-06 Thread Luke Seelenbinder
Hi List,

I just upgraded one of our HAProxy installations to 2.2.10 (on Debian using the 
from the HAProxy maintained apt repo). It appears the changes made to how SRV 
records are expired is causing issues, at least with short-lived TTLs in the 
SRV records.

The issue I'm seeing is the record resolves, the servers stay properly set (and 
serving requests) until the SRV TTL expires (which in our case could be any 
value between 0 and 60), at which point the servers are set to no address, but 
this happens *before* a new record is fetched to reset the TTLs since this 
timeout is based on the values defined in resolves. I can play with the timeout 
section of resolvers to improve this situation, but it never completely fixes 
it, since the TTL on the SRV record could be quite low since I'm not fetching 
directly from our origin NS.

Code snippet below:

---
resolvers default
  nameserver …
  accepted_payload_size 8192
  resolve_retries 4

  hold valid 30s< Adjusting this to really low 
helps, but adds undue load on DNS, and may end up still being expired by the 
new "watchdog".
  hold obsolete  60s< Adjusting this higher means it stays up 
longer, but still fails to load the new record set in time

  hold timeout1s
  timeout resolve 5s
  timeout retry   1s
---

If the TTL for the SRV records return is less than valid or obsolete, the 
servers will lose their address before it is updated.

I would consider this a serious regression for short-lived SRV records.

Thanks! Happy to provide more details if this isn't easily reproducible.

Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com



Re: Segfault on 2.2.0 and UUID fetch regression

2020-07-18 Thread Luke Seelenbinder
Good thoughts, Tim.

I've created two Github issue:

- https://github.com/haproxy/haproxy/issues/762 
<https://github.com/haproxy/haproxy/issues/762> (segfault)
- https://github.com/haproxy/haproxy/issues/763 
<https://github.com/haproxy/haproxy/issues/763> (uuid config issue)

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 17 Jul 2020, at 18:21, Tim Düsterhus  wrote:
> 
> Luke,
> 
> Am 17.07.20 um 17:55 schrieb Luke Seelenbinder:
>> To follow up on this—is it easier if I create this as a GitHub issue?
>> 
> 
> It will certainly not get lost in the depths of the email archive and it
> allows to easily link commits to issues and issues to commits just by
> mentioning the number in the message.
> 
> Personally I create issues for everything, even if I fix them myself a
> few minutes later. In case my patch isn't correct at least the issue
> will still remain in the tracker.
> 
> Best regards
> Tim Düsterhus
> 



Re: Segfault on 2.2.0 and UUID fetch regression

2020-07-17 Thread Luke Seelenbinder
To follow up on this—is it easier if I create this as a GitHub issue?

I'm also not sure if it's related to 
https://github.com/haproxy/haproxy/issues/758 
<https://github.com/haproxy/haproxy/issues/758> or not.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 16 Jul 2020, at 20:49, Luke Seelenbinder 
>  wrote:
> 
> Hi List,
> 
> I just installed 2.2.0 (as packaged on haproxy.debian.net), and I'm 
> experiencing almost instant segfaults on all canary machines (seems to be on 
> the first handled request). I captured a core for debug purposes, but had to 
> roll back as these machines serve production traffic. I can send the cores to 
> someone over a more secure channel.
> 
> After installing debug symbols, I got this traceback in gdb:
> 
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p 
> /run/haproxy.pid -x /run/h'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x55b0131bce27 in si_cs_send (cs=cs@entry=0x55b015974530) at 
> include/haproxy/channel.h:128
> 128   include/haproxy/channel.h: No such file or directory.
> (gdb) bt
> #0  0x55b0131bce27 in si_cs_send (cs=cs@entry=0x55b015974530) at 
> include/haproxy/channel.h:128
> #1  0x55b0131be7a9 in si_cs_io_cb (t=, ctx=0x55b015972ef0, 
> state=) at src/stream_interface.c:789
> #2  0x55b01321fd0c in run_tasks_from_lists 
> (budgets=budgets@entry=0x7ffc8770b9cc) at src/task.c:448
> #3  0x55b0132204e8 in process_runnable_tasks () at src/task.c:674
> #4  0x55b0131d9bd7 in run_poll_loop () at src/haproxy.c:2905
> #5  0x55b0131d9f79 in run_thread_poll_loop (data=) at 
> src/haproxy.c:3070
> #6  0x55b0130aa8a4 in main (argc=, argv=) 
> at src/haproxy.c:3772
> 
> I also had to adjust a fetch of `uuid()` to `uuid(4)` for the configuration 
> to validate, even though the docs still state that `uuid()` means `uuid(4)` 
> in 2.2.
> 
> Best,
> Luke
> 
> —
> Luke Seelenbinder
> Stadia Maps | Founder
> stadiamaps.com



Segfault on 2.2.0 and UUID fetch regression

2020-07-16 Thread Luke Seelenbinder
Hi List,

I just installed 2.2.0 (as packaged on haproxy.debian.net), and I'm 
experiencing almost instant segfaults on all canary machines (seems to be on 
the first handled request). I captured a core for debug purposes, but had to 
roll back as these machines serve production traffic. I can send the cores to 
someone over a more secure channel.

After installing debug symbols, I got this traceback in gdb:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p 
/run/haproxy.pid -x /run/h'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x55b0131bce27 in si_cs_send (cs=cs@entry=0x55b015974530) at 
include/haproxy/channel.h:128
128 include/haproxy/channel.h: No such file or directory.
(gdb) bt
#0  0x55b0131bce27 in si_cs_send (cs=cs@entry=0x55b015974530) at 
include/haproxy/channel.h:128
#1  0x55b0131be7a9 in si_cs_io_cb (t=, ctx=0x55b015972ef0, 
state=) at src/stream_interface.c:789
#2  0x55b01321fd0c in run_tasks_from_lists 
(budgets=budgets@entry=0x7ffc8770b9cc) at src/task.c:448
#3  0x55b0132204e8 in process_runnable_tasks () at src/task.c:674
#4  0x55b0131d9bd7 in run_poll_loop () at src/haproxy.c:2905
#5  0x55b0131d9f79 in run_thread_poll_loop (data=) at 
src/haproxy.c:3070
#6  0x55b0130aa8a4 in main (argc=, argv=) at 
src/haproxy.c:3772

I also had to adjust a fetch of `uuid()` to `uuid(4)` for the configuration to 
validate, even though the docs still state that `uuid()` means `uuid(4)` in 2.2.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com


Re: [ANNOUNCE] haproxy-2.2.0

2020-07-08 Thread Luke Seelenbinder
Hi Willy,

Thanks for your tome treatment of my ideas! I forgot how much I enjoyed reading 
them. :)

>> To dig up an old discussion--I took a look at better support for SRV records
>> (using the priority field as backup/non-backup, etc.) a few weeks ago, but
>> determined it didn't make sense in our use case. The issue is 0 weighted
>> servers are considerably less useful to us since they aren't ever used, even
>> in the condition where every other server is down.
> 
> I seem to remember a discussion about making this configurable but I
> don't seem to see any commit matching anything like that, so maybe the
> discussion ended up in "change the behavior again the previous one was
> wrong", I don't remember well.

It was quite a long time ago (March), but I didn't have a chance to test 
behavior and look at the code until a few weeks ago.

> With your approach it would be almost identical except that we would
> always have two load-balancing groups, a primary one and a secondary
> one, the first one made only of the active servers and the second one
> made only of the backup servers.

Great! I'm glad it isn't a huge departure from the present code.

> We would then pick from the first
> list and if it's empty, then the next one.

This slightly concerns me. Hopefully I'm just not quite understanding the 
behavior.

Would that imply request A would pick from the primary server group for all 
backend requests (including retires) unless the primary is 100% down / empty? 
An ideal path for us (as odd as it may sound), is to allow the ability for 
request A to go to the primary group first, then optionally redispatch to 
secondary group. This isn't currently possible, and is the source of most of 
our remaining 5xx errors.

> We'd just document that the keyword "backup" means "server of the
> secondary group", and probably figure new actions or decisions to
> force to use one group over the other one.

I think if these actions are capable of changing the group picked by retries, 
that addresses my concerns.

> I'm dumping all that in case it can help you get a better idea of the
> various mid-term possibilities and what the steps could be (and also what
> not to do if we don't want to shoot ourselves in the foot).

That helps my understanding quite a bit, too!

Regarding queues, LB algorithms, and such, this is of lesser concern for us. We 
want to reasonably fairly pick backends, but beyond that, we don't much care 
(perhaps therein lies the rub). I was a bit surprised to read that requests are 
queued for particular servers vs for a particular group at the moment, which 
has some interesting implications for L7 retries based on 5xx errors which in 
turn result in the server being marked down. It could explain why we're seeing 
occasional edge cases of errors that don't make complete sense. (Request D 
comes in, is scheduled for a server, the server goes down along with the rest 
of the group due to Requests A, B, and C failing, Request D then fails by 
default, since the group is empty.)

A first step towards this would be to allow requests to be redispatched to the 
backup group. That would eliminate many of our issues. We're fine with a few 
slower requests if we know they'll likely succeed the second time around 
(because the slow region is not handling both). It'd likely help our 99p and 
999p times a good bit.

I was hoping 0 weighted servers would allow for this, but I was mistaken, since 
0 weighted servers are even less used than backup servers. :-)

I hope this helps clarify our needs.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 8 Jul 2020, at 19:34, Willy Tarreau  wrote:
> 
> Hi Luke!
> 
> On Wed, Jul 08, 2020 at 11:57:15AM +0200, Luke Seelenbinder wrote:
>> I've been following along the torturous road, and I'm happy to see all the
>> issues resolved and the excellent results.
> 
> You can imagine how I am as well :-)
> 
>> Personally, I'm excited about the
>> performance gains. I'll deploy this soon on our network.
> 
> OK!
> 
>> To dig up an old discussion--I took a look at better support for SRV records
>> (using the priority field as backup/non-backup, etc.) a few weeks ago, but
>> determined it didn't make sense in our use case. The issue is 0 weighted
>> servers are considerably less useful to us since they aren't ever used, even
>> in the condition where every other server is down.
> 
> I seem to remember a discussion about making this configurable but I
> don't seem to see any commit matching anything like that, so maybe the
> discussion ended up in "change the behavior again the previous one was
> wrong", I don't remember well.
> 
>> That raises the next question: is the idea of server groups (with the ability
>> for a reques

Re: [ANNOUNCE] haproxy-2.2.0

2020-07-08 Thread Luke Seelenbinder
Congrats on the release, Willy & the rest of the team!

I've been following along the torturous road, and I'm happy to see all the 
issues resolved and the excellent results. Personally, I'm excited about the 
performance gains. I'll deploy this soon on our network.

To dig up an old discussion—I took a look at better support for SRV records 
(using the priority field as backup/non-backup, etc.) a few weeks ago, but 
determined it didn't make sense in our use case. The issue is 0 weighted 
servers are considerably less useful to us since they aren't ever used, even in 
the condition where every other server is down.

That raises the next question: is the idea of server groups (with the ability 
for a request to try group 1, then group 2, etc. on retries) in the development 
plans at some point? Would that be something I could tinker as a longer term 
project?

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 7 Jul 2020, at 19:20, Willy Tarreau  wrote:
> 
> Hi,
> 
> HAProxy 2.2.0 was released on 2020/07/07. It added 24 new commits
> after version 2.2-dev12.
> 
> There were very few last-minute changes since dev12, just as I hoped,
> that's pretty fine.
> 
> We're late by about 1 month compared to the initial planning, which is
> not terrible and should be seen instead as an investment on the debugging
> cycle since almost only bug fixes were merged during that period. In the
> end you get a better version later.
> 
> While I was initially worried that this version didn't seem to contain
> any outstanding changes, looking back in the mirror tells be it's another
> awesome one instead:
> 
>  - dynamic content emission:
> - "http-request return" directive to build dynamic responses ;
> - rewrite of headers (including our own) after the response ;
> - dynamic error files (errorfiles can be used as templates to
>   deliver personalized pages)
> 
>  - further improvements to TLS runtime certificates management:
> - insertion of new certificates
> - split of key and cert
> - manipulation and creation of crt-lists
> - even directories can be handled
> 
>And by the way now TLSv1.2 is set as the default minimum version.
> 
>  - significant reduction of server-side resources by sharing idle
>connection pools between all threads ; till 2.1 if you had 64 threads,
>each of them had its own connections, so the reuse rate was lower, and
>the idle connection count was very high. This is not the case anymore.
> 
>  - health-checks were rewritten to all rely on tcp-check rules behind the
>curtains. This allowed to get rid of all the dirt we had accumulate over
>18 years and to write extensible checks. New ones are much easier to add.
>In addition we now have http-checks which support header and body
>addition, and which pass through muxes (HTTP/1 and HTTP/2).
> 
>  - ring buffer creation with ability to forward any event to any log server
>including over TCP. This means that it's now possible to log over a TCP
>syslog server, and that adding new protocols should be fairly easy.
> 
>  - further refined and improved debugging (symbols in panic dumps, malloc
>debugging, more activity counters)
> 
>  - the default security was improved. For example fork() is forbidden by
>default, which will block against any potential code execution (and
>will also block external checks by default unless explicitly unblocked).
> 
>  - new performance improvements in the scheduler and I/O layers, reducing
>the cost of I/O processing and overall latency. I've known from private
>discussions that some noticed tremendous gains there.
> 
> I'm pretty sure there are many other things but I don't remember, I'm
> looking at my notes. I'm aware that HaproxyTech will soon post an in-depth
> review on the haproxy.com blog so just have a look there for all the details.
> (edit: it's already there: 
> https://www.haproxy.com/blog/announcing-haproxy-2-2/ ).
> 
> There are three things I noted during the development of this version.
> 
> The first one is that with the myriad of new tools we're using to help
> users and improve our code quality (discourse, travis, cirrus, oss-fuzz,
> mailing-list etc), some people really found their role in the project and
> are becoming more autonomous. This definitely scales much better and helps
> me spend less time on things that are not directly connected to my code
> activities, so thank you very much for this (Lukas, Tim, Ilya, Cyril).
> 
> The second one is that this is the first version that has been tortured
> in production long before the release. And when I'm saying "tortured", I
> really mean it, because several of us were suffering as well. But

Re: SRV Record Priority Values

2020-02-28 Thread Luke Seelenbinder
Hi Baptiste,

> What this means is that backup status would use priority 0 or 1 or some kind 
> of. But we burn the remaining 65534 values from this field.

That's a concern, for sure.

> I also think we wanted to have "server groups" first in HAProxy before using 
> the priority. The idea before server groups is that a bunch of server should 
> be used all together until they fail (or enough have failed), and in such 
> case, we want to fail over to the next group, and so on (unless first group 
> recovers, of course).

This would be amazing for us! We're struggling with occasionally having all 
servers "up" in a pool (but struggling), and requests not getting moved to the 
next (backup) pool when they fail. Having groups we could use to control 
failover more closely would be really nice for us. SRV records, or not. :)

> What we can do for now, is consider "active" a priority 0 and backup, any 
> value greater than 0.

I think that's perfectly acceptable for us. I'm not sure of anyone else on the 
mailing list using SRV records, so I don't know who else we could ask about 
that.

Would I have all I need to begin a patch for this in src/dns.c or will it 
require bringing in more pieces to accomplish the task? If it's going to be 
involved, a few pointers before I dive in would be helpful. My C is rusty 
(using mostly Rust now, anyways ;-) ), and my knowledge of the HAProxy codebase 
is weak right now.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 28 Feb 2020, at 09:44, Baptiste  wrote:
> 
>> I suspect that it's more a property of the resolvers than the servers.
>> I mean, if you know that you're using your DNS servers this way, this
>> should really have the same meaning for all servers. So you shouldn't
>> have a per-server option to adjust this behavior but a per-resolvers
>> section.
> 
> That's even better! And probably more easily implemented. I'll wait for 
> Baptiste's response.
> 
> Hi There,
> 
> When we first designed support for SRV record, we thought about use cases for 
> this "priority" field.
> That said, at that time, the conclusion was some kind of "it is not possible 
> to match a 'backup' state with an integer, or it is a "waste" of information".
> What this means is that backup status would use priority 0 or 1 or some kind 
> of. But we burn the remaining 65534 values from this field.
> I also think we wanted to have "server groups" first in HAProxy before using 
> the priority. The idea before server groups is that a bunch of server should 
> be used all together until they fail (or enough have failed), and in such 
> case, we want to fail over to the next group, and so on (unless first group 
> recovers, of course). Then, priority could be used to set up the groups, 
> cause HAProxy would assign al server with same priority in the same group.
> 
> What we can do for now, is consider "active" a priority 0 and backup, any 
> value greater than 0.
> 
> Baptiste



Re: SRV Record Priority Values

2020-02-27 Thread Luke Seelenbinder
Hi Willy,

> Yes it is! They're typically used to drain old user sessions while
> progressively taking a server off. Some also use them to let an
> overloaded server cool down for a moment with no extra session. This
> is completely unrelated to backup servers in fact, which have their
> own weights and which can even be load balanced when all active servers
> are dead.

This makes sense. I'm glad I know (now) I can use 0 weights to drain servers.

> I suspect that it's more a property of the resolvers than the servers.
> I mean, if you know that you're using your DNS servers this way, this
> should really have the same meaning for all servers. So you shouldn't
> have a per-server option to adjust this behavior but a per-resolvers
> section.

That's even better! And probably more easily implemented. I'll wait for 
Baptiste's response.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 27 Feb 2020, at 16:11, Willy Tarreau  wrote:
> 
> Hi Luke,
> 
> On Thu, Feb 27, 2020 at 03:07:35PM +0100, Luke Seelenbinder wrote:
>> Hello List,
>> 
>> We use SRV records extensively (for internal service discovery, etc.).
>> 
>> When the patch was integrated to support a 0 weighted SRV records, I thought
>> that would simplify our setup, because at the time, I thought 0 weight meant
>> "backup server" without a "backup" flag on the server. Unfortunately for our
>> simplicity, that is not the case. A 0 weight means "will never be used unless
>> explicitly chosen".
>> 
>> That leads me to my questions:
>> 
>> - Is that the intended behaviors of 0 weight servers: to not function as a
>>  backup if all other servers are down?
> 
> Yes it is! They're typically used to drain old user sessions while
> progressively taking a server off. Some also use them to let an
> overloaded server cool down for a moment with no extra session. This
> is completely unrelated to backup servers in fact, which have their
> own weights and which can even be load balanced when all active servers
> are dead.
> 
>> - Would you (Willy?) accept a patch that used the Priority field of SRV
>> records to determine backup/non-backup status? Or perhaps an additional
>> server option to specify 0 weighted SRV records means "backup"?
> 
> I suspect that it's more a property of the resolvers than the servers.
> I mean, if you know that you're using your DNS servers this way, this
> should really have the same meaning for all servers. So you shouldn't
> have a per-server option to adjust this behavior but a per-resolvers
> section. I'm personally not opposed to having more flexibility, and I
> even find that it is a good idea. however I'm really not skilled at all
> in the DNS area and Baptiste is the maintainer so I'm CCing him and
> will let him decide.
> 
> Cheers,
> Willy



SRV Record Priority Values

2020-02-27 Thread Luke Seelenbinder
Hello List,

We use SRV records extensively (for internal service discovery, etc.).

When the patch was integrated to support a 0 weighted SRV records, I thought 
that would simplify our setup, because at the time, I thought 0 weight meant 
"backup server" without a "backup" flag on the server. Unfortunately for our 
simplicity, that is not the case. A 0 weight means "will never be used unless 
explicitly chosen".

That leads me to my questions:

- Is that the intended behaviors of 0 weight servers: to not function as a 
backup if all other servers are down?
- Would you (Willy?) accept a patch that used the Priority field of SRV records 
to determine backup/non-backup status? Or perhaps an additional server option 
to specify 0 weighted SRV records means "backup"?

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

Re: [ANNOUNCE] haproxy-2.1.3

2020-02-20 Thread Luke Seelenbinder
Hi Willy,

>  - the H2 mux had an incorrect buffer full detection causing the send
>phase to stop on a fragment boundary then to immediately wake up all
>waiting threads to go on, resulting in an excessive CPU usage in some
>tricky situations. It is possible that those using H2 with many streams
>per connection and moderately large objects, like Luke's maps servers,
>could observe a CPU usage drop (maybe Luke on his map servers).

We just deployed 2.1.3 across our PoP network last night, and I can indeed 
verify we're seeing better CPU usage—anywhere from 40-50% aggregate reduction!

Once we have a few more days of data, I'll send a pretty chart so you can enjoy 
the fruits of your hard work.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 12 Feb 2020, at 17:44, Willy Tarreau  wrote:
> 
> Hi,
> 
> HAProxy 2.1.3 was released on 2020/02/12. It added 86 new commits
> after version 2.1.2.
> 
> It's clear that 2.1 has been one of the calmest releases in a while, to
> the point of making us forget that it still had a few fixes pending that
> would be pleasant to have in a released version! So after accumulating
> fixes for 7 weeks, it's about time to have another one!
> 
> Here are the most relevant fixes:
> 
>  - pools: there is an ABA race condition in pool_flush() (which is called
>when stopping as well as under memory pressure) which can lead to a
>crash. It's been there since 1.9 and is very hard to trigger, but if
>you run with many threads and reload very often you may occasionally
>hit it, seeing a trace of the old process crashing in your system
>logs.
> 
>  - there was a bug in the way our various hashes were calculated, some
>of them were considering the inputs as signed chars instead of
>unsigned ones, so some non-ASCII characters would hash differently
>across different architectures and wouldn't match another component's
>calculation (e.g. a CRC32 inserted in a header would differ when given
>values with the 8th bit set, or applied to the PROXY protocol header).
>The bug has been there since 1.5-dev20 but became visible since it
>affected Postfix's validation of the PROXY protocol's CRC32. It's
>unlikely that anyone will ever witness it if it didn't happen already,
>but I tagged it "major" to make sure it is properly backported to
>distro packages, since not having it on certain nodes may sometimes
>result in hash inconsistencies which can be very hard to diagnose.
> 
>  - the addition of the Early-Data header when using 0rtt could wrongly
>be emitted during SSL handshake as well.
> 
>  - health checks could crash if using handshakes (e.g. SSL) mixed with
>DNS that takes time to retrieve an address, causing an attempt to
>use an incompletely initialized connection.
> 
>  - the peers listening socket was missing from the seamless reload,
>possibly causing some failed bindings when not using reuseport,
>resulting in the new process giving up.
> 
>  - splicing could often end up on a timeout because after the last block
>we did not switch back to HTX to complete the message.
> 
>  - fixed a small race affecting idle connections, allowing one thread to
>pick a connection at the same moment another one would decide to free
>it because there are too many idle.
> 
>  - response redirects were appended to the actual response instead of
>replacing it. This could cause various errors, including data
>corruption on the client if the entire response didn't fit into the
>buffer at once.
> 
>  - when stopping or when releasing a few connections after a listener's
>maxconn was reached, we could distribute some work to inexistent
>threads if the listener had "1/odd" or "1/even" while the process
>had less than 64 threads. An easy workaround for this is to explicitly
>reference the thread numbers instead.
> 
>  - when proxying an HTTP/1 client to an HTTP/2 server, make sure to clean
>up the "TE" header from anything but "trailers", otherwise the server
>may reject a request if it came from a browser placing "gzip" there.
> 
>  - the H2 mux had an incorrect buffer full detection causing the send
>phase to stop on a fragment boundary then to immediately wake up all
>waiting threads to go on, resulting in an excessive CPU usage in some
>tricky situations. It is possible that those using H2 with many streams
>per connection and moderately large objects, like Luke's maps servers,
>could observe a CPU usage drop (maybe Luke on his map servers).
> 
>  - it was possible to lose the master-worker status after a failed reload
>

Documentation clarification: option redispatch

2020-02-19 Thread Luke Seelenbinder
Hello list,

I'm working on improving our error rates (the elusive 0 is rather close), and, 
as a result, working on tightening up our HAProxy configuration. Based on some 
testing I'm doing, I realized there's a bit of a documentation hole around the 
exact behavior of `option redispatch`.

In the part I'm currently debugging, I have two servers. One is the main server 
and one is the backup. Does `option redispatch 1` retry on a backup server if 
the request to the main server fails or does it redispatch to the same (main) 
backend server? Ideally a dispatch could operate across normal/backup server 
pools, but based on behavior, I'm rather convinced it does not. My next step is 
to configure the backup server as a normal server, but assign a weight of 0 to 
make it act as a backup and also allow redispatches.

Is anyone able to shed some light on the specifics of this behavior?

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com



Re: [PATCH] BUG/MINOR: dns: allow srv record weight set to 0

2019-10-21 Thread Luke Seelenbinder
Thank you for this bug fix…we're more than a little excited!

When I initially found it, I was under the assumption it was on purpose. :-)

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 21 Oct 2019, at 16:35, Christopher Faulet  wrote:
> 
> Le 21/10/2019 à 16:20, Baptiste a écrit :
>> Thx to 2 people who spotted a bug in my patch, (missing parenthesis).
>> here is the updated version.
>> On Mon, Oct 21, 2019 at 3:59 PM Baptiste > <mailto:bed...@gmail.com>> wrote:
>>hi there,
>>Following up some recent discussion about SRV record's weight and server
>>weight in HAProxy, we spotted a bug in the current code: when weight in 
>> SRV
>>record is set to 0, then server weight in HAProxy was 1...
>>Thanks to Willy for proposing the solution applied into that patch.
>>Baptiste
> 
> Baptiste,
> 
> I don't know if the comment is wrong or not. But with your patch, the weight 
> is now between 0 and 256. The function server_parse_weight_change_request() 
> is ok with that. So I can amend your comment if you want. I just want to have 
> a confirmation.
> 
> -- 
> Christopher Faulet
> 



Re: [PATCH] BUG/MEDIUM: dns: Correctly use weight specified in SRV record

2019-10-17 Thread Luke Seelenbinder
Hi Baptiste,

> The only "bug" I can see here now is that a server's weight can never be 0. 
> But nobody reported this as an issue yet.

I can confirm this as a bug; I've just never thought to report it. We've worked 
around it in our own setup by having separate backup and active SRV records. 
Not needing explicit backup records and simply setting weights to 0 would be 
ideal.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 17 Oct 2019, at 07:47, Baptiste  wrote:
> 
> 
> 
> On Thu, Oct 17, 2019 at 5:35 AM Daniel Corbett  <mailto:dcorb...@haproxy.com>> wrote:
> Hello,
> 
> 
> In #48 it was reported that when using the server-template
> 
> directive combined with an SRV record that HAProxy would
> always set the weight to "1" regardless of what the SRV record
> contains.
> 
> It was found that in an attempt to force a minimum value of "1"
> actually ended up forcing "1" in all situations.  This was due to
> an improper equation: ( x / 256 ) + 1
> 
> This patch should be backported to 1.8 and 1.9
> 
> 
> 
> Thanks,
> 
> -- Daniel
> 
> 
> 
> 
> Hi Daniel,
> 
> Thanks for the patch, but I don't think it's accurate.
> What this part of the code aims to do is to "map" a DNS weight into an 
> HAProxy weight.
> There is a ratio of 256 between both: DNS being in range "0-65535" and 
> HAProxy in range "0-255".
> What your code does, is that it ignores any DNS weight above 256 and force 
> them to 1...
> 
> The only "bug" I can see here now is that a server's weight can never be 0. 
> But nobody reported this as an issue yet.
> 
> I'll check what question is asked into #48 and answer it.
> 
> As a conclusion, please don't apply this patch.
> 
> Baptiste



Re: Case Sensitive Headers

2019-07-26 Thread Luke Seelenbinder
Hi Christopher,

That's great! Thank you. It looks exactly like what we need.

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer






> On Jul 25, 2019, at 09:18, Christopher Faulet  wrote:
> 
> Le 12/07/2019 à 13:26, Luke Seelenbinder a écrit :
>> Hi Christopher,
>> That definitely is ugly—but it works. Thanks! I'll look for improvements in 
>> 2.1.
> 
> Hi Luke,
> 
> FYI, a feature has been added in the 2.1 to change the case of header names. 
> Take a look on the commit 98fbe953:
> 
>  http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=98fbe953
> 
> Now, you may decide to change the case of specific header names using global 
> directives "h1-case-adjust" or "h1-case-adjust-file". It can be enabled in 
> both directions, client or server, with 2 options, the first for the 
> frontends ("option h1-case-adjust-bogus-client") and the other one for the 
> backends ("option h1-case-adjust-bogus-server").
> 
> Best,
> -- 
> Christopher Faulet



Re: Case Sensitive Headers

2019-07-12 Thread Luke Seelenbinder
Hi Christopher,

That definitely is ugly—but it works. Thanks! I'll look for improvements in 2.1.

Best,
Luke
—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer

> On Jul 10, 2019, at 14:53, Christopher Faulet  wrote:
> 
> Le 10/07/2019 à 13:08, Luke Seelenbinder a écrit :
>> Hi Patrick,
>> 
>> That didn't work (in a few different forms I tried)—thanks for the 
>> suggestion 
>> though!
>> 
>> It seems HTX is pretty picky about how those headers get emitted. :)
>> 
>> I'm still looking for a solution to this that doesn't involve disabling HTX.
>> 
> 
> Hi Luke,
> 
> It is pretty ugly but you may hide the full "Content-Length" header in the 
> value of
> another one, for instance:
> 
>http-response set-header x-custom-cl "1\r\nContent-Length: 
> %[res.fhdr(content-length)]" if { res.fhdr(content-length) -m found }
>http-response del-header content-length
> 
> As said, it is ugly. But it does the trick for now. I will probably try to 
> work
> on a solution for the 2.1. Even more so the legacy HTTP will be removed for
> this release.
> 
> -- 
> Christopher Faulet
> 



Re: Case Sensitive Headers

2019-07-10 Thread Luke Seelenbinder
Hi Patrick,

That didn't work (in a few different forms I tried)—thanks for the suggestion 
though!

It seems HTX is pretty picky about how those headers get emitted. :)

I'm still looking for a solution to this that doesn't involve disabling HTX.

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer


> On Jun 27, 2019, at 17:11, Patrick Hemmer  wrote:
> 
> 
> 
> From: Luke Seelenbinder [mailto:l...@sermonaudio.com 
> <mailto:l...@sermonaudio.com>]
> Sent: Wednesday, June 26, 2019, 10:07 EDT
> To: HAProxy  <mailto:haproxy@formilux.org>
> Subject: Case Sensitive Headers
> 
>> Hello List,
>> 
>> I have a painful case of noncompliance to report and figure out how to fix.
>> 
>> When HTX is enabled, all headers are returned in lower case (e.g., 
>> content-length, date, etc.). This is obviously fine and within spec. 
>> Unfortunately, I'm using a rather frustrating piece of software (Wowza) that 
>> talks to an haproxy instance and Wowza requires that the content-length 
>> header is always camel case, i.e., Content-Length, otherwise requests fail.
>> 
>> I tried using http-response set-header Content-Length 
>> %[res.hdr(content-length)] if { res.hdr(content-length) -m found } to force 
>> the value to upper case, but that didn't help.
>> 
>> This is very obviously a case of badly behaving software and not a problem 
>> with HAProxy, but I'm wondering if there's any other way to force that 
>> header to Content-Length without turning HTX off.
>> 
>> Thanks for any ideas!
>> 
>> Best,
>> Luke
>> 
>> —
>> Luke Seelenbinder
>> SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer
>> 
>> 
> This is just a stab in the dark, but try deleting the header, then adding it 
> back. For example
> 
> http-response set-var(res.conlen) res.hdr(content-length)
> http-response del-header content-length
> http-response set-header Content-Length %[var(res.conlen)] if { 
> var(res.conlen) -m found }
> 
> -Patrick



Re: Config Segmentation Fault [2.0.1]

2019-06-28 Thread Luke Seelenbinder
Hi Olivier,

That makes sense. I figured it was one of my various odd settings not being 
tested with the other (this config is rather…complex), and I hoped your eyes 
would be better than mine. Glad they were!

Thanks for getting this fixed up. I'll pull the latest git when I have the 
chance and confirm it fixes it.

Best,
Luke
—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On Jun 28, 2019, at 14:14, Olivier Houchard  wrote:
> 
> Hi Luke,
> 
> On Fri, Jun 28, 2019 at 07:05:32AM +0200, Luke Seelenbinder wrote:
>> Hello all,
>> 
>> I've found a segfault in v2.0.1. I believe the issue is a no-ssl directive 
>> on a server line after seeing check ssl on default-server in defaults. 
>> Here's the snips of my config. I haven't been able to create a minimal 
>> config that recreates it, since my config is rather complex.
>> 
>> defaults
>>  log  global
>>  mode http
>>  default-server ca-file ca-certificates.crt resolvers default inter 5s 
>> fastinter 2s downinter 10s init-addr libc,last check ssl check-alpn http/1.1 
>> pool-purge-delay 60s max-reuse 1500 alpn http/1.1
>> […snip…]
>> backend varnish
>>  server varnish_local   unix@/path-to-socket.sock no-check-ssl no-ssl
>> 
>> If I remove no-ssl, it starts up, but the check naturally fails. If I add it 
>> back, I get a segmentation fault. I've tried this with and without unix 
>> sockets to verify it wasn't something related to IP binding.
>> 
>> I'm happy to try alternatives / test things a bit.
>> 
>> Best,
> 
> Indeed, "check-alpn" failed to make sure we were really using a SSL connection
> before attempting to change the ALPN. This should be fixed by commit
> c50eb73b85f80ac1ac6e519fcab2ba6807f5de65, and should be backported to 2.0
> soon.
> 
> Thanks a lot !
> 
> Olivier



Config Segmentation Fault [2.0.1]

2019-06-27 Thread Luke Seelenbinder
Hello all,

I've found a segfault in v2.0.1. I believe the issue is a no-ssl directive on a 
server line after seeing check ssl on default-server in defaults. Here's the 
snips of my config. I haven't been able to create a minimal config that 
recreates it, since my config is rather complex.

defaults
  log  global
  mode http
  default-server ca-file ca-certificates.crt resolvers default inter 5s 
fastinter 2s downinter 10s init-addr libc,last check ssl check-alpn http/1.1 
pool-purge-delay 60s max-reuse 1500 alpn http/1.1
[…snip…]
backend varnish
  server varnish_local   unix@/path-to-socket.sock no-check-ssl no-ssl

If I remove no-ssl, it starts up, but the check naturally fails. If I add it 
back, I get a segmentation fault. I've tried this with and without unix sockets 
to verify it wasn't something related to IP binding.

I'm happy to try alternatives / test things a bit.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com



Re: Case Sensitive Headers

2019-06-27 Thread Luke Seelenbinder
Hi Lukas,

Wowza actually talks to a service behind HAProxy, so it should be 
http-response, because Wowza expects to see "Content-Length" vs 
"content-length" header from the response served through HAProxy.

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer

> On Jun 27, 2019, at 15:41, Lukas Tribus  wrote:
> 
> Hello Luke
> 
> On Wed, 26 Jun 2019 at 16:08, Luke Seelenbinder  wrote:
>> I tried using http-response set-header Content-Length 
>> %[res.hdr(content-length)] if { res.hdr(content-length) -m found } to force 
>> the value to upper case, but that didn't help.
> 
> This should be http-request though, not http-response, as only the
> request headers are impacted, right?
> 
> Can you confirm this doesn't work with http-request?
> 
> 
> Thanks,
> Lukas
> 



Re: Case Sensitive Headers

2019-06-26 Thread Luke Seelenbinder
Hello Lukas,

Thanks for getting back to me.

I'm game to try anything if you all have ideas with Lua or otherwise. We're 
looking at a server that's used less than 1% of our requests, but we need for 
legacy device support. :(

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer


> On Jun 26, 2019, at 16:17, Lukas Tribus  wrote:
> 
> Hello,
> 
> 
> On Wed, 26 Jun 2019 at 16:08, Luke Seelenbinder  wrote:
>> 
>> Hello List,
>> 
>> I have a painful case of noncompliance to report and figure out how to fix.
>> 
>> When HTX is enabled, all headers are returned in lower case (e.g., 
>> content-length, date, etc.). This is obviously fine and within spec. 
>> Unfortunately, I'm using a rather frustrating piece of software (Wowza) that 
>> talks to an haproxy instance and Wowza requires that the content-length 
>> header is always camel case, i.e., Content-Length, otherwise requests fail.
>> 
>> I tried using http-response set-header Content-Length 
>> %[res.hdr(content-length)] if { res.hdr(content-length) -m found } to force 
>> the value to upper case, but that didn't help.
>> 
>> This is very obviously a case of badly behaving software and not a problem 
>> with HAProxy, but I'm wondering if there's any other way to force that 
>> header to Content-Length without turning HTX off.
>> 
>> Thanks for any ideas!
> 
> Yeah I think we are gonna see more of such interoperability issues as
> people migrate to 2.0 with htx enabled:
> 
> https://discourse.haproxy.org/t/haproxy-2-0-0-header/3930/4
> 
> This is only the second time this has come up though.
> 
> 
> Maybe we can workaround this with some LUA magic?
> 
> 
> cheers,
> lukas
> 



Case Sensitive Headers

2019-06-26 Thread Luke Seelenbinder
Hello List,

I have a painful case of noncompliance to report and figure out how to fix.

When HTX is enabled, all headers are returned in lower case (e.g., 
content-length, date, etc.). This is obviously fine and within spec. 
Unfortunately, I'm using a rather frustrating piece of software (Wowza) that 
talks to an haproxy instance and Wowza requires that the content-length header 
is always camel case, i.e., Content-Length, otherwise requests fail.

I tried using http-response set-header Content-Length 
%[res.hdr(content-length)] if { res.hdr(content-length) -m found } to force the 
value to upper case, but that didn't help.

This is very obviously a case of badly behaving software and not a problem with 
HAProxy, but I'm wondering if there's any other way to force that header to 
Content-Length without turning HTX off.

Thanks for any ideas!

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer








Re: http-request do-resolve for rDNS queries

2019-06-25 Thread Luke Seelenbinder
Hi Baptiste,

Thanks for the information.

Do you might pointing me towards the documentation for a slow agent? I'm having 
difficulty finding what you're referring to in the docs. Thanks!

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer






> On Jun 23, 2019, at 14:26, Baptiste  wrote:
> 
> Hi Luke,
> 
> It is not yet doable with do-resolve.
> That said you can easily write an slow agent to do this.
> I can help if you need to.
> 
> Baptiste
> 
> Le ven. 21 juin 2019 à 15:25, Luke Seelenbinder  <mailto:l...@sermonaudio.com>> a écrit :
> Hello all,
> 
> Is it possible to use the new `http-request do-resolve` to do reverse DNS 
> lookups? It's left unspecified in the documentation, and I think it'd be 
> helpful to clarify for posterity.
> 
> I'd like to integrate this as part of a IP blocking methodology, but that 
> would depend on rDNS being supported.
> 
> Thanks!
> 
> Luke
> 
> —
> Luke Seelenbinder
> SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer
> 
> 
> 
> 
> 



http-request do-resolve for rDNS queries

2019-06-21 Thread Luke Seelenbinder
Hello all,

Is it possible to use the new `http-request do-resolve` to do reverse DNS 
lookups? It's left unspecified in the documentation, and I think it'd be 
helpful to clarify for posterity.

I'd like to integrate this as part of a IP blocking methodology, but that would 
depend on rDNS being supported.

Thanks!

Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer







Re: Odd HTX Behavior [2.0.0]

2019-06-19 Thread Luke Seelenbinder
> Ah you really want to find another bug! :-)

But of course! :-)

Actually, I think I did find one while narrowing down the cause for the 
previous bug. I can trigger a segfault with this config:

default-server ca-file ca-bundle.crt resolvers default inter 5s fastinter 2s 
downinter 10s init-addr libc,last check ssl verifyhost sermonaudio.com 
pool-purge-delay 20s max-reuse 900 alpn h2,http/1.1 check-alpn http/1.1
backend problematic_precursor
  option httpchk GET /healthz
  mode tcp

  # Static for now
  server master-00 :6443 no-ssl check-ssl verify none
  server master-01 :6443 no-ssl check-ssl verify none
  server master-02 :6443 no-ssl check-ssl verify none

backend problematic
  option httpchk GET /healthz
  mode tcp

  # Static for now
  server master-2-00 :6443 no-ssl check-ssl verify none
  server master-2-01 :6443 no-ssl check-ssl verify none
  server master-2-02 :6443 no-ssl check-ssl verify none


The log:

Jun 19 08:47:19 ha-balancer-01 haproxy[10320]: Proxy problematic started.
Jun 19 08:47:19 ha-balancer-01 haproxy: [ALERT] 169/084719 (10320) : Current 
worker #1 (10322) exited with code 139 (Segmentation fault)

I suspect this should probably be a config error, since alpn or check-alpn is 
being used on TCP via the defaults and causing a segfault, but I'm not sure.

This is definitely not priority, as removing it from defaults and putting it 
directly on the servers that need it works.

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer



> On Jun 19, 2019, at 10:03, Willy Tarreau  wrote:
> 
> On Wed, Jun 19, 2019 at 09:57:43AM +0200, Luke Seelenbinder wrote:
>> Hi Willy,
>> 
>> Just updated, and I can confirm, those fixes worked.
> 
> Great, thanks for the quick feedback.
> 
>> I had seen the commits in git, but didn't think they'd apply--thanks for 
>> pointing me back to them!
>> 
>> I'm looking forward to all the other goodies we can deploy in 2.0.0! :-)
> 
> Ah you really want to find another bug! :-)
> 
> Willy
> 



Re: Odd HTX Behavior [2.0.0]

2019-06-19 Thread Luke Seelenbinder
Hi Willy,

Just updated, and I can confirm, those fixes worked.

I had seen the commits in git, but didn't think they'd apply—thanks for 
pointing me back to them!

I'm looking forward to all the other goodies we can deploy in 2.0.0! :-)

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer

> On Jun 19, 2019, at 09:50, Willy Tarreau  wrote:
> 
> Hi Luke,
> 
> On Wed, Jun 19, 2019 at 09:03:04AM +0200, Luke Seelenbinder wrote:
>> Hi Willy, List,
>> 
>> I seem to have great luck with HTX and finding obscure bugs. I upgraded to
>> 2.0.0 on our main deployment recently (reasonably heavy load: 5k simultaneous
>> conns, upwards of 1Gbps at times), and whenever HTX is enabled, I see very
>> weird behavior:
>> 
>> - First load typically works for primary domain, www. (html + assets, maybe
>> 150 requests max; we're working on reducing this number ;) )
>> - First load never works for secondary domain, media.
>> - All failed requests on secondary domain (same haproxy instance, different 
>> backend) timeout with a server data timeout (SD--)
>> - A second (or third or fourth) reload of the primary page typically fails or
>> hangs for a long time. If I wait ~2 minutes (roughly our timeout), it'll act
>> like a first load.
> 
> Christopher found that a bug used to affect H2 with the way cookies are
> reassembled, from the early days of HTX. It's just that the recent updates
> to HTX have emphasized the bug and made it actually affect traffic while
> it didn't necessarily in the past. This explains the fact that it depends
> on the request number.
> 
> Please update to latest git, as Christopher has already backported the
> fixes there.
> 
> Thanks,
> Willy



Odd HTX Behavior [2.0.0]

2019-06-19 Thread Luke Seelenbinder
Hi Willy, List,

I seem to have great luck with HTX and finding obscure bugs. I upgraded to 
2.0.0 on our main deployment recently (reasonably heavy load: 5k simultaneous 
conns, upwards of 1Gbps at times), and whenever HTX is enabled, I see very 
weird behavior:

- First load typically works for primary domain, www. (html + assets, maybe 150 
requests max; we're working on reducing this number ;) )
- First load never works for secondary domain, media.
- All failed requests on secondary domain (same haproxy instance, different 
backend) timeout with a server data timeout (SD--)
- A second (or third or fourth) reload of the primary page typically fails or 
hangs for a long time. If I wait ~2 minutes (roughly our timeout), it'll act 
like a first load.

Important details:
www.   -> h2/h1.1 FE -> BE nginx h2+ssl (local network, three backend servers, 
using private IP)
media. -> h2/h1.1 FE -> BE IIS h1.1+ssl (separate network, one backend server, 
using public IP)

If I disable HTX, it works. Our config is not complex, but I've included 
relevant bits below:

default-server ca-file ca-bundle.crt resolvers default inter 5s fastinter 2s 
downinter 10s init-addr libc,last check ssl verifyhost sermonaudio.com 
pool-purge-delay 20s max-reuse 900
[snip]
frontend sermonaudio
  bind 0.0.0.0:443,[::]:443 alpn h2,http/1.1 ssl crt ssl.file
  bind 0.0.0.0:80,[::]:80

[snip]

backend media
  server media :443 verify none

[snip]

backend www
  option httpchk GET /nginx-health

  # Static for now
  server www-00 :30993 alpn h2,http/1.1 check-alpn http/1.1
  server www-01 :30993 alpn h2,http/1.1 check-alpn http/1.1
  server www-02 :30993 alpn h2,http/1.1 check-alpn http/1.1

Let me know if you want me to try a patch or latest git or something else! Also 
happy to provide additional information if it's helpful.

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer








Re: [ANNOUNCE] haproxy-2.0-dev6 + Conf reminder

2019-06-07 Thread Luke Seelenbinder
I just realized we have both referred to two different variants of the email, 
one with an s and one without:

submissi...@haproxy.com and submiss...@haproxy.com

I sent mine to submiss...@haproxy.com (the first one you linked).

Maybe you want to clarify for everyone which is the canonical before all the 
submissions end up accidentally hitting a junk drawer somewhere? :-)

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On Jun 7, 2019, at 10:14, Willy Tarreau  wrote:
> 
> On Fri, Jun 07, 2019 at 10:03:59AM +0200, Luke Seelenbinder wrote:
>>> On another point, I was told that very few proposals for talks at the
>>> HAProxyConf were sent to date. That doesn't surprize me and I haven't
>>> even sent mine yet either :-)  Please keep in mind that the deadline for
>>> proposals submissions is in two weeks and that time flies. So instead of
>>> wasting your time reading my lengthy boring e-mails, please take a few
>>> minutes to scratch an idea of something that could be of interest to
>>> others, such as "how we deal with logs at XXX", "how we keep our configs
>>> in sync", "how we built our own CDN", "12 things to take care of for
>>> scalability", "how we perform A/B testing", etc. Also if you contribute
>>> to other OSS projects and see certain things done well there that you
>>> think haproxy could benefit from, that could bring interesting discussions
>>> as well! Anyway I'll ping again and more aggressively next week!
>> 
>> 
>> Well, I guess now I *have* to submit that talk idea about how we built a
>> global PoP network, saved 80-90% auth overhead per request, and saved money
>> at the same time... :-)
> 
> Oh yes definitely!
> 
>> If we just use the form on the website, does that suffice or should we also
>> submit to submissions@?
> 
> I don't know :-)  I simply sent an email to submissi...@haproxy.com 
> <mailto:submissi...@haproxy.com> with
> my talk proposal (how to most efficiently contribute), I guess it will work
> fine.
> 
> Thanks,
> Willy



Re: [ANNOUNCE] haproxy-2.0-dev6 + Conf reminder

2019-06-07 Thread Luke Seelenbinder
> On another point, I was told that very few proposals for talks at the
> HAProxyConf were sent to date. That doesn't surprize me and I haven't
> even sent mine yet either :-)  Please keep in mind that the deadline for
> proposals submissions is in two weeks and that time flies. So instead of
> wasting your time reading my lengthy boring e-mails, please take a few
> minutes to scratch an idea of something that could be of interest to
> others, such as "how we deal with logs at XXX", "how we keep our configs
> in sync", "how we built our own CDN", "12 things to take care of for
> scalability", "how we perform A/B testing", etc. Also if you contribute
> to other OSS projects and see certain things done well there that you
> think haproxy could benefit from, that could bring interesting discussions
> as well! Anyway I'll ping again and more aggressively next week!


Well, I guess now I *have* to submit that talk idea about how we built a global 
PoP network, saved 80-90% auth overhead per request, and saved money at the 
same time… :-)

If we just use the form on the website, does that suffice or should we also 
submit to submissions@?

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

Re: H2 Protocol Errors in HTX Mode (1.9.4 & 1.9.4-dev)

2019-03-23 Thread Luke Seelenbinder
Hi Willy,

I just upgraded to 1.9.5, and this bug is still present (but seems to be 
somewhat diminished). On 1.9.4, approximately 5 of these images failed to load, 
on 1.9.5, it's usually 1 or 2. So overall it seems there is improvement, but 
something is still a bit wonky. :)

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer


> On Mar 1, 2019, at 05:35, Willy Tarreau  wrote:
> 
> On Fri, Feb 22, 2019 at 01:35:19PM +0100, Luke Seelenbinder wrote:
>> Hi List,
>> 
>> We recently started using HAProxy to act as a first point of entry for most
>> of our traffic. We initially set it up with H2 + HTX frontend and H1.1
>> backend; however, this led to some strange behavior consistently reproducible
>> on one page.
>> 
>> Whenever we loaded this page
>> (https://www.sermonaudio.com/search.asp?currsection=new=MP4) a we
>> would get a series of H2 protocol errors (SPDY_PROTOCOL_ERROR, as reported by
>> Chrome) for images stored on media-cloud.sermonaudio.com (running on the same
>> server / ip as www.semronaudio.com). Disabling HTX fixed the problem. Our
>> configuration is quite simple, just a few ACLs for redirection and forcing
>> SSL. I initially thought the issue could be related to the number of objects
>> being loaded (just over 100), so I tuned h2.max-concurrent-streams, but that
>> had no effect. This is observable in both 1.9.4 and the latest (as of two
>> days ago) 1.9 git branch.
> 
> That's a bit weird. I don't see what could cause a decoding error in HTX
> that doesn't happen without. I'll check if I can spot different code paths
> between HTX and legacy that could cause such errors to be emitted. One
> thing that could happen would be if in one of the htx decoding functions
> we end up reading a wrong amount of data sometimes, causing the stream to
> be desynchronized, but I really don't see where that would happen.
> 
>> I did not observe the issue on any other page but my search was not
>> exhaustive. It also never occurs on an individual request when made with
>> curl, for example.
> 
> OK that's useful info, it avois the usual "strange, works for me" after the
> first succcessful test :-)
> 
>> I would be able to stand up a server that has the same configuration with HTX
>> enabled for testing, if that would be helpful, you would have to provide the
>> /etc/hosts entries, though. :)
> 
> Might be. Let me take a look at the code first. Maybe I'll ask you to
> retry with a different version or to try a patch.
> 
> Thanks,
> Willy



Re: Status Codes in H2 Mode

2019-03-19 Thread Luke Seelenbinder
Hi Willy,

> Yes definitely. There's no emergency but any extra info you can provide
> will help us of course.

I tested with HTTP/1.1 on the client side and HAProxy was flawless, so this is 
very, very likely limited to h2.

> That could indeed. No need to have something very advanced, if you figure
> that by having a dummy server delivering certain sizes after a certain
> delay and issuing a few curl requests then Ctrl-C is enough to trigger
> the issue often enough, it will help us start to inspect the code live
> when the problem is expected to happen. But I know how painful this can
> be to do so really, I'm not going to ask you to spend too much time on
> this.

I've responded to you directly with a replication script. I'm happy to continue 
helping however I can.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

On Tue, Mar 19, 2019, at 14:19, Willy Tarreau wrote:
> On Tue, Mar 19, 2019 at 08:59:38AM -0400, Luke Seelenbinder wrote:
> > Makes sense, Willy. Thanks for continuing to investigate this.
> > 
> > > I'm assuming that this is always reproducible with H2 on the front and
> > > H1 on the back.
> > 
> > I have not tried it with H1 -> H1, but I assume that case works correctly.
> > Would it be helpful if I proved it one way or the other?
> 
> Yes definitely. There's no emergency but any extra info you can provide
> will help us of course.
> 
> > > I'll see if we can find a reliable reproducer for such
> > > situations, that will help us nail down this issues.
> > 
> > Would it be helpful if I try to work up a test case? (Bash script with curl
> > or python or something?) I imagine the request cancellation part would be 
> > the
> > hard part.
> 
> That could indeed. No need to have something very advanced, if you figure
> that by having a dummy server delivering certain sizes after a certain
> delay and issuing a few curl requests then Ctrl-C is enough to trigger
> the issue often enough, it will help us start to inspect the code live
> when the problem is expected to happen. But I know how painful this can
> be to do so really, I'm not going to ask you to spend too much time on
> this.
> 
> Many thanks,
> Willy
> 
>



Re: Status Codes in H2 Mode

2019-03-19 Thread Luke Seelenbinder
Makes sense, Willy. Thanks for continuing to investigate this.

> I'm assuming that this is always reproducible with H2 on the front and
> H1 on the back.

I have not tried it with H1 -> H1, but I assume that case works correctly. 
Would it be helpful if I proved it one way or the other?

> I'll see if we can find a reliable reproducer for such
> situations, that will help us nail down this issues.

Would it be helpful if I try to work up a test case? (Bash script with curl or 
python or something?) I imagine the request cancellation part would be the hard 
part.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

On Tue, Mar 19, 2019, at 04:19, Willy Tarreau wrote:
> Hi Luke,
> 
> On Mon, Mar 18, 2019 at 11:14:12AM -0400, Luke Seelenbinder wrote:
> (...)
> > If I disable HTX, everything flows per normal and the status codes are even
> > correctly -1.
> > 
> > I've replicated this on 1.9.4, 1.9.x master, and 2.0-dev master branches. 
> > The
> > global "this will work" and "this will not work" switch is HTX mode. Anytime
> > it's enabled, I see bad behavior. Anytime it's disabled, I see flawless
> > behavior.
> > 
> > Any thoughts? I've tried this with and without http-reuse, abortonclose,
> > various settings for pool-purge-delay.
> 
> That's useful information. Christopher has been working on fixing some
> issues related to abortonclose and ended up having to touch a large
> number of places. We figured that we need to make deeper changes to
> make this thing more reliable. I still need to check with him which of
> his patches could be merged now (some are not suitable unfortunately).
> 
> I'm assuming that this is always reproducible with H2 on the front and
> H1 on the back. I'll see if we can find a reliable reproducer for such
> situations, that will help us nail down this issues.
> 
> Thanks,
> Willy
> 
>



Re: Status Codes in H2 Mode

2019-03-18 Thread Luke Seelenbinder
Hi Willy,

Unfortunately, I spoke too soon in my last email. After hitting send, I went 
down the rabbit hole again and uncovered some behaviors I thought we'd rooted 
out. Namely, any time I use HTX mode with an H2 fe -> H1 or H2 backend and have 
frequent request cancellation as discussed previously, I'm seeing hung 
requests. It's not every request nor is it every cycle of requests, but I'd say 
at least 10% of requests end up hanging indefinitely until they eventually 
timeout according to HAProxy. (So perhaps this is an indicator itself of what 
might be wrong?) HAProxy reports retries / redispatches and maxes out the 
timeouts then the request dies. Here's two example log lines, the second one I 
killed the request myself:

[18/Mar/2019:15:02:49.723] stadiamaps~ tile/tile1 0/37204/-1/-1/49606 503 0 - - 
sC-- 2/1/2/2/3 0/0 {} "GET /tiles/osm_bright/10/565/3...@2x.png HTTP/2.0"
[18/Mar/2019:15:03:39.507] stadiamaps~ tile/tile1 0/24804/-1/-1/29123 503 0 - - 
CC-- 2/1/0/0/2 0/0 {} "GET /tiles/osm_bright/10/565/3...@2x.png HTTP/2.0"

If I disable HTX, everything flows per normal and the status codes are even 
correctly -1.

I've replicated this on 1.9.4, 1.9.x master, and 2.0-dev master branches. The 
global "this will work" and "this will not work" switch is HTX mode. Anytime 
it's enabled, I see bad behavior. Anytime it's disabled, I see flawless 
behavior.

Any thoughts? I've tried this with and without http-reuse, abortonclose, 
various settings for pool-purge-delay.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

On Mon, Mar 18, 2019, at 13:46, Luke Seelenbinder wrote:
> Hi Willy,
> 
> I finally had the opportunity to try out `option abortonclose`.
> 
> Initially, it made the problem much worse. Instead of occasionally 
> incorrect status codes in the logs, I saw requests fail in the 
> following manner:
> 
> [18/Mar/2019:12:30:08.040] stadiamaps~ tile/tile1 0/18603/-1/-1/24804 
> 503 0 - - sC-- 2/1/1/1/3 0/0 {} "GET /tiles/osm_bright/6/31/20.png 
> HTTP/2.0"
> [18/Mar/2019:12:30:08.041] stadiamaps~ tile/tile1 0/18602/-1/-1/24803 
> 503 0 - - sC-- 2/1/0/0/3 0/0 {} "GET /tiles/osm_bright/6/34/20.png 
> HTTP/2.0"
> 
> What's further interesting, it is was consistently 2 out of 18 
> requests. That led me down the road of checking queue timeouts 
> (noticing the timing correlation in the logs). I adjusted `timeout 
> connect` up from 6200ms to 12400ms and added pool-purge-delay to 60s.
> 
> After adjusting those timeouts and pool purges and re-enabling 
> `abortonclose`, the request errors I was seeing magically went away. 
> I'll push this config to production and see if we see a reduction in 
> 503s. I also suspect we'll see a marginal improvement in throughput and 
> response time due to keeping backend connections open longer.
> 
> I'll also keep an eye our for inconsistencies between our backend 
> accept capability and timeouts and see if perhaps we're overrunning 
> some buffer somewhere in HAProxy, NGINX, or somewhere else.
> 
> Thanks for your help so far!
> 
> Best,
> Luke
> 
> —
> Luke Seelenbinder
> Stadia Maps | Founder
> stadiamaps.com
> 
> On Mon, Mar 4, 2019, at 14:08, Willy Tarreau wrote:
> > On Mon, Mar 04, 2019 at 11:45:53AM +, Luke Seelenbinder wrote:
> > > Hi Willy,
> > >
> > > > Do you have "option abortonclose" in your config ?
> > >
> > > We do not have abortonclose. Do you recommend this if we have a lot of
> > > client-side request aborts (but not connection level closes)? From reading
> > > the docs, I came away conflicted as to the implications. :-)
> > 
> > It will help, especially if you have maxconn configured on your server
> > lines, as it will allow the requests to be aborted while still in queue.
> > 
> > That said, we still don't know exactly what causes your logs.
> > 
> > Willy
> >
> 
>



Re: Status Codes in H2 Mode

2019-03-18 Thread Luke Seelenbinder
Hi Willy,

I finally had the opportunity to try out `option abortonclose`.

Initially, it made the problem much worse. Instead of occasionally incorrect 
status codes in the logs, I saw requests fail in the following manner:

[18/Mar/2019:12:30:08.040] stadiamaps~ tile/tile1 0/18603/-1/-1/24804 503 0 - - 
sC-- 2/1/1/1/3 0/0 {} "GET /tiles/osm_bright/6/31/20.png HTTP/2.0"
[18/Mar/2019:12:30:08.041] stadiamaps~ tile/tile1 0/18602/-1/-1/24803 503 0 - - 
sC-- 2/1/0/0/3 0/0 {} "GET /tiles/osm_bright/6/34/20.png HTTP/2.0"

What's further interesting, it is was consistently 2 out of 18 requests. That 
led me down the road of checking queue timeouts (noticing the timing 
correlation in the logs). I adjusted `timeout connect` up from 6200ms to 
12400ms and added pool-purge-delay to 60s.

After adjusting those timeouts and pool purges and re-enabling `abortonclose`, 
the request errors I was seeing magically went away. I'll push this config to 
production and see if we see a reduction in 503s. I also suspect we'll see a 
marginal improvement in throughput and response time due to keeping backend 
connections open longer.

I'll also keep an eye our for inconsistencies between our backend accept 
capability and timeouts and see if perhaps we're overrunning some buffer 
somewhere in HAProxy, NGINX, or somewhere else.

Thanks for your help so far!

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

On Mon, Mar 4, 2019, at 14:08, Willy Tarreau wrote:
> On Mon, Mar 04, 2019 at 11:45:53AM +, Luke Seelenbinder wrote:
> > Hi Willy,
> >
> > > Do you have "option abortonclose" in your config ?
> >
> > We do not have abortonclose. Do you recommend this if we have a lot of
> > client-side request aborts (but not connection level closes)? From reading
> > the docs, I came away conflicted as to the implications. :-)
> 
> It will help, especially if you have maxconn configured on your server
> lines, as it will allow the requests to be aborted while still in queue.
> 
> That said, we still don't know exactly what causes your logs.
> 
> Willy
>



Re: Status Codes in H2 Mode

2019-03-04 Thread Luke Seelenbinder
Hi Willy,

> Do you have "option abortonclose" in your config ?

We do not have abortonclose. Do you recommend this if we have a lot of 
client-side request aborts (but not connection level closes)? From reading the 
docs, I came away conflicted as to the implications. :-)

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Friday, March 1, 2019 5:28 AM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Fri, Feb 22, 2019 at 10:03:12AM +0000, Luke Seelenbinder wrote:
> 

> > Hi List, Willy,
> > After transitioning to 1.9.4, I can say things are much more stable when
> > using h2 on the frontend. Thanks for all the bug fixes and patches since
> > 1.9.0! I'll be upgrading to 1.9.5 when it comes out, so I'm looking forward
> > to that.
> 

> Great!
> 

> > I have one question: we track error rates across our fleet via response 
> > codes
> > (5xx being considered errors) and we're noticing something a bit interesting
> > in h2 mode. Given the following log lines:
> > fra-magellan-tosaq haproxy[8845]: 178.23.xxx.xxx:49243 
> > [22/Feb/2019:06:58:29.986] stadiamaps~ tile/tile2 0/0/-1/-1/9 503 0 - - 
> > CC-- 66/66/3/1/0 0/0 {...} "GET /tiles/osm_bright/9/263/1...@2x.png 
> > HTTP/2.0"
> > fra-magellan-tosaq haproxy[8845]: 178.23.xxx.xxx:49243 
> > [22/Feb/2019:06:58:30.001] stadiamaps~ tile/tile3 0/0/-1/-1/10 503 0 - - 
> > CC-- 66/66/2/1/0 0/0 {...} "GET /tiles/osm_bright/9/264/1...@2x.png 
> > HTTP/2.0"
> > fra-magellan-tosaq haproxy[8845]: 178.23.xxx.xxx:49243 
> > [22/Feb/2019:06:58:30.019] stadiamaps~ tile/tile2 0/0/-1/-1/7 503 0 - - 
> > CC-- 66/66/3/1/0 0/0 {...} "GET /tiles/osm_bright/9/265/1...@2x.png 
> > HTTP/2.0"
> > fra-magellan-tosaq haproxy[8845]: 195.143.xxx.xxx:36064 
> > [22/Feb/2019:07:14:52.759] stadiamaps~ tile/tile3 0/0/-1/-1/12 503 0 - - 
> > CC-- 78/78/1/1/0 0/0 {...} "GET /data/openmaptiles/3/1/2.pbf HTTP/2.0"
> > fra-magellan-tosaq haproxy[8845]: 195.143.xxx.xxx:36064 
> > [22/Feb/2019:07:14:52.961] stadiamaps~ tile/tile3 0/0/-1/-1/853 503 0 - - 
> > CC-- 77/77/0/0/0 0/0 {...} "GET /data/openmaptiles/3/2/2.pbf HTTP/2.0"
> > Why is the response code recorded as 503? If I'm interpreting the logs
> > correctly, the client connected and disconnected before the request could
> > even be passed to the backend, so shouldn't that be a -1 response code?
> 

> It really depends how/where/when the error was triggered. When forwarding
> a connection, aborting the output can have the effect of an error being
> triggered on the connection, which is immediately reported and detected
> as such. That doesn't mean we shouldn't improve this, but this also means
> trying to figure the exact sequence of events and trying to hack them to
> consider alternate error reports.
> 

> > mainly want to know if we can safely ignore these errors or if perhaps it's 
> > a
> > bug / undocumented behavior in h2 mode.
> 

> Do you have "option abortonclose" in your config ? If you have it, it can
> indeed be a client abort that was sent and caused the outgoing request to
> be aborted. If you don't have it, it could be a real connection error
> that was misreported as a client abort because the client side technically
> is already closed once the request is received from H2. But I'm not seeing
> a non-null retry count in your logs so I have a doubt about this.
> 

> > For reference we're using H2 fe, mode htx, be H1.1 in our config currently.
> 

> OK that's useful indeed, thanks.
> 

> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


H2 Protocol Errors in HTX Mode (1.9.4 & 1.9.4-dev)

2019-02-22 Thread Luke Seelenbinder
Hi List,

We recently started using HAProxy to act as a first point of entry for most of 
our traffic. We initially set it up with H2 + HTX frontend and H1.1 backend; 
however, this led to some strange behavior consistently reproducible on one 
page.

Whenever we loaded this page 
(https://www.sermonaudio.com/search.asp?currsection=new=MP4) a we 
would get a series of H2 protocol errors (SPDY_PROTOCOL_ERROR, as reported by 
Chrome) for images stored on media-cloud.sermonaudio.com (running on the same 
server / ip as www.semronaudio.com). Disabling HTX fixed the problem. Our 
configuration is quite simple, just a few ACLs for redirection and forcing SSL. 
I initially thought the issue could be related to the number of objects being 
loaded (just over 100), so I tuned h2.max-concurrent-streams, but that had no 
effect. This is observable in both 1.9.4 and the latest (as of two days ago) 
1.9 git branch.

I did not observe the issue on any other page but my search was not exhaustive. 
It also never occurs on an individual request when made with curl, for example.

I would be able to stand up a server that has the same configuration with HTX 
enabled for testing, if that would be helpful, you would have to provide the 
/etc/hosts entries, though. :)

Thanks for any help!

Best,
Luke

—
Luke Seelenbinder
SermonAudio.com <http://sermonaudio.com/> | Senior Software Engineer








Status Codes in H2 Mode

2019-02-22 Thread Luke Seelenbinder
Hi List, Willy,

After transitioning to 1.9.4, I can say things are much more stable when using 
h2 on the frontend. Thanks for all the bug fixes and patches since 1.9.0! I'll 
be upgrading to 1.9.5 when it comes out, so I'm looking forward to that.

I have one question: we track error rates across our fleet via response codes 
(5xx being considered errors) and we're noticing something a bit interesting in 
h2 mode. Given the following log lines:

fra-magellan-tosaq haproxy[8845]: 178.23.xxx.xxx:49243 
[22/Feb/2019:06:58:29.986] stadiamaps~ tile/tile2 0/0/-1/-1/9 503 0 - - CC-- 
66/66/3/1/0 0/0 {…} "GET /tiles/osm_bright/9/263/1...@2x.png HTTP/2.0"
fra-magellan-tosaq haproxy[8845]: 178.23.xxx.xxx:49243 
[22/Feb/2019:06:58:30.001] stadiamaps~ tile/tile3 0/0/-1/-1/10 503 0 - - CC-- 
66/66/2/1/0 0/0 {…} "GET /tiles/osm_bright/9/264/1...@2x.png HTTP/2.0"
fra-magellan-tosaq haproxy[8845]: 178.23.xxx.xxx:49243 
[22/Feb/2019:06:58:30.019] stadiamaps~ tile/tile2 0/0/-1/-1/7 503 0 - - CC-- 
66/66/3/1/0 0/0 {…} "GET /tiles/osm_bright/9/265/1...@2x.png HTTP/2.0"
fra-magellan-tosaq haproxy[8845]: 195.143.xxx.xxx:36064 
[22/Feb/2019:07:14:52.759] stadiamaps~ tile/tile3 0/0/-1/-1/12 503 0 - - CC-- 
78/78/1/1/0 0/0 {…} "GET /data/openmaptiles/3/1/2.pbf HTTP/2.0"
fra-magellan-tosaq haproxy[8845]: 195.143.xxx.xxx:36064 
[22/Feb/2019:07:14:52.961] stadiamaps~ tile/tile3 0/0/-1/-1/853 503 0 - - CC-- 
77/77/0/0/0 0/0 {…} "GET /data/openmaptiles/3/2/2.pbf HTTP/2.0"

Why is the response code recorded as 503? If I'm interpreting the logs 
correctly, the client connected and disconnected before the request could even 
be passed to the backend, so shouldn't that be a -1 response code? I mainly 
want to know if we can safely ignore these errors or if perhaps it's a bug / 
undocumented behavior in h2 mode.

For reference we're using H2 fe,  mode htx, be H1.1 in our config currently.

Thanks for any suggestions!

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2 & 1.9.3)

2019-02-01 Thread Luke Seelenbinder
Hi Willy,

I just had a chance to check this—I haven't run every test I could think of 
yet, but it works! Not a single server-side error or disconnection!

I'll definitely be rolling this out when 1.9.4 lands. I will let you know if 
anything else comes up during any remaining testing.

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Friday, February 1, 2019 3:36 PM, Luke Seelenbinder 
 wrote:

> Hi Willy,
> 

> Great news! Thank you again for your work to clean this all up.
> 

> > http://git.haproxy.org/?p=haproxy-1.9.git;a=snapshot;h=HEAD;sf=tgz
> 

> I'll pull this and test as soon as I get the chance. That is likely to be 
> Monday given how my Friday is going. :-)
> 

> Based on what you've said given your testing, it sounds very much like our 
> problems should be entirely gone with this set of bug fixes.
> 

> Best,
> Luke
> 

> —
> Luke Seelenbinder
> Stadia Maps | Founder
> stadiamaps.com
> 

> ‐‐‐ Original Message ‐‐‐
> On Friday, February 1, 2019 9:49 AM, Willy Tarreau w...@1wt.eu wrote:
> 

> > Hi Luke,
> 

> > On Wed, Jan 30, 2019 at 07:43:06PM +0100, Willy Tarreau wrote:
> 

> > > I've also found a few erroneous state transitions and an issue affecting
> > > trailers which could also break the connection. For now it's only in
> > > 2.0-dev because it'll take me a while to collect all pending patches for
> > > the next 1.9.
> 

> > So all of them are now in the 1.9 branch, you can download a preview here :
> 

> > http://git.haproxy.org/?p=haproxy-1.9.git;a=snapshot;h=HEAD;sf=tgz
> 

> > Regards,
> > Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2 & 1.9.3)

2019-02-01 Thread Luke Seelenbinder
Hi Willy,

Great news! Thank you again for your work to clean this all up.

> http://git.haproxy.org/?p=haproxy-1.9.git;a=snapshot;h=HEAD;sf=tgz

I'll pull this and test as soon as I get the chance. That is likely to be 
Monday given how my Friday is going. :-)

Based on what you've said given your testing, it sounds very much like our 
problems should be entirely gone with this set of bug fixes.

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Friday, February 1, 2019 9:49 AM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Wed, Jan 30, 2019 at 07:43:06PM +0100, Willy Tarreau wrote:
> 

> > I've also found a few erroneous state transitions and an issue affecting
> > trailers which could also break the connection. For now it's only in
> > 2.0-dev because it'll take me a while to collect all pending patches for
> > the next 1.9.
> 

> So all of them are now in the 1.9 branch, you can download a preview here :
> 

> http://git.haproxy.org/?p=haproxy-1.9.git;a=snapshot;h=HEAD;sf=tgz
> 

> Regards,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: HTTP connection is reset after each request

2019-01-30 Thread Luke Seelenbinder
Hi Aleks,

You're correct for http/1.1, but unfortunately, nothing I found after a pretty 
long search indicated 1.8.x supports an h2 frontend with reusable backend 
connections (h1.1 or h2).

I stuck with h/1.1 until 1.9 was released because of this.

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 30, 2019 12:02 PM, Aleksandar Lazic  
wrote:

> Hi.
> 

> Am 30.01.2019 um 11:53 schrieb Marco Corte:
> 

> > Il 2019-01-30 11:40 Luke Seelenbinder ha scritto:
> > 

> > > Are you on 1.9.x? 1.8.x does not support reuse of backend connections
> > > when using an h2 frontend. 1.9.x does support this and it works quite
> > > nicely.
> > 

> > Yes! I am on version 1.8.17.
> > Thank you for the explanation!
> 

> Well somehow it supports
> https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4-http-reuse
> 

> I would play with the timeouts
> 

> https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4-timeout 
> http-keep-alive
> https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4-timeout 
> http-request
> 

> There are some more timeouts which starts in the doc at `timeout check` in 
> this section.
> https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4.1
> 

> never the less 700ms is "relatively" long so I would also add a check in the 
> server line.
> 

> > .marcoc
> 

> Regards
> Aleks



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: HTTP connection is reset after each request

2019-01-30 Thread Luke Seelenbinder
Hi Marco,

Are you on 1.9.x? 1.8.x does not support reuse of backend connections when 
using an h2 frontend. 1.9.x does support this and it works quite nicely.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 30, 2019 11:30 AM, Marco Corte  
wrote:

> Hi, list!
> 

> If do not use HTTP/2 in the frontend, the connection to the real server
> is kept open.
> 

> I did not find anything about this in the documentation or in the change
> logs.
> Can you please point me to the explanation of this behaviour?
> 

> Thank you.
> 

> .marcoc



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2 & 1.9.3)

2019-01-30 Thread Luke Seelenbinder
Hi Willy,

I've done a good bit more digging. Here's my notes so far.

1. On occasion the old behavior (CD-- then SD--) still exhibits. It's a bit too 
consistent for me to push to our production platform yet.
2. It seems Chrome and Firefox's behavior is different enough to change the 
frequency of this being a problem. I consistently experience errors on Chrome 
(resulting in net::ERR_SPDY_PROTOCOL_ERROR or CONNECTION_ERROR), but I only 
occasionally have similar issues on Firefox. Safari is a non-issue, because it 
doesn't cancel in-flight requests at all.
3. Going directly against NGINX exhibits fault-tolerant behavior in all 
browsers. I see a smattering of 499 (client went away) and 401 (unauthorized, 
but I'm not sure why?) errors, but never valid requests erroring. It also 
reuses the same connection ID across all requests.
4. The patch definitely improved the situation. I hit an error every 4-5 double 
zoom cycles, and that's usually only 1-2 requests of 10.
5. When an error occurs, Chrome definitely renegotiates the connection, as one 
would expect.
6. Firefox seems less problematic than Chrome. I get an error maybe every 10 
zoom cycles, but that usually affects the whole connection. (Undoubtedly due to 
differences in how Firefox and Chrome handle errors.)
7. The 403 errors previously experienced are definitely do to our 
authentication / authorization ACLs, the requests are malformed or 
misrepresented well enough to not pass the ACLs. Maybe I should return a 400 
instead of 403 in my previously mentioned ACL? Removing all of our ACLs removes 
all 403s and PR-- and turned the response code into -1.
8. Not all CD-- / SD-- pairs result in client-side failed requests. (This is a 
definite improvement as well.)
9. Most CD-- / SD-- lines are independent now. That is, some requests from a 
set fail with CD-- and some with SD--, but CD-- responses don't seem to cause 
SD-- anymore.

I don't know if any of this rings any bells, but that's what I've been able to 
determine so far. Overall the patch improves things significantly, but some 
latent issues are still around.

I would be happy to continue to assist however I can. :)

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 30, 2019 10:32 AM, Luke Seelenbinder 
 wrote:

> Hi Willy,
> 

> > 403 is strange, it's forbidden. 400 I could understand. What might happen
> > is the following scenario :
> 

> > I'm really not thinking of any case resulting in this but I'll take a
> > second look.
> 

> This…is actually a configuration issue on my side, I think. Sorry for the 
> wild-goose chase. I have the following lines:
> 

> acl stadia_request hdr_dom(host) -i stadiamaps.com
> http-request deny unless stadia_request
> 

> I can well imagine this causing 403s instead of 400s on canceled requests.
> 

> > So Chrome likely sends RST_STREAM immediately while Firefox silently
> > cancels the stream and responds with RST_STREAM when it gets the response
> > back.
> 

> Even more fun: Safari doesn't actually cancel the request. It just happily 
> processes whatever it gets back. So the three major browsers process it all 
> in three different ways. Fun.
> 

> > Don't get me wrong, I'm extremely careful about killing streams only
> > as much as possible. There's even an h2spec failure due to this, where
> > it expects a connection error, but where doing so would result in too
> > many streams being caught causing this to happen. So as long as we can
> > identify the exact sequence which causes this, I'm obviously interested
> > in studying it!
> 

> Okay—great! I will keep digging to see if I can consistently replicate what 
> I'm seeing. I suspect this may be a side-effect of the http-request deny acl 
> above.
> 

> I will remove that acl and test without it to see if that improves the 
> situation, if so, I'll come up with a better way to implement this check.
> 

> Best,
> Luke
> 

> —
> Luke Seelenbinder
> Stadia Maps | Founder
> stadiamaps.com
> 

> ‐‐‐ Original Message ‐‐‐
> On Wednesday, January 30, 2019 10:21 AM, Willy Tarreau w...@1wt.eu wrote:
> 

> > Hi Luke,
> 

> > On Wed, Jan 30, 2019 at 08:41:03AM +, Luke Seelenbinder wrote:
> 

> > > It works! I'm seeing very, very few CD-- -> SD-- chains now. I did see a 
> > > few
> > > in h2<->h2 mode, but precious few, so I'm very happy to say the bug as
> > > previous manifested is remedied! Thanks for digging so wide and deep for 
> > > this
> > > bug fix--kinda funny how it ended up being half an if condition on one 
> > > line.
> > > . . software development is the best, ain't it? :-)
> 

> > Excellent, thanks for the report.
> 

> &

Re: H2 Server Connection Resets (1.9.2 & 1.9.3)

2019-01-30 Thread Luke Seelenbinder
Hi Willy,


> 403 is strange, it's forbidden. 400 I could understand. What might happen
> is the following scenario :

> I'm really not thinking of any case resulting in this but I'll take a
> second look.

This…is actually a configuration issue on my side, I think. Sorry for the 
wild-goose chase. I have the following lines:

  acl stadia_request hdr_dom(host) -i stadiamaps.com
  http-request deny unless stadia_request

I can well imagine this causing 403s instead of 400s on canceled requests.

> So Chrome likely sends RST_STREAM immediately while Firefox silently
> cancels the stream and responds with RST_STREAM when it gets the response
> back.

Even more fun: Safari doesn't actually cancel the request. It just happily 
processes whatever it gets back. So the three major browsers process it all in 
three different ways. Fun.

> Don't get me wrong, I'm extremely careful about killing streams only
> as much as possible. There's even an h2spec failure due to this, where
> it expects a connection error, but where doing so would result in too
> many streams being caught causing this to happen. So as long as we can
> identify the exact sequence which causes this, I'm obviously interested
> in studying it!

Okay—great! I will keep digging to see if I can consistently replicate what I'm 
seeing. I suspect this may be a side-effect of the http-request deny acl above.

I will remove that acl and test without it to see if that improves the 
situation, if so, I'll come up with a better way to implement this check.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 30, 2019 10:21 AM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Wed, Jan 30, 2019 at 08:41:03AM +, Luke Seelenbinder wrote:
> 

> > It works! I'm seeing very, very few CD-- -> SD-- chains now. I did see a few
> > in h2<->h2 mode, but precious few, so I'm very happy to say the bug as
> > previous manifested is remedied! Thanks for digging so wide and deep for 
> > this
> > bug fix--kinda funny how it ended up being half an if condition on one line.
> > . . software development is the best, ain't it? :-)
> 

> Excellent, thanks for the report.
> 

> > However, that leads to a second tier of less-important questions:
> > 

> > 1.  Should RST_STREAM (I believe that's what happens on request 
> > cancellation)
> > also reset the full h2 connection?
> > 

> 

> It should never happen, unless of course the frame is invalid protocol-wise
> and results in a protocol error, but in practice it should not happen when
> coming from a browser. Thus I suspect we have another case of GOAWAY.
> 

> > I'm seeing the following behavior:
> > 

> > -   Ensure connection established. Zoom twice. Set X is canceled. Set Y
> > succeeds, but the first request from Y must renegotiate the connection 
> > & TLS,
> > which definitely results in a performance hit on our side.
> > 

> > 

> > 2.  Sometimes a canceled request results in a 403 because the request is
> > malformed (PR-- log line); this makes sense in the usual case, but in 
> > this
> > specific case, I'm not so sure.
> > 

> 

> 403 is strange, it's forbidden. 400 I could understand. What might happen
> is the following scenario :
> 

> -   the client sends the request
> 

> -   the H2 demux decodes it, creates an H2 stream, then creates an
> application-layer stream, attaches them and wakes the latter up
> 

> -   the client sends RST_STREAM
> 

> -   the demux places an error flag on the stream (it was aborted)
> 

> -   the stream is scheduled, reads the decoded request to process it,
> and finds a request plus an error
> -> 400 bad request could be logged to indicate how this stream was
> aborted. It's not much different from what could happen in HTTP/1
> if the client had aborted the connection just before it was started
> to be parsed.
> 

> 

> > This would also result in a broken connection
> > on the client side, I'm thinking.
> 

> It should not, but as you probably know, everything lies in the difference
> between "should" and "can".
> 

> > Would there be a way to handle RST_STREAM
> > different without compromising security reasons for rejecting malformed /
> > broken requests?
> 

> I'm really not thinking of any case resulting in this but I'll take a
> second look.
> 

> > Side note: this seems to be a behavior difference between Firefox and 
> > Chrome.
> > In Chrome I always get PR-- lines. In Firefox I almost always get C(C|D)--
> > lines.
> 

>

Re: H2 Server Connection Resets (1.9.2 & 1.9.3)

2019-01-30 Thread Luke Seelenbinder
Hi Willy,

> I've merged the attached patch that fixes the problem for me, please try
> it, it should apply cleanly on top of 1.9.3.

It works! I'm seeing very, very few CD-- -> SD-- chains now. I did see a few in 
h2<->h2 mode, but precious few, so I'm very happy to say the bug as previous 
manifested is remedied! Thanks for digging so wide and deep for this bug 
fix—kinda funny how it ended up being half an if condition on one line. . . 
software development is the best, ain't it? :-)

However, that leads to a second tier of less-important questions:

1. Should RST_STREAM (I believe that's what happens on request cancellation) 
also reset the full h2 connection? I'm seeing the following behavior:

- Ensure connection established. Zoom twice. Set X is canceled. Set Y succeeds, 
but the first request from Y must renegotiate the connection & TLS, which 
definitely results in a performance hit on our side.

2. Sometimes a canceled request results in a 403 because the request is 
malformed (PR-- log line); this makes sense in the usual case, but in this 
specific case, I'm not so sure. This would also result in a broken connection 
on the client side, I'm thinking. Would there be a way to handle RST_STREAM 
different without compromising security reasons for rejecting malformed / 
broken requests?

Side note: this seems to be a behavior difference between Firefox and Chrome. 
In Chrome I always get PR-- lines. In Firefox I almost always get C(C|D)-- 
lines.

Would it be possible to remove the tie between stream failure and connection 
failures on the client side? That is, would it be possible to allow individual 
stream failures, such is the case with client-side request cancellation, while 
maintaining the integrity of the whole h2 client connection? I know this is 
delving pretty deep into the nuances of h2 vs http/1.1 and resulting issues and 
may not even be a server-side solution, but I figured it's worth asking while 
the code is already in your head. :-)

Thanks again for this fix! I will roll it out to our servers today after a bit 
more testing, since the incidence of bad results is negligible and this 
definitely should improve our e2e timings.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Tuesday, January 29, 2019 7:00 PM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Tue, Jan 29, 2019 at 10:52:15AM +, Luke Seelenbinder wrote:
> 

> > Hi Willy,
> > 

> > > By the way, how do you manage to cancel a single stream in the browser ?
> > > Pressing Esc might break all of them I guess ? Thus I'm uncertain how to
> > > achieve this.
> > 

> > So we're in a very specific use-case of delivering map tiles, which are
> > predominately requested via Leaflet.js. Leaflet.js cancels requests when a
> > particular map tile is no longer needed (e.g., zoomed/panned past), so
> > individual request cancellation is a very normal behavior in our setup.
> 

> OK so this definitely makes sense.
> 

> > Here's a demo link that you could find helpful to capture stream 
> > information:
> > https://stadiamaps.com/demo-raster.html. To replicate the issue, I zoom in
> > twice (hit the plus button twice), which creates a set of tile requests for
> > zoom X and then zoom Y. All requests for X are canceled when the Y set is
> > sent. In h2 backend mode, this means most of Y fails, because the X set 
> > broke
> > all the currently open connections/streams that Y tries to reuse. In h1.1
> > backend mode, it means a small subset of Y may fail, because HAProxy
> > improperly closes the client connection during X.
> 

> Thank you for this. I finally managed to track it down to something very
> stupid : we are supposed to cause a connection error when seeing an attempt
> to create a new stream with an already existing ID, which translates to
> seeing a HEADERS frame on a closed stream in practice. This was true on
> the frontend side, but not on the backend, because when a client aborts a
> stream, it immediately turns to the closed state (always the same problem)
> and if the server happens to respond on this stream, it will start with a
> HEADERS frame with an ID of a closed stream, hence we kill the connection.
> 

> So I've added a test for the side to this check, because obviously HEADERS
> frames only create streams from clients to servers, not the other way around.
> 

> I've merged the attached patch that fixes the problem for me, please try
> it, it should apply cleanly on top of 1.9.3.
> 

> Thanks,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: Reloads do not terminate old processes (1.9.x)

2019-01-29 Thread Luke Seelenbinder
Hi William,

> The timeout client applies on inactivity, are you sure those connections are
> inactives? Try to do a "show sess" on the old process so you can see the
> remaining sessions.

I suspected the same. This confirms what I observed.

> Are you using the seamless reload feature or just the master worker?
> Looks like there is some confusion about what is the seamless reload in
> HAProxy, It's not the kill -USR2 in master-worker mode. It's the hability to
> transfer the bind socket to a new process over the unix socket using the -x
> parameter. This can be configurated with "expose-fd listeners" on the stats
> socket configuration. But this won't change the behavior you are observing.

I'm using both, so I was conflating the two in my email. Thanks for clarifying 
this!

> Once the -USR2 signal is received by the master it will reexec itself with 
> -sf,
> which will send -USR1 signals to the workers. So yes, it applies on the reload
> in the master worker mode.

Ah—I missed that string of actions in the docs. Makes sense.

> The reload of HAProxy in master worker mode or in daemon mode implies that you
> will still have old instances of HAProxy running until all there jobs are 
> done.
> If you don't want this behavior, you should do a restart of the service and
> not a reload.

Again, makes sense. Thanks for the clarification.

> This option isn't useful in master worker mode, you just have to specify
> "expose-fd listeners" in the configuration, and the master will had this 
> option
> for you.

Cool!

So I think the answer to this all is to add hard-stop-after, and it will fix 
the issue. H2 likely has longer-lived sessions on the client-side, so I only 
noticed the hole in my configuration after enabling h2.

Thanks for the help!

Best,
Luke 


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Tuesday, January 29, 2019 12:23 PM, William Lallemand 
 wrote:

> Hi Luke,
> 

> On Tue, Jan 29, 2019 at 10:31:00AM +, Luke Seelenbinder wrote:
> 

> > Hi all,
> > I'm observing some odd behavior with seamless reloads and processes hanging
> > around. It appears when a reload is triggered with any active client
> > connections, the new process comes up with the proper -sf , but 
> > the
> > old process(es) is only terminated after the last client disconnects
> > (definitely longer than timeout client) and lives for a very long time in 
> > our
> > experience or until a restart of the master process is triggered. I'm using
> > h2 on the front-end with HTX.
> 

> The normal behavior of HAProxy during a soft stop (-sf) is to wait for all
> jobs to stop before leaving. This behavior is the same with a reload in
> master worker mode.
> 

> The timeout client applies on inactivity, are you sure those connections are
> inactives? Try to do a "show sess" on the old process so you can see the
> remaining sessions.
> 

> Are you using the seamless reload feature or just the master worker?
> Looks like there is some confusion about what is the seamless reload in
> HAProxy, It's not the kill -USR2 in master-worker mode. It's the hability to
> transfer the bind socket to a new process over the unix socket using the -x
> parameter. This can be configurated with "expose-fd listeners" on the stats
> socket configuration. But this won't change the behavior you are observing.
> 

> > Would hard-stop-after apply in this case since the reload signal is USR2, 
> > but
> > hard-stop-after is documented to apply to USR1?
> 

> Once the -USR2 signal is received by the master it will reexec itself with 
> -sf,
> which will send -USR1 signals to the workers. So yes, it applies on the reload
> in the master worker mode.
> 

> > This is somewhat disconcerting, since it results in an effective memory leak
> > on every reload, which can happen pretty frequently in our setup. If
> > hard-stop-after applies here, then it's not a bug, but perhaps the
> > documentation should clarify its meaning.
> 

> The reload of HAProxy in master worker mode or in daemon mode implies that you
> will still have old instances of HAProxy running until all there jobs are 
> done.
> If you don't want this behavior, you should do a restart of the service and
> not a reload.
> 

> > I first observed this with 1.9.2, it continues in 1.9.3, but the behavior 
> > may
> > exist before that.
> 

> > I've included my systemd configuration (from systemctl cat haproxy), in case
> > that's the culprit.
> 

> > Add extra flags here, see haproxy(1) for a few o

Re: H2 Server Connection Resets (1.9.2 & 1.9.3)

2019-01-29 Thread Luke Seelenbinder
Hi Willy,

> By the way, how do you manage to cancel a single stream in the browser ?
> Pressing Esc might break all of them I guess ? Thus I'm uncertain how to
> achieve this.

So we're in a very specific use-case of delivering map tiles, which are 
predominately requested via Leaflet.js. Leaflet.js cancels requests when a 
particular map tile is no longer needed (e.g., zoomed/panned past), so 
individual request cancellation is a very normal behavior in our setup.

Here's a demo link that you could find helpful to capture stream information: 
https://stadiamaps.com/demo-raster.html. To replicate the issue, I zoom in 
twice (hit the plus button twice), which creates a set of tile requests for 
zoom X and then zoom Y. All requests for X are canceled when the Y set is sent. 
In h2 backend mode, this means most of Y fails, because the X set broke all the 
currently open connections/streams that Y tries to reuse. In h1.1 backend mode, 
it means a small subset of Y may fail, because HAProxy improperly closes the 
client connection during X.

Speak of this, that may explain why there was a string of CD-- and then SD--. 
The CD-- set indicates the HAProxy issue on the client-side (handling of 
canceled requests); the SD-- set indicates the CD-- issue made it back to the 
server connection, which are then reused, causing further issues, which are 
related but somewhat tangentially.

> It's a bit difficult for me to suggest anything unfortunately. With a
> multiplexed protocol, you have a myriad of possible combinations and
> the only thing you can do is try to imagine how this or that event could
> have a potential impact when mixed with this or that one either on the
> same side or on the other side of the mux. My bible here is to always
> have RFC7540 opened on my desk to compare the behaviour and sometimes
> figure how far small issues can spread :-/

Well then—godspeed! Let me know if sending a set of HAR / TCPDump / etc. would 
help.

> I think I found a solution for this, I open two tabs in the browser both 
> pointing to the same server. It reuses the connection in this case, so that I 
> can test one
> stream's interaction with the other one by pressing Escape while waiting for 
> the response. I managed to get one SD on the front side, I need to dig 
> further now.

Ah! That's clever.

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Tuesday, January 29, 2019 11:39 AM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Tue, Jan 29, 2019 at 10:06:03AM +, Luke Seelenbinder wrote:
> 

> > I just pulled, compiled, and tested the newly minted 1.9.3, and I'm
> > experiencing the same issue with alpn h2 on the backend definition.
> 

> Ah sh*t :-(
> 

> > I also
> > strongly suspect it's not related to maximum streams per connection, because
> > the issue happens well before 1000 requests (and consistently at that).
> 

> OK, that's useful info.
> 

> > > Perhaps the client causing the issues was a red herring for
> > > the server-side bugs.
> > 

> > I believe after the fixes in 1.9.3, this has actually been proven false. I
> > can replicate this bug every single time with the following:
> > 

> > 1.  Make set of requests
> > 2.  Cancel all or subset of requests
> > 3.  Make another set of requests
> 

> I failed to produce this case, but I'm now figuring that I tried using
> curl/h2load/nghttp and that all of these break the connection and the
> stream at the same time. The tests I've ran in a browser were involving
> a single stream. I'll try to make a dummy HTML page to force the browser
> to emit multiple streams over the same connection and see if I can produce
> it like this.
> 

> By the way, how do you manage to cancel a single stream in the browser ?
> Pressing Esc might break all of them I guess ? Thus I'm uncertain how to
> achieve this.
> 

> > On step 3, every single request fails because something is getting messed up
> > by 2, causing the server stream to go away. The log lines are the same
> > pattern of C(C|D)-- / SD--.
> > Another piece of information, when this happens, Chrome drops this in the
> > console, which always correlates to a SD-- line in the haproxy logs:
> > Failed to load resource: net::ERR_SPDY_PROTOCOL_ERROR
> 

> OK.
> 

> > I also just verified this happens under similar circumstances using alpn
> > http/1.1 on the backend (this may or may not be new in 1.9.3). 4 requests
> > failed on the client side with the following error messages after using the
> > same 3 step process (all correlate to a CD-- message in the logs):
> > net::ERR_SPDY_PROTOCOL_ERROR
> > net::ERR_CONNECTION_CLOSED 200
> > net::ERR_CONNECTION_CLOSED 200
>

Reloads do not terminate old processes (1.9.x)

2019-01-29 Thread Luke Seelenbinder
Hi all,

I'm observing some odd behavior with seamless reloads and processes hanging 
around. It appears when a reload is triggered with any active client 
connections, the new process comes up with the proper -sf , but the 
old process(es) is only terminated after the last client disconnects 
(definitely longer than timeout client) and lives for a very long time in our 
experience or until a restart of the master process is triggered. I'm using h2 
on the front-end with HTX.

Would hard-stop-after apply in this case since the reload signal is USR2, but 
hard-stop-after is documented to apply to USR1?

This is somewhat disconcerting, since it results in an effective memory leak on 
every reload, which can happen pretty frequently in our setup. If 
hard-stop-after applies here, then it's not a bug, but perhaps the 
documentation should clarify its meaning.

I first observed this with 1.9.2, it continues in 1.9.3, but the behavior may 
exist before that.

I've included my systemd configuration (from systemctl cat haproxy), in case 
that's the culprit.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

# /lib/systemd/system/haproxy.service
[Unit]
Description=HAProxy Load Balancer
Documentation=man:haproxy(1)
Documentation=file:/usr/share/doc/haproxy/configuration.txt.gz
After=network.target rsyslog.service

[Service]
EnvironmentFile=-/etc/default/haproxy
Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid"
ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always
SuccessExitStatus=143
Type=notify

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/haproxy.service.d/customexec.conf
[Service]
ExecStartPre=
ExecStartPre=-/bin/rm /run/haproxy/global
ExecStartPre=/usr/local/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecStart=
ExecStart=/usr/local/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS
ExecReload=
ExecReload=/bin/bash -c '/usr/bin/socat /run/haproxy/admin.sock - <<< "show 
servers state" > /run/haproxy/global'
ExecReload=/usr/local/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecReload=/bin/kill -USR2 $MAINPID

Output of cat /etc/defaults/haproxy:

# Defaults file for HAProxy
#
# This is sourced by both, the initscript and the systemd unit file, so do not
# treat it as a shell script fragment.

# Change the config file location if needed
#CONFIG="/etc/haproxy/haproxy.cfg"

# Add extra flags here, see haproxy(1) for a few options
EXTRAOPTS="-x /run/haproxy/admin.sock"

haproxy -vv:

HA-Proxy version 1.9.3 2019/01/29 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -D_FORTIFY_SOURCE=2 -fstack-protector-strong -Wdate-time 
-Werror=format-security -fno-strict-aliasing -Wdeclaration-after-statement 
-fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter 
-Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered 
-Wno-missing-field-initializers -Wtype-limits -Wshift-negative-value 
-Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 
USE_SYSTEMD=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_NS=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.0j  20 Nov 2018
Running on OpenSSL version : OpenSSL 1.1.0j  20 Nov 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.22 2016-07-29
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as  cannot be specified using 'proto' keyword)
  h2 : mode=HTXside=FE|BE
  h2 : mode=HTTP   side=FE
: mode=HTXside=FE|BE
: mode=TCP|HTTP   side=FE|BE

Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace

publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2 & 1.9.3)

2019-01-29 Thread Luke Seelenbinder
Hi Willy,

> As you like. My first rule is never to make people take risks they're not
> willing to take. It's perfectly OK to me if you don't feel confident with
> 2.0-dev in prod. I'm going to perform the 1.9 backports. If you're
> interested in testing them from the branch before I release it today, just
> let me know.

I just pulled, compiled, and tested the newly minted 1.9.3, and I'm 
experiencing the same issue with alpn h2 on the backend definition. I also 
strongly suspect it's not related to maximum streams per connection, because 
the issue happens well before 1000 requests (and consistently at that).

> Perhaps the client causing the issues was a red herring for
> the server-side bugs.

I believe after the fixes in 1.9.3, this has actually been proven false. I can 
replicate this bug every single time with the following:

1. Make set of requests
2. Cancel all or subset of requests
3. Make another set of requests

On step 3, every single request fails because something is getting messed up by 
2, causing the server stream to go away. The log lines are the same pattern of 
C(C|D)-- / SD--.

Another piece of information, when this happens, Chrome drops this in the 
console, which always correlates to a SD-- line in the haproxy logs:

Failed to load resource: net::ERR_SPDY_PROTOCOL_ERROR

I also just verified this happens under similar circumstances using alpn 
http/1.1 on the backend (this may or may not be new in 1.9.3). 4 requests 
failed on the client side with the following error messages after using the 
same 3 step process (all correlate to a CD-- message in the logs):

net::ERR_SPDY_PROTOCOL_ERROR
net::ERR_CONNECTION_CLOSED 200
net::ERR_CONNECTION_CLOSED 200
net::ERR_CONNECTION_CLOSED 200

I wonder if HAProxy is interpreting a broken request as a client error and 
going away (but not sending GOAWAY)? I don't know enough about h2 to know if 
this is in the spec or not, but perhaps that's another avenue of investigation?

I'm more than happy to help, and while my C is a bit rusty, I'm starting to get 
a feel for the HAProxy source, so I could attempt to debug as well, if you have 
any suggestions in that vein.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Friday, January 25, 2019 9:48 AM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Fri, Jan 25, 2019 at 08:08:22AM +, Luke Seelenbinder wrote:
> 

> > Hi Willy,
> > 

> > > OK so instead of sending you a boring series, I can propose you to run
> > > a test on 2.0-dev, which contains all the fixes I had to go through
> > > because of tiny issues everywhere related to this. If you're using git,
> > > just clone the master and checkout commit f7a259d46f8.
> > > you can simply wait for the next nightly snapshot.
> > 

> > Sounds good. My compilation playbook uses tarballs, so I'll just use the 
> > last
> > nightly. I assume I should wait for these fixes to be backported (1.9.3?)
> > before trying anything in production?
> 

> As you like. My first rule is never to make people take risks they're not
> willing to take. It's perfectly OK to me if you don't feel confident with
> 2.0-dev in prod. I'm going to perform the 1.9 backports. If you're
> interested in testing them from the branch before I release it today, just
> let me know.
> 

> > > But now you have a new server parameter called
> > > "max-reuse". This allows to limit the number of times a server connection
> > > is reused. For example you can set it to 990 when you know that the
> > > server limits to 1000.
> > 

> > That's great! I didn't expect to get a new configuration option. I'll
> > definitely make sure these are in sync across our infrastructure.
> 

> Even without the option it will work better than before, but the option
> is there to completely void any risk of hitting the limit too late.
> 

> > > Regarding the fact that in your case the client's close seems to cause
> > > the server-side issue, I couldn't yet reproduce it though I have a few
> > > theories about it. One of them would be an unexpected response from
> > > the server causing the connection to turn to an error state. The other
> > > one would be that we'd incorrectly abort our stream and/or session and
> > > bring the connection down with us. I'll submit these theories to Olivier
> > > once he's back so that he can tell me I'm saying crap regarding some of
> > > them and we can focus on what remains :-)
> > 

> > Sounds good. I'll report back my results from the latest snapshot and we can
> > go from there. Perhaps the client causing the issues was a red herring for
> > the server-side bugs.
> 

> I hadn't thought about it but it could

Re: H2 Server Connection Resets (1.9.2)

2019-01-25 Thread Luke Seelenbinder
Hi Willy,

> OK so instead of sending you a boring series, I can propose you to run
> a test on 2.0-dev, which contains all the fixes I had to go through
> because of tiny issues everywhere related to this. If you're using git,
> just clone the master and checkout commit f7a259d46f8.
> you can simply wait for the next nightly snapshot.

Sounds good. My compilation playbook uses tarballs, so I'll just use the last 
nightly. I assume I should wait for these fixes to be backported (1.9.3?) 
before trying anything in production?

> But now you have a new server parameter called
> "max-reuse". This allows to limit the number of times a server connection
> is reused. For example you can set it to 990 when you know that the
> server limits to 1000.

That's great! I didn't expect to get a new configuration option. I'll 
definitely make sure these are in sync across our infrastructure.

> Regarding the fact that in your case the client's close seems to cause
> the server-side issue, I couldn't yet reproduce it though I have a few
> theories about it. One of them would be an unexpected response from
> the server causing the connection to turn to an error state. The other
> one would be that we'd incorrectly abort our stream and/or session and
> bring the connection down with us. I'll submit these theories to Olivier
> once he's back so that he can tell me I'm saying crap regarding some of
> them and we can focus on what remains :-)

Sounds good. I'll report back my results from the latest snapshot and we can go 
from there. Perhaps the client causing the issues was a red herring for the 
server-side bugs.

Thanks again for deep-diving and resolving this! I won't ask how many hours it 
took to find all these small edge cases. . .

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Thursday, January 24, 2019 7:55 PM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Wed, Jan 23, 2019 at 05:16:04PM +, Luke Seelenbinder wrote:
> 

> > Hi Willy,
> > This is all very good to hear. I'm glad you were able to get to the bottom 
> > of
> > it all!
> > Feel free to send along patches if you want me to test before the 1.9.3
> > release. I'm more than happy to do so.
> 

> OK so instead of sending you a boring series, I can propose you to run
> a test on 2.0-dev, which contains all the fixes I had to go through
> because of tiny issues everywhere related to this. If you're using git,
> just clone the master and checkout commit f7a259d46f8.
> you can simply wait for the next nightly snapshot.
> 

> Just let me know if that's OK for you.
> 

> I found a number of issues that were causing server aborts, mainly
> due to the late GOAWAY frame. Once we hit this one, the connection
> is quickly closed by the server, causing our output packets to be
> rejected and the connection to be in error. I have not yet investigated
> in details to see if the close happens after we got the last data or in
> the middle though. But now you have a new server parameter called
> "max-reuse". This allows to limit the number of times a server connection
> is reused. For example you can set it to 990 when you know that the
> server limits to 1000.
> 

> On the tests I've run here, I managed to address all the problems
> related to excessive use of idle connections resulting in too many
> streams being sent. In addition most of the rare cases that still
> happen when you don't have max-reuse are properly handled as a retry.
> 

> Regarding the fact that in your case the client's close seems to cause
> the server-side issue, I couldn't yet reproduce it though I have a few
> theories about it. One of them would be an unexpected response from
> the server causing the connection to turn to an error state. The other
> one would be that we'd incorrectly abort our stream and/or session and
> bring the connection down with us. I'll submit these theories to Olivier
> once he's back so that he can tell me I'm saying crap regarding some of
> them and we can focus on what remains :-)
> 

> Regards,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Luke Seelenbinder
Hi Willy,

This is all very good to hear. I'm glad you were able to get to the bottom of 
it all!

Feel free to send along patches if you want me to test before the 1.9.3 
release. I'm more than happy to do so.

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 23, 2019 6:02 PM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Wed, Jan 23, 2019 at 10:47:33AM +0000, Luke Seelenbinder wrote:
> 

> > We were using http-reuse always and experiencing this
> > issue (as well as getting 80+% connection reuse). When I scaled it back to
> > http-reuse safe, the frequency of this issue seemed to be much lower.
> > (Perhaps because the bulk of my testing was with one client and somewhat
> > unscientific?)
> 

> It could be caused by various things. In my tests the client doesn't even
> use keep-alive so haproxy is less aggressive with connection reuse and
> that could explain some differences.
> 

> > > Thus it
> > > definitely is a matter of bad interaction between two streams, or one
> > > stream affecting the connection and hurting the other stream.
> > 

> > My debugging spidery-sense points to the same thing.
> 

> So I have more info now. There are multiple issues which stack up and
> cause this :
> 

> -   the GOAWAY frame indicating the last stream id might be in flight
> while many more streams have been added. This results in batch
> deaths once the limit is met ;
> 

> -   the last stream ID received in the GOAWAY frame was not considered
> when calculating the number of available streams, leading to more
> than acceptable by the server to be created ;
> 

> -   there is an issue with how new streams are attached to idle connections
> making them non-retryable in case of a failure sur as above. I managed
> to fix this but it still requires some testing with other configs ;
> 

> -   another issue affects idle connections, some of them could remain
> in the idle list while they don't have room anymore because they
> are removed only when they deliver the last stream, thus the check
> doesn't support jumps in the number of available streams ; I suspect
> it could be related to the client aborts that cause server aborts,
> just because it allowed some excess streams to be sent to a mux which
> doesn't have room anymore, but I could be wrong ;
> 

> And a less important one : the maximum number of concurrent streams per
> connection is global. In this case it's 100 so it's lower than nginx's
> 128 thus it doesn't cause any issue. But we could run into problems with
> this and I must address this to make it per-connection.
> 

> With all these changes, I managed to run a long test with no more errors
> and only an immediate retry once in a while if nginx announced the GOAWAY
> too late. When we set the limit ourselves, there's not even any retry
> anymore. Thus I'll continue to work on this and we'll slightly delay 1.9.3
> to collect these fixes. From there we'll be able to see if you still have
> problems and iterate.
> 

> 

> > Let me know if you want me to share our config (it's quite complex) with you
> > privately or if there's anything else we can do to assist.
> 

> That's kind but now I don't need it anymore, I have everything needed to
> reproduce the whole issue it seems.
> 

> Thanks,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Luke Seelenbinder
Hi Willy,

> When using "http-reuse always" the issue disappears and I
> can never get any issue at all. Now that I've fixed this, I'm seeing the
> issue with the SD flags.

Now that's interesting. We were using http-reuse always and experiencing this 
issue (as well as getting 80+% connection reuse). When I scaled it back to 
http-reuse safe, the frequency of this issue seemed to be much lower. (Perhaps 
because the bulk of my testing was with one client and somewhat unscientific?)

> Thus it
> definitely is a matter of bad interaction between two streams, or one
> stream affecting the connection and hurting the other stream.

My debugging spidery-sense points to the same thing. Let me know if you want me 
to share our config (it's quite complex) with you privately or if there's 
anything else we can do to assist.

> I now have something to dig into.

:-)

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 23, 2019 11:39 AM, Willy Tarreau  wrote:

> On Wed, Jan 23, 2019 at 11:09:53AM +0100, Willy Tarreau wrote:
> 

> > On Wed, Jan 23, 2019 at 09:24:19AM +, Luke Seelenbinder wrote:
> > 

> > > > I've place an nginx instance after my local haproxy dev config, and
> > > > found something which might explain what you're observing : the process
> > > > apparently leaks FDs and fails once in a while, causing 500 to be 
> > > > returned :
> > > 

> > > That's fascinating. I would have thought nginx would have had a bit better
> > > care given to things like that. . .
> > 

> > Well, it's possible I'm hitting a corner case. I don't want to blame nginx
> > for such situations, we all have our share of crap when it comes to error
> > handling :-)
> 

> Actually I have to stand corrected, the issue is with our idle connection
> management. For some reason we pile up new connections instead of reusing
> the previous ones and the nginx process fails to stand extra ones past a
> certain point. When using "http-reuse always" the issue disappears and I
> can never get any issue at all. Now that I've fixed this, I'm seeing the
> issue with the SD flags. I don't have this one in the specific case where
> I only have one client at a time, though there's still some reuse. Thus it
> definitely is a matter of bad interaction between two streams, or one
> stream affecting the connection and hurting the other stream.
> 

> I now have something to dig into.
> 

> Thanks,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Luke Seelenbinder
Hi Willy,

Thanks for continuing to look into this. 


> 

> I've place an nginx instance after my local haproxy dev config, and
> found something which might explain what you're observing : the process
> apparently leaks FDs and fails once in a while, causing 500 to be returned :

That's fascinating. I would have thought nginx would have had a bit better care 
given to things like that. . .

Oddly enough, I cannot find any log entries that approximate this. However, 
it's possible since we're primarily (99+%) using nginx as a reverse-proxy that 
the fd issues wouldn't appear for us.

My next thought is to try tcpdump to try to determine what's on the wire when 
the CD-- and SD-- pairs appear, but since our stack is SSL e2e, that might 
prove difficult. Any suggestions?

One more interesting piece of data: if we use htx without h2 on the backends, 
we only see CD-- entries consistently (with a very, very few SD-- entries). 
Thus, it would seem whatever is causing the issue is directly related to h2 
backends. I further think we can safely say it is directly related to h2 
streams breaking (due to client-side request cancellations) resulting in the 
whole connection breaking in HAProxy or nginx (though determining which will be 
the trick).

There's also a strong possibility we replace nginx with HAProxy entirely for 
our SSL + H2 setup as we overhaul the backends, so this problem will probably 
be resolved by removing the problematic interaction.

I'm still working on running h2load against our nginx servers to see if that 
turns anything up.

> And at this point the connection is closed and reopened for new requests.
> There's never any GOAWAY sent.

If I'm understanding this correctly, that implies as long as nginx sends GOAWAY 
properly, HAProxy will not attempt to reuse the connection?

> I managed to work around the problem by limiting the number of total
> requests per connection. I find this extremely dirty but if it helps...
> I just need to figure how to best do it, so that we can use it as well
> for H2 as for H1.

We're pretty satisfied with our h2 fe <-> be h1.1 setup right now, so we will 
probably stick with that for now, since we don't want to have any more 
operational issues from bleeding-edge bugs. (Not a comment on HAProxy, per se, 
just a business reality. :-) ) I'm more than happy to try out anything you turn 
up on our staging setup!

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 23, 2019 8:28 AM, Willy Tarreau  wrote:

> Hi Luke,
> 

> I've place an nginx instance after my local haproxy dev config, and
> found something which might explain what you're observing : the process
> apparently leaks FDs and fails once in a while, causing 500 to be returned :
> 

> 2019/01/23 08:22:13 [crit] 25508#0: *36705 open() 
> "/usr/local/nginx/html/index.html" failed (24: Too many open files), client: 
> 1>
> 2019/01/23 08:22:13 [crit] 25508#0: accept4() failed (24: Too many open files)
> 

> 127.0.0.1 - - [23/Jan/2019:08:22:13 +0100] "GET / HTTP/2.0" 500 579 "-" 
> "Mozilla/4.0 (compatible; MSIE 7.01; Windows)"
> 

> The ones are seen by haproxy :
> 

> 127.0.0.1:47098 [23/Jan/2019:08:22:13.589] decrypt trace/ngx 0/0/0/0/0 500 
> 701 - -  1/1/0/0/0 0/0 "GET / HTTP/1.1"
> 

> And at this point the connection is closed and reopened for new requests.
> There's never any GOAWAY sent.
> 

> I managed to work around the problem by limiting the number of total
> requests per connection. I find this extremely dirty but if it helps...
> I just need to figure how to best do it, so that we can use it as well
> for H2 as for H1.
> 

> Best regards,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2)

2019-01-22 Thread Luke Seelenbinder
Hi Aleksandar,

Thanks for your tips.

> Do you have such a info in the nginx log?
> 

> "http2 flood detected"

I did not find this in any of the logs from when the buggy configuration was 
deployed.

> Can you try to set some timeout values for`timeout http-keep-alive`

I do have this set already:

timeout http-keep-alive 3m

> Mind you to create a issue for that if there isn't one already?

Can do!

> Isn't`unsigned int` not enought ?
> How many idle connections do you have for how long time?

We cycle through idle connections pretty quickly, so I can certainly bump the 
NGINX limit. My issue is that we have a very real possibility of reusing a 
connection many thousands of times. We pretty consistently serve hundreds of 
req/s on some of the instances of this deployment, which means it has a lot of 
opportunity to keep a backend connection around. Thus, if there is any sort of 
upper limit on our backend server, it feels like we may very well hit that 
limit.

> Can you try to increase the max-requests to 20 in nginx

I can certainly try this. I'm not certain if that will entirely eliminate the 
issue, given my last paragraph. (Which makes me somewhat reluctant to put this 
back into production with the strong possibility of affecting user requests.)

> Just for my curiosity, have you seen any changes for your solution with the 
> htx
> /H2 e2e?

I wouldn't say we've seen any particular benefits of htx/h2 e2e simply because 
we only ran it for a few hours in one region. Once we have some bugs ironed 
out, I'll be able to better answer your question. :-) I expect to see better 
overall response times, since fast requests won't be blocked by slow requests 
(in theory).

Running h2 fe and h1.1 be has definitely made our solution more performant!

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Monday, January 21, 2019 3:16 PM, Aleksandar Lazic  
wrote:

> Hi Luke.
> 

> Am 21.01.2019 um 10:30 schrieb Luke Seelenbinder:
> 

> > Hi all,
> > One more bug (or configuration hole) from our transition to 1.9.x using 
> > end-to-end h2 connections.
> > After enabling h2 backends (technically `server … alpn h2,http/1.1`), we 
> > began seeing a high number of backend /server/ connection resets. A 
> > reasonable number of client-side connection resets due to timeouts, etc., 
> > is normal, but the server connection resets were new.
> > I believe the root cause is that our backend servers are NGINX servers, 
> > which by default have a 1000 request limit per h2 connection 
> > (https://nginx.org/en/docs/http/ngx_http_v2_module.html#http2_max_requests).
> >  As far as I can tell there's no way to set this to unlimited. That 
> > resulted in NGINX resetting the HAProxy backend connections and thus 
> > resulted in user requests being dropped or returning 404s (oddly enough; 
> > though this may be as a result of the outstanding bug related to header 
> > manipulation and HTX mode).
> 

> Do you have such a info in the nginx log?
> 

> "http2 flood detected"
> 

> It's the message from this lines
> 

> https://trac.nginx.org/nginx/browser/nginx/src/http/v2/ngx_http_v2.c#L4517
> 

> > This wouldn't be a problem if one of the following were true:
> > 

> > -   HAProxy could limit the number of times it reused a connection
> 

> Can you try to set some timeout values for`timeout http-keep-alive`
> https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#timeout 
> http-keep-alive
> 

> I assume that this timeout could be helpful because of this block in the doc
> 

> https://cbonte.github.io/haproxy-dconv/1.9/configuration.html
> 

>   - KAL : keep alive ("option http-keep-alive") which is the default mode 
> : all
> requests and responses are processed, and connections remain open but 
> idle
> between responses and new requests.
> 

> 

> and this code part
> 

> https://github.com/haproxy/haproxy/blob/v1.9.0/src/backend.c#L1164
> 

> > -   HAProxy could retry a failed request due to backend server connection 
> > reset (possibly coming in 2.0 with L7 retries?)
> 

> Mind you to create a issue for that if there isn't one already?
> 

> > -   NGINX could set that limit to unlimited.
> 

> Isn't`unsigned int` not enought ?
> How many idle connections do you have for how long time?
> 

> > Our http-reuse is set to aggressive, but that doesn't make much difference, 
> > I don't think, since safe would result in the same behavior (the connection 
> > is reusable…but only for a limited number of requests).
> > We've worked around this by only using h/1.1 on the backends, which isn't a 
> > big probl

Re: H2 Server Connection Resets (1.9.2)

2019-01-22 Thread Luke Seelenbinder
Hi Willy,

I just confirmed the other patchset works, so I will start going down this 
road. :-)

While testing the other issue, I discovered something fascinating. Our 
application is typically used by clients that cancel requests with reasonable 
frequency (5+%). When zooming in and out of maps quickly, it's pretty common 
for map tiles to be requested and then the request is canceled from the 
client-side, because they are no longer required (out of view, etc.).

There is a strong correlation between client connections canceling requests 
(resulting in a HTTP log string of CD--) and then a whole string of requests 
immediately after resulting in server connection resets (resulting in SD--).  
I've included some example log lines below. This behavior causes no issues when 
HTX & h2 mode are disabled for backends (still using h2 on the frontend). If 
the additional information triggers any ideas, let me know, otherwise I'm 
starting down the list recommended by you and Aleks.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

stadiamaps~ tile/tile1 0/0/202/-1/266 -1 0 - - CD-- 2/1/5/5/0 0/0 {} "GET 
/tiles/osm_bright/9/344/1...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/202/-1/266 -1 0 - - CD-- 2/1/4/4/0 0/0 {} "GET 
/tiles/osm_bright/9/348/1...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/202/-1/266 -1 0 - - CD-- 2/1/3/3/0 0/0 {} "GET 
/tiles/osm_bright/9/344/1...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/202/-1/266 -1 0 - - CD-- 2/1/2/2/0 0/0 {} "GET 
/tiles/osm_bright/9/348/1...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/202/-1/266 -1 0 - - CD-- 2/1/1/1/0 0/0 {} "GET 
/tiles/osm_bright/9/344/1...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/202/-1/266 -1 0 - - CD-- 2/1/0/0/0 0/0 {} "GET 
/tiles/osm_bright/9/348/1...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/0/-1/456 -1 0 - - SD-- 2/1/5/5/0 0/0 {} "GET 
/tiles/osm_bright/10/690/3...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/0/-1/456 -1 0 - - SD-- 2/1/4/4/0 0/0 {} "GET 
/tiles/osm_bright/10/695/3...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/0/-1/456 -1 0 - - SD-- 2/1/3/3/0 0/0 {} "GET 
/tiles/osm_bright/10/690/3...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/0/-1/454 -1 0 - - SD-- 2/1/2/2/0 0/0 {} "GET 
/tiles/osm_bright/10/695/3...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/0/-1/454 -1 0 - - SD-- 2/1/1/1/0 0/0 {} "GET 
/tiles/osm_bright/10/690/3...@2x.png HTTP/2.0"
stadiamaps~ tile/tile1 0/0/0/-1/454 -1 0 - - SD-- 2/1/0/0/0 0/0 {} "GET 
/tiles/osm_bright/10/695/3...@2x.png HTTP/2.0"

‐‐‐ Original Message ‐‐‐
On Tuesday, January 22, 2019 11:33 AM, Willy Tarreau  wrote:

> On Tue, Jan 22, 2019 at 09:42:53AM +, Luke Seelenbinder wrote:
> 

> > Hi Willy, Aleks,
> > I will try the things suggested this afternoon (hopefully) or tomorrow and 
> > get back to you.
> > 

> > > At least if nginx does this it should send a GOAWAY
> > > frame indicating that it will stop after stream #2001.
> > 

> > That's my understanding as well (and the docs say as much).
> 

> OK.
> 

> > I assumed HAProxy
> > would properly handle it, as well, so perhaps it's something else nefarious
> > going on in our particular setup.
> 

> Or we might have a bug there as well. I'll recheck the code just in case
> I spot anything.
> 

> > There is still the possibility that the bug
> > fixed by Aleks' patches regarding HTX & headers were causing this issue in a
> > back-handed sort of way. I will apply those patches, establish that the
> > headers bug is fixed, and then try the recommendations from this bug to rule
> > out any interactions on that side (a badly written header in our situation
> > could result in a 404, which seemed to be the worst user-facing case of this
> > bug).
> 

> Sure, let's address one problem at a time :-)
> 

> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: HTX & tune.maxrewrite [1.9.2]

2019-01-22 Thread Luke Seelenbinder
Hi Christopher,

I can confirm the patches fixed the issue. Thanks again for fixing this up!

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Monday, January 21, 2019 2:07 PM, Christopher Faulet  
wrote:

> Le 18/01/2019 à 14:23, Luke Seelenbinder a écrit :
> 

> > Quick clarification on the previous message.
> > The code emitting the warning is almost assuredly here: 
> > https://github.com/haproxy/haproxy/blob/ed7a066b454f09fee07a9ffe480407884496461b/src/proto_htx.c#L3242
> >  not in proto_http.c, seeing how this is in htx mode not http mode.
> > I've traced the issue to likely being caused by the following condition 
> > false:
> > https://github.com/haproxy/haproxy/blob/202c6ce1a27c92d21995ee82c71b2f70c636e3ea/src/htx.c#L93
> > We are dealing with a lot of larger responses (PNGs, 50-100KB/request on 
> > avg) with perhaps 10 simultaneous initial requests on the same h2 
> > connection being very common. That sounds like I may in fact need to tweak 
> > some buffer settings somewhere. In http/1.1 mode, these requests were 
> > spread out across four connections with browsers blocking until the 
> > previous connection finished.
> > The documentation is only somewhat helpful for tune.bufsize and 
> > tune.maxrewrite, http/2 and large requests. If this isn't a bug, would 
> > someone be willing to offer some guidance into good values for these buffer 
> > sizes?
> > Thanks for your help!
> > Best,
> > Luke
> 

> Hi Luke,
> 

> Could you try following patches please ?
> 

> Thanks,
> 

> --
> 

> Christopher Faulet



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2)

2019-01-22 Thread Luke Seelenbinder
Hi Willy, Aleks,

I will try the things suggested this afternoon (hopefully) or tomorrow and get 
back to you.

> At least if nginx does this it should send a GOAWAY
> frame indicating that it will stop after stream #2001.

That's my understanding as well (and the docs say as much). I assumed HAProxy 
would properly handle it, as well, so perhaps it's something else nefarious 
going on in our particular setup. There is still the possibility that the bug 
fixed by Aleks' patches regarding HTX & headers were causing this issue in a 
back-handed sort of way. I will apply those patches, establish that the headers 
bug is fixed, and then try the recommendations from this bug to rule out any 
interactions on that side (a badly written header in our situation could result 
in a 404, which seemed to be the worst user-facing case of this bug).

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Tuesday, January 22, 2019 9:37 AM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Mon, Jan 21, 2019 at 09:30:39AM +, Luke Seelenbinder wrote:
> 

> > After enabling h2 backends (technically `server ... alpn h2,http/1.1`), we
> > began seeing a high number of backend /server/ connection resets. A
> > reasonable number of client-side connection resets due to timeouts, etc., is
> > normal, but the server connection resets were new.
> > I believe the root cause is that our backend servers are NGINX servers, 
> > which
> > by default have a 1000 request limit per h2 connection
> > (https://nginx.org/en/docs/http/ngx_http_v2_module.html#http2_max_requests).
> > As far as I can tell there's no way to set this to unlimited. That resulted
> > in NGINX resetting the HAProxy backend connections and thus resulted in user
> > requests being dropped
> 

> That's rather strange. At least if nginx does this it should send a GOAWAY
> frame indicating that it will stop after stream #2001. We normally respect
> stream limits advertised by the server before deciding if a connection is
> still usable (but we could very well have a bug of course). If it only
> rejects new stream creation, that's extremely inefficient and unfriendly
> to clients, so I doubt it's doing something like this.
> 

> We'll need to run some interoperability tests on nginx so see what happens.
> It might indeed be that the only short-term solution would be to add an
> option to limit the total number of streams per connection. I don't see
> any value in doing something as gross, except working around some memory
> leak bugs, but we also need to be able to adapt to such servers.
> 

> Could you try h2load on your server to see if it reports errors ? Just
> use a single connection (-c 1) and a few streams (-m 10), and no more
> than 10k requests (-n 1). It could give us some hints about how it
> works and behaves.
> 

> Thanks,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


H2 Server Connection Resets (1.9.2)

2019-01-21 Thread Luke Seelenbinder
Hi all,

One more bug (or configuration hole) from our transition to 1.9.x using 
end-to-end h2 connections.

After enabling h2 backends (technically `server … alpn h2,http/1.1`), we began 
seeing a high number of backend /server/ connection resets. A reasonable number 
of client-side connection resets due to timeouts, etc., is normal, but the 
server connection resets were new.

I believe the root cause is that our backend servers are NGINX servers, which 
by default have a 1000 request limit per h2 connection 
(https://nginx.org/en/docs/http/ngx_http_v2_module.html#http2_max_requests). As 
far as I can tell there's no way to set this to unlimited. That resulted in 
NGINX resetting the HAProxy backend connections and thus resulted in user 
requests being dropped or returning 404s (oddly enough; though this may be as a 
result of the outstanding bug related to header manipulation and HTX mode).

This wouldn't be a problem if one of the following were true:

- HAProxy could limit the number of times it reused a connection
- HAProxy could retry a failed request due to backend server connection reset 
(possibly coming in 2.0 with L7 retries?)
- NGINX could set that limit to unlimited.

Our http-reuse is set to aggressive, but that doesn't make much difference, I 
don't think, since safe would result in the same behavior (the connection is 
reusable…but only for a limited number of requests).

We've worked around this by only using h/1.1 on the backends, which isn't a big 
problem for us, but I thought I would raise the issue, since I'm sure a lot of 
folks are using haproxy <-> nginx pairings, and this is a bit of a subtle 
result of that in full h2 mode.

Thanks again for such great software—I've found it pretty fantastic to run in 
production. :)

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: HTX & tune.maxrewrite [1.9.2]

2019-01-18 Thread Luke Seelenbinder
Quick clarification on the previous message.

The code emitting the warning is almost assuredly here: 
https://github.com/haproxy/haproxy/blob/ed7a066b454f09fee07a9ffe480407884496461b/src/proto_htx.c#L3242
 not in proto_http.c, seeing how this is in htx mode not http mode.

I've traced the issue to likely being caused by the following condition false:

https://github.com/haproxy/haproxy/blob/202c6ce1a27c92d21995ee82c71b2f70c636e3ea/src/htx.c#L93

We are dealing with a lot of larger responses (PNGs, 50-100KB/request on avg) 
with perhaps 10 simultaneous initial requests on the same h2 connection being 
very common. That sounds like I may in fact need to tweak some buffer settings 
somewhere. In http/1.1 mode, these requests were spread out across four 
connections with browsers blocking until the previous connection finished.

The documentation is only somewhat helpful for tune.bufsize and 
tune.maxrewrite, http/2 and large requests. If this isn't a bug, would someone 
be willing to offer some guidance into good values for these buffer sizes?

Thanks for your help!

Best,
Luke

‐‐‐ Original Message ‐‐‐
On Friday, January 18, 2019 1:10 PM, Luke Seelenbinder 
 wrote:

> Hello all,
> 

> I just rolled out 1.9.2 compiled from the official tarball to a subset of our 
> servers, and I'm observing some odd behavior in the logs. I'm seeing the 
> following warning (with accompanying warnings about failed hdr rewrites in 
> the stats page):
> 

> Proxy foo failed to add or set the response header 'bar' for request #1380. 
> You might need to increase tune.maxrewrite
> 

> I've tweaked tune.maxrewrite to 2048 with no apparent affect (it was not 
> previously set). The further odd thing is that the header is present on the 
> client side (seemingly every time). This is does not happen with an identical 
> config in 1.8.x (obviously sans h2 on both ends & option http-use-htx).
> 

> Any ideas regarding what I should investigate? The line emitting the warning 
> seems to be 
> https://github.com/haproxy/haproxy/blob/master/src/proto_http.c#L1630.
> 

> I'm happy to try patches additional configuration changes if that would 
> assist. I assume it's something slightly amiss in the HTX setup or my 
> configuration thereof.
> 

> Best,
> Luke
> 

> —
> Luke Seelenbinder
> Stadia Maps | Founder
> stadiamaps.com



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


HTX & tune.maxrewrite [1.9.2]

2019-01-18 Thread Luke Seelenbinder
Hello all,

I just rolled out 1.9.2 compiled from the official tarball to a subset of our 
servers, and I'm observing some odd behavior in the logs. I'm seeing the 
following warning (with accompanying warnings about failed hdr rewrites in the 
stats page):

Proxy foo failed to add or set the response header 'bar' for request #1380. You 
might need to increase tune.maxrewrite

I've tweaked tune.maxrewrite to 2048 with no apparent affect (it was not 
previously set). The further odd thing is that the header is present on the 
client side (seemingly every time). This is does not happen with an identical 
config in 1.8.x (obviously sans h2 on both ends & option http-use-htx).

Any ideas regarding what I should investigate? The line emitting the warning 
seems to be 
https://github.com/haproxy/haproxy/blob/master/src/proto_http.c#L1630.

I'm happy to try patches additional configuration changes if that would assist. 
I assume it's something slightly amiss in the HTX setup or my configuration 
thereof.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: Segfault in assign_tproxy_address with h2 and source address .[1.9.2]

2019-01-17 Thread Luke Seelenbinder
Hi Oliver,

Yes! I can confirm the patch does indeed work—thanks for the quick turnaround.

Best,
Luke

‐‐‐ Original Message ‐‐‐
On Thursday, January 17, 2019 4:01 PM, Olivier Houchard  
wrote:

> Hi Luke,
> 

> On Thu, Jan 17, 2019 at 02:35:38PM +0000, Luke Seelenbinder wrote:
> 

> > Hello all,
> > First, I wanted to say a huge thanks to the team for a producing a quality 
> > piece of software. My company just moved all of our traffic over, and the 
> > performance and nimbleness of haproxy is impressive. I'm testing 1.9.2 for 
> > migration as soon as it's stable for our use-case.
> > I'm experiencing a segfault when running in mode http, option http-use-htx, 
> > with h2 backends (alpn h2) and an assigned source address. A sanitized 
> > config is as follows:
> > defaults
> > source []
> > mode http
> > http-reuse always
> > option http-use-htx
> > listen test
> > bind :443 ssl crt  alpn h2,http/1.1
> > server backend ipv6@ check ssl crt  ca-file 
> >  verifyhost  alpn h2,http/1.1 check-alpn http/1.1
> > If I disable h2 on the backend, it works correctly. If I disable the source 
> > in defaults, it works correctly. I've attached the backtrace below.
> > Best,
> > Luke
> 

> I think I understand what's going on.
> Does the attached patch fix it for you ?
> 

> Thanks a lot !
> 

> Olivier



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Segfault in assign_tproxy_address with h2 and source address .[1.9.2]

2019-01-17 Thread Luke Seelenbinder
Hello all,

First, I wanted to say a huge thanks to the team for a producing a quality 
piece of software. My company just moved all of our traffic over, and the 
performance and nimbleness of haproxy is impressive. I'm testing 1.9.2 for 
migration as soon as it's stable for our use-case.

I'm experiencing a segfault when running in mode http, option http-use-htx, 
with h2 backends (alpn h2) and an assigned source address. A sanitized config 
is as follows:

defaults
   source []
   mode http
   http-reuse always
   option http-use-htx

listen test
   bind :443 ssl crt  alpn h2,http/1.1
   server backend ipv6@ check ssl crt  ca-file 
 verifyhost  alpn h2,http/1.1 check-alpn http/1.1

If I disable h2 on the backend, it works correctly. If I disable the source in 
defaults, it works correctly. I've attached the backtrace below.

Best,
Luke

#0  0x55672f2e in memset (__len=128, __ch=0, __dest=0x98) at 
/usr/include/x86_64-linux-gnu/bits/string3.h:90
#1  assign_tproxy_address (s=0x55dd6500) at src/backend.c:1047
#2  connect_server (s=s@entry=0x55dd6500) at src/backend.c:1379
#3  0x555e87dc in sess_update_stream_int (s=0x55dd6500) at 
src/stream.c:928
#4  process_stream (t=, context=0x55dd6500, state=) at src/stream.c:2305
#5  0x556b3f94 in process_runnable_tasks () at src/task.c:432
#6  0x5562d051 in run_poll_loop () at src/haproxy.c:2619
#7  run_thread_poll_loop (data=) at src/haproxy.c:2684
#8  0x555869c2 in main (argc=, argv=) at 
src/haproxy.c:3313

—
Luke Seelenbinder
Stadia Maps | Founder
https://stadiamaps.com