Re: [ANNOUNCE] haproxy-2.9-dev10
Hi Tristan. On 2023-11-20 (Mo.) 15:14, Tristan wrote: Hi Aleksandar, On 20 Nov 2023, at 17:18, Aleksandar Lazic wrote: at configuration Change the reload leaves the old processes alive until "hard-stop-after" value and after that is the connection terminated which does not looks like that the connection was takeover to the new process. The use case was log shipping with HAProxy with mode tcp, as far as I have understood the author in a proper way. Is that new behavior? Because I was under the impression that this is by design Well I don't know as I have not the setup in use, I'm just the messenger and ask if somebody have seen also such a behavior within tcp mode. If the new process took over an existing L4 connection, it seems like it’d cause strange behavior in quite a few cases due to configuration changes. Well as there are the *_takover functions for http and fcgi, maybe there is also such a function for tcp but I may have overseen it. Either haproxy tries to reuse all old values and essentially needs to fork the new process for takeover (which then is equivalent to the current old process living for a while), or it applies new values to the existing connection (assuming it’s even possible in all cases) which is likely to just break it (removed frontend, backend, or server; or timeouts changes, etc etc). Seems like it’s just a design choice to me [1] and that HAProxy’s approach is sort of the only sane one… Ofc that means potentially a lot of old processes, hence hard-stop-after and max-reloads as tunables. Now in a k8s environment I can imagine high churn in pods causing a lot of server changes and making this a problem, but the official ingress controllers seems to generally mitigate it by using the runtime api when it can instead of hard reloads, and only using the latter in limited cases 🤔 Maybe the used the « community » ingress controller (bless their maintainer, it’s not a jab at it) which does rely more on hard reloads Either way, sounds unlikely to be a fix for it? I'm also not sure if the tcp mode have such a take over mechanism but it would be nice for the hitless/seamless reload. Tristan [1]: Also a bit out of topic but I always found ~infinite duration TCP connections to be a very strange idea… So many things can go wrong (and will go wrong) if you depend on it… at least it’s never going to be as reliable as client side retries or moving to UDP where possible… Regards Alex
Re: [ANNOUNCE] haproxy-2.9-dev10
Hi Aleksandar, > On 20 Nov 2023, at 17:18, Aleksandar Lazic wrote: > > at configuration Change the reload leaves the old processes alive until > "hard-stop-after" value and after that is the connection terminated which > does not looks like that the connection was takeover to the new process. The > use case was log shipping with HAProxy with mode tcp, as far as I have > understood the author in a proper way. Is that new behavior? Because I was under the impression that this is by design If the new process took over an existing L4 connection, it seems like it’d cause strange behavior in quite a few cases due to configuration changes. Either haproxy tries to reuse all old values and essentially needs to fork the new process for takeover (which then is equivalent to the current old process living for a while), or it applies new values to the existing connection (assuming it’s even possible in all cases) which is likely to just break it (removed frontend, backend, or server; or timeouts changes, etc etc). Seems like it’s just a design choice to me [1] and that HAProxy’s approach is sort of the only sane one… Ofc that means potentially a lot of old processes, hence hard-stop-after and max-reloads as tunables. Now in a k8s environment I can imagine high churn in pods causing a lot of server changes and making this a problem, but the official ingress controllers seems to generally mitigate it by using the runtime api when it can instead of hard reloads, and only using the latter in limited cases 🤔 Maybe the used the « community » ingress controller (bless their maintainer, it’s not a jab at it) which does rely more on hard reloads Either way, sounds unlikely to be a fix for it? Tristan [1]: Also a bit out of topic but I always found ~infinite duration TCP connections to be a very strange idea… So many things can go wrong (and will go wrong) if you depend on it… at least it’s never going to be as reliable as client side retries or moving to UDP where possible…
Re: [ANNOUNCE] haproxy-2.9-dev10
Hi Willy. On 2023-11-18 (Sa.) 15:40, Willy Tarreau wrote: Hi, HAProxy 2.9-dev10 was released on 2023/11/18. It added 154 new commits after version 2.9-dev9. Wow what a release :-) [snipp] BUG/MEDIUM: mux-h2: fail earlier on malloc in takeover() BUG/MEDIUM: mux-h1: fail earlier on malloc in takeover() BUG/MEDIUM: mux-fcgi: fail earlier on malloc in takeover() I just have seen this comits and asked my self could this have some positive effect to the hitless/seamless reload issue mentioned in this comment? https://github.com/mholt/caddy-l4/issues/132#issuecomment-1672367076 > (I originally used HAProxy, but its promise of hitless reloads is a complete lie, whereas caddy-l4 actually does the right thing.) I have contacted the author of the comment what the problem was and the answer was that at configuration Change the reload leaves the old processes alive until "hard-stop-after" value and after that is the connection terminated which does not looks like that the connection was takeover to the new process. The use case was log shipping with HAProxy with mode tcp, as far as I have understood the author in a proper way. This behavior was seen with HAProxy 2.4 and 2.6. Have anybody else face the issue that a long running connection, im mode tcp, was terminated with a reload of haproxy? Regards Alex
[ANNOUNCE] haproxy-2.9-dev10
Hi, HAProxy 2.9-dev10 was released on 2023/11/18. It added 154 new commits after version 2.9-dev9. It's not yet as calm as I would like it to be but that's essentially due to a number of long-lasting bugs being worked on at the same time, and many of those fixed here were already in 2.8, so instead it's a sign of stabilization. Rémi's updates on the cache were finally merged. The patches were clean and the review easy enough. Given that this series has been under stress test for a while now and that its main goal is to significantly reduce locking contention, there was no reason to further delay it. During the review we even noticed new opportunities for future improvements, such as changing the default block size etc. But this can come later and even be backported if needed. The automatic reconnection issue that I mentioned was affecting reverse-http was finally not caused by reverse-http but by an old bug which prevents connection errors from being reported immediately when ALPN is configured on a server. This was fixed, but it gave the opportunity to reverse-http to also honor the connect timeout. Speaking about reverse-http, there were issues in which some incoming connections would be considered in excess and rejected too early, sometimes causing 503 to appear. Now it seems to be working quite well. On the QUIC front, we now have a "timeout client-hs" that permits to set a timeout for the client handshake phase without being bound by the client timeout. We indeed anticipate that in the future some users will need a large client timeout for certain applications and still want a small handshake timeout to avoid easy attacks on the early stages of the connection. The way the accept queue and handshakes was counted has been refined to better match reality: retry tokens will now only be sent when too many unconfirmed connections were seen, and the listener will stop responding once the accept queue is full, just like for TCP finally. This allows to reuse existing tunables without creating new ones. Just like for TCP, the QUIC TLS handshakes are now performed in a task tagged "heavy" which allows to maintain low processing latency even when under attack. Finally, all congestion control algorithms now support an optional argument that sets their maximum window size. Just like for TCP with sysctls, this will be useful to balance memory usage versus performance. Some changes were brought to the log backend. Initially a "log-balance" keyword had been added to select the LB algorithms without having to rework the "balance" parser to adapt to the mode, but we finally thought it was a mistake that we would risk to have to maintain for a long time and justify by "this is historic". So now in a backend in mode "log", the "balance" keyword sets the balance algorithm. Those that are not supported in "mode log" will trigger a warning, and those specific to this mode will trigger a warning when used in tcp/http. There are differences between modes for the same algorithm, which are related to how they're used, but these are documented. Right now we have "log-hash" for the hash, which differs from "hash" in that it takes a list of converters to apply to the log line. Possibly in the future we'll implement a dummy sample-fetch function returning the log line so that "hash" can be used naturally on it, I don't know. Similarly, there's a "log-sticky" which sticks to the same server as long as it's up. I'm just realizing while writing this that some users already do that for data base servers and might like this with regular servers, so maybe this will later turn into "sticky" and be usable everywhere, again I don't really know. Let's not try to anticipate too much, at least what I want is to make sure the config remains understandable, manageable and non-confusing. In previous dev version there were a number of commits that were added to exclude some backend keywords depending on the mode, but these were not reliable enough (depends if "mode" was seen before or after). These commits were reverted and the check moved to a more reliable place. After careful inspection of the behavior under zero-copy fast-forward transfers, we noticed the memory usage hadn't dropped as we'd anticipate, and finally found a few places with missing synchronous flushes. On large h2 transfers we've observed an increase of 60% of the forwarding badnwidth together with a reduction of 30% of the memory usage! That was worth it! The bandwidth increase is in fact caused by the lower memory usage: data don't have the time to rot in the L3 cache anymore, and that was the original goal. The gains will definitely vary depending on the CPUs and traffic patterns though. For now QUIC doesn't benefit from it. We also found that the H2 mux was wasting a lot of memory on single-stream transfers (i.e. someone watching a video), because all of the 32 buffers a connection can have could be allocated for this single stream. This was changed so that on average t