Re: [ANNOUNCE] haproxy-2.9-dev10

2023-11-20 Thread Aleksandar Lazic

Hi Tristan.

On 2023-11-20 (Mo.) 15:14, Tristan wrote:

Hi Aleksandar,


On 20 Nov 2023, at 17:18, Aleksandar Lazic  wrote:

at configuration Change the reload leaves the old processes alive until 
"hard-stop-after" value and after that is the connection terminated which does

not looks like that the connection was takeover to the new process. The use
case was log shipping with HAProxy with mode tcp, as far as I have understood
the author in a proper way.


Is that new behavior? Because I was under the impression that this is by design


Well I don't know as I have not the setup in use, I'm just the messenger and ask
if somebody have seen also such a behavior within tcp mode.


If the new process took over an existing L4 connection, it seems like it’d cause
strange behavior in quite a few cases due to configuration changes.


Well as there are the *_takover functions for http and fcgi, maybe there is also 
such a function for tcp but I may have overseen it.



Either haproxy tries to reuse all old values and essentially needs to fork the 
new
process for takeover (which then is equivalent to the current old process living
for a while), or it applies new values to the existing connection (assuming it’s
even possible in all cases) which is likely to just break it (removed frontend,
backend, or server; or timeouts changes, etc etc).

Seems like it’s just a design choice to me [1] and that HAProxy’s approach is 
sort
of the only sane one…
Ofc that means potentially a lot of old processes, hence hard-stop-after and
max-reloads as tunables.

Now in a k8s environment I can imagine high churn in pods causing a lot of 
server
changes and making this a problem, but the official ingress controllers seems to
generally mitigate it by using the runtime api when it can instead of hard 
reloads,
and only using the latter in limited cases 🤔
Maybe the used the « community » ingress controller (bless their maintainer, 
it’s
not a jab at it) which does rely more on hard reloads

Either way, sounds unlikely to be a fix for it?


I'm also not sure if the tcp mode have such a take over mechanism but it would 
be nice for the hitless/seamless reload.



Tristan

[1]: Also a bit out of topic but I always found ~infinite duration TCP 
connections
to be a very strange idea… So many things can go wrong (and will go wrong) if 
you
depend on it… at least it’s never going to be as reliable as client side retries
or moving to UDP where possible…


Regards
Alex



Re: [ANNOUNCE] haproxy-2.9-dev10

2023-11-20 Thread Tristan
Hi Aleksandar,

> On 20 Nov 2023, at 17:18, Aleksandar Lazic  wrote:
> 
> at configuration Change the reload leaves the old processes alive until 
> "hard-stop-after" value and after that is the connection terminated which 
> does not looks like that the connection was takeover to the new process. The 
> use case was log shipping with HAProxy with mode tcp, as far as I have 
> understood the author in a proper way.

Is that new behavior? Because I was under the impression that this is by design

If the new process took over an existing L4 connection, it seems like it’d 
cause strange behavior in quite a few cases due to configuration changes.

Either haproxy tries to reuse all old values and essentially needs to fork the 
new process for takeover (which then is equivalent to the current old process 
living for a while), or it applies new values to the existing connection 
(assuming it’s even possible in all cases) which is likely to just break it 
(removed frontend, backend, or server; or timeouts changes, etc etc).

Seems like it’s just a design choice to me [1] and that HAProxy’s approach is 
sort of the only sane one…
Ofc that means potentially a lot of old processes, hence hard-stop-after and 
max-reloads as tunables.

Now in a k8s environment I can imagine high churn in pods causing a lot of 
server changes and making this a problem, but the official ingress controllers 
seems to generally mitigate it by using the runtime api when it can instead of 
hard reloads, and only using the latter in limited cases 🤔
Maybe the used the « community » ingress controller (bless their maintainer, 
it’s not a jab at it) which does rely more on hard reloads

Either way, sounds unlikely to be a fix for it?

Tristan 

[1]: Also a bit out of topic but I always found ~infinite duration TCP 
connections to be a very strange idea… So many things can go wrong (and will go 
wrong) if you depend on it… at least it’s never going to be as reliable as 
client side retries or moving to UDP where possible…



Re: [ANNOUNCE] haproxy-2.9-dev10

2023-11-20 Thread Aleksandar Lazic

Hi Willy.

On 2023-11-18 (Sa.) 15:40, Willy Tarreau wrote:

Hi,

HAProxy 2.9-dev10 was released on 2023/11/18. It added 154 new commits
after version 2.9-dev9.


Wow what a release :-)

[snipp]


   BUG/MEDIUM: mux-h2: fail earlier on malloc in takeover()
   BUG/MEDIUM: mux-h1: fail earlier on malloc in takeover()
   BUG/MEDIUM: mux-fcgi: fail earlier on malloc in takeover()


I just have seen this comits and asked my self could this have some positive 
effect to the hitless/seamless reload issue mentioned in this comment?


https://github.com/mholt/caddy-l4/issues/132#issuecomment-1672367076
> (I originally used HAProxy, but its promise of hitless reloads is a complete 
lie, whereas caddy-l4 actually does the right thing.)


I have contacted the author of the comment what the problem was and the answer 
was that at configuration Change the reload leaves the old processes alive until 
"hard-stop-after" value and after that is the connection terminated which does 
not looks like that the connection was takeover to the new process. The use case 
was log shipping with HAProxy with mode tcp, as far as I have understood the 
author in a proper way.


This behavior was seen with HAProxy 2.4 and 2.6.

Have anybody else face the issue that a long running connection, im mode tcp, 
was terminated with a reload of haproxy?


Regards
Alex



[ANNOUNCE] haproxy-2.9-dev10

2023-11-18 Thread Willy Tarreau
Hi,

HAProxy 2.9-dev10 was released on 2023/11/18. It added 154 new commits
after version 2.9-dev9.

It's not yet as calm as I would like it to be but that's essentially due
to a number of long-lasting bugs being worked on at the same time, and
many of those fixed here were already in 2.8, so instead it's a sign of
stabilization.

Rémi's updates on the cache were finally merged. The patches were clean
and the review easy enough. Given that this series has been under stress
test for a while now and that its main goal is to significantly reduce
locking contention, there was no reason to further delay it. During the
review we even noticed new opportunities for future improvements, such
as changing the default block size etc. But this can come later and even
be backported if needed.

The automatic reconnection issue that I mentioned was affecting reverse-http
was finally not caused by reverse-http but by an old bug which prevents
connection errors from being reported immediately when ALPN is configured
on a server. This was fixed, but it gave the opportunity to reverse-http
to also honor the connect timeout. Speaking about reverse-http, there were
issues in which some incoming connections would be considered in excess and
rejected too early, sometimes causing 503 to appear. Now it seems to be
working quite well.

On the QUIC front, we now have a "timeout client-hs" that permits to set
a timeout for the client handshake phase without being bound by the client
timeout. We indeed anticipate that in the future some users will need a
large client timeout for certain applications and still want a small
handshake timeout to avoid easy attacks on the early stages of the
connection. The way the accept queue and handshakes was counted has
been refined to better match reality: retry tokens will now only be
sent when too many unconfirmed connections were seen, and the listener
will stop responding once the accept queue is full, just like for TCP
finally. This allows to reuse existing tunables without creating new
ones. Just like for TCP, the QUIC TLS handshakes are now performed in
a task tagged "heavy" which allows to maintain low processing latency
even when under attack. Finally, all congestion control algorithms now
support an optional argument that sets their maximum window size. Just
like for TCP with sysctls, this will be useful to balance memory usage
versus performance.

Some changes were brought to the log backend. Initially a "log-balance"
keyword had been added to select the LB algorithms without having to
rework the "balance" parser to adapt to the mode, but we finally thought
it was a mistake that we would risk to have to maintain for a long time
and justify by "this is historic". So now in a backend in mode "log",
the "balance" keyword sets the balance algorithm. Those that are not
supported in "mode log" will trigger a warning, and those specific to
this mode will trigger a warning when used in tcp/http. There are
differences between modes for the same algorithm, which are related to
how they're used, but these are documented. Right now we have "log-hash"
for the hash, which differs from "hash" in that it takes a list of
converters to apply to the log line. Possibly in the future we'll implement
a dummy sample-fetch function returning the log line so that "hash" can
be used naturally on it, I don't know. Similarly, there's a "log-sticky"
which sticks to the same server as long as it's up. I'm just realizing
while writing this that some users already do that for data base servers
and might like this with regular servers, so maybe this will later turn
into "sticky" and be usable everywhere, again I don't really know. Let's
not try to anticipate too much, at least what I want is to make sure the
config remains understandable, manageable and non-confusing. In previous
dev version there were a number of commits that were added to exclude
some backend keywords depending on the mode, but these were not reliable
enough (depends if "mode" was seen before or after). These commits were
reverted and the check moved to a more reliable place.

After careful inspection of the behavior under zero-copy fast-forward
transfers, we noticed the memory usage hadn't dropped as we'd anticipate,
and finally found a few places with missing synchronous flushes. On large
h2 transfers we've observed an increase of 60% of the forwarding badnwidth
together with a reduction of 30% of the memory usage! That was worth it!
The bandwidth increase is in fact caused by the lower memory usage: data
don't have the time to rot in the L3 cache anymore, and that was the
original goal. The gains will definitely vary depending on the CPUs and
traffic patterns though. For now QUIC doesn't benefit from it. We also
found that the H2 mux was wasting a lot of memory on single-stream
transfers (i.e. someone watching a video), because all of the 32 buffers
a connection can have could be allocated for this single stream. This
was changed so that on average t