Re: [RFC] Allow disabling abstract unix socket paths NUL-padding

2024-03-09 Thread Tristan




On 09/03/2024 18:09, Tristan wrote:

Hi Willy,

On 09/03/2024 16:51, Willy Tarreau wrote:

Hi Tristan,

On Sat, Mar 09, 2024 at 04:20:21PM +, Tristan wrote:

To be honest, I don't think this is unfixable. It's just a matter of how
much code change we think is acceptable for it.


I don't mind about the amount of changes. "we've always done it like 
this"

is never a valid excuse to keep code as it is, especially when it's wrong
and causes difficulties to add new features. Of course I prefer invasive
changes early in the dev cycle than late but we're still OK and I don't
expect that many changes for this anyway.


Agreed on not shying away from amount of changes.

I tried to follow that route, being generous in what I was willing to 
touch, but ran into the issue of protocols and socket families being a 
1:1 binding.


So adding a new socket family *requires* adding a new protocol too, 
which isn't too hard but spirals a bit out of control:
- protocol_by_family is a misnomer, and it is actually protocols *by 
socket domain* (which is still AF_UNIX, even with family AF_CUST_ABNS2, 
and ends up overriding standard socket/abns protocol)
- but you can't just change that, because then other things stop working 
as some places rely on socket domain != socket family


Actually, following up on this, I'm more and more convinced that it 
actually makes things somewhat worse to take this approach...


At least I don't see a good way to make it work without looking very 
bad, because in a few places we lookup a protocol based on a 
socket_storage reference, so we have truly only the socket domain to 
work with if no connection is established yet.


While it sounds like shying away from deeper changes, it seems to me 
that a bind/server option is truly the right way to go here.


Tristan



Re: [RFC] Allow disabling abstract unix socket paths NUL-padding

2024-03-09 Thread Tristan

Hi Willy,

On 09/03/2024 16:51, Willy Tarreau wrote:

Hi Tristan,

On Sat, Mar 09, 2024 at 04:20:21PM +, Tristan wrote:

To be honest, I don't think this is unfixable. It's just a matter of how
much code change we think is acceptable for it.


I don't mind about the amount of changes. "we've always done it like this"
is never a valid excuse to keep code as it is, especially when it's wrong
and causes difficulties to add new features. Of course I prefer invasive
changes early in the dev cycle than late but we're still OK and I don't
expect that many changes for this anyway.


Agreed on not shying away from amount of changes.

I tried to follow that route, being generous in what I was willing to 
touch, but ran into the issue of protocols and socket families being a 
1:1 binding.


So adding a new socket family *requires* adding a new protocol too, 
which isn't too hard but spirals a bit out of control:
- protocol_by_family is a misnomer, and it is actually protocols *by 
socket domain* (which is still AF_UNIX, even with family AF_CUST_ABNS2, 
and ends up overriding standard socket/abns protocol)
- but you can't just change that, because then other things stop working 
as some places rely on socket domain != socket family



If push comes to shove, one
can always add an extra field somewhere, or an extra arg in get_addr_len,
even if it'd be a little ugly.


Actually you're totally right and you make me realize that there's an
addrcmp field in the address families, and it's inconsistent to have
get_addr_len() being family-agnostic. So most likely we should first
change get_addr_len() to be per-family and appear in the proto_fam
struct. The few places where get_addr_len() is used should instead
rely on the protocol family for this. And due to this new address
having a variable length, we should support passing a pointer to the
address (like the current get_addr_len() does) in addition to the
protocol pointer.


I tried to do that yes, changing sock_addrlen to be a function pointer.
But it's actually less convenient, surprisingly, since then you end up 
with constructs like:


  listener->rx/tx->ss->proto->fam->get_addrlen(listener->rx/tx->ss)

(paraphrasing from memory; point is that it becomes somewhat unpleasant 
to look at)


Still might end up having to do that given how tricky things become 
otherwise...



Doing so would cleanly unlock all the problem. The new abns2 family
would just bring its own get_addr_len() variant that uses strlen().


Agreed on that OOP-ish approach being in principle a bit nicer though 
(sorry for the swear word ;-) )


Tristan



Re: [RFC] Allow disabling abstract unix socket paths NUL-padding

2024-03-09 Thread Willy Tarreau
Hi Tristan,

On Sat, Mar 09, 2024 at 04:20:21PM +, Tristan wrote:
> To be honest, I don't think this is unfixable. It's just a matter of how
> much code change we think is acceptable for it.

I don't mind about the amount of changes. "we've always done it like this"
is never a valid excuse to keep code as it is, especially when it's wrong
and causes difficulties to add new features. Of course I prefer invasive
changes early in the dev cycle than late but we're still OK and I don't
expect that many changes for this anyway.

> If push comes to shove, one
> can always add an extra field somewhere, or an extra arg in get_addr_len,
> even if it'd be a little ugly.

Actually you're totally right and you make me realize that there's an
addrcmp field in the address families, and it's inconsistent to have
get_addr_len() being family-agnostic. So most likely we should first
change get_addr_len() to be per-family and appear in the proto_fam
struct. The few places where get_addr_len() is used should instead
rely on the protocol family for this. And due to this new address
having a variable length, we should support passing a pointer to the
address (like the current get_addr_len() does) in addition to the
protocol pointer.

Doing so would cleanly unlock all the problem. The new abns2 family
would just bring its own get_addr_len() variant that uses strlen().

This just shows how important it is to discuss about design ;-)

Cheers,
Willy



Re: [RFC] Allow disabling abstract unix socket paths NUL-padding

2024-03-09 Thread Tristan

Hi Willy,

On 08/03/2024 18:01, Willy Tarreau wrote:

The problem with default values is that a single behavior cannot be deduced
from reading a single config statement. That's quite painful for lots of
people (including those who copy config blocks from stackoverflow), and
for API tools. And it works half of the time because internally both modes
work but they can't interoperate with other tools. Here we're really
indicating a new address format. There's nothing wrong with that and we
do have the tools to handle that.

I think that the plan should be:

   - add AF_CUST_ABNS2 to protocol-t.h before AF_CUST_MAX
   - add to str2sa_range() the case for "abns2" just after "abns" with
 address family "AF_CUST_ABNS2", and duplicate the test for
 ss.ss_family == AF_UNIX later for AF_CUST_ABNS2 removing all the
 stuff related to !abstract, or instead, just extend the "if" condition
 to the new family and add that case there as well (might be even easier).
   - duplicate "proto_fam_unix" to "proto_fam_abns2" in sock_unix.c, referencing
 "sock_abns2_addrcmp" for the address comparison
   - duplicate sock_unix_addrcmp() to sock_abns2_addrcmp() and implement only
 the abns case, basically:

if (a->ss_family != b->ss_family)
return -1;

if (a->ss_family != AF_CUST_ABNS2)
return -1;

if (au->sun_path[0])
return -1;

return memcmp(au->sun_path, bu->sun_path, sizeof(au->sun_path));
   - in sock_unix_bind_receiver(), add a case for this address family in
 the bind() size argument, replacing sizeof(addr) with the string
 length when running AF_CUST_ABNS2.

   - for get_addr_len() I need to think :-/  I actually don't know how
 that one works with socketpairs already, so there might be a trick
 I'm overlooking.


Alright. I will look at how that looks and report back shortly.


I need to leave now, but I continue to think that if there is any internal
shortcoming, we should not try to work around it at the expense of trickier
long-term maintenance, instead we should address it. And don't worry, I'm
not going to assign you the task of dealing with limited stuff. We may end
up with the workaround in the worst case if we don't find better, but that
would mean declaring a failure and having to break that stuff sometime in
the future, which is always a pain.


To be honest, I don't think this is unfixable. It's just a matter of how 
much code change we think is acceptable for it. If push comes to shove, 
one can always add an extra field somewhere, or an extra arg in 
get_addr_len, even if it'd be a little ugly.


Tristan



[ANNOUNCE] haproxy-3.0-dev5

2024-03-09 Thread Willy Tarreau
Hi,

HAProxy 3.0-dev5 was released on 2024/03/09. It added 58 new commits
after version 3.0-dev4.

Again mostly fixes for recent regressions dominate this version (ocsp
crashes, zero-copy forwarding) and for older bugs (locking issues in Lua,
QUIC freezes during handshake, initial settings for "add server").

Among the new features, we finally support draining HTTP/1 requests
when we respond early to POST requests (the typical redirect or 401 on
POST). Previously we'd send the response, drain pending data and close
if not all data were sent. But given that there remain rare cases where
this continues to cause trouble to some clients (late incoming data can
cause a reset in the TCP stack and destroy the response), and that the
mux-based architecture now makes this much easier, it was about time to
implement it to get rid of this rare but annoying case.

The rest is pretty minor, an AES encryption converter (we used to only
have the decryption side), Solaris build fixes, improved "show quic"
output to help troubleshooting, improved performance when traces are
enabled with an attached reader (previously we used to rely on a lock
to make sure to emit the dropped counter, but that approach was wrong
and causing everything to work at the speed of the slowest thread).

Ah and we got a report of a funny bug affecting the "random" balance
algorithm. Internally we have two random generators, a slow one which
is suitable for generating UUIDs and and a fast one which is only
suitable for statistical randoms. Obviously "balance random" relies on
the second one, which produces a predictable sequence for a given thread.
It just turns out that the sequence was initialized with the thread number
and that incoming connections are distributed by default in round-robin
fashion to available threads. The end result of all of this is that when
using "balance random", the first request would always be sent to the same
server, which creates a visible skew for those who reload very frequently!
This was fixed by seeding the fast one with the slow one at boot. Who would
have imagined that reloading very frequently would exhibit such design
limitations!

And as usual, cleanups, doc and CI updates close the list.

Over the last two weeks, we've participated to interesting discussions
with a few users who explained how some of the limitations regarding the
use of dynamic servers affect their usage. Some of them were quickly
addressed but what remains was written down in GitHub issues 2469, 2482
and 2483. Those who try to minimize the number of reloads might want to
have a look there and possibly feed the design discussions.

Please find the usual URLs below :
   Site index   : https://www.haproxy.org/
   Documentation: https://docs.haproxy.org/
   Wiki : https://github.com/haproxy/wiki/wiki
   Discourse: https://discourse.haproxy.org/
   Slack channel: https://slack.haproxy.org/
   Issue tracker: https://github.com/haproxy/haproxy/issues
   Sources  : https://www.haproxy.org/download/3.0/src/
   Git repository   : https://git.haproxy.org/git/haproxy.git/
   Git Web browsing : https://git.haproxy.org/?p=haproxy.git
   Changelog: https://www.haproxy.org/download/3.0/src/CHANGELOG
   Dataplane API: 
https://github.com/haproxytech/dataplaneapi/releases/latest
   Pending bugs : https://www.haproxy.org/l/pending-bugs
   Reviewed bugs: https://www.haproxy.org/l/reviewed-bugs
   Code reports : https://www.haproxy.org/l/code-reports
   Latest builds: https://www.haproxy.org/l/dev-packages

Willy
---
Complete changelog :
Amaury Denoyelle (8):
  BUG/MEDIUM: server: fix dynamic servers initial settings
  MINOR: quic: filter show quic by address
  MINOR: quic: specify show quic output fields
  MINOR: quic: add MUX output for show quic
  BUG/MEDIUM: quic: fix connection freeze on post handshake
  BUG/MINOR: mux-quic: fix crash on aborting uni remote stream
  BUG/MEDIUM: quic: fix handshake freeze under high traffic
  MINOR: quic: always use ncbuf for rx CRYPTO

Aurelien DARRAGON (15):
  LICENSE: event_hdl: fix GPL license version
  LICENSE: http_ext: fix GPL license version
  BUG/MINOR: hlua: fix unsafe lua_tostring() usage with empty stack
  BUG/MINOR: hlua: don't use lua_tostring() from unprotected contexts
  BUG/MINOR: hlua: fix possible crash in hlua_filter_new() under load
  BUG/MINOR: hlua: improper lock usage in hlua_filter_callback()
  BUG/MINOR: hlua: improper lock usage in hlua_filter_new()
  BUG/MEDIUM: hlua: improper lock usage with SET_SAFE_LJMP()
  BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume()
  BUG/MINOR: hlua: don't call ha_alert() in hlua_event_subscribe()
  MINOR: hlua: use SEND_ERR to report errors in hlua_event_runner()
  CLEANUP: hlua: txn class functions may LJMP
  CLEANUP: log: fix obsolete comment for add_sample_to_logformat_list()
  CLEANUP: tree-wide: 

Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-09 Thread Willy Tarreau
Hi Anthony,

On Wed, Feb 21, 2024 at 11:49:45AM -0500, Anthony Deschamps wrote:
> Hi Willy, thanks for the thoughtful feedback.
> 
> Here's a new patch that makes this configurable via a "hash-key" server
> argument, which defaults to "id" as you suggested.

Thanks.

> I'm struggling to test the last case you described. A quick description of
> my test setup: I have multiple instances of a simple echo server running
> on a range of ports, which I spawn/kill manually. I then have a local consul
> agent providing DNS, and multiple proxy instances running, and my actual
> test is a short Python script that makes a series of requests to each of the
> proxies and checks whether they were all routed to the same echo server
> (they identify themselves by setting a header in the response). I can share
> all the test scripts/config if it's helpful.

I'm not sure the scripts will help me (at least :-)). I was thinking that
a test could just be "set server XXX addr YYY" on the CLI to change the
server's address and verify that the hashing follows the address not the
server number.

> When I configure my proxy with long health checks -- by setting
>  "server-template ... check inter 6" -- I still find that when I
> kill/spawn an
> instance of the mock service, a server will first go down when an entry is
> removed from the SRV record, and then get rehashed when a new SRV
> entry is assigned to the server. Is there a different sequence of events I
> should be testing?

Well, if it does that and your server continues to get the traffic it
should get, then that's fine.

> Here is the updated patch.

Thank you. I'm seeing that your mailer mangled it (see below). Better
try again attaching it. Another detail, it looks like the doc entry is
located inside the "bind keywords" section, while it should be in section
4.1 "proxy keyword matrix", located between "hash-balance-factor" and
"hash-type". Please also make sure to add one entry in the keywords
table at the beginning of the section.

Oh and really do not hesitate to ping me again if I don't respond within
7 days, because it then becomes very hard for me to get back to a previous
discussion. I just remembered there was something to look for on the list,
but I could easily have lost it :-/

Thanks!
Willy

> >From 7d8a63803017c9b7630ec5cb3a154778fdff4d86 Mon Sep 17 00:00:00 2001
> From: Anthony Deschamps 
> Date: Fri, 16 Feb 2024 16:00:35 -0500
> Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
>  address
> 
> Motivation: When services are discovered through DNS resolution, the order
> in
> which DNS records get resolved and assigned to servers is arbitrary.
> Therefore,
> even though two HAProxy instances using chash balancing might agree that a
> particular request should go to server3, it is likely the case that they
> have
> assigned different IPs and ports to the server in that slot.
> 
> This patch adds a server option, "hash-key " which can be set to "id"
> (the
> existing behaviour, defaut), "addr", or "addr-port". By deriving the keys
> for
> the chash tree nodes from a server's address and port we ensure that
> independent
> HAProxy instances will agree on routing decisions. If an address is not
> known
> then the key is derived from the server's puid as it was previously.
> 
> When adjusting a server's weight, we now unconditionally remove all nodes
> first,
> since the server's address (and therefore the desired node keys) may have
> changed.
> ---
>  doc/configuration.txt  | 21 
>  include/haproxy/server-t.h |  8 
>  src/lb_chash.c | 98 +-
>  src/server.c   | 28 +++
>  4 files changed, 132 insertions(+), 23 deletions(-)
> 
> diff --git a/doc/configuration.txt b/doc/configuration.txt
> index 1065e6098..a0710395a 100644
> --- a/doc/configuration.txt
> +++ b/doc/configuration.txt
> @@ -15763,6 +15763,27 @@ group 
>"gid" setting except that the group name is used instead of its gid. This
>setting is ignored by non UNIX sockets.
> 
> +hash-key 
> +  Specify how "hash-type consistent" node keys are computed
> +
> +  Arguments :
> +may be one of the following :
> +
> +  id The node keys will be derived from the server's position
> in
> + the server list.
> +
> +  addr   The node keys will be derived from the server's address,
> when
> + available, or else fall back on "id".
> +
> +  addr-port  The node keys will be derived from the server's address
> and
> + port, when available, or else fall back on "id".
> +
> +  The "addr" and "addr-port" options may be useful in scenarios where
> multiple
> +  HAProxy processes are balancing traffic to the same set of servers. If
> the
> +  server order of each process is different (because, for example, DNS
> records
> +  were resolved in different orders) then this will allow each independent
> +  HAProxy processes to agree on 

Re: [PATCH] MINOR: lb-chash: Respect maxconn when selecting a server

2024-03-09 Thread Willy Tarreau
Hi Anthony,

it seems I forgot about this thread, being sidetracked on other stuff...

On Wed, Feb 21, 2024 at 04:41:04PM -0500, Anthony Deschamps wrote:
> Hi Willy,
> 
> I wonder if I could accomplish what I'm looking to do by changing the
> behaviour of "maxqueue" (without making a breaking change, ideally).
> Currently, "maxqueue 0" is interpreted as "unlimited". However, the system
> I'm working with has long-running WebSocket connections, so a queued
> request is likely to sit in the queue for too long.
> 
> Is there a way to configure "maxqueue 0" for real? If not, then perhaps
> that's a patch I should make first, and then come back to this one. Since
> "maxqueue 0" already means "unlimited", maybe a "no-queue" option
> would be best in order to avoid making a breaking change.

Oh, sadly I didn't remember that maxqueue 0 was unlimited. Indeed, that's
annoying because I do see the value in totally preventing the queue from
being used. I guess we copied that from maxconn and other limits to
preserve the same logic of 0=unlimited. But that wasn't very wise for
this specific one :-/

Maybe (big maybe) we could change the default value to -1 for unlimited
as a few other settings have, and permit value zero to be used. The
problem I'm having with this is for APIs and those who automate their
config generation, and who might possibly already emit "maxqueue 0" to
make it infinite. In addition, the maxqueue feature was introduced 16
years ago in 1.3.14 by commit acafc5f88 ("[MEDIUM] add support for
"maxqueue" to limit server queue overload"), and documented a few
months later in 1.3.15 by commit 198a744e1 ("[DOC] all server parameters
have been documented"), so the risk is not that minimal.

I'm CCing Marko who develops the Dataplane API. This should be very
representative of how such values could be interpreted. Marko, do you
know if the dpapi ever emits "maxqueue 0" or makes a special case of
it when it finds it in a config ? Here the problem is that a poor
historical choice prevents one from saying that we don't want to queue
at all on a server.

I'd like to hear about others here on the list regarding this. If there's
a general consensus that it's mostly harmless to change the infinite from
0 to -1 and permit 0 to be used, we could do it and put it in the release
notes. Otherwise we'll need to find another solution (maybe an extra
keyword to disable any queuing).

Thanks!
willy



Re: [PR] BUILD: solaris: fix compilation errors

2024-03-09 Thread Willy Tarreau
On Tue, Feb 27, 2024 at 10:23:02AM +, PR Bot wrote:
> Dear list!
> 
> Author: matthias sweertvaegher <178714+mx...@users.noreply.github.com>
> Number of patches: 1
> 
> This is an automated relay of the Github pull request:
>BUILD: solaris: fix compilation errors

Now merged, thank you Matthias!

Willy