Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
Hi Willy, Those changes are easy enough to make, so I've attached the patch again with those changes. I had to make a few small adjustments to the commit message anyway (some things that changed as a result of reviewing this path). Let me know if there's anything else that I can help with. Thank you, Anthony From 30896b2f6368cd022bfb574d4d5f647483b79a59 Mon Sep 17 00:00:00 2001 From: Anthony Deschamps Date: Fri, 16 Feb 2024 16:00:35 -0500 Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key " which can be set to "id" (the existing behaviour, default), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now check whether the server's hash has changed. If it has, we have to remove all its nodes first, since the node keys will also have to change. --- doc/configuration.txt | 25 +++ include/haproxy/server-t.h | 9 src/lb_chash.c | 91 +++--- src/server.c | 31 + 4 files changed, 150 insertions(+), 6 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index 1065e6098..0d5649d34 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -4813,6 +4813,8 @@ error-log-format X X X - force-persist - - X X filter- X X X fullconn X - X X +hash-balance-factor X - X X +hash-key X - X X hash-type X - X X http-after-response X (!) X X X http-check commentX - X X @@ -6606,6 +6608,29 @@ hash-balance-factor See also : "balance" and "hash-type". +hash-key + Specify how "hash-type consistent" node keys are computed + + Arguments : +may be one of the following : + + id The node keys will be derived from the server's numeric + identifier as set from "id" or which defaults to its position + in the server list. + + addr The node keys will be derived from the server's address, when + available, or else fall back on "id". + + addr-port The node keys will be derived from the server's address and + port, when available, or else fall back on "id". + + The "addr" and "addr-port" options may be useful in scenarios where multiple + HAProxy processes are balancing traffic to the same set of servers. If the + server order of each process is different (because, for example, DNS records + were resolved in different orders) then this will allow each independent + HAProxy processes to agree on routing decisions. + + hash-type Specify a method to use for mapping hashes to servers diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h index f077ff2f8..811eae712 100644 --- a/include/haproxy/server-t.h +++ b/include/haproxy/server-t.h @@ -223,6 +223,13 @@ struct pid_list { int exited; }; +/* srv methods of computing chash keys */ +enum srv_hash_key { + SRV_HASH_KEY_ID = 0, /* derived from server puid */ + SRV_HASH_KEY_ADDR, /* derived from server address */ + SRV_HASH_KEY_ADDR_PORT /* derived from server address and port */ +}; + /* A tree occurrence is a descriptor of a place in a tree, with a pointer back * to the server itself. */ @@ -366,6 +373,8 @@ struct server { struct tree_occ *lb_nodes; /* lb_nodes_tot * struct tree_occ */ unsigned lb_nodes_tot; /* number of allocated lb_nodes (C-HASH) */ unsigned lb_nodes_now; /* number of lb_nodes placed in the tree (C-HASH) */ + enum srv_hash_key hash_key; /* method to compute node hash (C-HASH) */ + unsigned lb_server_key; /* hash of the values indicated by "hash_key" (C-HASH) */ const struct netns_entr
Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
Hi Willy, I'm just checking in to see if there's anything left I can help address here. Thank you, Anthony On Wed, Mar 13, 2024 at 4:59 PM Willy Tarreau wrote: > > Hi Anthony, > > On Wed, Mar 13, 2024 at 12:33:54PM -0400, Anthony Deschamps wrote: > > Hi Willy, > > > > My original concern was that if two servers had values of lb_server_key > > that are close to each other, then there could be some overlap in their > > values of lb_server_key + node_index;, which is why I was looking for a > > reasonable hash function to combine them. > > I know but you can't really avoid it anyway, and that's why we place > many nodes. I tried an attempt in the past where nodes were evenly > spaced by dividing by the number of entries etc. It was a disaster. > As soon as a server went down, the next one would take its extra load > for example. So anything you try to do to avoid some corner cases create > other ones, while overall, a random distribution (like a hash does) > gives great results. > > > I based it on boost::hash_combine, > > but since writing that I came across https://stackoverflow.com/a/50978188 > > explaining why boost::hash_combine isn't necessarily a good hash function. > > Hehe. We've all written our own hash functions for a given purpose at one > point in time, and they're usually fine inside their context and bad once > used outside. That's also where XXH and such stuff shines, it brings > generic functions that are good all over the space. In our case, the > full_hash() is just an avalanche non-linear 1:1 mapping between the input > and the output which suffices to avoid resonance effects that come from > repeated additions, it's perfectly suited here I think. > > > I agree that it's not too critical to have a few collisions, but I'm > > reaching > > the limits of my knowledge here. Does this change address your concerns? > > I've attached an updated patch, and here's the diff between them: > > > > diff --git a/src/lb_chash.c b/src/lb_chash.c > > index c77222854..d7a1b2539 100644 > > --- a/src/lb_chash.c > > +++ b/src/lb_chash.c > > @@ -66,9 +66,7 @@ static inline void chash_dequeue_srv(struct server *s) > > */ > > static inline u32 chash_compute_node_key(struct server *s, unsigned > > node_index) > > { > > -u32 server_key = s->lb_server_key; > > -u32 index_key = full_hash(node_index); > > -return server_key ^= index_key + 0x9e3779b9 + (server_key << 6) + > > (server_key >> 2); > > +return full_hash(s->lb_server_key + node_index); > > } > > > > /* Compute the base server key that will be used to compute node keys. > > Servers > > @@ -112,8 +110,7 @@ static inline u32 chash_compute_server_key(struct > > server *s) > > case SRV_HASH_KEY_ADDR: > > switch (srv_addr.family) { > > case AF_INET: > > -u32 addr_key = full_hash(srv_addr.addr.v4.s_addr); > > -key ^= addr_key + 0x9e3779b9 + (key << 6) + (key >> 2); > > +key = full_hash(key + srv_addr.addr.v4.s_addr); > > break; > > case AF_INET6: > > key = XXH32(srv_addr.addr.v6.s6_addr, 16, key); > > Yeay I think it addresses everything. I'll re-test your updated patch > tomorrow hoping that this time I'll merge it :-) > > Thanks for your patience! > Willy
Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
Hi Willy, My original concern was that if two servers had values of lb_server_key that are close to each other, then there could be some overlap in their values of lb_server_key + node_index;, which is why I was looking for a reasonable hash function to combine them. I based it on boost::hash_combine, but since writing that I came across https://stackoverflow.com/a/50978188 explaining why boost::hash_combine isn't necessarily a good hash function. I agree that it's not too critical to have a few collisions, but I'm reaching the limits of my knowledge here. Does this change address your concerns? I've attached an updated patch, and here's the diff between them: diff --git a/src/lb_chash.c b/src/lb_chash.c index c77222854..d7a1b2539 100644 --- a/src/lb_chash.c +++ b/src/lb_chash.c @@ -66,9 +66,7 @@ static inline void chash_dequeue_srv(struct server *s) */ static inline u32 chash_compute_node_key(struct server *s, unsigned node_index) { -u32 server_key = s->lb_server_key; -u32 index_key = full_hash(node_index); -return server_key ^= index_key + 0x9e3779b9 + (server_key << 6) + (server_key >> 2); +return full_hash(s->lb_server_key + node_index); } /* Compute the base server key that will be used to compute node keys. Servers @@ -112,8 +110,7 @@ static inline u32 chash_compute_server_key(struct server *s) case SRV_HASH_KEY_ADDR: switch (srv_addr.family) { case AF_INET: -u32 addr_key = full_hash(srv_addr.addr.v4.s_addr); -key ^= addr_key + 0x9e3779b9 + (key << 6) + (key >> 2); +key = full_hash(key + srv_addr.addr.v4.s_addr); break; case AF_INET6: key = XXH32(srv_addr.addr.v6.s6_addr, 16, key); Thanks, Anthony From 3280a39664bf913807200f21d1b952856966d141 Mon Sep 17 00:00:00 2001 From: Anthony Deschamps Date: Fri, 16 Feb 2024 16:00:35 -0500 Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key " which can be set to "id" (the existing behaviour, defaut), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now unconditionally remove all nodes first, since the server's address (and therefore the desired node keys) may have changed. --- doc/configuration.txt | 24 ++ include/haproxy/server-t.h | 9 src/lb_chash.c | 90 +++--- src/server.c | 31 + 4 files changed, 148 insertions(+), 6 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index 1065e6098..052d41f9f 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -4813,6 +4813,8 @@ error-log-format X X X - force-persist - - X X filter- X X X fullconn X - X X +hash-balance-factor X - X X +hash-key X - X X hash-type X - X X http-after-response X (!) X X X http-check commentX - X X @@ -6606,6 +6608,28 @@ hash-balance-factor See also : "balance" and "hash-type". +hash-key + Specify how "hash-type consistent" node keys are computed + + Arguments : +may be one of the following : + + id The node keys will be derived from the server's position in + the server list. + + addr The node keys will be derived from the server's address, when + available, or else fall back on "id". + + addr-port The node keys will be derived from the server's address and + port, when available, or else fall back on "id". + + The "addr" and "addr-port" options may be useful in scenarios where multiple + HAProxy processes are balancing traffic to the same set of servers. If the + server order of each process is different (because, for example
Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
Hi Willy, Thanks for the feedback. I had been testing with smaller numbers of servers (usually between 4 and 32) so I hadn't noticed the performance impact. Here's an updated patch (as an attachment, until I figure out how to make sure things are formatted correctly!) that should address that. I added a "lb_server_key" to the server struct. I moved most of the hashing logic that I previously had in "chash_compute_node_key" into a new function, "chash_compute_server_key", and I store that value; nodes only need to be removed and reinserted if this value changes. To compute a node key, I just combine the server key with a hash of node_index. I considered using XXH32_round to do this, but that looks like it's intended to be an internal function, so I inlined something similar instead. Hopefully this all works. Thanks for taking the time to work through this, and please let me know if there's anything else to address. Thanks, Anthony From 82acc8ff344abde640941d15fa5217b0dfe9701a Mon Sep 17 00:00:00 2001 From: Anthony Deschamps Date: Fri, 16 Feb 2024 16:00:35 -0500 Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key " which can be set to "id" (the existing behaviour, defaut), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now unconditionally remove all nodes first, since the server's address (and therefore the desired node keys) may have changed. --- doc/configuration.txt | 24 ++ include/haproxy/server-t.h | 9 src/lb_chash.c | 93 +++--- src/server.c | 31 + 4 files changed, 151 insertions(+), 6 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index 1065e6098..052d41f9f 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -4813,6 +4813,8 @@ error-log-format X X X - force-persist - - X X filter- X X X fullconn X - X X +hash-balance-factor X - X X +hash-key X - X X hash-type X - X X http-after-response X (!) X X X http-check commentX - X X @@ -6606,6 +6608,28 @@ hash-balance-factor See also : "balance" and "hash-type". +hash-key + Specify how "hash-type consistent" node keys are computed + + Arguments : +may be one of the following : + + id The node keys will be derived from the server's position in + the server list. + + addr The node keys will be derived from the server's address, when + available, or else fall back on "id". + + addr-port The node keys will be derived from the server's address and + port, when available, or else fall back on "id". + + The "addr" and "addr-port" options may be useful in scenarios where multiple + HAProxy processes are balancing traffic to the same set of servers. If the + server order of each process is different (because, for example, DNS records + were resolved in different orders) then this will allow each independent + HAProxy processes to agree on routing decisions. + + hash-type Specify a method to use for mapping hashes to servers diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h index f077ff2f8..811eae712 100644 --- a/include/haproxy/server-t.h +++ b/include/haproxy/server-t.h @@ -223,6 +223,13 @@ struct pid_list { int exited; }; +/* srv methods of computing chash keys */ +enum srv_hash_key { + SRV_HASH_KEY_ID = 0, /* derived from server puid */ + SRV_HASH_KEY_ADDR, /* derived from server address */ + SRV_HASH_KEY_ADDR_PORT /* derived from server address and port */ +}; + /* A tree occurrence is a descriptor of a place in a tree, with a pointer back * to the serve
Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
Sorry for the trouble! I'll have to sort out what's happening. Here it is as an attachment. Thanks! Anthony On Mon, Mar 11, 2024 at 11:06 PM Willy Tarreau wrote: > Hi Anthony, > > On Mon, Mar 11, 2024 at 08:58:17PM -0400, Anthony Deschamps wrote: > > > I'm not sure the scripts will help me (at least :-)). I was thinking > that > > > a test could just be "set server XXX addr YYY" on the CLI to change the > > > server's address and verify that the hashing follows the address not > the > > > server number. > > > > Oh, good point -- I wasn't testing what happens when changing the address > > via the CLI. It turns out I wasn't handling that case. I added a call in > > _srv_set_inetaddr_port() to update the hashes, and tested that it works. > > > > I also moved the docs and added an entry to the table. > > > > Here's the updated patch. > > Thanks! > > However there's still a problem with your mailer mangling the patch > see below: > > > + /* If hash-key is addr or addr-port then we need the address, but if we > > + * can't determine the address then we fall back on hashing the puid. > > + */ > > + switch (s->hash_key) { > > + case SRV_HASH_KEY_ADDR: > > + case SRV_HASH_KEY_ADDR_PORT: > > + server_get_inetaddr(s, _addr); > > + if (srv_addr.family != AF_INET && srv_addr.family != AF_INET6) { > > + hash_key = SRV_HASH_KEY_ID; > > + } > > + break; > (...) > > Can you please resend it as an attachment instead ? > > Thanks! > Willy > From 464504a3c8560a952602d6f28009727f29e7c289 Mon Sep 17 00:00:00 2001 From: Anthony Deschamps Date: Fri, 16 Feb 2024 16:00:35 -0500 Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key " which can be set to "id" (the existing behaviour, defaut), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now unconditionally remove all nodes first, since the server's address (and therefore the desired node keys) may have changed. --- doc/configuration.txt | 24 + include/haproxy/server-t.h | 8 +++ src/lb_chash.c | 101 +++-- src/server.c | 31 4 files changed, 138 insertions(+), 26 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index 1065e6098..052d41f9f 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -4813,6 +4813,8 @@ error-log-format X X X - force-persist - - X X filter- X X X fullconn X - X X +hash-balance-factor X - X X +hash-key X - X X hash-type X - X X http-after-response X (!) X X X http-check commentX - X X @@ -6606,6 +6608,28 @@ hash-balance-factor See also : "balance" and "hash-type". +hash-key + Specify how "hash-type consistent" node keys are computed + + Arguments : +may be one of the following : + + id The node keys will be derived from the server's position in + the server list. + + addr The node keys will be derived from the server's address, when + available, or else fall back on "id". + + addr-port The node keys will be derived from the server's address and + port, when available, or else fall back on "id". + + The "addr" and "addr-port" options may be useful in scenarios where multiple + HAProxy processes are balancing traffic to the same set of servers. If the + server order of each process is different (because, for example, DNS records + were resolved in different orders) then this will allow each independent + HAProxy pro
Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
> I'm not sure the scripts will help me (at least :-)). I was thinking that > a test could just be "set server XXX addr YYY" on the CLI to change the > server's address and verify that the hashing follows the address not the > server number. Oh, good point -- I wasn't testing what happens when changing the address via the CLI. It turns out I wasn't handling that case. I added a call in _srv_set_inetaddr_port() to update the hashes, and tested that it works. I also moved the docs and added an entry to the table. Here's the updated patch. Thanks! Anthony >From 3412e86c222ad7b231d687f1402fbe30de312974 Mon Sep 17 00:00:00 2001 From: Anthony Deschamps Date: Fri, 16 Feb 2024 16:00:35 -0500 Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key " which can be set to "id" (the existing behaviour, defaut), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now unconditionally remove all nodes first, since the server's address (and therefore the desired node keys) may have changed. --- doc/configuration.txt | 24 + include/haproxy/server-t.h | 8 +++ src/lb_chash.c | 101 +++-- src/server.c | 31 4 files changed, 138 insertions(+), 26 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index 1065e6098..052d41f9f 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -4813,6 +4813,8 @@ error-log-format X X X - force-persist - - X X filter- X X X fullconn X - X X +hash-balance-factor X - X X +hash-key X - X X hash-type X - X X http-after-response X (!) X X X http-check commentX - X X @@ -6606,6 +6608,28 @@ hash-balance-factor See also : "balance" and "hash-type". +hash-key + Specify how "hash-type consistent" node keys are computed + + Arguments : +may be one of the following : + + id The node keys will be derived from the server's position in + the server list. + + addr The node keys will be derived from the server's address, when + available, or else fall back on "id". + + addr-port The node keys will be derived from the server's address and + port, when available, or else fall back on "id". + + The "addr" and "addr-port" options may be useful in scenarios where multiple + HAProxy processes are balancing traffic to the same set of servers. If the + server order of each process is different (because, for example, DNS records + were resolved in different orders) then this will allow each independent + HAProxy processes to agree on routing decisions. + + hash-type Specify a method to use for mapping hashes to servers diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h index f077ff2f8..1a2735c2a 100644 --- a/include/haproxy/server-t.h +++ b/include/haproxy/server-t.h @@ -223,6 +223,13 @@ struct pid_list { int exited; }; +/* srv methods of computing chash keys */ +enum srv_hash_key { + SRV_HASH_KEY_ID = 0, /* derived from server puid */ + SRV_HASH_KEY_ADDR, /* derived from server address */ + SRV_HASH_KEY_ADDR_PORT /* derived from server address and port */ +}; + /* A tree occurrence is a descriptor of a place in a tree, with a pointer back * to the server itself. */ @@ -366,6 +373,7 @@ struct server { struct tree_occ *lb_nodes; /* lb_nodes_tot * struct tree_occ */ unsigned lb_nodes_tot; /* number of allocated lb_nodes (C-HASH) */ unsigned lb_nodes_now; /* number of lb_nodes placed in the tree (C-HASH) */ + enum srv_hash_key hash_key; /* method to compute node hash (C-HAS
Re: [PATCH] MINOR: lb-chash: Respect maxconn when selecting a server
Hi Willy, I wonder if I could accomplish what I'm looking to do by changing the behaviour of "maxqueue" (without making a breaking change, ideally). Currently, "maxqueue 0" is interpreted as "unlimited". However, the system I'm working with has long-running WebSocket connections, so a queued request is likely to sit in the queue for too long. Is there a way to configure "maxqueue 0" for real? If not, then perhaps that's a patch I should make first, and then come back to this one. Since "maxqueue 0" already means "unlimited", maybe a "no-queue" option would be best in order to avoid making a breaking change. Thanks, Anthony
Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
Hi Willy, thanks for the thoughtful feedback. Here's a new patch that makes this configurable via a "hash-key" server argument, which defaults to "id" as you suggested. I'm struggling to test the last case you described. A quick description of my test setup: I have multiple instances of a simple echo server running on a range of ports, which I spawn/kill manually. I then have a local consul agent providing DNS, and multiple proxy instances running, and my actual test is a short Python script that makes a series of requests to each of the proxies and checks whether they were all routed to the same echo server (they identify themselves by setting a header in the response). I can share all the test scripts/config if it's helpful. When I configure my proxy with long health checks -- by setting "server-template ... check inter 6" -- I still find that when I kill/spawn an instance of the mock service, a server will first go down when an entry is removed from the SRV record, and then get rehashed when a new SRV entry is assigned to the server. Is there a different sequence of events I should be testing? Here is the updated patch. Thanks, Anthony >From 7d8a63803017c9b7630ec5cb3a154778fdff4d86 Mon Sep 17 00:00:00 2001 From: Anthony Deschamps Date: Fri, 16 Feb 2024 16:00:35 -0500 Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key " which can be set to "id" (the existing behaviour, defaut), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now unconditionally remove all nodes first, since the server's address (and therefore the desired node keys) may have changed. --- doc/configuration.txt | 21 include/haproxy/server-t.h | 8 src/lb_chash.c | 98 +- src/server.c | 28 +++ 4 files changed, 132 insertions(+), 23 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index 1065e6098..a0710395a 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -15763,6 +15763,27 @@ group "gid" setting except that the group name is used instead of its gid. This setting is ignored by non UNIX sockets. +hash-key + Specify how "hash-type consistent" node keys are computed + + Arguments : +may be one of the following : + + id The node keys will be derived from the server's position in + the server list. + + addr The node keys will be derived from the server's address, when + available, or else fall back on "id". + + addr-port The node keys will be derived from the server's address and + port, when available, or else fall back on "id". + + The "addr" and "addr-port" options may be useful in scenarios where multiple + HAProxy processes are balancing traffic to the same set of servers. If the + server order of each process is different (because, for example, DNS records + were resolved in different orders) then this will allow each independent + HAProxy processes to agree on routing decisions. + id Fixes the socket ID. By default, socket IDs are automatically assigned, but sometimes it is more convenient to fix them to ease monitoring. This value diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h index f077ff2f8..1a2735c2a 100644 --- a/include/haproxy/server-t.h +++ b/include/haproxy/server-t.h @@ -223,6 +223,13 @@ struct pid_list { int exited; }; +/* srv methods of computing chash keys */ +enum srv_hash_key { + SRV_HASH_KEY_ID = 0, /* derived from server puid */ + SRV_HASH_KEY_ADDR, /* derived from server address */ + SRV_HASH_KEY_ADDR_PORT /* derived from server address and port */ +}; + /* A tree occurrence is a descriptor of a place in a tree, with a pointer back * to the server itself. */ @@ -366,6 +373,7 @@ struct server { struct tree_occ *lb_nodes; /* lb_nodes_tot * struct tree_occ */ unsigned lb_nodes_tot; /* number of allocated lb_nodes (C-HASH) */ unsigned lb_nodes_now; /* number of lb_nodes placed in the tree (C-HASH) */ + enum srv_hash_key hash_key;
[PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
>From a031cf97da759eb2c2f9b6e191065ad503f821ed Mon Sep 17 00:00:00 2001 From: Anthony Deschamps Date: Fri, 16 Feb 2024 16:00:35 -0500 Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now unconditionally remove all nodes first, since the server's address (and therefore the desired node keys) may have changed. --- src/lb_chash.c | 71 ++ 1 file changed, 48 insertions(+), 23 deletions(-) diff --git a/src/lb_chash.c b/src/lb_chash.c index 1c8034d89..bd39a07f8 100644 --- a/src/lb_chash.c +++ b/src/lb_chash.c @@ -21,8 +21,9 @@ #include #include #include -#include +#include #include +#include /* Return next tree node after which must still be in the tree, or be * NULL. Lookup wraps around the end to the beginning. If the next node is the @@ -58,6 +59,37 @@ static inline void chash_dequeue_srv(struct server *s) } } +/* Compute a key that can be used to insert a node into the CHASH tree. If the + * server has a known address then the key will be determined from its address + * and port. This means that independent HAProxy instances will agree on routing + * decisions. If an address is not known then the key will be determined from + * the server's puid. + */ +static inline u32 chash_compute_node_key(struct server *s, unsigned node_index) +{ + u32 key; + struct server_inetaddr srv_addr; + + server_get_inetaddr(s, _addr); + + switch (srv_addr.family) { + case AF_INET: + key = full_hash(srv_addr.addr.v4.s_addr); + key ^= full_hash(srv_addr.port.svc); + key ^= full_hash(node_index); + break; + case AF_INET6: + unsigned seed = (srv_addr.port.svc << 16) + node_index; + key = XXH32(srv_addr.addr.v6.s6_addr, 16, seed); + break; + default: + key = full_hash(s->puid * SRV_EWGHT_RANGE + node_index); + break; + } + + return key; +} + /* Adjust the number of entries of a server in its tree. The server must appear * as many times as its weight indicates it. If it's there too often, we remove * the last occurrences. If it's not there enough, we add more occurrences. To @@ -67,39 +99,32 @@ static inline void chash_dequeue_srv(struct server *s) */ static inline void chash_queue_dequeue_srv(struct server *s) { - while (s->lb_nodes_now > s->next_eweight) { - if (s->lb_nodes_now >= s->lb_nodes_tot) // should always be false anyway - s->lb_nodes_now = s->lb_nodes_tot; - s->lb_nodes_now--; - if (s->proxy->lbprm.chash.last == >lb_nodes[s->lb_nodes_now].node) - s->proxy->lbprm.chash.last = chash_skip_node(s->lb_tree, s->proxy->lbprm.chash.last); - eb32_delete(>lb_nodes[s->lb_nodes_now].node); - } + unsigned int j; + + /* First we need to remove all of the server's entries from its tree + * because a) the realloc will change all node pointers and b) the keys + * may change. + */ + chash_dequeue_srv(s); /* Attempt to increase the total number of nodes, if the user * increased the weight beyond the original weight */ if (s->lb_nodes_tot < s->next_eweight) { struct tree_occ *new_nodes; - - /* First we need to remove all server's entries from its tree - * because the realloc will change all nodes pointers */ - chash_dequeue_srv(s); - new_nodes = realloc(s->lb_nodes, s->next_eweight * sizeof(*new_nodes)); if (new_nodes) { - unsigned int j; - s->lb_nodes = new_nodes; - memset(>lb_nodes[s->lb_nodes_tot], 0, -(s->next_eweight - s->lb_nodes_tot) * sizeof(*s->lb_nodes)); - for (j = s->lb_nodes_tot; j < s->next_eweight; j++) { - s->lb_nodes[j].server = s; - s->lb_nodes[j].node.key = full_hash(s->puid * SRV_EWGHT_RANGE + j); - } s->lb_nodes_tot = s->next_eweight; } } + + memset(s->lb_nodes, 0, s->next_eweight * sizeof(*s->lb_nodes)); + for (j = 0; j < s->next_eweight; j++) { + s->lb_nodes[j].server = s; + s->lb_nodes[j].node.key = chash_compute_node_key(s, j); + } + while (s->lb_nodes_now < s->next_eweight) { if (s->lb_nodes_now >= s->lb_nodes_tot) // should always be false anyway break; @@ -507,7 +532,7 @@ int chash_init_server_tree(struct proxy *p) } for (node = 0; node < srv->lb_nodes_tot; node++) { srv->lb_nodes[node].server = srv; - srv->lb_nodes[node].node.key = full_hash(srv->puid * SRV_EWGHT_RANGE + node); + srv->lb_nodes[node].node.key = chash_compute_node_key(srv, node); } if (srv_currently_usable(srv)) -- 2.43.2
[PATCH] MINOR: lb-chash: Respect maxconn when selecting a server
>From 3fc983b719bd4d8af80037c36e7032e0af383557 Mon Sep 17 00:00:00 2001 From: Anthony Deschamps Date: Tue, 13 Feb 2024 18:11:56 -0500 Subject: [PATCH] MINOR: lb-chash: Respect maxconn when selecting a server This is useful in a situation where hash-balance-factor isn't quite optimal. For example: If we have two servers at times of low traffic, if server A has one active request and server B has zero, we might prefer that a second request for the same resource also be routed to server A; with hash-balance-factor set to anything less than 200, that request would routed to server B, causing it to dedicate resources to a resource that is already loaded/warm in server A. Meanwhile, at times of high traffic, we still want multiple servers to be able to share the load of highly used resources. --- src/lb_chash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lb_chash.c b/src/lb_chash.c index 4e8fb1536..1c8034d89 100644 --- a/src/lb_chash.c +++ b/src/lb_chash.c @@ -371,7 +371,7 @@ struct server *chash_get_server_hash(struct proxy *p, unsigned int hash, const s } loop = 0; - while (nsrv == avoid || (p->lbprm.hash_balance_factor && !chash_server_is_eligible(nsrv))) { + while (nsrv == avoid || (nsrv->maxconn && nsrv->served >= nsrv->maxconn) || (p->lbprm.hash_balance_factor && !chash_server_is_eligible(nsrv))) { next = eb32_next(next); if (!next) { next = eb32_first(root); -- 2.43.0