Re: [PATCH] FEATURE: Adding MPTCP with option to disable it and fall-back to TCP
Hi! On Wed, Apr 24, 2024 at 05:45:03PM +0200, Nicolas CARPi wrote: > Hello, > > On 24 Apr, Dorian Craps wrote: > > This attached patch uses MPTCP by default instead of TCP on Linux. > The backward compatibility of MPTCP is indeed a good point toward > enabling it by default. Nonetheless, I feel your message should include > a discussion on the security implications of this change. > > As you know, v0 had security issues. v1 addresses them, and we can > likely consider that new attacks targeting this protocol will pop up as > it becomes widespread. > > In fact, that's already the case: > > See: CVE-2024-26708: mptcp: really cope with fastopen race > or CVE-2024-26826: mptcp: fix data re-injection from stale subflow > or CVE-2024-26782 kernel: mptcp: fix double-free on socket dismantle > > The three CVEs above are all from April 2024. > > Given that MPTCP v1 is relatively new (2020), and that we do not have > real assurances that it is at least as secure as plain TCP, I would > humbly suggest inverting the logic, and making it an opt-in option. > > This way, a vulnerability impacting MPTCP would only impact users that > enabled it, instead of 100% of HAProxy users. In a few years, making it > the default could be reconsidered. > > Please note that I'm simply voicing my concern as a user, and the core > dev team might have a very different view about these aspects. > > > It sounds good to have MPTCP enabled by default > Except when looking at it through the prism of the increased attack > surface! ;) > > > IPPROTO_MPTCP is defined just in case old libC are being used and > > don't have the ref. > Shouldn't it be defined with a value, as per > https://www.mptcp.dev/faq.html#why-is-ipproto_mptcp-not-defined ? > (sorry if it's a dumb remark, I'm not a C dev) Without going into all the details and comments above, I just want to say that I'll study this after 3.0 is released, since there's still a massive amount of stuff to adjust for the release and we're way past a feature freeze. I *am* interested in the feature, which has been floating around for a few years already. However I tend to agree with Nicolas that, at least for the principle of least surprise, I'd rather not have this turned on by default. There might even be security implications that some are not willing to take yet, and forcing them to guess that they need to opt out of something is not great in general (it might change in the future though, once everyone turns it on by default, like we did for keep-alive, threads, and h2 for example). I'm also concerned by the change at the socket layer that will make all new sockets cause two syscalls on systems where this is not enabled, and I'd really really prefer that we use the extended syntax for addresses that offers a lot of flexibility. We can definitely have "mptcp@address" and probably mptcp as a variant of tcp. Regarding how we'd deal with the fallback, we'll see, but overall, thinking that someone would explicitly configure mptcp and not even get an error if it cannot bind is problematic to me, because among the most common causes of trouble are things that everyone ignores (module not loaded, missing secondary IP, firewall rules etc) that usually cause problems. Those we can discover at boot time should definitely be reported. But let's discuss this in early June instead (I mean, feel free to discuss the points together, but I'm not going to invest more time on this at this moment). Thanks! Willy
Re: [PATCH] FEATURE: Adding MPTCP with option to disable it and fall-back to TCP
Hello, On 24 Apr, Dorian Craps wrote: > This attached patch uses MPTCP by default instead of TCP on Linux. The backward compatibility of MPTCP is indeed a good point toward enabling it by default. Nonetheless, I feel your message should include a discussion on the security implications of this change. As you know, v0 had security issues. v1 addresses them, and we can likely consider that new attacks targeting this protocol will pop up as it becomes widespread. In fact, that's already the case: See: CVE-2024-26708: mptcp: really cope with fastopen race or CVE-2024-26826: mptcp: fix data re-injection from stale subflow or CVE-2024-26782 kernel: mptcp: fix double-free on socket dismantle The three CVEs above are all from April 2024. Given that MPTCP v1 is relatively new (2020), and that we do not have real assurances that it is at least as secure as plain TCP, I would humbly suggest inverting the logic, and making it an opt-in option. This way, a vulnerability impacting MPTCP would only impact users that enabled it, instead of 100% of HAProxy users. In a few years, making it the default could be reconsidered. Please note that I'm simply voicing my concern as a user, and the core dev team might have a very different view about these aspects. > It sounds good to have MPTCP enabled by default Except when looking at it through the prism of the increased attack surface! ;) > IPPROTO_MPTCP is defined just in case old libC are being used and > don't have the ref. Shouldn't it be defined with a value, as per https://www.mptcp.dev/faq.html#why-is-ipproto_mptcp-not-defined ? (sorry if it's a dumb remark, I'm not a C dev) Best regards, ~Nicolas
[PATCH] FEATURE: Adding MPTCP with option to disable it and fall-back to TCP
From: Dorian Craps Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension that enables a TCP connection to use different paths. Multipath TCP has been used for several use cases. On smartphones, MPTCP enables seamless handovers between cellular and Wi-Fi networks while preserving established connections. This use-case is what pushed Apple to use MPTCP since 2013 in multiple applications [2]. On dual-stack hosts, Multipath TCP enables the TCP connection to automatically use the best performing path, either IPv4 or IPv6. If one path fails, MPTCP automatically uses the other path. To benefit from MPTCP, both the client and the server have to support it. Multipath TCP is a backward-compatible TCP extension that is enabled by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...). Multipath TCP is included in the Linux kernel since version 5.6 [3]. To use it on Linux, an application must explicitly enable it when creating the socket. No need to change anything else in the application. This attached patch uses MPTCP by default instead of TCP on Linux. There is a fallback if the creation of the MPTCP socket fails. A new option has been added in the config to be able to disable MPTCP support. It sounds good to have MPTCP enabled by default, so the client can decide when to use it or not. If the client didn't ask to use MPTCP, the kernel will return a "plain" TCP socket to the server application after an "accept()". [4] IPPROTO_MPTCP is defined just in case old libC are being used and don't have the ref. The running kernel is the only one who can tell if MPTCP is supported or not, it is probably best not to check that at build time. TCP_MAXSEG is currently not supported by MPTCP: is it an issue? MPTCP devs didn't add a support for it because it has not been requested with a use-case. If you think it is important, I can report that to them. Due to the limited impact within a data center environment, MPTCP support has only been added on the listening sockets, then not between the proxy and the servers. The high-speed, low-latency nature of data center networks reduces the benefits of MPTCP, making the complexity of its implementation unnecessary in this context. Developed with the help of Matthieu Baerts (matt...@kernel.org) and Olivier Bonaventure (olivier.bonavent...@uclouvain.be) Link: https://www.rfc-editor.org/rfc/rfc8684.html [1] Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2] Link: https://www.mptcp.dev/ [3] Link: https://www.mptcp.dev/faq.html#why--when-should-mptcp-be-enabled-by-default [4] --- doc/configuration.txt| 4 include/haproxy/global-t.h | 1 + include/haproxy/protocol-t.h | 7 +++ src/cfgparse-global.c| 7 ++- src/cfgparse.c | 2 +- src/proto_rhttp.c| 5 + src/proto_tcp.c | 10 ++ src/protocol.c | 12 +++- src/sock_inet.c | 21 +++-- 9 files changed, 64 insertions(+), 5 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index d2d654c19..85b75de33 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -1342,6 +1342,7 @@ The following keywords are supported in the "global" section : - maxsslrate - maxzlibmem - no-memory-trimming + - no-mptcp - noepoll - noevports - nogetaddrinfo @@ -2974,6 +2975,9 @@ no-memory-trimming nice with the new process. Note that advanced memory allocators usually do not suffer from such a problem. +no-mptcp + Disables Multipath TCP (MPTCP) support when the TCP protocol is requested. + noepoll Disables the use of the "epoll" event polling system on Linux. It is equivalent to the command-line argument "-de". The next polling system diff --git a/include/haproxy/global-t.h b/include/haproxy/global-t.h index 92d2c6bc1..c2b81fb50 100644 --- a/include/haproxy/global-t.h +++ b/include/haproxy/global-t.h @@ -85,6 +85,7 @@ #define GTUNE_LISTENER_MQ_OPT(1<<28) #define GTUNE_LISTENER_MQ_ANY(GTUNE_LISTENER_MQ_FAIR | GTUNE_LISTENER_MQ_OPT) #define GTUNE_QUIC_CC_HYSTART(1<<29) +#define GTUNE_NO_MPTCP (1<<30) #define NO_ZERO_COPY_FWD 0x0001 /* Globally disable zero-copy FF */ #define NO_ZERO_COPY_FWD_PT 0x0002 /* disable zero-copy FF for PT (recv & send are disabled automatically) */ diff --git a/include/haproxy/protocol-t.h b/include/haproxy/protocol-t.h index b85f29cc0..6a7d45c52 100644 --- a/include/haproxy/protocol-t.h +++ b/include/haproxy/protocol-t.h @@ -28,6 +28,12 @@ #include #include +#ifdef __linux__ +#ifndef IPPROTO_MPTCP +#define IPPROTO_MPTCP +#endif +#endif + /* some pointer types referenced below */ struct listener; struct receiver; @@ -99,6 +105,7 @@ struct protocol { enum proto_type proto_type; /* protocol type at the socket layer (PROTO_TYPE_*) */ int sock_type; /* socket
Re: Fwd: [PATCH] MEDIUM: shctx naming shared memory context
On Wed, Apr 24, 2024 at 09:53:04AM +0100, David CARLIER wrote: > -- Forwarded message - > From: David CARLIER > Date: Wed, 24 Apr 2024 at 07:56 > Subject: Re: [PATCH] MEDIUM: shctx naming shared memory context > To: Willy Tarreau > > > Here a new version. Works fine and applied, thank you. I'm seeing opportunities to use that in other places like rings. It could require some tiny API changes however to consistently pass the name. But that would be nice for ring buffers used by traces, logs, startup-logs etc. Maybe we'll then decide to re-adjust the displayed name to prepend a type. E.g "haproxy cache: my_cache_name" etc. We're large with 80 chars since our identifiers are apparently limited to 32. Cheers, Willy
Fwd: [PATCH] MEDIUM: shctx naming shared memory context
-- Forwarded message - From: David CARLIER Date: Wed, 24 Apr 2024 at 07:51 Subject: Re: [PATCH] MEDIUM: shctx naming shared memory context To: Willy Tarreau Hi On Wed, 24 Apr 2024 at 07:45, Willy Tarreau wrote: > Hi David, > > On Sat, Apr 20, 2024 at 07:33:16AM +0100, David CARLIER wrote: > > From d49d9d5966caead320f33f789578cb69f2aa3787 Mon Sep 17 00:00:00 2001 > > From: David Carlier > > Date: Sat, 20 Apr 2024 07:18:48 +0100 > > Subject: [PATCH] MEDIUM: shctx: Naming shared memory context > > > > From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA > > so caches can be identified when looking at HAProxy process memory > > mapping. > > That's pretty cool, I wasn't aware of this feature, that's really > interesting! > > However I disagree with this point: > > > +#if defined(USE_PRCTL) && defined(PR_SET_VMA) > > + { > > + /** > > + * From Linux 5.17 (and if the `CONFIG_ANON_VMA_NAME` > kernel config is set)`, > > + * anonymous regions can be named. > > + * > > + * The naming can take up to 79 characters, accepting > valid ASCII values > > + * except [, ], \, $ and '. > > + * As a result, when looking for /proc//maps, we can > see the anonymous range > > + * as follow : > > + * `7364c4fff000-73650800 rw-s 00:01 3540 > [anon_shmem:HAProxy globalCache]` > > + */ > > + int rn; > > + char fullname[80]; > > + > > + rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", > name); > > + if (rn < 0 || prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, > (uintptr_t)shctx, > > +totalsize, (uintptr_t)fullname) != 0) { > > + munmap(shctx, totalsize); > > + shctx = NULL; > > + ret = SHCTX_E_ALLOC_CACHE; > > + goto err; > > + } > > + } > > +#endif > > You're making the mapping fail on any kernel that does not support the > feature (hence apparently anything < 5.17 according to your description). > IMHO we should just silently fall back to the default behavior, Usually when using this feature, it is what is usually done indeed. However, I thought with HAProxy you would prefer this behavior I guess I was wrong :) Ok changing it. > i.e. we > couldn't assign the name, fine, we continue to run without a name, thus > something like this: > > rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", name); > if (rn >= 0) > prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)shctx, > totalsize, (uintptr_t)fullname); > > Could you please adjust it and verify that it's still OK for you ? > Likewise I can check it on a 5.15 here. > > Thank you! > Willy >
Fwd: [PATCH] MEDIUM: shctx naming shared memory context
-- Forwarded message - From: David CARLIER Date: Wed, 24 Apr 2024 at 07:56 Subject: Re: [PATCH] MEDIUM: shctx naming shared memory context To: Willy Tarreau Here a new version. Cheers. On Wed, 24 Apr 2024 at 07:45, Willy Tarreau wrote: > Hi David, > > On Sat, Apr 20, 2024 at 07:33:16AM +0100, David CARLIER wrote: > > From d49d9d5966caead320f33f789578cb69f2aa3787 Mon Sep 17 00:00:00 2001 > > From: David Carlier > > Date: Sat, 20 Apr 2024 07:18:48 +0100 > > Subject: [PATCH] MEDIUM: shctx: Naming shared memory context > > > > From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA > > so caches can be identified when looking at HAProxy process memory > > mapping. > > That's pretty cool, I wasn't aware of this feature, that's really > interesting! > > However I disagree with this point: > > > +#if defined(USE_PRCTL) && defined(PR_SET_VMA) > > + { > > + /** > > + * From Linux 5.17 (and if the `CONFIG_ANON_VMA_NAME` > kernel config is set)`, > > + * anonymous regions can be named. > > + * > > + * The naming can take up to 79 characters, accepting > valid ASCII values > > + * except [, ], \, $ and '. > > + * As a result, when looking for /proc//maps, we can > see the anonymous range > > + * as follow : > > + * `7364c4fff000-73650800 rw-s 00:01 3540 > [anon_shmem:HAProxy globalCache]` > > + */ > > + int rn; > > + char fullname[80]; > > + > > + rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", > name); > > + if (rn < 0 || prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, > (uintptr_t)shctx, > > +totalsize, (uintptr_t)fullname) != 0) { > > + munmap(shctx, totalsize); > > + shctx = NULL; > > + ret = SHCTX_E_ALLOC_CACHE; > > + goto err; > > + } > > + } > > +#endif > > You're making the mapping fail on any kernel that does not support the > feature (hence apparently anything < 5.17 according to your description). > IMHO we should just silently fall back to the default behavior, i.e. we > couldn't assign the name, fine, we continue to run without a name, thus > something like this: > > rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", name); > if (rn >= 0) > prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)shctx, > totalsize, (uintptr_t)fullname); > > Could you please adjust it and verify that it's still OK for you ? > Likewise I can check it on a 5.15 here. > > Thank you! > Willy > From 72542db6b65994f1755d161fad04828f435f7134 Mon Sep 17 00:00:00 2001 From: David Carlier Date: Sat, 20 Apr 2024 07:18:48 +0100 Subject: [PATCH] MEDIUM: shctx: Naming shared memory context From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA so caches can be identified when looking at HAProxy process memory mapping. The most possible error is lack of kernel support, as a result we ignore it, if the naming fails the mapping of memory context ought to still occur. --- include/haproxy/shctx.h | 2 +- src/cache.c | 2 +- src/shctx.c | 34 +++--- src/ssl_sock.c | 2 +- 4 files changed, 34 insertions(+), 6 deletions(-) diff --git a/include/haproxy/shctx.h b/include/haproxy/shctx.h index a57cf1567..01bb09d32 100644 --- a/include/haproxy/shctx.h +++ b/include/haproxy/shctx.h @@ -21,7 +21,7 @@ int shctx_init(struct shared_context **orig_shctx, int maxblocks, int blocksize, unsigned int maxobjsz, - int extra); + int extra, __maybe_unused const char *name); struct shared_block *shctx_row_reserve_hot(struct shared_context *shctx, struct shared_block *last, int data_len); void shctx_row_detach(struct shared_context *shctx, struct shared_block *first); diff --git a/src/cache.c b/src/cache.c index 4a9992a19..25f20a782 100644 --- a/src/cache.c +++ b/src/cache.c @@ -2471,7 +2471,7 @@ int post_check_cache() list_for_each_entry_safe(cache_config, back, _config, list) { ret_shctx = shctx_init(, cache_config->maxblocks, CACHE_BLOCKSIZE, - cache_config->maxobjsz, sizeof(struct cache)); + cache_config->maxobjsz, sizeof(struct cache), cache_config->id); if (ret_shctx <= 0) { if (ret_shctx == SHCTX_E_INIT_LOCK) diff --git a/src/shctx.c b/src/shctx.c index be5905392..6c7ad172d 100644 --- a/src/shctx.c +++ b/src/shctx.c @@ -16,6 +16,9 @@ #include #include #include +#if defined(USE_PRCTL) +#include +#endif /* * Reserve a new row if is null, put it in the hotlist, set the refcount to 1 @@ -269,13 +272,14 @@ int shctx_row_data_get(struct shared_context *shctx, struct shared_block *first, * and 0 if cache is already allocated. */ int
Re: [PATCH] MEDIUM: shctx naming shared memory context
Hi David, On Sat, Apr 20, 2024 at 07:33:16AM +0100, David CARLIER wrote: > From d49d9d5966caead320f33f789578cb69f2aa3787 Mon Sep 17 00:00:00 2001 > From: David Carlier > Date: Sat, 20 Apr 2024 07:18:48 +0100 > Subject: [PATCH] MEDIUM: shctx: Naming shared memory context > > From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA > so caches can be identified when looking at HAProxy process memory > mapping. That's pretty cool, I wasn't aware of this feature, that's really interesting! However I disagree with this point: > +#if defined(USE_PRCTL) && defined(PR_SET_VMA) > + { > + /** > + * From Linux 5.17 (and if the `CONFIG_ANON_VMA_NAME` kernel > config is set)`, > + * anonymous regions can be named. > + * > + * The naming can take up to 79 characters, accepting valid > ASCII values > + * except [, ], \, $ and '. > + * As a result, when looking for /proc//maps, we can see > the anonymous range > + * as follow : > + * `7364c4fff000-73650800 rw-s 00:01 3540 > [anon_shmem:HAProxy globalCache]` > + */ > + int rn; > + char fullname[80]; > + > + rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", name); > + if (rn < 0 || prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, > (uintptr_t)shctx, > +totalsize, (uintptr_t)fullname) != 0) { > + munmap(shctx, totalsize); > + shctx = NULL; > + ret = SHCTX_E_ALLOC_CACHE; > + goto err; > + } > + } > +#endif You're making the mapping fail on any kernel that does not support the feature (hence apparently anything < 5.17 according to your description). IMHO we should just silently fall back to the default behavior, i.e. we couldn't assign the name, fine, we continue to run without a name, thus something like this: rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", name); if (rn >= 0) prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)shctx, totalsize, (uintptr_t)fullname); Could you please adjust it and verify that it's still OK for you ? Likewise I can check it on a 5.15 here. Thank you! Willy
Re: [PATCH 2/3] MINOR: Add `ha_generate_uuid_v7`
Hi Tim! On Fri, Apr 19, 2024 at 09:01:26PM +0200, Tim Duesterhus wrote: > +/* Generates a draft-ietf-uuidrev-rfc4122bis-14 version 7 UUID into chunk > + * which must be at least 37 bytes large. > + */ > +void ha_generate_uuid_v7(struct buffer *output) > +{ > + uint32_t rnd[3]; > + uint64_t last; > + uint64_t time; > + > + time = (date.tv_sec * 1000) + (date.tv_usec / 1000); > + last = ha_random64(); > + rnd[0] = last; > + rnd[1] = last >> 32; > + > + last = ha_random64(); > + rnd[2] = last; > + > + chunk_printf(output, "%8.8x-%4.4x-%4.4x-%4.4x-%12.12llx", > + (uint)(time >> 16u), > + (uint)(time & 0x), > + ((rnd[0] >> 16u) & 0xFFF) | 0x7000, // highest 4 bits > indicate the uuid version > + (rnd[1] & 0x3FFF) | 0x8000, // the highest 2 bits > indicate the UUID variant (10), > + (long long)((rnd[1] >> 14u) | ((uint64_t) rnd[2] << 18u)) > & 0xull); > +} Just thinking about all the shifts above, I think you could have gone through less efforts by acting on 64-bit randoms (less shifts). But the difference is probably not that much anyway. In any case, that looks good and I've merged it now. Many thanks! Willy