Re: [PATCH] FEATURE: Adding MPTCP with option to disable it and fall-back to TCP

2024-04-24 Thread Willy Tarreau
Hi!

On Wed, Apr 24, 2024 at 05:45:03PM +0200, Nicolas CARPi wrote:
> Hello,
> 
> On 24 Apr, Dorian Craps wrote:
> > This attached patch uses MPTCP by default instead of TCP on Linux. 
> The backward compatibility of MPTCP is indeed a good point toward 
> enabling it by default. Nonetheless, I feel your message should include 
> a discussion on the security implications of this change.
> 
> As you know, v0 had security issues. v1 addresses them, and we can 
> likely consider that new attacks targeting this protocol will pop up as 
> it becomes widespread.
> 
> In fact, that's already the case:
> 
> See: CVE-2024-26708: mptcp: really cope with fastopen race
> or CVE-2024-26826: mptcp: fix data re-injection from stale subflow
> or CVE-2024-26782 kernel: mptcp: fix double-free on socket dismantle
> 
> The three CVEs above are all from April 2024.
> 
> Given that MPTCP v1 is relatively new (2020), and that we do not have 
> real assurances that it is at least as secure as plain TCP, I would 
> humbly suggest inverting the logic, and making it an opt-in option.
> 
> This way, a vulnerability impacting MPTCP would only impact users that 
> enabled it, instead of 100% of HAProxy users. In a few years, making it 
> the default could be reconsidered.
> 
> Please note that I'm simply voicing my concern as a user, and the core 
> dev team might have a very different view about these aspects.
> 
> > It sounds good to have MPTCP enabled by default
> Except when looking at it through the prism of the increased attack 
> surface! ;)
> 
> > IPPROTO_MPTCP is defined just in case old libC are being used and 
> > don't have the ref.
> Shouldn't it be defined with a value, as per 
> https://www.mptcp.dev/faq.html#why-is-ipproto_mptcp-not-defined ?
> (sorry if it's a dumb remark, I'm not a C dev)

Without going into all the details and comments above, I just want to
say that I'll study this after 3.0 is released, since there's still a
massive amount of stuff to adjust for the release and we're way past
a feature freeze. I *am* interested in the feature, which has been
floating around for a few years already. However I tend to agree with
Nicolas that, at least for the principle of least surprise, I'd rather
not have this turned on by default. There might even be security
implications that some are not willing to take yet, and forcing them
to guess that they need to opt out of something is not great in general
(it might change in the future though, once everyone turns it on by
default, like we did for keep-alive, threads, and h2 for example).

I'm also concerned by the change at the socket layer that will make all
new sockets cause two syscalls on systems where this is not enabled,
and I'd really really prefer that we use the extended syntax for
addresses that offers a lot of flexibility. We can definitely have
"mptcp@address" and probably mptcp as a variant of tcp. Regarding how
we'd deal with the fallback, we'll see, but overall, thinking that
someone would explicitly configure mptcp and not even get an error if
it cannot bind is problematic to me, because among the most common
causes of trouble are things that everyone ignores (module not loaded,
missing secondary IP, firewall rules etc) that usually cause problems.
Those we can discover at boot time should definitely be reported. But
let's discuss this in early June instead (I mean, feel free to discuss
the points together, but I'm not going to invest more time on this at
this moment).

Thanks!
Willy



Re: [PATCH] FEATURE: Adding MPTCP with option to disable it and fall-back to TCP

2024-04-24 Thread Nicolas CARPi
Hello,

On 24 Apr, Dorian Craps wrote:
> This attached patch uses MPTCP by default instead of TCP on Linux. 
The backward compatibility of MPTCP is indeed a good point toward 
enabling it by default. Nonetheless, I feel your message should include 
a discussion on the security implications of this change.

As you know, v0 had security issues. v1 addresses them, and we can 
likely consider that new attacks targeting this protocol will pop up as 
it becomes widespread.

In fact, that's already the case:

See: CVE-2024-26708: mptcp: really cope with fastopen race
or CVE-2024-26826: mptcp: fix data re-injection from stale subflow
or CVE-2024-26782 kernel: mptcp: fix double-free on socket dismantle

The three CVEs above are all from April 2024.

Given that MPTCP v1 is relatively new (2020), and that we do not have 
real assurances that it is at least as secure as plain TCP, I would 
humbly suggest inverting the logic, and making it an opt-in option.

This way, a vulnerability impacting MPTCP would only impact users that 
enabled it, instead of 100% of HAProxy users. In a few years, making it 
the default could be reconsidered.

Please note that I'm simply voicing my concern as a user, and the core 
dev team might have a very different view about these aspects.

> It sounds good to have MPTCP enabled by default
Except when looking at it through the prism of the increased attack 
surface! ;)

> IPPROTO_MPTCP is defined just in case old libC are being used and 
> don't have the ref.
Shouldn't it be defined with a value, as per 
https://www.mptcp.dev/faq.html#why-is-ipproto_mptcp-not-defined ?
(sorry if it's a dumb remark, I'm not a C dev)

Best regards,
~Nicolas



[PATCH] FEATURE: Adding MPTCP with option to disable it and fall-back to TCP

2024-04-24 Thread Dorian Craps
From: Dorian Craps 

Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension
that enables a TCP connection to use different paths.

Multipath TCP has been used for several use cases. On smartphones, MPTCP
enables seamless handovers between cellular and Wi-Fi networks while
preserving established connections. This use-case is what pushed Apple
to use MPTCP since 2013 in multiple applications [2]. On dual-stack
hosts, Multipath TCP enables the TCP connection to automatically use the
best performing path, either IPv4 or IPv6. If one path fails, MPTCP
automatically uses the other path.

To benefit from MPTCP, both the client and the server have to support
it. Multipath TCP is a backward-compatible TCP extension that is enabled
by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...).
Multipath TCP is included in the Linux kernel since version 5.6 [3]. To
use it on Linux, an application must explicitly enable it when creating
the socket. No need to change anything else in the application.

This attached patch uses MPTCP by default instead of TCP on Linux. There
is a fallback if the creation of the MPTCP socket fails. A new option
has been added in the config to be able to disable MPTCP support.

It sounds good to have MPTCP enabled by default, so the client can
decide when to use it or not. If the client didn't ask to use MPTCP, the
kernel will return a "plain" TCP socket to the server application after
an "accept()". [4]

IPPROTO_MPTCP is defined just in case old libC are being used and don't
have the ref. The running kernel is the only one who can tell if MPTCP
is supported or not, it is probably best not to check that at build
time.

TCP_MAXSEG is currently not supported by MPTCP: is it an issue? MPTCP
devs didn't add a support for it because it has not been requested with
a use-case. If you think it is important, I can report that to them.

Due to the limited impact within a data center environment, MPTCP
support has only been added on the listening sockets, then not between
the proxy and the servers. The high-speed, low-latency nature of data
center networks reduces the benefits of MPTCP, making the complexity of
its implementation unnecessary in this context.

Developed with the help of Matthieu Baerts (matt...@kernel.org) and
Olivier Bonaventure (olivier.bonavent...@uclouvain.be)

Link: https://www.rfc-editor.org/rfc/rfc8684.html [1]
Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2]
Link: https://www.mptcp.dev/ [3]
Link: 
https://www.mptcp.dev/faq.html#why--when-should-mptcp-be-enabled-by-default [4]
---
 doc/configuration.txt|  4 
 include/haproxy/global-t.h   |  1 +
 include/haproxy/protocol-t.h |  7 +++
 src/cfgparse-global.c|  7 ++-
 src/cfgparse.c   |  2 +-
 src/proto_rhttp.c|  5 +
 src/proto_tcp.c  | 10 ++
 src/protocol.c   | 12 +++-
 src/sock_inet.c  | 21 +++--
 9 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index d2d654c19..85b75de33 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -1342,6 +1342,7 @@ The following keywords are supported in the "global" 
section :
- maxsslrate
- maxzlibmem
- no-memory-trimming
+   - no-mptcp
- noepoll
- noevports
- nogetaddrinfo
@@ -2974,6 +2975,9 @@ no-memory-trimming
   nice with the new process. Note that advanced memory allocators usually do
   not suffer from such a problem.
 
+no-mptcp
+  Disables Multipath TCP (MPTCP) support when the TCP protocol is requested.
+
 noepoll
   Disables the use of the "epoll" event polling system on Linux. It is
   equivalent to the command-line argument "-de". The next polling system
diff --git a/include/haproxy/global-t.h b/include/haproxy/global-t.h
index 92d2c6bc1..c2b81fb50 100644
--- a/include/haproxy/global-t.h
+++ b/include/haproxy/global-t.h
@@ -85,6 +85,7 @@
 #define GTUNE_LISTENER_MQ_OPT(1<<28)
 #define GTUNE_LISTENER_MQ_ANY(GTUNE_LISTENER_MQ_FAIR | 
GTUNE_LISTENER_MQ_OPT)
 #define GTUNE_QUIC_CC_HYSTART(1<<29)
+#define GTUNE_NO_MPTCP   (1<<30)
 
 #define NO_ZERO_COPY_FWD 0x0001 /* Globally disable zero-copy FF */
 #define NO_ZERO_COPY_FWD_PT  0x0002 /* disable zero-copy FF for PT 
(recv & send are disabled automatically) */
diff --git a/include/haproxy/protocol-t.h b/include/haproxy/protocol-t.h
index b85f29cc0..6a7d45c52 100644
--- a/include/haproxy/protocol-t.h
+++ b/include/haproxy/protocol-t.h
@@ -28,6 +28,12 @@
 #include 
 #include 
 
+#ifdef __linux__
+#ifndef IPPROTO_MPTCP
+#define IPPROTO_MPTCP
+#endif
+#endif
+
 /* some pointer types referenced below */
 struct listener;
 struct receiver;
@@ -99,6 +105,7 @@ struct protocol {
enum proto_type proto_type; /* protocol type at the 
socket layer (PROTO_TYPE_*) */
int sock_type;  /* socket 

Re: Fwd: [PATCH] MEDIUM: shctx naming shared memory context

2024-04-24 Thread Willy Tarreau
On Wed, Apr 24, 2024 at 09:53:04AM +0100, David CARLIER wrote:
> -- Forwarded message -
> From: David CARLIER 
> Date: Wed, 24 Apr 2024 at 07:56
> Subject: Re: [PATCH] MEDIUM: shctx naming shared memory context
> To: Willy Tarreau 
> 
> 
> Here a new version.

Works fine and applied, thank you. I'm seeing opportunities to use that
in other places like rings. It could require some tiny API changes however
to consistently pass the name. But that would be nice for ring buffers used
by traces, logs, startup-logs etc. Maybe we'll then decide to re-adjust the
displayed name to prepend a type. E.g "haproxy cache: my_cache_name" etc.
We're large with 80 chars since our identifiers are apparently limited to 32.

Cheers,
Willy



Fwd: [PATCH] MEDIUM: shctx naming shared memory context

2024-04-24 Thread David CARLIER
-- Forwarded message -
From: David CARLIER 
Date: Wed, 24 Apr 2024 at 07:51
Subject: Re: [PATCH] MEDIUM: shctx naming shared memory context
To: Willy Tarreau 


Hi

On Wed, 24 Apr 2024 at 07:45, Willy Tarreau  wrote:

> Hi David,
>
> On Sat, Apr 20, 2024 at 07:33:16AM +0100, David CARLIER wrote:
> > From d49d9d5966caead320f33f789578cb69f2aa3787 Mon Sep 17 00:00:00 2001
> > From: David Carlier 
> > Date: Sat, 20 Apr 2024 07:18:48 +0100
> > Subject: [PATCH] MEDIUM: shctx: Naming shared memory context
> >
> > From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA
> > so caches can be identified when looking at HAProxy process memory
> > mapping.
>
> That's pretty cool, I wasn't aware of this feature, that's really
> interesting!
>
> However I disagree with this point:
>
> > +#if defined(USE_PRCTL) && defined(PR_SET_VMA)
> > + {
> > + /**
> > +  * From Linux 5.17 (and if the `CONFIG_ANON_VMA_NAME`
> kernel config is set)`,
> > +  * anonymous regions can be named.
> > +  *
> > +  * The naming can take up to 79 characters, accepting
> valid ASCII values
> > +  * except [, ], \, $ and '.
> > +  * As a result, when looking for /proc//maps, we can
> see the anonymous range
> > +  * as follow :
> > +  * `7364c4fff000-73650800 rw-s  00:01 3540
> [anon_shmem:HAProxy globalCache]`
> > +  */
> > + int rn;
> > + char fullname[80];
> > +
> > + rn = snprintf(fullname, sizeof(fullname), "HAProxy %s",
> name);
> > + if (rn < 0 || prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME,
> (uintptr_t)shctx,
> > +totalsize, (uintptr_t)fullname) != 0) {
> > + munmap(shctx, totalsize);
> > + shctx = NULL;
> > + ret = SHCTX_E_ALLOC_CACHE;
> > + goto err;
> > + }
> > + }
> > +#endif
>
> You're making the mapping fail on any kernel that does not support the
> feature (hence apparently anything < 5.17 according to your description).
> IMHO we should just silently fall back to the default behavior,


Usually when using this feature, it is what is usually done indeed. However,
 I thought with HAProxy you would prefer this behavior I guess I was wrong
:)
Ok changing it.





> i.e. we
> couldn't assign the name, fine, we continue to run without a name, thus
> something like this:
>
> rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", name);
> if (rn >= 0)
> prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)shctx,
> totalsize, (uintptr_t)fullname);
>
> Could you please adjust it and verify that it's still OK for you ?
> Likewise I can check it on a 5.15 here.
>
> Thank you!
> Willy
>


Fwd: [PATCH] MEDIUM: shctx naming shared memory context

2024-04-24 Thread David CARLIER
-- Forwarded message -
From: David CARLIER 
Date: Wed, 24 Apr 2024 at 07:56
Subject: Re: [PATCH] MEDIUM: shctx naming shared memory context
To: Willy Tarreau 


Here a new version.

Cheers.

On Wed, 24 Apr 2024 at 07:45, Willy Tarreau  wrote:

> Hi David,
>
> On Sat, Apr 20, 2024 at 07:33:16AM +0100, David CARLIER wrote:
> > From d49d9d5966caead320f33f789578cb69f2aa3787 Mon Sep 17 00:00:00 2001
> > From: David Carlier 
> > Date: Sat, 20 Apr 2024 07:18:48 +0100
> > Subject: [PATCH] MEDIUM: shctx: Naming shared memory context
> >
> > From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA
> > so caches can be identified when looking at HAProxy process memory
> > mapping.
>
> That's pretty cool, I wasn't aware of this feature, that's really
> interesting!
>
> However I disagree with this point:
>
> > +#if defined(USE_PRCTL) && defined(PR_SET_VMA)
> > + {
> > + /**
> > +  * From Linux 5.17 (and if the `CONFIG_ANON_VMA_NAME`
> kernel config is set)`,
> > +  * anonymous regions can be named.
> > +  *
> > +  * The naming can take up to 79 characters, accepting
> valid ASCII values
> > +  * except [, ], \, $ and '.
> > +  * As a result, when looking for /proc//maps, we can
> see the anonymous range
> > +  * as follow :
> > +  * `7364c4fff000-73650800 rw-s  00:01 3540
> [anon_shmem:HAProxy globalCache]`
> > +  */
> > + int rn;
> > + char fullname[80];
> > +
> > + rn = snprintf(fullname, sizeof(fullname), "HAProxy %s",
> name);
> > + if (rn < 0 || prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME,
> (uintptr_t)shctx,
> > +totalsize, (uintptr_t)fullname) != 0) {
> > + munmap(shctx, totalsize);
> > + shctx = NULL;
> > + ret = SHCTX_E_ALLOC_CACHE;
> > + goto err;
> > + }
> > + }
> > +#endif
>
> You're making the mapping fail on any kernel that does not support the
> feature (hence apparently anything < 5.17 according to your description).
> IMHO we should just silently fall back to the default behavior, i.e. we
> couldn't assign the name, fine, we continue to run without a name, thus
> something like this:
>
> rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", name);
> if (rn >= 0)
> prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)shctx,
> totalsize, (uintptr_t)fullname);
>
> Could you please adjust it and verify that it's still OK for you ?
> Likewise I can check it on a 5.15 here.
>
> Thank you!
> Willy
>
From 72542db6b65994f1755d161fad04828f435f7134 Mon Sep 17 00:00:00 2001
From: David Carlier 
Date: Sat, 20 Apr 2024 07:18:48 +0100
Subject: [PATCH] MEDIUM: shctx: Naming shared memory context

From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA
so caches can be identified when looking at HAProxy process memory
mapping.
The most possible error is lack of kernel support, as a result
we ignore it, if the naming fails the mapping of memory context
ought to still occur.
---
 include/haproxy/shctx.h |  2 +-
 src/cache.c |  2 +-
 src/shctx.c | 34 +++---
 src/ssl_sock.c  |  2 +-
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/haproxy/shctx.h b/include/haproxy/shctx.h
index a57cf1567..01bb09d32 100644
--- a/include/haproxy/shctx.h
+++ b/include/haproxy/shctx.h
@@ -21,7 +21,7 @@
 
 int shctx_init(struct shared_context **orig_shctx,
int maxblocks, int blocksize, unsigned int maxobjsz,
-   int extra);
+   int extra, __maybe_unused const char *name);
 struct shared_block *shctx_row_reserve_hot(struct shared_context *shctx,
struct shared_block *last, int data_len);
 void shctx_row_detach(struct shared_context *shctx, struct shared_block *first);
diff --git a/src/cache.c b/src/cache.c
index 4a9992a19..25f20a782 100644
--- a/src/cache.c
+++ b/src/cache.c
@@ -2471,7 +2471,7 @@ int post_check_cache()
 	list_for_each_entry_safe(cache_config, back, _config, list) {
 
 		ret_shctx = shctx_init(, cache_config->maxblocks, CACHE_BLOCKSIZE,
-		   cache_config->maxobjsz, sizeof(struct cache));
+		   cache_config->maxobjsz, sizeof(struct cache), cache_config->id);
 
 		if (ret_shctx <= 0) {
 			if (ret_shctx == SHCTX_E_INIT_LOCK)
diff --git a/src/shctx.c b/src/shctx.c
index be5905392..6c7ad172d 100644
--- a/src/shctx.c
+++ b/src/shctx.c
@@ -16,6 +16,9 @@
 #include 
 #include 
 #include 
+#if defined(USE_PRCTL)
+#include 
+#endif
 
 /*
  * Reserve a new row if  is null, put it in the hotlist, set the refcount to 1
@@ -269,13 +272,14 @@ int shctx_row_data_get(struct shared_context *shctx, struct shared_block *first,
  * and 0 if cache is already allocated.
  */
 int 

Re: [PATCH] MEDIUM: shctx naming shared memory context

2024-04-24 Thread Willy Tarreau
Hi David,

On Sat, Apr 20, 2024 at 07:33:16AM +0100, David CARLIER wrote:
> From d49d9d5966caead320f33f789578cb69f2aa3787 Mon Sep 17 00:00:00 2001
> From: David Carlier 
> Date: Sat, 20 Apr 2024 07:18:48 +0100
> Subject: [PATCH] MEDIUM: shctx: Naming shared memory context
> 
> From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA
> so caches can be identified when looking at HAProxy process memory
> mapping.

That's pretty cool, I wasn't aware of this feature, that's really
interesting!

However I disagree with this point:

> +#if defined(USE_PRCTL) && defined(PR_SET_VMA)
> + {
> + /**
> +  * From Linux 5.17 (and if the `CONFIG_ANON_VMA_NAME` kernel 
> config is set)`,
> +  * anonymous regions can be named.
> +  *
> +  * The naming can take up to 79 characters, accepting valid 
> ASCII values
> +  * except [, ], \, $ and '.
> +  * As a result, when looking for /proc//maps, we can see 
> the anonymous range
> +  * as follow :
> +  * `7364c4fff000-73650800 rw-s  00:01 3540  
> [anon_shmem:HAProxy globalCache]`
> +  */
> + int rn;
> + char fullname[80];
> +
> + rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", name);
> + if (rn < 0 || prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, 
> (uintptr_t)shctx,
> +totalsize, (uintptr_t)fullname) != 0) {
> + munmap(shctx, totalsize);
> + shctx = NULL;
> + ret = SHCTX_E_ALLOC_CACHE;
> + goto err;
> + }
> + }
> +#endif

You're making the mapping fail on any kernel that does not support the
feature (hence apparently anything < 5.17 according to your description).
IMHO we should just silently fall back to the default behavior, i.e. we
couldn't assign the name, fine, we continue to run without a name, thus
something like this:

rn = snprintf(fullname, sizeof(fullname), "HAProxy %s", name);
if (rn >= 0)
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, (uintptr_t)shctx, 
totalsize, (uintptr_t)fullname);

Could you please adjust it and verify that it's still OK for you ?
Likewise I can check it on a 5.15 here.

Thank you!
Willy



Re: [PATCH 2/3] MINOR: Add `ha_generate_uuid_v7`

2024-04-24 Thread Willy Tarreau
Hi Tim!

On Fri, Apr 19, 2024 at 09:01:26PM +0200, Tim Duesterhus wrote:
> +/* Generates a draft-ietf-uuidrev-rfc4122bis-14 version 7 UUID into chunk
> + *  which must be at least 37 bytes large.
> + */
> +void ha_generate_uuid_v7(struct buffer *output)
> +{
> + uint32_t rnd[3];
> + uint64_t last;
> + uint64_t time;
> +
> + time = (date.tv_sec * 1000) + (date.tv_usec / 1000);
> + last = ha_random64();
> + rnd[0] = last;
> + rnd[1] = last >> 32;
> +
> + last = ha_random64();
> + rnd[2] = last;
> +
> + chunk_printf(output, "%8.8x-%4.4x-%4.4x-%4.4x-%12.12llx",
> +  (uint)(time >> 16u),
> +  (uint)(time & 0x),
> +  ((rnd[0] >> 16u) & 0xFFF) | 0x7000,  // highest 4 bits 
> indicate the uuid version
> +  (rnd[1] & 0x3FFF) | 0x8000,  // the highest 2 bits 
> indicate the UUID variant (10),
> +  (long long)((rnd[1] >> 14u) | ((uint64_t) rnd[2] << 18u)) 
> & 0xull);
> +}

Just thinking about all the shifts above, I think you could have
gone through less efforts by acting on 64-bit randoms (less shifts).
But the difference is probably not that much anyway.

In any case, that looks good and I've merged it now. Many thanks!
Willy