Re: [regression] TCP_MD5SIG on established sockets

2020-07-01 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 3:07 PM Eric Dumazet  wrote:

> Oh well, tcp_syn_options() is supposed to have the same logic.
>
> Maybe we have an issue with SYNCOOKIES (with MD5 + TS + SACK)
>
> Nice can of worms.


Yes, MD5 does not like SYNCOOKIES in some cases.

In this trace, S is a linux host, the SYNACK is a syncookie.

C host is attempting a SYN with MD5+TS+SACK, which a standard linux
host would not attempt.
But we could expect another stack to attempt this combination.

TCP stack believes it can encode selected TCP options (in the TSVAL value),
but since MD5 option is set, TS option is not sent in the SYNACK.
However we still send other options that MUST not be sent if TS option
could not fit the TCP option space.

10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535,
options [nop,nop,md5 valid,mss 65495,sackOK,TS val 456965269 ecr
0,nop,wscale 8], length 0
10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602,
win 65535, options [nop,nop,md5 valid,mss
65495,nop,nop,sackOK,nop,wscale 8], length 0



10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options
[nop,nop,md5 valid], length 0
10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256,
options [nop,nop,md5 valid], length 12
10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options
[nop,nop,md5 valid], length 0

We can see for instance the windows are wrong by a 256 factor (wscale 8)


Re: [regression] TCP_MD5SIG on established sockets

2020-07-01 Thread Eric Dumazet
On Wed, Jul 1, 2020 at 5:19 AM Mathieu Desnoyers
 wrote:

> The approach below looks good to me, but you'll also need to annotate
> both tcp_md5_hash_key and tcp_md5_do_add with __no_kcsan or use
> data_race(expr) to let the concurrency sanitizer know that there is
> a known data race which is there on purpose (triggered by memcpy in 
> tcp_md5_do_add
> and somewhere within crypto_ahash_update). See 
> Documentation/dev-tools/kcsan.rst
> for details.

Sure, I can add a data_race() and let stable teams handle the
backports without it ;)

>
> Thanks,
>
> Mathieu
>
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index
> > f111660453241692a17c881dd6dc2910a1236263..c3af8180c7049d5c4987bf5c67e4aff2ed6967c9
> > 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -4033,11 +4033,9 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> >
> > int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> > tcp_md5sig_key *key)
> > {
> > -   u8 keylen = key->keylen;
> > +   u8 keylen = READ_ONCE(key->keylen); /* paired with
> > WRITE_ONCE() in tcp_md5_do_add */
> >struct scatterlist sg;
> >
> > -   smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> > -
> >sg_init_one(, key->key, keylen);
> >ahash_request_set_crypt(hp->md5_req, , NULL, keylen);
> >return crypto_ahash_update(hp->md5_req);
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index
> > 99916fcc15ca0be12c2c133ff40516f79e6fdf7f..0d08e0134335a21d23702e6a5c24a0f2b3c61c6f
> > 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1114,9 +1114,13 @@ int tcp_md5_do_add(struct sock *sk, const union
> > tcp_md5_addr *addr,
> >/* Pre-existing entry - just update that one. */
> >memcpy(key->key, newkey, newkeylen);
> >
> > -   smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> > +   /* Pairs with READ_ONCE() in tcp_md5_hash_key().
> > +* Also note that a reader could catch new key->keylen value
> > +* but old key->key[], this is the reason we use __GFP_ZERO
> > +* at sock_kmalloc() time below these lines.
> > +*/
> > +   WRITE_ONCE(key->keylen, newkeylen);
> >
> > -   key->keylen = newkeylen;
> >return 0;
> >}
> >
> > @@ -1132,7 +1136,7 @@ int tcp_md5_do_add(struct sock *sk, const union
> > tcp_md5_addr *addr,
> >rcu_assign_pointer(tp->md5sig_info, md5sig);
> >}
> >
> > -   key = sock_kmalloc(sk, sizeof(*key), gfp);
> > +   key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO);
> >if (!key)
> >return -ENOMEM;
> > if (!tcp_alloc_md5sig_pool()) {
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com


Re: [regression] TCP_MD5SIG on established sockets

2020-07-01 Thread Mathieu Desnoyers
- On Jun 30, 2020, at 11:36 PM, Eric Dumazet eduma...@google.com wrote:

> On Tue, Jun 30, 2020 at 7:59 PM Herbert Xu  
> wrote:
>>
>> On Tue, Jun 30, 2020 at 07:30:43PM -0700, Eric Dumazet wrote:
>> >
>> > I made this clear in the changelog, do we want comments all over the 
>> > places ?
>> > Do not get me wrong, we had this bug for years and suddenly this is a
>> > big deal...
>>
>> I thought you were adding a new pair of smp_rmb/smp_wmb.  If they
>> already exist in the code then I agree it's not a big deal.  But
>> adding a new pair of bogus smp_Xmb's is bad for maintenance.
>>
> 
> If I knew so many people were excited about TCP / MD5, I would have
> posted all my patches on lkml ;)
> 
> Without the smp_wmb() we would still need something to prevent KMSAN
> from detecting that we read uninitialized bytes,
> if key->keylen is increased.  (initial content of key->key[] is garbage)
> 
> Something like this :

The approach below looks good to me, but you'll also need to annotate
both tcp_md5_hash_key and tcp_md5_do_add with __no_kcsan or use
data_race(expr) to let the concurrency sanitizer know that there is
a known data race which is there on purpose (triggered by memcpy in 
tcp_md5_do_add
and somewhere within crypto_ahash_update). See Documentation/dev-tools/kcsan.rst
for details.

Thanks,

Mathieu

> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index
> f111660453241692a17c881dd6dc2910a1236263..c3af8180c7049d5c4987bf5c67e4aff2ed6967c9
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4033,11 +4033,9 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> 
> int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> tcp_md5sig_key *key)
> {
> -   u8 keylen = key->keylen;
> +   u8 keylen = READ_ONCE(key->keylen); /* paired with
> WRITE_ONCE() in tcp_md5_do_add */
>struct scatterlist sg;
> 
> -   smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> -
>sg_init_one(, key->key, keylen);
>ahash_request_set_crypt(hp->md5_req, , NULL, keylen);
>return crypto_ahash_update(hp->md5_req);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index
> 99916fcc15ca0be12c2c133ff40516f79e6fdf7f..0d08e0134335a21d23702e6a5c24a0f2b3c61c6f
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1114,9 +1114,13 @@ int tcp_md5_do_add(struct sock *sk, const union
> tcp_md5_addr *addr,
>/* Pre-existing entry - just update that one. */
>memcpy(key->key, newkey, newkeylen);
> 
> -   smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> +   /* Pairs with READ_ONCE() in tcp_md5_hash_key().
> +* Also note that a reader could catch new key->keylen value
> +* but old key->key[], this is the reason we use __GFP_ZERO
> +* at sock_kmalloc() time below these lines.
> +*/
> +   WRITE_ONCE(key->keylen, newkeylen);
> 
> -   key->keylen = newkeylen;
>return 0;
>}
> 
> @@ -1132,7 +1136,7 @@ int tcp_md5_do_add(struct sock *sk, const union
> tcp_md5_addr *addr,
>rcu_assign_pointer(tp->md5sig_info, md5sig);
>}
> 
> -   key = sock_kmalloc(sk, sizeof(*key), gfp);
> +   key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO);
>if (!key)
>return -ENOMEM;
> if (!tcp_alloc_md5sig_pool()) {

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Herbert Xu
On Tue, Jun 30, 2020 at 08:36:51PM -0700, Eric Dumazet wrote:
>
> If I knew so many people were excited about TCP / MD5, I would have
> posted all my patches on lkml ;)
> 
> Without the smp_wmb() we would still need something to prevent KMSAN
> from detecting that we read uninitialized bytes,
> if key->keylen is increased.  (initial content of key->key[] is garbage)
> 
> Something like this :

LGTM.  Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 7:59 PM Herbert Xu  wrote:
>
> On Tue, Jun 30, 2020 at 07:30:43PM -0700, Eric Dumazet wrote:
> >
> > I made this clear in the changelog, do we want comments all over the places 
> > ?
> > Do not get me wrong, we had this bug for years and suddenly this is a
> > big deal...
>
> I thought you were adding a new pair of smp_rmb/smp_wmb.  If they
> already exist in the code then I agree it's not a big deal.  But
> adding a new pair of bogus smp_Xmb's is bad for maintenance.
>

If I knew so many people were excited about TCP / MD5, I would have
posted all my patches on lkml ;)

Without the smp_wmb() we would still need something to prevent KMSAN
from detecting that we read uninitialized bytes,
if key->keylen is increased.  (initial content of key->key[] is garbage)

Something like this :

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 
f111660453241692a17c881dd6dc2910a1236263..c3af8180c7049d5c4987bf5c67e4aff2ed6967c9
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4033,11 +4033,9 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);

 int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
tcp_md5sig_key *key)
 {
-   u8 keylen = key->keylen;
+   u8 keylen = READ_ONCE(key->keylen); /* paired with
WRITE_ONCE() in tcp_md5_do_add */
struct scatterlist sg;

-   smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
-
sg_init_one(, key->key, keylen);
ahash_request_set_crypt(hp->md5_req, , NULL, keylen);
return crypto_ahash_update(hp->md5_req);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 
99916fcc15ca0be12c2c133ff40516f79e6fdf7f..0d08e0134335a21d23702e6a5c24a0f2b3c61c6f
100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1114,9 +1114,13 @@ int tcp_md5_do_add(struct sock *sk, const union
tcp_md5_addr *addr,
/* Pre-existing entry - just update that one. */
memcpy(key->key, newkey, newkeylen);

-   smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
+   /* Pairs with READ_ONCE() in tcp_md5_hash_key().
+* Also note that a reader could catch new key->keylen value
+* but old key->key[], this is the reason we use __GFP_ZERO
+* at sock_kmalloc() time below these lines.
+*/
+   WRITE_ONCE(key->keylen, newkeylen);

-   key->keylen = newkeylen;
return 0;
}

@@ -1132,7 +1136,7 @@ int tcp_md5_do_add(struct sock *sk, const union
tcp_md5_addr *addr,
rcu_assign_pointer(tp->md5sig_info, md5sig);
}

-   key = sock_kmalloc(sk, sizeof(*key), gfp);
+   key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO);
if (!key)
return -ENOMEM;
if (!tcp_alloc_md5sig_pool()) {


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Herbert Xu
On Tue, Jun 30, 2020 at 07:30:43PM -0700, Eric Dumazet wrote:
>
> I made this clear in the changelog, do we want comments all over the places ?
> Do not get me wrong, we had this bug for years and suddenly this is a
> big deal...

I thought you were adding a new pair of smp_rmb/smp_wmb.  If they
already exist in the code then I agree it's not a big deal.  But
adding a new pair of bogus smp_Xmb's is bad for maintenance.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Joe Perches
On Tue, 2020-06-30 at 19:30 -0700, Eric Dumazet wrote:
> On Tue, Jun 30, 2020 at 7:23 PM Herbert Xu  
> wrote:
> > On Tue, Jun 30, 2020 at 07:17:46PM -0700, Eric Dumazet wrote:
> > > The main issue of the prior code was the double read of key->keylen in
> > > tcp_md5_hash_key(), not that few bytes could change under us.
> > > 
> > > I used smp_rmb() to ease backports, since old kernels had no
> > > READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead.
> > 
> > If it's the double-read that you're protecting against, you should
> > just use barrier() and the comment should say so too.
> 
> I made this clear in the changelog, do we want comments all over the places ?

Having to run git for every line of code isn't great.

Comments in code is better than comments in changelogs.




Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 7:23 PM Herbert Xu  wrote:
>
> On Tue, Jun 30, 2020 at 07:17:46PM -0700, Eric Dumazet wrote:
> >
> > The main issue of the prior code was the double read of key->keylen in
> > tcp_md5_hash_key(), not that few bytes could change under us.
> >
> > I used smp_rmb() to ease backports, since old kernels had no
> > READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead.
>
> If it's the double-read that you're protecting against, you should
> just use barrier() and the comment should say so too.

I made this clear in the changelog, do we want comments all over the places ?
Do not get me wrong, we had this bug for years and suddenly this is a
big deal...

MD5 keys are read with RCU protection, and tcp_md5_do_add()
might update in-place a prior key.

Normally, typical RCU updates would allocate a new piece
of memory. In this case only key->key and key->keylen might
be updated, and we do not care if an incoming packet could
see the old key, the new one, or some intermediate value,
since changing the key on a live flow is known to be problematic
anyway.

We only want to make sure that in the case key->keylen
is changed, cpus in tcp_md5_hash_key() wont try to use
uninitialized data, or crash because key->keylen was
read twice to feed sg_init_one() and ahash_request_set_crypt()


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Herbert Xu
On Tue, Jun 30, 2020 at 07:17:46PM -0700, Eric Dumazet wrote:
>
> The main issue of the prior code was the double read of key->keylen in
> tcp_md5_hash_key(), not that few bytes could change under us.
>
> I used smp_rmb() to ease backports, since old kernels had no
> READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead.

If it's the double-read that you're protecting against, you should
just use barrier() and the comment should say so too.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 7:02 PM Herbert Xu  wrote:
>
> Eric Dumazet  wrote:
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 
> > 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124
> > 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> > int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> > tcp_md5sig_key *key)
> > {
> >struct scatterlist sg;
> > +   u8 keylen = key->keylen;
> >
> > -   sg_init_one(, key->key, key->keylen);
> > -   ahash_request_set_crypt(hp->md5_req, , NULL, key->keylen);
> > +   smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> > +
> > +   sg_init_one(, key->key, keylen);
> > +   ahash_request_set_crypt(hp->md5_req, , NULL, keylen);
> >return crypto_ahash_update(hp->md5_req);
> > }
> > EXPORT_SYMBOL(tcp_md5_hash_key);
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 
> > ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f
> > 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union
> > tcp_md5_addr *addr,
> >if (key) {
> >/* Pre-existing entry - just update that one. */
> >memcpy(key->key, newkey, newkeylen);
> > +
> > +   smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> > +
> >key->keylen = newkeylen;
> >return 0;
> >}
>
> This doesn't make sense.  Your smp_rmb only guarantees that you
> see a version of key->key that's newer than keylen.  What if the
> key got changed twice? You could still read more than what's in
> the key (if that's what you're trying to protect against).

The worst that can happen is a dropped packet.

Which is anyway going to happen anyway eventually if an application
changes a TCP MD5 key on a live flow.

The main issue of the prior code was the double read of key->keylen in
tcp_md5_hash_key(), not that few bytes could change under us.

I used smp_rmb() to ease backports, since old kernels had no
READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead.


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Herbert Xu
Eric Dumazet  wrote:
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 
> 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> tcp_md5sig_key *key)
> {
>struct scatterlist sg;
> +   u8 keylen = key->keylen;
> 
> -   sg_init_one(, key->key, key->keylen);
> -   ahash_request_set_crypt(hp->md5_req, , NULL, key->keylen);
> +   smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> +
> +   sg_init_one(, key->key, keylen);
> +   ahash_request_set_crypt(hp->md5_req, , NULL, keylen);
>return crypto_ahash_update(hp->md5_req);
> }
> EXPORT_SYMBOL(tcp_md5_hash_key);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 
> ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union
> tcp_md5_addr *addr,
>if (key) {
>/* Pre-existing entry - just update that one. */
>memcpy(key->key, newkey, newkeylen);
> +
> +   smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> +
>key->keylen = newkeylen;
>return 0;
>}

This doesn't make sense.  Your smp_rmb only guarantees that you
see a version of key->key that's newer than keylen.  What if the
key got changed twice? You could still read more than what's in
the key (if that's what you're trying to protect against).

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 4:44 PM Mathieu Desnoyers
 wrote:
>
> - On Jun 30, 2020, at 6:38 PM, Eric Dumazet eduma...@google.com wrote:
> [...]
> >
> > For updates of keys, it seems existing code lacks some RCU care.
> >
> > MD5 keys use RCU for lookups/hashes, but the replacement of a key does
> > not allocate a new piece of memory.
>
> How is that RCU-safe ?
>
> Based on what I see here:
>
> tcp_md5_do_add() has a comment stating:
>
> "/* This can be called on a newly created socket, from other files */"
>
> which appears to be untrue if this can indeed be called on a live socket.

"This can be called" is not the same than "this is always called for
newly created socket"

>
> The path for pre-existing keys does:
>
> key = tcp_md5_do_lookup_exact(sk, addr, family, prefixlen, l3index);
> if (key) {
> /* Pre-existing entry - just update that one. */
> memcpy(key->key, newkey, newkeylen);
> key->keylen = newkeylen;
> return 0;
> }
>
> AFAIU, this works only if you assume there are no concurrent readers
> accessing key->key, else they can see a corrupted key.

This is fine.

>
> The change you are proposing adds smp_wmb/smp_rmb to pair stores
> to key before key_len with loads of key_len before key. I'm not sure
> what this is trying to achieve, and how it prevents the readers from
> observing a corrupted state if the key is updated on a live socket ?



By definition if you change the MD5 key on a socket while packets are
flying, the incoming packet could either

1) See old key (packet is dropped)
2) See new key.

So any other decision (catching intermediate state) is really not an
issue, since you already accepted the fact that a packet could be
dropped,
and TCP will retransmit.

TCP MD5 implementation does not support multiple keys  for one flow,
you can not have both old and new keys being checked.


>
> Based on my understanding, this path which deals with pre-existing keys
> in-place should only ever be used when there are no concurrent readers,
> else a new memory allocation would be needed to guarantee that readers
> always observe a valid copy.

This is not _needed,_ and since memory allocations can fail, we would
potentially break applications
assuming that changing MD5 key would never fail.

Patch has been sent for review on netdev@ (
https://patchwork.ozlabs.org/project/netdev/patch/20200630234101.3259179-1-eduma...@google.com/
)


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Mathieu Desnoyers
- On Jun 30, 2020, at 6:38 PM, Eric Dumazet eduma...@google.com wrote:
[...]
> 
> For updates of keys, it seems existing code lacks some RCU care.
> 
> MD5 keys use RCU for lookups/hashes, but the replacement of a key does
> not allocate a new piece of memory.

How is that RCU-safe ?

Based on what I see here:

tcp_md5_do_add() has a comment stating:

"/* This can be called on a newly created socket, from other files */"

which appears to be untrue if this can indeed be called on a live socket.

The path for pre-existing keys does:

key = tcp_md5_do_lookup_exact(sk, addr, family, prefixlen, l3index);
if (key) {
/* Pre-existing entry - just update that one. */
memcpy(key->key, newkey, newkeylen);
key->keylen = newkeylen;
return 0;
}

AFAIU, this works only if you assume there are no concurrent readers
accessing key->key, else they can see a corrupted key.

The change you are proposing adds smp_wmb/smp_rmb to pair stores
to key before key_len with loads of key_len before key. I'm not sure
what this is trying to achieve, and how it prevents the readers from
observing a corrupted state if the key is updated on a live socket ?

Based on my understanding, this path which deals with pre-existing keys
in-place should only ever be used when there are no concurrent readers,
else a new memory allocation would be needed to guarantee that readers
always observe a valid copy.

Thanks,

Mathieu

> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index
> 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> tcp_md5sig_key *key)
> {
>struct scatterlist sg;
> +   u8 keylen = key->keylen;
> 
> -   sg_init_one(, key->key, key->keylen);
> -   ahash_request_set_crypt(hp->md5_req, , NULL, key->keylen);
> +   smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> +
> +   sg_init_one(, key->key, keylen);
> +   ahash_request_set_crypt(hp->md5_req, , NULL, keylen);
>return crypto_ahash_update(hp->md5_req);
> }
> EXPORT_SYMBOL(tcp_md5_hash_key);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index
> ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union
> tcp_md5_addr *addr,
>if (key) {
>/* Pre-existing entry - just update that one. */
>memcpy(key->key, newkey, newkeylen);
> +
> +   smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> +
>key->keylen = newkeylen;
>return 0;
> }

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 3:07 PM Eric Dumazet  wrote:
>
> On Tue, Jun 30, 2020 at 2:54 PM Eric Dumazet  wrote:
> >
> > On Tue, Jun 30, 2020 at 2:23 PM Eric Dumazet  wrote:
> > >
> > > On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers
> > >  wrote:
> > > >
> > > > - On Jun 30, 2020, at 4:56 PM, Eric Dumazet eduma...@google.com 
> > > > wrote:
> > > >
> > > > > On Tue, Jun 30, 2020 at 1:44 PM David Miller  
> > > > > wrote:
> > > > >>
> > > > >> From: Eric Dumazet 
> > > > >> Date: Tue, 30 Jun 2020 13:39:27 -0700
> > > > >>
> > > > >> > The (C) & (B) case are certainly doable.
> > > > >> >
> > > > >> > A) case is more complex, I have no idea of breakages of various TCP
> > > > >> > stacks if a flow got SACK
> > > > >> > at some point (in 3WHS) but suddenly becomes Reno.
> > > > >>
> > > > >> I agree that C and B are the easiest to implement without having to
> > > > >> add complicated code to handle various negotiated TCP option
> > > > >> scenerios.
> > > > >>
> > > > >> It does seem to be that some entities do A, or did I misread your
> > > > >> behavioral analysis of various implementations Mathieu?
> > > > >>
> > > > >> Thanks.
> > > > >
> > > > > Yes, another question about Mathieu cases is do determine the behavior
> > > > > of all these stacks vs :
> > > > > SACK option
> > > > > TCP TS option.
> > > >
> > > > I will ask my customer's networking team to investigate these behaviors,
> > > > which will allow me to prepare a thorough reply to the questions raised
> > > > by Eric and David. I expect to have an answer within 2-3 weeks at most.
> > > >
> > > > Thank you!
> > >
> > >
> > > Great, I am working on adding back support for (B) & (C) by the end of
> > > this week.
> >
> > Note that the security issue (of sending uninit bytes to the wire) has
> > been independently fixed with [1]
> >
> > This means syzbot was able to have MD5+TS+SACK  ~6 months ago.
> >
> > It seems we (linux) do not enable this combination for PASSIVE flows,
> > (according to tcp_synack_options()),
> > but  for ACTIVE flows we do nothing special.
> >
> > So maybe code in tcp_synack_options() should be mirrored to
> > tcp_syn_options() for consistency.
> > (disabling TS if  both MD5 and SACK are enabled)
>
> Oh well, tcp_syn_options() is supposed to have the same logic.
>
> Maybe we have an issue with SYNCOOKIES (with MD5 + TS + SACK)
>
> Nice can of worms.
>

For updates of keys, it seems existing code lacks some RCU care.

MD5 keys use RCU for lookups/hashes, but the replacement of a key does
not allocate a new piece of memory.

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 
810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
 int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
tcp_md5sig_key *key)
 {
struct scatterlist sg;
+   u8 keylen = key->keylen;

-   sg_init_one(, key->key, key->keylen);
-   ahash_request_set_crypt(hp->md5_req, , NULL, key->keylen);
+   smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
+
+   sg_init_one(, key->key, keylen);
+   ahash_request_set_crypt(hp->md5_req, , NULL, keylen);
return crypto_ahash_update(hp->md5_req);
 }
 EXPORT_SYMBOL(tcp_md5_hash_key);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 
ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f
100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union
tcp_md5_addr *addr,
if (key) {
/* Pre-existing entry - just update that one. */
memcpy(key->key, newkey, newkeylen);
+
+   smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
+
key->keylen = newkeylen;
return 0;
}


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 2:54 PM Eric Dumazet  wrote:
>
> On Tue, Jun 30, 2020 at 2:23 PM Eric Dumazet  wrote:
> >
> > On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers
> >  wrote:
> > >
> > > - On Jun 30, 2020, at 4:56 PM, Eric Dumazet eduma...@google.com wrote:
> > >
> > > > On Tue, Jun 30, 2020 at 1:44 PM David Miller  
> > > > wrote:
> > > >>
> > > >> From: Eric Dumazet 
> > > >> Date: Tue, 30 Jun 2020 13:39:27 -0700
> > > >>
> > > >> > The (C) & (B) case are certainly doable.
> > > >> >
> > > >> > A) case is more complex, I have no idea of breakages of various TCP
> > > >> > stacks if a flow got SACK
> > > >> > at some point (in 3WHS) but suddenly becomes Reno.
> > > >>
> > > >> I agree that C and B are the easiest to implement without having to
> > > >> add complicated code to handle various negotiated TCP option
> > > >> scenerios.
> > > >>
> > > >> It does seem to be that some entities do A, or did I misread your
> > > >> behavioral analysis of various implementations Mathieu?
> > > >>
> > > >> Thanks.
> > > >
> > > > Yes, another question about Mathieu cases is do determine the behavior
> > > > of all these stacks vs :
> > > > SACK option
> > > > TCP TS option.
> > >
> > > I will ask my customer's networking team to investigate these behaviors,
> > > which will allow me to prepare a thorough reply to the questions raised
> > > by Eric and David. I expect to have an answer within 2-3 weeks at most.
> > >
> > > Thank you!
> >
> >
> > Great, I am working on adding back support for (B) & (C) by the end of
> > this week.
>
> Note that the security issue (of sending uninit bytes to the wire) has
> been independently fixed with [1]
>
> This means syzbot was able to have MD5+TS+SACK  ~6 months ago.
>
> It seems we (linux) do not enable this combination for PASSIVE flows,
> (according to tcp_synack_options()),
> but  for ACTIVE flows we do nothing special.
>
> So maybe code in tcp_synack_options() should be mirrored to
> tcp_syn_options() for consistency.
> (disabling TS if  both MD5 and SACK are enabled)

Oh well, tcp_syn_options() is supposed to have the same logic.

Maybe we have an issue with SYNCOOKIES (with MD5 + TS + SACK)

Nice can of worms.

>
> [1]
>
> commit 9424e2e7ad93a88f882c9bc5023570904b55
> Author: Eric Dumazet 
> Date:   Thu Dec 5 10:10:15 2019 -0800
>
> tcp: md5: fix potential overestimation of TCP option space
>
> Back in 2008, Adam Langley fixed the corner case of packets for flows
> having all of the following options : MD5 TS SACK
>
> Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
> can be cooked from the remaining 8 bytes.
>
> tcp_established_options() correctly sets opts->num_sack_blocks
> to zero, but returns 36 instead of 32.
>
> This means TCP cooks packets with 4 extra bytes at the end
> of options, containing unitialized bytes.
>
> Fixes: 33ad798c924b ("tcp: options clean up")
> Signed-off-by: Eric Dumazet 
> Reported-by: syzbot 
> Acked-by: Neal Cardwell 
> Acked-by: Soheil Hassas Yeganeh 
> Signed-off-by: David S. Miller 
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 
> be6d22b8190fa375074062032105879270af4be5..b184f03d743715ef4b2d166ceae651529be77953
> 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -755,8 +755,9 @@ static unsigned int tcp_established_options(struct
> sock *sk, struct sk_buff *skb
> min_t(unsigned int, eff_sacks,
>   (remaining - TCPOLEN_SACK_BASE_ALIGNED) /
>   TCPOLEN_SACK_PERBLOCK);
> -   size += TCPOLEN_SACK_BASE_ALIGNED +
> -   opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
> +   if (likely(opts->num_sack_blocks))
> +   size += TCPOLEN_SACK_BASE_ALIGNED +
> +   opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
> }
>
> return size;


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 2:23 PM Eric Dumazet  wrote:
>
> On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers
>  wrote:
> >
> > - On Jun 30, 2020, at 4:56 PM, Eric Dumazet eduma...@google.com wrote:
> >
> > > On Tue, Jun 30, 2020 at 1:44 PM David Miller  wrote:
> > >>
> > >> From: Eric Dumazet 
> > >> Date: Tue, 30 Jun 2020 13:39:27 -0700
> > >>
> > >> > The (C) & (B) case are certainly doable.
> > >> >
> > >> > A) case is more complex, I have no idea of breakages of various TCP
> > >> > stacks if a flow got SACK
> > >> > at some point (in 3WHS) but suddenly becomes Reno.
> > >>
> > >> I agree that C and B are the easiest to implement without having to
> > >> add complicated code to handle various negotiated TCP option
> > >> scenerios.
> > >>
> > >> It does seem to be that some entities do A, or did I misread your
> > >> behavioral analysis of various implementations Mathieu?
> > >>
> > >> Thanks.
> > >
> > > Yes, another question about Mathieu cases is do determine the behavior
> > > of all these stacks vs :
> > > SACK option
> > > TCP TS option.
> >
> > I will ask my customer's networking team to investigate these behaviors,
> > which will allow me to prepare a thorough reply to the questions raised
> > by Eric and David. I expect to have an answer within 2-3 weeks at most.
> >
> > Thank you!
>
>
> Great, I am working on adding back support for (B) & (C) by the end of
> this week.

Note that the security issue (of sending uninit bytes to the wire) has
been independently fixed with [1]

This means syzbot was able to have MD5+TS+SACK  ~6 months ago.

It seems we (linux) do not enable this combination for PASSIVE flows,
(according to tcp_synack_options()),
but  for ACTIVE flows we do nothing special.

So maybe code in tcp_synack_options() should be mirrored to
tcp_syn_options() for consistency.
(disabling TS if  both MD5 and SACK are enabled)

[1]

commit 9424e2e7ad93a88f882c9bc5023570904b55
Author: Eric Dumazet 
Date:   Thu Dec 5 10:10:15 2019 -0800

tcp: md5: fix potential overestimation of TCP option space

Back in 2008, Adam Langley fixed the corner case of packets for flows
having all of the following options : MD5 TS SACK

Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
can be cooked from the remaining 8 bytes.

tcp_established_options() correctly sets opts->num_sack_blocks
to zero, but returns 36 instead of 32.

This means TCP cooks packets with 4 extra bytes at the end
of options, containing unitialized bytes.

Fixes: 33ad798c924b ("tcp: options clean up")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Acked-by: Neal Cardwell 
Acked-by: Soheil Hassas Yeganeh 
Signed-off-by: David S. Miller 

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 
be6d22b8190fa375074062032105879270af4be5..b184f03d743715ef4b2d166ceae651529be77953
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -755,8 +755,9 @@ static unsigned int tcp_established_options(struct
sock *sk, struct sk_buff *skb
min_t(unsigned int, eff_sacks,
  (remaining - TCPOLEN_SACK_BASE_ALIGNED) /
  TCPOLEN_SACK_PERBLOCK);
-   size += TCPOLEN_SACK_BASE_ALIGNED +
-   opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
+   if (likely(opts->num_sack_blocks))
+   size += TCPOLEN_SACK_BASE_ALIGNED +
+   opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
}

return size;


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers
 wrote:
>
> - On Jun 30, 2020, at 4:56 PM, Eric Dumazet eduma...@google.com wrote:
>
> > On Tue, Jun 30, 2020 at 1:44 PM David Miller  wrote:
> >>
> >> From: Eric Dumazet 
> >> Date: Tue, 30 Jun 2020 13:39:27 -0700
> >>
> >> > The (C) & (B) case are certainly doable.
> >> >
> >> > A) case is more complex, I have no idea of breakages of various TCP
> >> > stacks if a flow got SACK
> >> > at some point (in 3WHS) but suddenly becomes Reno.
> >>
> >> I agree that C and B are the easiest to implement without having to
> >> add complicated code to handle various negotiated TCP option
> >> scenerios.
> >>
> >> It does seem to be that some entities do A, or did I misread your
> >> behavioral analysis of various implementations Mathieu?
> >>
> >> Thanks.
> >
> > Yes, another question about Mathieu cases is do determine the behavior
> > of all these stacks vs :
> > SACK option
> > TCP TS option.
>
> I will ask my customer's networking team to investigate these behaviors,
> which will allow me to prepare a thorough reply to the questions raised
> by Eric and David. I expect to have an answer within 2-3 weeks at most.
>
> Thank you!


Great, I am working on adding back support for (B) & (C) by the end of
this week.


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Mathieu Desnoyers
- On Jun 30, 2020, at 4:56 PM, Eric Dumazet eduma...@google.com wrote:

> On Tue, Jun 30, 2020 at 1:44 PM David Miller  wrote:
>>
>> From: Eric Dumazet 
>> Date: Tue, 30 Jun 2020 13:39:27 -0700
>>
>> > The (C) & (B) case are certainly doable.
>> >
>> > A) case is more complex, I have no idea of breakages of various TCP
>> > stacks if a flow got SACK
>> > at some point (in 3WHS) but suddenly becomes Reno.
>>
>> I agree that C and B are the easiest to implement without having to
>> add complicated code to handle various negotiated TCP option
>> scenerios.
>>
>> It does seem to be that some entities do A, or did I misread your
>> behavioral analysis of various implementations Mathieu?
>>
>> Thanks.
> 
> Yes, another question about Mathieu cases is do determine the behavior
> of all these stacks vs :
> SACK option
> TCP TS option.

I will ask my customer's networking team to investigate these behaviors,
which will allow me to prepare a thorough reply to the questions raised
by Eric and David. I expect to have an answer within 2-3 weeks at most.

Thank you!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 1:44 PM David Miller  wrote:
>
> From: Eric Dumazet 
> Date: Tue, 30 Jun 2020 13:39:27 -0700
>
> > The (C) & (B) case are certainly doable.
> >
> > A) case is more complex, I have no idea of breakages of various TCP
> > stacks if a flow got SACK
> > at some point (in 3WHS) but suddenly becomes Reno.
>
> I agree that C and B are the easiest to implement without having to
> add complicated code to handle various negotiated TCP option
> scenerios.
>
> It does seem to be that some entities do A, or did I misread your
> behavioral analysis of various implementations Mathieu?
>
> Thanks.

Yes, another question about Mathieu cases is do determine the behavior
of all these stacks vs :
SACK option
TCP TS option.


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread David Miller
From: Eric Dumazet 
Date: Tue, 30 Jun 2020 13:39:27 -0700

> The (C) & (B) case are certainly doable.
> 
> A) case is more complex, I have no idea of breakages of various TCP
> stacks if a flow got SACK
> at some point (in 3WHS) but suddenly becomes Reno.

I agree that C and B are the easiest to implement without having to
add complicated code to handle various negotiated TCP option
scenerios.

It does seem to be that some entities do A, or did I misread your
behavioral analysis of various implementations Mathieu?

Thanks.


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 1:34 PM Mathieu Desnoyers
 wrote:
>
> - On Jun 30, 2020, at 3:52 PM, Linus Torvalds 
> torva...@linux-foundation.org wrote:
>
> > On Tue, Jun 30, 2020 at 12:43 PM Linus Torvalds
> >  wrote:
> >>
> [...]
> > So I think it's still wrong (clearly others do change passwords
> > outside of listening state), but considering that it apparently took
> > people two years to notice, at least some of the onus on figuring out
> > a better morel is on people who didn't even bother to test things in a
> > timely manner.
>
> I'm fully willing to work with Eric on finding a way forward with a
> fix which addresses the original issue Eric's patch was trying to
> fix while preserving ABI compatibility.
>
> The main thing we need to agree on at this stage is what is our goal. We
> can either choose to restore the original ABI behavior entirely, or only
> focus on what appears to be the most important use-cases.
>
> AFAIU, restoring full ABI compatibility would require to re-enable all
> the following scenarios:
>
> A) Transition of live socket from no key -> MD5 key.
> B) Transition of live socket from MD5 key -> no key.
> C) Transition of live socket from MD5 key to a different MD5 key.
>
> Scenario (C) appears to be the most important use-case, and probably the
> easiest to restore to its original behavior.
>
> AFAIU restoring scenarios A and B would require us to validate how
> much header space is needed by each SACK, TS and MD5 option enabled
> on the socket, and reject enabling any option that adds header space
> requirement exceeding the available space.
>
> I welcome advice on what should be the end goal here.
>

The (C) & (B) case are certainly doable.

A) case is more complex, I have no idea of breakages of various TCP
stacks if a flow got SACK
at some point (in 3WHS) but suddenly becomes Reno.


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Mathieu Desnoyers
- On Jun 30, 2020, at 4:30 PM, Eric Dumazet eduma...@google.com wrote:

> On Tue, Jun 30, 2020 at 1:21 PM David Miller  wrote:
>>
>> From: Linus Torvalds 
>> Date: Tue, 30 Jun 2020 12:43:21 -0700
>>
>> > If you're not willing to do the work to fix it, I will revert that
>> > commit.
>>
>> Please let me handle this situation instead of making threats, this
>> just got reported.
>>
>> Thank you.
>>
> 
> Also keep in mind the commit fixed a security issue, since we were
> sending on the wire
> garbage bytes from the kernel.
> 
> We can not simply revert it and hope for the best.
> 
> I find quite alarming vendors still use TCP MD5 "for security
> reasons", but none of them have contributed to it in linux kernel
> since 2018
> (Time of the 'buggy patch')

I'm helping a customer increase their contributions and feedback to upstream.
As we can see, they have accumulated some backlog over time.

Clearly reverting a security fix is not acceptable here. Coming up with a
proper ABI-compatible fix should not be out of our reach though.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Mathieu Desnoyers
- On Jun 30, 2020, at 3:52 PM, Linus Torvalds torva...@linux-foundation.org 
wrote:

> On Tue, Jun 30, 2020 at 12:43 PM Linus Torvalds
>  wrote:
>>
[...]
> So I think it's still wrong (clearly others do change passwords
> outside of listening state), but considering that it apparently took
> people two years to notice, at least some of the onus on figuring out
> a better morel is on people who didn't even bother to test things in a
> timely manner.

I'm fully willing to work with Eric on finding a way forward with a
fix which addresses the original issue Eric's patch was trying to
fix while preserving ABI compatibility.

The main thing we need to agree on at this stage is what is our goal. We
can either choose to restore the original ABI behavior entirely, or only
focus on what appears to be the most important use-cases.

AFAIU, restoring full ABI compatibility would require to re-enable all
the following scenarios:

A) Transition of live socket from no key -> MD5 key.
B) Transition of live socket from MD5 key -> no key.
C) Transition of live socket from MD5 key to a different MD5 key.

Scenario (C) appears to be the most important use-case, and probably the
easiest to restore to its original behavior.

AFAIU restoring scenarios A and B would require us to validate how
much header space is needed by each SACK, TS and MD5 option enabled
on the socket, and reject enabling any option that adds header space
requirement exceeding the available space.

I welcome advice on what should be the end goal here.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Eric Dumazet
On Tue, Jun 30, 2020 at 1:21 PM David Miller  wrote:
>
> From: Linus Torvalds 
> Date: Tue, 30 Jun 2020 12:43:21 -0700
>
> > If you're not willing to do the work to fix it, I will revert that
> > commit.
>
> Please let me handle this situation instead of making threats, this
> just got reported.
>
> Thank you.
>

Also keep in mind the commit fixed a security issue, since we were
sending on the wire
garbage bytes from the kernel.

We can not simply revert it and hope for the best.

I find quite alarming vendors still use TCP MD5 "for security
reasons", but none of them have contributed to it in linux kernel
since 2018
(Time of the 'buggy patch')


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread David Miller
From: Linus Torvalds 
Date: Tue, 30 Jun 2020 12:43:21 -0700

> If you're not willing to do the work to fix it, I will revert that
> commit.

Please let me handle this situation instead of making threats, this
just got reported.

Thank you.



Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Linus Torvalds
On Tue, Jun 30, 2020 at 12:43 PM Linus Torvalds
 wrote:
>
> If you're not willing to do the work to fix it, I will revert that
> commit.

Hmm. I only now noticed that that commit is over two years old.

So I think it's still wrong (clearly others do change passwords
outside of listening state), but considering that it apparently took
people two years to notice, at least some of the onus on figuring out
a better morel is on people who didn't even bother to test things in a
timely manner.

At some point "entreprise vendor kernels" or whatever who stay with
legacy kernels for a long time only have themselves to blame.

 Linus


Re: [regression] TCP_MD5SIG on established sockets

2020-06-30 Thread Linus Torvalds
On Mon, Jun 29, 2020 at 1:47 PM Eric Dumazet  wrote:
>
> If you want to be able to _change_ md5 key, this is fine by me, please
> send a patch.

Eric, if this change breaks existing users, then it gets reverted.
That's just fundamental.

No RFC's are in the lreast relevant when compared to "this broke
existing users".

If you're not willing to do the work to fix it, I will revert that
commit. Because that's how it works - you can't ask other people to
fix the breakage you introduced.

It really is that simple. We do not allow developers to break things
and then step away and say "not my problem".

 Linus


Re: [regression] TCP_MD5SIG on established sockets

2020-06-29 Thread Eric Dumazet
On Mon, Jun 29, 2020 at 12:43 PM Mathieu Desnoyers
 wrote:
>
> - On May 13, 2020, at 3:56 PM, Eric Dumazet eduma...@google.com wrote:
>
> > On Wed, May 13, 2020 at 12:49 PM Eric Dumazet  wrote:
> >>
> >>
> >> On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers
> >>  wrote:
> >> >
> >> > Hi,
> >> >
> >> > I am reporting a regression with respect to use of 
> >> > TCP_MD5SIG/TCP_MD5SIG_EXT
> >> > on established sockets. It is observed by a customer.
> >> >
> >> > This issue is introduced by this commit:
> >> >
> >> > commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on
> >> > established sockets"
> >> >
> >> > The intent of this commit appears to be to fix a use of uninitialized 
> >> > value in
> >> > tcp_parse_options(). The change introduced by this commit is to disallow 
> >> > setting
> >> > the TCP_MD5SIG{,_EXT} socket options on an established socket.
> >> >
> >> > The justification for this change appears in the commit message:
> >> >
> >> >"I believe this was caused by a TCP_MD5SIG being set on live
> >> > flow.
> >> >
> >> > This is highly unexpected, since TCP option space is limited.
> >> >
> >> > For instance, presence of TCP MD5 option automatically disables
> >> > TCP TimeStamp option at SYN/SYNACK time, which we can not do
> >> > once flow has been established.
> >> >
> >> > Really, adding/deleting an MD5 key only makes sense on sockets
> >> > in CLOSE or LISTEN state."
> >> >
> >> > However, reading through RFC2385 [1], this justification does not appear
> >> > correct. Quoting to the RFC:
> >> >
> >> >"This password never appears in the connection stream, and the actual
> >> > form of the password is up to the application. It could even change
> >> > during the lifetime of a particular connection so long as this change
> >> > was synchronized on both ends"
> >> >
> >> > The paragraph above clearly underlines that changing the MD5 signature of
> >> > a live TCP socket is allowed.
> >> >
> >> > I also do not understand why it would be invalid to transition an 
> >> > established
> >> > TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting 
> >> > the
> >> > RFC:
> >> >
> >> >   "The total header size is also an issue.  The TCP header specifies
> >> >where segment data starts with a 4-bit field which gives the total
> >> >size of the header (including options) in 32-byte words.  This means
> >> >that the total size of the header plus option must be less than or
> >> >equal to 60 bytes -- this leaves 40 bytes for options."
> >> >
> >> > The paragraph above seems to be the only indication that some TCP options
> >> > cannot be combined on a given TCP socket: if the resulting header size 
> >> > does
> >> > not fit. However, I do not see anything in the specification preventing 
> >> > any
> >> > of the following use-cases on an established TCP socket:
> >> >
> >> > - Transition from no-MD5 to MD5,
> >> > - Transition from MD5 to no-MD5,
> >> > - Changing the MD5 key associated with a socket.
> >> >
> >> > As long as the resulting combination of options does not exceed the 
> >> > available
> >> > header space.
> >> >
> >> > Can we please fix this KASAN report in a way that does not break 
> >> > user-space
> >> > applications expectations about Linux' implementation of RFC2385 ?
> [...]
> >> > [1] RFC2385: https://tools.ietf.org/html/rfc2385
> >>
> >>
> >> I do not think we want to transition sockets in the middle. since
> >> packets can be re-ordered in the network.
> >>
> >> MD5 is about security (and a loose form of it), so better make sure
> >> all packets have it from the beginning of the flow.
> >>
> >> A flow with TCP TS on can not suddenly be sending packets without TCP TS.
> >>
> >> Clearly, trying to support this operation is a can of worms, I do not
> >> want to maintain such atrocity.
> >>
> >> RFC can state whatever it wants, sometimes reality forces us to have
> >> sane operations.
> >>
> >> Thanks.
> >>
> > Also the RFC states :
> >
> > "This password never appears in the connection stream, and the actual
> >form of the password is up to the application. It could even change
> >during the lifetime of a particular connection so long as this change
> >was synchronized on both ends"
> >
> > It means the key can be changed, but this does not imply the option
> > can be turned on/off dynamically.
> >
>
> The change discussed previously (introduced by commit 721230326891 "tcp:
> md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets") breaks
> user-space ABI. As an example, the following BGP application uses
> setsockopt TCP_MD5SIG on a live TCP socket:
>
> https://github.com/IPInfusion/SDN-IP
>
> In addition to break user-space, it also breaks network protocol
> expectations for network equipment vendors implementing RFC2385.
> Considering that the goal of these protocols is interaction between
> different network equipment, breaking compatibility on that side
> 

Re: [regression] TCP_MD5SIG on established sockets

2020-06-29 Thread Mathieu Desnoyers
- On May 13, 2020, at 3:56 PM, Eric Dumazet eduma...@google.com wrote:

> On Wed, May 13, 2020 at 12:49 PM Eric Dumazet  wrote:
>>
>>
>> On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers
>>  wrote:
>> >
>> > Hi,
>> >
>> > I am reporting a regression with respect to use of 
>> > TCP_MD5SIG/TCP_MD5SIG_EXT
>> > on established sockets. It is observed by a customer.
>> >
>> > This issue is introduced by this commit:
>> >
>> > commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on
>> > established sockets"
>> >
>> > The intent of this commit appears to be to fix a use of uninitialized 
>> > value in
>> > tcp_parse_options(). The change introduced by this commit is to disallow 
>> > setting
>> > the TCP_MD5SIG{,_EXT} socket options on an established socket.
>> >
>> > The justification for this change appears in the commit message:
>> >
>> >"I believe this was caused by a TCP_MD5SIG being set on live
>> > flow.
>> >
>> > This is highly unexpected, since TCP option space is limited.
>> >
>> > For instance, presence of TCP MD5 option automatically disables
>> > TCP TimeStamp option at SYN/SYNACK time, which we can not do
>> > once flow has been established.
>> >
>> > Really, adding/deleting an MD5 key only makes sense on sockets
>> > in CLOSE or LISTEN state."
>> >
>> > However, reading through RFC2385 [1], this justification does not appear
>> > correct. Quoting to the RFC:
>> >
>> >"This password never appears in the connection stream, and the actual
>> > form of the password is up to the application. It could even change
>> > during the lifetime of a particular connection so long as this change
>> > was synchronized on both ends"
>> >
>> > The paragraph above clearly underlines that changing the MD5 signature of
>> > a live TCP socket is allowed.
>> >
>> > I also do not understand why it would be invalid to transition an 
>> > established
>> > TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting 
>> > the
>> > RFC:
>> >
>> >   "The total header size is also an issue.  The TCP header specifies
>> >where segment data starts with a 4-bit field which gives the total
>> >size of the header (including options) in 32-byte words.  This means
>> >that the total size of the header plus option must be less than or
>> >equal to 60 bytes -- this leaves 40 bytes for options."
>> >
>> > The paragraph above seems to be the only indication that some TCP options
>> > cannot be combined on a given TCP socket: if the resulting header size does
>> > not fit. However, I do not see anything in the specification preventing any
>> > of the following use-cases on an established TCP socket:
>> >
>> > - Transition from no-MD5 to MD5,
>> > - Transition from MD5 to no-MD5,
>> > - Changing the MD5 key associated with a socket.
>> >
>> > As long as the resulting combination of options does not exceed the 
>> > available
>> > header space.
>> >
>> > Can we please fix this KASAN report in a way that does not break user-space
>> > applications expectations about Linux' implementation of RFC2385 ?
[...]
>> > [1] RFC2385: https://tools.ietf.org/html/rfc2385
>>
>>
>> I do not think we want to transition sockets in the middle. since
>> packets can be re-ordered in the network.
>>
>> MD5 is about security (and a loose form of it), so better make sure
>> all packets have it from the beginning of the flow.
>>
>> A flow with TCP TS on can not suddenly be sending packets without TCP TS.
>>
>> Clearly, trying to support this operation is a can of worms, I do not
>> want to maintain such atrocity.
>>
>> RFC can state whatever it wants, sometimes reality forces us to have
>> sane operations.
>>
>> Thanks.
>>
> Also the RFC states :
> 
> "This password never appears in the connection stream, and the actual
>form of the password is up to the application. It could even change
>during the lifetime of a particular connection so long as this change
>was synchronized on both ends"
> 
> It means the key can be changed, but this does not imply the option
> can be turned on/off dynamically.
> 

The change discussed previously (introduced by commit 721230326891 "tcp:
md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets") breaks
user-space ABI. As an example, the following BGP application uses
setsockopt TCP_MD5SIG on a live TCP socket:

https://github.com/IPInfusion/SDN-IP

In addition to break user-space, it also breaks network protocol
expectations for network equipment vendors implementing RFC2385.
Considering that the goal of these protocols is interaction between
different network equipment, breaking compatibility on that side
is unexpected as well. Requiring to bring down/up the connection
just to change the TCP MD5 password is a no-go in networks with
high availability requirements. Changing the BGP authentication
password must be allowed without tearing down and re-establishing
the TCP sockets. Otherwise it doesn't scale for large