Re: [TLS] Identify and mitigate tracking and fingerprinting vectors - How to progress

2023-01-15 Thread Martin Thomson
On Fri, Jan 13, 2023, at 22:56, John Mattsson wrote:
> There are a lot of additional tracking and fingerprinting vectors in 
> the Client Hello and Server Hello.
> 
> - Tracking is also an issue for servers. IoT devices are often servers 
> and by tracking the device you can often track the person owning the 
> device.

This isn't an item in your list, but an expansion of the scope.

> - Ticket reuse is just one example of psk identifier reuse, all psk 
> identifier reuse has the same client and server tracking considerations.
> - Client reuse of key shares can be used to track the client. Server 
> reuse of key shares can be used to track the server or to reveal the 
> server name.


Just to be clearer about your implicit threat model, I assume that you are 
assuming that this is a scenario where the same client and server are 
establishing a connection, but that addressing and timing don't reveal any 
information about their identity.  In that context, repetition of identifiers 
from a previous connection might allow an observer to correlate the two 
sessions.

Tickets and key shares are easy: don't do that.  We already say that though, so 
nothing to do, except perhaps to point out where implementations ignore that 
advice.

> - SNI can be used to track a server and most SNI (except very common 
> ones) can be used to track a client.

This is what ECH is intended to address.

> - Non-common values of max_fragment_length, supported_groups, 
> signature_algorithms, application_layer_protocol_negotiation, etc. can 
> be used to track a client with high probability.
> - The set of extentions in CH or SH might be used to track client or 
> server with high probability. The fingerprinting vector does not need 
> to be globally unique. An attacker often looks in a specific location, 
> in a specific network, and at a specific time. Can also be correlated 
> with fingerprints at other layers.

This is a harder question.  Some of these are protected by ECH, some are not.  

(Threat model question: Are we now protecting against tracking by the other 
endpoint?)

FWIW, there seems to be another implicit assumption in this list that needs to 
be expanded.  The values that a client or server offers here is governed by 
some combination of the code it executes, the configuration of that endpoint, 
and the inputs it receives.

Differences in code is something I have personally given up on.  There are 
reasons that you might prefer to eliminate apparent differences between 
implementations, but this is likely somewhere between hard and impossible.  If 
you consider the fingerprinting risk profile as a product of the entire 
networking stack, you need to eliminate differences across the entire stack, 
from the hardware up to the application.

There are a finite number of implementations, so the entropy provided by 
implementation variations should be small.  Furthermore, synchronizing 
fingerprints with another implementation is possible only up until the point 
that both implementations remain the same.  It only takes a new feature 
addition in one implementation to undo this.

That doesn't mean that eliminating any gratuitous differences might not 
eventually help by reducing the number of discrete fingerprints.  That's 
specification work though.  For instance, we could all agree that padding is no 
longer needed, so that we can all remove that extension from ClientHello.  But 
consider the fingerprinting risk inherent in your choice of congestion 
controller.  Do you really expect people agree on all of those details such 
that you might eliminate any differences - and potential competitive advantage 
- between products?

Configuration and other live inputs tend to produce variation that is 
observable.  The best choice for configuration is to not allow it.  That's a 
general trend now, but see above regarding eliminating choice and control.

Live input is harder.  If the input is provided by a peer, then the attitude 
we've taking in browsers is to only apply any corresponding change when 
communicating with that peer; otherwise, you are exposed to tracking by that 
entity.  But that leads to leaking this change in peer identity toward the 
network.

The best reaction I have here is to say that for ClientHello, any change in 
configuration/behaviour should be in the part protected by ECH; in comparison, 
for ServerHello, don't; move it to EncryptedExtensions instead. If this is not 
possible, reconsider the feature.  The only things that can't be protected by 
ECH are those things that relate to key configuration, which are effectively a 
lower layer of the protocol. Those lower-layer functions require more care.

For instance, if we were to move key share configuration to DNS, so that 
clients can guess the right key share to provide, then that might work, but 
we'd need to be careful to ensure that the configuration is consistent on the 
same level as the ECH configuration.  Otherwise, it partitions the anonymity

[TLS] Identify and mitigate tracking and fingerprinting vectors - How to progress

2023-01-13 Thread John Mattsson
Hi,

The charter from march 2022 states that one of the most important goal for the 
group is to identify and mitigate tracking and fingerprinting vectors.

I think this is an excellent goal. I can however not see any discussion or work 
except for ECH. Reading RFC8446bis it does not say much when it comes to 
tracking, fingerprinting, and privacy. As far as I can see the only thing that 
is discussed is client tracking based on ticket reuse.

There are a lot of additional tracking and fingerprinting vectors in the Client 
Hello and Server Hello.

- Tracking is also an issue for servers. IoT devices are often servers and by 
tracking the device you can often track the person owning the device.
- Ticket reuse is just one example of psk identifier reuse, all psk identifier 
reuse has the same client and server tracking considerations.
- Client reuse of key shares can be used to track the client. Server reuse of 
key shares can be used to track the server or to reveal the server name.
- SNI can be used to track a server and most SNI (except very common ones) can 
be used to track a client.
- Non-common values of max_fragment_length, supported_groups, 
signature_algorithms, application_layer_protocol_negotiation, etc. can be used 
to track a client with high probability.
- The set of extentions in CH or SH might be used to track client or server 
with high probability. The fingerprinting vector does not need to be globally 
unique. An attacker often looks in a specific location, in a specific network, 
and at a specific time. Can also be correlated with fingerprints at other 
layers.

ECH helps a bit by encrypting CH on the whole or part of the path but does not 
encrypt SH. Is there any possibility to also encrypt SH with ECH?

How do we progress with this important goal for the group? I think RFC8446bis 
needs to be updated, but maybe an additional document would also be good? I 
would be willing to help with that.

Cheers,
John
___
TLS mailing list
TLS@ietf.org
https://www.ietf.org/mailman/listinfo/tls