Firstly, thanks for taking a look at this.  This is obviously a considerable 
amount of work and it is good to see people thinking about the way that the 
different pieces fit together.

This situation does not bother me very much.  As Dmitri observes, this might be 
down to different treatment of the draining period in implementations, but it's 
a little more complicated.  There are also differences you are likely to see in 
routing infrastructure as well.  I'll get to whether I think this needs 
documenting at the end.

The use of the client's destination connection ID is going to be a little weird 
in servers.  Neqo as a server does not use that value for very long; it's only 
used in combination with the remote address for as long as a handshake is live, 
and only as a backup for matching packets to connections where a 
server-provided connection ID isn't included.  This is approximately the same 
as picoquic and quicly, and it would surprise me if others differed.

That isn't the whole story though.  If the *routing* infrastructure does the 
same thing, then you are able mount the claimed attack by varying source 
address.  If the routing infrastructure only looks at the connection ID, then 
you won't reveal any information.  But either allows targeting of server 
instances, which is probably unwise, so I would be surprised to see that in 
more advanced infrastructure.  Address tuple-based routing, which might still 
be common, does offer some opportunities here, but that was already true as 
those systems can be exploited by manipulating source address.  

All in all, the question of how a load balancer directs new connections to 
server instances is highly relevant here.

It probably pays to see what an attacker gains.  Revealing what connection IDs 
are in use is something, but given the size of the space, that might not be 
especially valuable.  And exploiting this requires a routing infrastructure 
that is vulnerable to more interesting attacks, like resource exhaustion by 
targeting.

The covert channel is something we've already decided is not interesting.  The 
number of other covert channels is practically unbounded here, and the bit rate 
of the one you describe is far lower.

To the question of what might be said about this sort of thing, I don't see 
anything here for the main specifications.  We have already acknowledged that 
there is a need for operational guidance, particularly around the deployment of 
load balancers and routing infrastructure.  At this stage, we're still 
learning.  We have a couple of working group items that cover aspects of this.  
From the manageability document that aims to look at how deployments could be 
managed (from different perspectives), to the load balancers draft that 
specifically looks at the details routing mechanisms.

Given the state of what we know, I think that it would be premature to put any 
text in the main document, but this should certainly help inform our work on 
improving the state of those other documents.

Cheers,
Martin

On Fri, Nov 13, 2020, at 03:34, Kashyap Thimmaraju wrote:
> * *
> 
> *Dear QUIC WG,*
> 
> * 
> I’m Kashyap Thimmaraju, a postdoctoral researcher at Humboldt 
> University Berlin. Recently I discovered a couple of attacks with the 
> QUIC protocol. Lars Eggert recommended that I share my findings with 
> the WG to discuss the impact and relevance of these findings.
> 
> 
> Broadly speaking, the attacks deal with the Connection ID (CID), which 
> I’ve elaborated on below.
> 
> 
> ANALYSIS
> 
> The draft specifies and recommends several important points on secure 
> usage of the CID, e.g., i) the same CID MUST NOT be used multiple times 
> over the same connection to prevent an eavesdropper from correlating 
> the end-points; ii) end-points should associate a sequence number with 
> each CID to detect reuse within the same connection; iii) CIDs should 
> be (pseudo) randomly generated as they are also used for packet 
> security and; iv) CIDs can be very long (max. 20 bytes).
> 
> 
> The draft, however, does not specify how servers should handle the case 
> when successive incoming connections to a server use the same 
> destination CID.  Indeed, it can be seen that  based on the assumption 
> that the CIDs are (pseudo) randomly generated with a maximum length of 
> 20 bytes, the probability of such a collision is very low.
> 
> 
> Hence, I posit the following: If the server (implementation) does not 
> permit the use of the same destination CID across successive 
> connections (for a specific timeout), then an attacker can exploit this 
> behaviour in at least 2 ways:
> 
>  1. She can enumerate the number of server instances behind the domain 
> name/public IP address: if the handshake does not complete on the 
> second/successive attempt, then she infers that she has reached the 
> same instance; if the handshake succeeds then it is a new instance.
> 
>  2. Create a covert channel: we can assume that a sender and receiver 
> agree upon a set of CIDs to use at specific times. For each time 
> interval, the sender and receiver use the same destination CID. The 
> sender sends a 1 by connecting to the server and a 0 by not connecting. 
> The receiver always attempts to connect to the server. If the receiver 
> could not complete the handshake, it’s a 1, if the handshake completes 
> it’s a 0. Using this scheme a binary string can be covertly 
> communicated in one direction.
> 
> 
> EVALUATION
> 
> The primary goal of my evaluation is to identify implementations, if 
> any, that prevent two subsequent QUIC connections from using the same 
> destination CID. To conduct this experiment I adopted the following 
> methodology. First, I created a client that uses deterministic CIDs for 
> the source and destination CIDs (I modified LiteSpeed Technologies 
> open-source QUIC client). Second, I used 15 implementations from those 
> listed on the QUIC Working Group’s GitHub page1: either manually built 
> Docker containers or the public test server. The list of servers tested 
> are shown across the two rows in Table 1. To conduct the experiment, I 
> had the client open and hold a QUIC connection with source CID 1 and 
> destination CID 2 for a maximum of 10s with the server, and then close 
> it. We then idle for 10s and repeat the steps a second time. I saved 
> the debug logs of the client, packet traces and session keys. I then 
> repeated the process for each of the 15 implementations. Finally, to 
> obtain the results of the test, I searched the debug log of the client 
> or the packet trace to confirm whether the second QUIC connection 
> obtained a successful handshake or not. The absence of a successful 
> handshake indicates a vulnerable implementation.
> 
> 
> RESULTS
> 
> The results from my evaluation are shown in the table below. In 
> particular, I found 4 out of 15 implementations to be vulnerable to the 
> enumeration and covert channel attacks, namely, Apache Traffic Server 
> (ATS), Chromium, LiteSpeed and NGTCP2. In an evaluation of the 
> throughput using Lsquic clients and server on my desktop I measured a 
> throughput of ~30 bps.
> 
> 
> VULNERABLE : IMPLEMENTATION
> 
> NO : Akamai, F5, Quinn, Aioquic, Quiche, Proxygen, Neqo, Nginx, 
> Picoquic, Quant, Quicly 
> 
> YES : ATS, Chromium, Lsquic, Ngctcp2
> 
> *
> Sincerely,
> 
> -- 
> Dr.-Ing Kashyap Thimmaraju
> 
> Lehrstuhl für Technische Informatik
> Institut für Informatik
> Humboldt-Universität zu Berlin
> 
> Besucheranschrift:
> Rudower Chaussee 25, 12489 Berlin
> Haus 4, 3. OG
> 
> [email protected]
> 
> http://www.ti.informatik.hu-berlin.de

Reply via email to