Hi QUIC-WG,

Thanks for recognizing my work and effort.

I was not part of the mailing list (but I am now :)) when you replied
so I've had to manually copy your responses. Below I've included the
person's name and replied to their comments/remarks.

> Dmitri Tikhonov
> I ran some experiments myself and I realized that my original guess was
> incorrect.  lsquic server retires the Initial DCID a short time after
> handshake succeeds.  When a CID is retired, all incoming packets bearing
> it are dropped.
>
> lsquic keeps CIDs retired for at least 30 seconds and at most forever
> (old entries are purged opportunistically when new entries are added.)

Good to know. We did some experiments and it seemed to persist for over a day.
However, we did not try to connect with other CIDs so that could explain why
they remained for so long.

> I looked for guidance in the Transport Draft for how long a CID is to
> stay retired (both Initial DCID and those retired via the
> RETIRE_CONNECTION_ID frame), but found none.

So for lsquic, the problem is about a CID being stuck in the retired state?

> Christian Huitema
> The description of the experiment does not say whether the successive
> connection attempts used the same IP address. For Picoquic at least
> that's important, because Picoquic retrieves the handshake context using
> the combination of Initial DCID and client IP + port. Multiple
> connection attempts using the same Initial DCID and different IP
> addresses will be treated as independent connections. This was one of
> the suggestion in
> https://tools.ietf.org/html/draft-kazuho-quic-authenticated-handshake-01.

The same IP address was used for the successive connection. However, the source
port was different.

> Kazuho Oku
> Quicly adopts the same approach. Applications of quicly are expected to
> supply their own CID generation scheme, and therefore quicly does not know
> if there's enough entropy in the CID to avoid collision between
> server-supplied CIDs and the original DCID being generated by the client.
>
> Therefore, during the handshake, quicly uses `4-tuple && (client-generated
> DCID || server-generated CID)` as the packet routing scheme.
>
> That's how we avoid the problem raised by Kashyap.

If the client-generated DCID is used, why did I not see the problem with quicly?
Ah, because the source port in the 2nd attempt is different from the first.

> Martin Thomson
> Firstly, thanks for taking a look at this.  This is obviously a considerable > amount of work and it is good to see people thinking about the way that the
> different pieces fit together.

You're welcome.

> That isn't the whole story though.  If the *routing* infrastructure does the
> same thing, then you are able mount the claimed attack by varying source
> address.  If the routing infrastructure only looks at the connection ID, then
> you won't reveal any information.  But either allows targeting of server
> instances, which is probably unwise, so I would be surprised to see that in > more advanced infrastructure.  Address tuple-based routing, which might still > be common, does offer some opportunities here, but that was already true as
> those systems can be exploited by manipulating source address.

The finer granularity in the routing infrastructure does influence the
feasibility of the attack at a first glance. However, if CID routing is used, I
don't think the attack is mitigated. AFAIU, CID routing will be based on the
server generated CID. Hence, if the behavior described exists, and assuming the load balancer uses some kind of hashing to spray requests across the instances,
the attacker could eventually enumerate the instances.

> All in all, the question of how a load balancer directs new connections to
> server instances is highly relevant here.
Exactly.

> It probably pays to see what an attacker gains.  Revealing what connection IDs > are in use is something, but given the size of the space, that might not be > especially valuable.  And exploiting this requires a routing infrastructure > that is vulnerable to more interesting attacks, like resource exhaustion by
> targeting.

I think the point is not whether the CID is revealed. It is that an attacker
can build on this to eventually count the number of reachable instances. I did
not include that part of my research here as we are still working on it.
However, I do have a sketch of how it could be done below.

Enumeration Algorithm. We repeatedly issue connection requests with a
sequentially increasing source port, the CIDs however remain the same for each request. We then count the number C of successful connections established. If we do not receive a response from the server, we have reached an instance that was
already counted. If RR is being used, then it could be that other clients’
requests interleaved ours, hence, resulting in our request reaching a previously seen instance again instead of an uncounted one. If hashing is being used, then it could be that our 4-tuple values (source IP and port, and destination IP and
port) hashed to the same value, and hence the same instance. Therefore, we
continue to issue further requests until we do not establish any new connection with the server after a threshold of max requests attempts. After which C will
be the number of server instances.

> The covert channel is something we've already decided is not interesting.  The > number of other covert channels is practically unbounded here, and the bit rate
> of the one you describe is far lower.

Okay. I agree the throughput is not the highest, and I have not tested it over
the WAN. Nevertheless, I see it being used in a positive way, e.g., to
circumvent censorship. However, it could also be used as some kind of rendezvous
protocol, e.g., for bots to signal to a C&C server or prepare to mount an
attack. The root cause is the behavior I described in my first email.

> Jana Iyengar
> Thank you, Kashyap, for doing this work and for bringing it to the
> working group.

You're welcome.

> I agree with Martin's assessment, specifically that the interesting exploit
> on enumerating the number of servers, is very dependent on how load
> balancing is done. A relevant point here is that the attacker can only
> control the DCID in the first flight of Initial packets, and if the server > treats Initial packets differently than it does subsequent packets, that is
> yet another way in which the surface of this exploit gets limited.

Thank you for appreciating my work and discussing the matter so openly.  I am
however not fully convinced by the argument made by Martin. True, there are
mentions of operational and management guidance as well as the quic-lb draft. However, I believe, the key observation I made has been overlooked, which is the unspecified behavior in the spec. when it comes to dealing with the same DCID across successive connections. As my tests have shown, there is a difference in
the way implementations handle such a scenario which makes the enumeration
attack feasible on at least 4 different implementations. Also, the point raised
by Dimitri may also need some attention, i.e., wrt the CID retire timeout.

Sincerely,

--
Dr.-Ing Kashyap Thimmaraju

Lehrstuhl für Technische Informatik
Institut für Informatik
Humboldt-Universität zu Berlin

Besucheranschrift:
Rudower Chaussee 25, 12489 Berlin
Haus 4, 3. OG

[email protected]

http://www.ti.informatik.hu-berlin.de

Reply via email to