Dan Bernstein wrote:

>We're talking about ML-KEM for TLS 1.3. Recall from RFC 8446 that TLS
1.3 mandates ECC. So, no, TLS implementors _can't_ remove ECC and just
have "standalone ML-KEM".

This is incorrect. RFC 8446 only mandates ECC in the absence of an application 
profile standard specifying otherwise.  Such application profiles are common, 
and TLS implementations often remove algorithms, especially in private network 
deployments. Constrained devices, for example, often support only one of 
secp256r1 or X25519. Likewise, CNSA 1.0 endpoints support only secp384r1, while 
CNSA 2.0 endpoints support only ML-KEM-1024. For TLS 1.2 (RFC 5246), the 3GPP 
TLS profile in TS 33.210 forbid support of SHA-1, non-PFS key exchange, and 
non-AEAD cipher suites. As a result, the "mandatory-to-implement" 
TLS_RSA_WITH_AES_128_CBC_SHA is prohibited to support for all three reasons. 
Contrary to what EKR wrote, I would argue that removing ECC through an 
application profile is fully compliant with RFC 8446.

>The code-size difference simply doesn't exist: ECC is there anyway.

This is incorrect. For constrained devices, code-size differences do exist, and 
based on past experience both the code size and the cycle count for 
X25519MLKEM768 versus ML-KEM-512 are likely to be significant in some systems. 
If (D)TLS is intended to remain viable on constrained devices, I believe the 
TLS WG should standardize standalone ML-KEM-512.

>The cycle-count difference is swamped by the PQ communication costs.

As Benjamin Kaduk has already noted, a quad-core 3 GHz Skylake with a 300 Mbps 
Internet connection is the opposite of constrained. While it is true that 
energy consumption would be dominated by communication, this does not make code 
size, CPU cycles, wattage, or memory usage irrelevant. Believing that CAPEX, 
OPEX, and real-time constraints for all constrained IoT systems can be reduced 
to a single number is somewhat naive. In truly constrained systems, every 
resource is a trade-off: additional code size, latency, or memory requirements 
are likely to prevent an algorithm from being used at all or cause it to be 
used less frequently.

>Furthermore, job #1 for the post-quantum rollout is to _try_ to deal
with the current security disaster of user data being exposed to future
quantum attacks.

I disagree — I would even go as far as saying that focusing solely on 
encryption is dangerous. I agree with CNSA 2.0 and the EU Roadmap that 
protecting long-lived devices is equally important. Compromising a device is 
typically a far higher-value target than decrypting a single decades-old 
connection: it gives an attacker access to more data, fresh data, enables 
active attacks, and eliminates the need to harvest and store traffic for 
decades. Migrating long-lived credentials in deployed devices affects a large 
part of the IETF, including LAMPS, SSHM, TLS, IPSEC, LAKE, and others.

https://media.defense.gov/2025/May/30/2003728741/-1/-1/0/CSA_CNSA_2.0_ALGORITHMS.PDF
https://ec.europa.eu/newsroom/dae/redirection/document/117507

Cheers,
John Preuß Mattsson


On 2025-11-26, 19:59, "D. J. Bernstein" <[email protected]> wrote:

John Mattsson writes:
> In these environments, standalone ML-KEM certainly reduces code size

We're talking about ML-KEM for TLS 1.3. Recall from RFC 8446 that TLS
1.3 mandates ECC. So, no, TLS implementors _can't_ remove ECC and just
have "standalone ML-KEM". Obviously the draft at hand doesn't change
this; i.e., you're crediting the draft with a savings that the draft
does not in fact achieve.

Furthermore, job #1 for the post-quantum rollout is to _try_ to deal
with the current security disaster of user data being exposed to future
quantum attacks. Given that X25519MLKEM768 has by far the biggest head
start on deployment, we want all TLS implementations supporting
X25519MLKEM768, to maximize the chance of successfully establishing
post-quantum connections---but of course that's contrary to any proposal
to allow ECC to be removed from TLS.

> You argued that "ECC+PQ has roughly the same performance properties as
> non-hybrid PQ," and I pointed out that this is incorrect when you
> consider cycle counts and code size.

The code-size difference simply doesn't exist: ECC is there anyway. The
cycle-count difference is swamped by the PQ communication costs.

You aren't arguing against the statement I made. You're arguing against
a strawman modification that replaces an analysis of system cost with
microbenchmarks that range from deceptive to irrelevant. This is
Benchmarking Crime B1; see https://arxiv.org/abs/1801.02381.

> ML-KEM-768 is more than twice as fast as X25519.

No, it's far more expensive than X25519. It's bottlenecked by the
communication costs for the 1184-byte key and the 1088-byte ciphertext.

To try to tilt this comparison towards ML-KEM-768, let's take a CPU
that's far out of date compared to the network connection. Concretely,
imagine a user at home with a poor little quad-core 3GHz Skylake from
2015 attached to a much newer 300Mbps Internet connection.

Even if ML-KEM-768 were taking _zero_ CPU time, it would still be
sending and receiving 8*(1184+1088) = 18176 bits overall, which can be
done only 16505 times per second before swamping the 300Mbps. (This is
imagining the 300Mbps being split about evenly between the upload and
download; in the more common situation of, say, 300Mbps download and
20Mbps upload, the upload is saturated at ~2000 operations per second.
Of course there are also framing costs etc.)

Meanwhile X25519 is sending and receiving 8*(32+32) = 512 bits, i.e.,
35x less than ML-KEM-768. This time the bottleneck is instead the CPU:
the user is paying 27780 cycles for keygen plus 83503 cycles for DH (see
https://bench.cr.yp.to/results-dh/amd64-samba.html), which can be done
26958 times per second per core, or 107833 times per second overall.

Does 1/107833 of a decade-old home CPU sound like a bigger cost than
1/16505 of a much newer network connection? (Also, isn't this cost
_obviously_ so close to 0 that we can just focus on security?)

The way https://cr.yp.to/papers.html#pppqefs puts communication and
computation on the same scale is by using dollar costs, for example
looking at the purchase price of a specific new 32-core machine (assumed
to die after 5 years) and of the electricity to run that machine. The
same numbers are used in https://blog.cr.yp.to/20240102-hybrid.html to
conclude that X25519 costs roughly 7% as much as ML-KEM-512. The main
reason the numbers are rough is variations in network costs: the paper
cites purchase costs ranging from 4x cheaper to 64x more expensive.

---D. J. Bernstein


===== NOTICES =====

This document may not be modified, and derivative works of it may not be
created, and it may not be published except as an Internet-Draft. (That
sentence is the official language from IETF's "Legend Instructions" for
the situation that "the Contributor does not wish to allow modifications
nor to allow publication as an RFC". I'm fine with redistribution of
copies of this document; the issue is with modification. Legend language
also appears in, e.g., RFC 5831. For further background on the relevant
IETF rules, see https://cr.yp.to/2025/20251024-rules.pdf.)


_______________________________________________
TLS mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to