Thx Luke, Bas.

Resurrecting this old thread regarding web connection data sizes to share some 
more data I presented at a conference last week. You two know about this, but I 
thought it could benefit future group discussions.

Slides 14-19 in 
https://pkic.org/events/2025/pqc-conference-austin-us/THU_BREAKOUT_1130_Panos-Kampanakis_How-much-will-ML-DSA-affect-Webpage-Metrics.pdf#page=14
 investigate some popular web page connection data sizes. The investigation 
showed that the pages I focused on pull down large amounts of data, but they 
include a bunch of slim connections delivering other content like tracking, 
ads, HTTP 304s (browser caching) or small elements. I believe this generally 
matches what you shared in your blog. There is a caveat that this investigation 
was on a small set of popular pages, so we can’t extrapolate that the represent 
the whole web. But if they do, then the performance of the conns transferring 
the “web content” won’t suffer as much. The small conns doing the other things 
will suffer. Will these small conns affect web metrics? Intuitively, probably 
not so much, but OK, without testing no one should be sure.

The earlier slides of the preso include some results from popular pages and 
estimate the impact of ML-DSA on web user metrics like TTFB, FCP, LCP and 
Document Complete times. They show that the web metric suffers much less than 
the handshake mainly because web pages usually spend more time on doing other 
things like downloading and rendering large sums of data like html, css, 
javascript, images, json etc than on TLS handshakes.



From: Luke Valenta <[email protected]>
Sent: Tuesday, November 19, 2024 3:19 PM
To: Kampanakis, Panos <[email protected]>
Cc: Bas Westerbaan <[email protected]>; <[email protected]> 
<[email protected]>; [email protected]
Subject: [EXTERNAL] [Pqc] Re: [TLS] Re: Bytes server -> client


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Panos,

Here are some more details on what we see in connections to Cloudflare.

To validate this theory, what would your data show if you queried for the % of 
conns that transfer <.5 or <1KB? If that is a lot, then there are many small 
conns that skew the median downwards. Or what if you run the query to exclude 
the very heavy conns and the very light (HTTP 301, 302 etc)? For example if you 
ran a report on the conns transferring 1KB<data<80th percentile KB, what would 
be the median for that? That would tell us if the too small and two big conns 
skew the median.

For non-resumed QUIC connections with at least one request where we transfer 
(including TLS data) between 4kB and 80kB (the 10th and 80th percentiles of the 
distribution, respectively), the median bytes transferred is 6.5kB and average 
is 13.8kB. In other words, less than 10% of non-resumed QUIC connections with 
at least one request transfer less than 4kB, so it does not appear to be the 
case that a large number of small requests are skewing the median downwards. 
Ignoring the top 20% of connections in terms of bytes transferred shifts the 
average down significantly, which supports the idea that a relatively small 
number of large requests are skewing the average upwards.

Let me know if I can clarify further! This is just what we see today, but it'll 
be great to see more measurements to see what the real impact is on end-users.

Best,
Luke

On Thu, Nov 7, 2024 at 10:54 AM Kampanakis, Panos 
<[email protected]<mailto:[email protected]>> wrote:
Hi Bas,

That is interesting and surprising, thank you.

I am mostly interested in the ~63% of non-resumed sessions that would be 
affected by 10-15KB of auth data. It looks like your data showed that each QUIC 
conn transfers about 4.7KB which is very surprising to me. It seems very low.

In experiments I am getting here for top web servers, I see lots of conns which 
transfer hundreds of KB even over QUIC in cached browsers sessions. This aligns 
with the average KB from your blog is 551*0.6=~330KB, but not the median 4.7. 
Hundreds of KB also aligns with the p50 per page / conns per page in 
https://httparchive.org/reports/page-weight?lens=top1k&start=2024_05_01&end=latest&view=list
 . Of course browsers cache a lot of things like javascript, images etc, so 
they don’t transfer all resources which could explain the median. But still, 
based on anecdotal experience looking at top visited servers, I am noticing 
many small transfers and just a few that transfer larger HTML, css etc on every 
page even in cached browser sessions..

I am curious about the 4.7KB and the 15.8% of conns transferring <100KB in your 
blog. Like you say in your blog, if the 95th percentile includes very large 
transfers that would skew the diff between the median and the average. But I am 
wondering if there is another explanation. In my experiments I see a lot of 302 
and 301 redirects which transfer minimal data. Some pages have a lot of those. 
If you have many of them, then your median will get skewed as it fills up with 
very small data transfers that basically don’t do anything. In essence, we 
could have 10 pages which transfer 100KB each for one of their resources and 
have another 9 that are HTTP Redirects or transfer 0.1KB. That would make us 
think that 90% of the 10 pages will be blazing fast, but the 100KB resource in 
each page will take a good amount of time in a slow network.

To validate this theory, what would your data show if you queried for the % of 
conns that transfer <.5 or <1KB? If that is a lot, then there are many small 
conns that skew the median downwards. Or what if you run the query to exclude 
the very heavy conns and the very light (HTTP 301, 302 etc)? For example if you 
ran a report on the conns transferring 1KB<data<80th percentile KB, what would 
be the median for that? That would tell us if the too small and two big conns 
skew the median.

Btw, I am curious also about
> Chrome is more cautious and set 10% as their target for maximum TLS handshake 
> time regression.
Is this public somewhere? There is no immediate link between TLS handshake and 
any of the Core Web Vitals Metrics or the CruX metrics other than the TTFB. 
Even for the TTFB, 10% in the handshake does not mean 10% TTFB; the TTFB is 
affected much less. I am wondering if we should start expecting the TLS 
handshake to slowly become a tracked web performance metric.


From: Bas Westerbaan 
<[email protected]<mailto:[email protected]>>
Sent: Thursday, November 7, 2024 9:07 AM
To: <[email protected]<mailto:[email protected]>> <[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: [EXTERNAL] [TLS] Bytes server -> client


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi all,

Just wanted to highlight a blog post we just published. 
https://blog.cloudflare.com/another-look-at-pq-signatures/  At the end we share 
some statistics that may be of interest:

On average, around 15 million TLS connections are established with Cloudflare 
per second. Upgrading each to ML-DSA, would take 1.8Tbps, which is 0.6% of our 
current total network capacity. No problem so far. The question is how these 
extra bytes affect performance.
Back in 2021, we ran a large-scale experiment to measure the impact of big 
post-quantum certificate chains on connections to Cloudflare’s network over the 
open Internet. There were two important results. First, we saw a steep increase 
in the rate of client and middlebox failures when we added more than 10kB to 
existing certificate chains. Secondly, when adding less than 9kB, the slowdown 
in TLS handshake time would be approximately 15%. We felt the latter is 
workable, but far from ideal: such a slowdown is noticeable and people might 
hold off deploying post-quantum certificates before it’s too late.

Chrome is more cautious and set 10% as their target for maximum TLS handshake 
time regression. They report that deploying post-quantum key agreement has 
already incurred a 4% slowdown in TLS handshake time, for the extra 1.1kB from 
server-to-client and 1.2kB from client-to-server. That slowdown is 
proportionally larger than the 15% we found for 9kB, but that could be 
explained by slower upload speeds than download speeds.

There has been pushback against the focus on TLS handshake times. One argument 
is that session resumption alleviates the need for sending the certificates 
again. A second argument is that the data required to visit a typical website 
dwarfs the additional bytes for post-quantum certificates. One example is this 
2024 publication, where Amazon researchers have simulated the impact of large 
post-quantum certificates on data-heavy TLS connections. They argue that 
typical connections transfer multiple requests and hundreds of kilobytes, and 
for those the TLS handshake slowdown disappears in the margin.

Are session resumption and hundreds of kilobytes over a connection typical 
though? We’d like to share what we see. We focus on QUIC connections, which are 
likely initiated by browsers or browser-like clients. Of all QUIC connections 
with Cloudflare that carry at least one HTTP request, 37% are resumptions, 
meaning that key material from a previous TLS connection is reused, avoiding 
the need to transmit certificates. The median number of bytes transferred from 
server-to-client over a resumed QUIC connection is 4.4kB, while the average is 
395kB. For non-resumptions the median is 7.8kB and average is 551kB. This vast 
difference between median and average indicates that a small fraction of 
data-heavy connections skew the average. In fact, only 15.8% of all QUIC 
connections transfer more than 100kB.

The median certificate chain today (with compression) is 3.2kB. That means that 
almost 40% of all data transferred from server to client on more than half of 
the non-resumed QUIC connections are just for the certificates, and this only 
gets worse with post-quantum algorithms. For the majority of QUIC connections, 
using ML-DSA as a drop-in replacement for classical signatures would more than 
double the number of transmitted bytes over the lifetime of the connection.

It sounds quite bad if the vast majority of data transferred for a typical 
connection is just for the post-quantum certificates. It’s still only a proxy 
for what is actually important: the effect on metrics relevant to the end-user, 
such as the browsing experience (e.g. largest contentful paint) and the amount 
of data those certificates take from a user’s monthly data cap. We will 
continue to investigate and get a better understanding of the impact.

Best,

 Bas
_______________________________________________
TLS mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to [email protected]<mailto:[email protected]>


--
Luke Valenta
Systems Engineer - Research
_______________________________________________
TLS mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to