Thx Luke, Bas. Resurrecting this old thread regarding web connection data sizes to share some more data I presented at a conference last week. You two know about this, but I thought it could benefit future group discussions.
Slides 14-19 in https://pkic.org/events/2025/pqc-conference-austin-us/THU_BREAKOUT_1130_Panos-Kampanakis_How-much-will-ML-DSA-affect-Webpage-Metrics.pdf#page=14 investigate some popular web page connection data sizes. The investigation showed that the pages I focused on pull down large amounts of data, but they include a bunch of slim connections delivering other content like tracking, ads, HTTP 304s (browser caching) or small elements. I believe this generally matches what you shared in your blog. There is a caveat that this investigation was on a small set of popular pages, so we can’t extrapolate that the represent the whole web. But if they do, then the performance of the conns transferring the “web content” won’t suffer as much. The small conns doing the other things will suffer. Will these small conns affect web metrics? Intuitively, probably not so much, but OK, without testing no one should be sure. The earlier slides of the preso include some results from popular pages and estimate the impact of ML-DSA on web user metrics like TTFB, FCP, LCP and Document Complete times. They show that the web metric suffers much less than the handshake mainly because web pages usually spend more time on doing other things like downloading and rendering large sums of data like html, css, javascript, images, json etc than on TLS handshakes. From: Luke Valenta <[email protected]> Sent: Tuesday, November 19, 2024 3:19 PM To: Kampanakis, Panos <[email protected]> Cc: Bas Westerbaan <[email protected]>; <[email protected]> <[email protected]>; [email protected] Subject: [EXTERNAL] [Pqc] Re: [TLS] Re: Bytes server -> client CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Panos, Here are some more details on what we see in connections to Cloudflare. To validate this theory, what would your data show if you queried for the % of conns that transfer <.5 or <1KB? If that is a lot, then there are many small conns that skew the median downwards. Or what if you run the query to exclude the very heavy conns and the very light (HTTP 301, 302 etc)? For example if you ran a report on the conns transferring 1KB<data<80th percentile KB, what would be the median for that? That would tell us if the too small and two big conns skew the median. For non-resumed QUIC connections with at least one request where we transfer (including TLS data) between 4kB and 80kB (the 10th and 80th percentiles of the distribution, respectively), the median bytes transferred is 6.5kB and average is 13.8kB. In other words, less than 10% of non-resumed QUIC connections with at least one request transfer less than 4kB, so it does not appear to be the case that a large number of small requests are skewing the median downwards. Ignoring the top 20% of connections in terms of bytes transferred shifts the average down significantly, which supports the idea that a relatively small number of large requests are skewing the average upwards. Let me know if I can clarify further! This is just what we see today, but it'll be great to see more measurements to see what the real impact is on end-users. Best, Luke On Thu, Nov 7, 2024 at 10:54 AM Kampanakis, Panos <[email protected]<mailto:[email protected]>> wrote: Hi Bas, That is interesting and surprising, thank you. I am mostly interested in the ~63% of non-resumed sessions that would be affected by 10-15KB of auth data. It looks like your data showed that each QUIC conn transfers about 4.7KB which is very surprising to me. It seems very low. In experiments I am getting here for top web servers, I see lots of conns which transfer hundreds of KB even over QUIC in cached browsers sessions. This aligns with the average KB from your blog is 551*0.6=~330KB, but not the median 4.7. Hundreds of KB also aligns with the p50 per page / conns per page in https://httparchive.org/reports/page-weight?lens=top1k&start=2024_05_01&end=latest&view=list . Of course browsers cache a lot of things like javascript, images etc, so they don’t transfer all resources which could explain the median. But still, based on anecdotal experience looking at top visited servers, I am noticing many small transfers and just a few that transfer larger HTML, css etc on every page even in cached browser sessions.. I am curious about the 4.7KB and the 15.8% of conns transferring <100KB in your blog. Like you say in your blog, if the 95th percentile includes very large transfers that would skew the diff between the median and the average. But I am wondering if there is another explanation. In my experiments I see a lot of 302 and 301 redirects which transfer minimal data. Some pages have a lot of those. If you have many of them, then your median will get skewed as it fills up with very small data transfers that basically don’t do anything. In essence, we could have 10 pages which transfer 100KB each for one of their resources and have another 9 that are HTTP Redirects or transfer 0.1KB. That would make us think that 90% of the 10 pages will be blazing fast, but the 100KB resource in each page will take a good amount of time in a slow network. To validate this theory, what would your data show if you queried for the % of conns that transfer <.5 or <1KB? If that is a lot, then there are many small conns that skew the median downwards. Or what if you run the query to exclude the very heavy conns and the very light (HTTP 301, 302 etc)? For example if you ran a report on the conns transferring 1KB<data<80th percentile KB, what would be the median for that? That would tell us if the too small and two big conns skew the median. Btw, I am curious also about > Chrome is more cautious and set 10% as their target for maximum TLS handshake > time regression. Is this public somewhere? There is no immediate link between TLS handshake and any of the Core Web Vitals Metrics or the CruX metrics other than the TTFB. Even for the TTFB, 10% in the handshake does not mean 10% TTFB; the TTFB is affected much less. I am wondering if we should start expecting the TLS handshake to slowly become a tracked web performance metric. From: Bas Westerbaan <[email protected]<mailto:[email protected]>> Sent: Thursday, November 7, 2024 9:07 AM To: <[email protected]<mailto:[email protected]>> <[email protected]<mailto:[email protected]>>; [email protected]<mailto:[email protected]> Subject: [EXTERNAL] [TLS] Bytes server -> client CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi all, Just wanted to highlight a blog post we just published. https://blog.cloudflare.com/another-look-at-pq-signatures/ At the end we share some statistics that may be of interest: On average, around 15 million TLS connections are established with Cloudflare per second. Upgrading each to ML-DSA, would take 1.8Tbps, which is 0.6% of our current total network capacity. No problem so far. The question is how these extra bytes affect performance. Back in 2021, we ran a large-scale experiment to measure the impact of big post-quantum certificate chains on connections to Cloudflare’s network over the open Internet. There were two important results. First, we saw a steep increase in the rate of client and middlebox failures when we added more than 10kB to existing certificate chains. Secondly, when adding less than 9kB, the slowdown in TLS handshake time would be approximately 15%. We felt the latter is workable, but far from ideal: such a slowdown is noticeable and people might hold off deploying post-quantum certificates before it’s too late. Chrome is more cautious and set 10% as their target for maximum TLS handshake time regression. They report that deploying post-quantum key agreement has already incurred a 4% slowdown in TLS handshake time, for the extra 1.1kB from server-to-client and 1.2kB from client-to-server. That slowdown is proportionally larger than the 15% we found for 9kB, but that could be explained by slower upload speeds than download speeds. There has been pushback against the focus on TLS handshake times. One argument is that session resumption alleviates the need for sending the certificates again. A second argument is that the data required to visit a typical website dwarfs the additional bytes for post-quantum certificates. One example is this 2024 publication, where Amazon researchers have simulated the impact of large post-quantum certificates on data-heavy TLS connections. They argue that typical connections transfer multiple requests and hundreds of kilobytes, and for those the TLS handshake slowdown disappears in the margin. Are session resumption and hundreds of kilobytes over a connection typical though? We’d like to share what we see. We focus on QUIC connections, which are likely initiated by browsers or browser-like clients. Of all QUIC connections with Cloudflare that carry at least one HTTP request, 37% are resumptions, meaning that key material from a previous TLS connection is reused, avoiding the need to transmit certificates. The median number of bytes transferred from server-to-client over a resumed QUIC connection is 4.4kB, while the average is 395kB. For non-resumptions the median is 7.8kB and average is 551kB. This vast difference between median and average indicates that a small fraction of data-heavy connections skew the average. In fact, only 15.8% of all QUIC connections transfer more than 100kB. The median certificate chain today (with compression) is 3.2kB. That means that almost 40% of all data transferred from server to client on more than half of the non-resumed QUIC connections are just for the certificates, and this only gets worse with post-quantum algorithms. For the majority of QUIC connections, using ML-DSA as a drop-in replacement for classical signatures would more than double the number of transmitted bytes over the lifetime of the connection. It sounds quite bad if the vast majority of data transferred for a typical connection is just for the post-quantum certificates. It’s still only a proxy for what is actually important: the effect on metrics relevant to the end-user, such as the browsing experience (e.g. largest contentful paint) and the amount of data those certificates take from a user’s monthly data cap. We will continue to investigate and get a better understanding of the impact. Best, Bas _______________________________________________ TLS mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]> -- Luke Valenta Systems Engineer - Research
_______________________________________________ TLS mailing list -- [email protected] To unsubscribe send an email to [email protected]
