Hi, On 2023-11-21 14:48:59 -0400, David Steele wrote: > > I'd not call 7.06->4.77 or 6.76->4.77 "virtually free". > > OK, but how does that look with compression
With compression it's obviously somewhat different - but that part is done in parallel, potentially on a different machine with client side compression, whereas I think right now the checksumming is single-threaded, on the server side. With parallel server side compression, it's still 20% slower with the default checksumming than none. With client side it's 15%. > -- to a remote location? I think this one unfortunately makes checksums a bigger issue, not a smaller one. The network interaction piece is single-threaded, adding another significant use of CPU onto the same thread means that you are hit harder by using substantial amount of CPU for checksumming in the same thread. Once you go beyond the small instances, you have plenty network bandwidth in cloud environments. We top out well before the network on bigger instances. > Uncompressed backup to local storage doesn't seem very realistic. With gzip > compression we measure SHA1 checksums at about 5% of total CPU. IMO using gzip is basically infeasible for non-toy sized databases today. I think we're using our users a disservice by defaulting to it in a bunch of places. Even if another default exposes them to difficulty due to potentially using a different compiled binary with fewer supported compression methods - that's gona be very rare in practice. > I can't understate how valuable checksums are in finding corruption, > especially in long-lived backups. I agree! But I think we need faster checksum algorithms or a faster implementation of the existing ones. And probably default to something faster once we have it. Greetings, Andres Freund