On Sun, Mar 31, 2024 at 5:33 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > I'm on 2.2.2 (on Linux). But there's something wrong, because the > pg_combinebackup that took ~150s on xfs/btrfs, takes ~900s on ZFS. > > I'm not sure it's a ZFS config issue, though, because it's not CPU or > I/O bound, and I see this on both machines. And some simple dd tests > show the zpool can do 10x the throughput. Could this be due to the file > header / pool alignment?
Could ZFS recordsize > 8kB be making it worse, repeatedly dealing with the same 128kB record as you copy_file_range 16 x 8kB blocks? (Guessing you might be using the default recordsize?) > I admit I'm not very familiar with the format, but you're probably right > there's a header, and header_length does not seem to consider alignment. > make_incremental_rfile simply does this: > > /* Remember length of header. */ > rf->header_length = sizeof(magic) + sizeof(rf->num_blocks) + > sizeof(rf->truncation_block_length) + > sizeof(BlockNumber) * rf->num_blocks; > > and sendFile() does the same thing when creating incremental basebackup. > I guess it wouldn't be too difficult to make sure to align this to > BLCKSZ or something like this. I wonder if the file format is documented > somewhere ... It'd certainly be nicer to tweak before v18, if necessary. > > Anyway, is that really a problem? I mean, in my tests the CoW stuff > seemed to work quite fine - at least on the XFS/BTRFS. Although, maybe > that's why it took longer on XFS ... Yeah I'm not sure, I assume it did more allocating and copying because of that. It doesn't matter and it would be fine if a first version weren't as good as possible, and fine if we tune the format later once we know more, ie leaving improvements on the table. I just wanted to share the observation. I wouldn't be surprised if the block-at-a-time coding makes it slower and maybe makes the on disk data structures worse, but I dunno I'm just guessing. It's also interesting but not required to figure out how to tune ZFS well for this purpose right now...