On Wed, Feb 12, 2025 at 1:03 AM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2025-02-11 13:12:17 +1300, Thomas Munro wrote: > > Tomas queried[1] the limit of 256kB (or really 32 blocks) for > > io_combine_limit. Yeah, I think we should increase it and allow > > experimentation with larger numbers. Note that real hardware and > > protocols have segment and size limits that can force the kernel to > > split your I/Os, so it's not at all a given that it'll help much or at > > all to use very large sizes, but YMMV.
+0.02 to the initiative, I've been always wondering why the IOs were so capped, I know :) > FWIW, I see substantial performance *regressions* with *big* IO sizes using > fio. Just looking at cached buffered IO. > > for s in 4 8 16 32 64 128 256 512 1024 2048 4096 8192;do echo -ne "$s\t\t"; > numactl --physcpubind 3 fio --directory /srv/dev/fio/ --size=32GiB > --overwrite 1 --time_based=0 --runtime=10 --name test --rw read --buffered 0 > --ioengine psync --buffered 1 --invalidate 0 --output-format json > --bs=$((1024*${s})) |jq '.jobs[] | .read.bw_mean';done > > io size kB throughput in MB/s [..] > 256 16864 > 512 19114 > 1024 12874 [..] > It's worth noting that if I boot with mitigations=off clearcpuid=smap I get > *vastly* better performance: > > io size kB throughput in MB/s [..] > 128 23133 > 256 23317 > 512 25829 > 1024 15912 [..] > Most of the gain isn't due to mitigations=off but clearcpuid=smap. Apparently > SMAP, which requires explicit code to allow kernel space to access userspace > memory, to make exploitation harder, reacts badly to copying lots of memory. > > This seems absolutely bonkers to me. There are two bizarre things there, +35% perf boost just like that due to security drama, and that io_size=512kb being so special to give a 10-13% boost in Your case? Any ideas, why? I've got on that Lsv2 individual MS nvme under Hyper-V, on ext4, which seems to be much more real world and average Joe situation, and it is much slower, but it is not showing advantage for blocksize beyond let's say 128: io size kB throughput in MB/s 4 1070 8 1117 16 1231 32 1264 64 1249 128 1313 256 1323 512 1257 1024 1216 2048 1271 4096 1304 8192 1214 top hitter on of course stuff like clear_page_rep [k] and rep_movs_alternative [k] (that was with mitigations=on). -J.