01.11.2019 14:12, Max Reitz wrote: > On 01.11.19 11:28, Vladimir Sementsov-Ogievskiy wrote: >> 01.11.2019 13:20, Max Reitz wrote: >>> On 01.11.19 11:00, Max Reitz wrote: >>>> Hi, >>>> >>>> This series builds on the previous RFC. The workaround is now applied >>>> unconditionally of AIO mode and filesystem because we don’t know those >>>> things for remote filesystems. Furthermore, bdrv_co_get_self_request() >>>> has been moved to block/io.c. >>>> >>>> Applying the workaround unconditionally is fine from a performance >>>> standpoint, because it should actually be dead code, thanks to patch 1 >>>> (the elephant in the room). As far as I know, there is no other block >>>> driver but qcow2 in handle_alloc_space() that would submit zero writes >>>> as part of normal I/O so it can occur concurrently to other write >>>> requests. It still makes sense to take the workaround for file-posix >>>> because we can’t really prevent that any other block driver will submit >>>> zero writes as part of normal I/O in the future. >>>> >>>> Anyway, let’s get to the elephant. >>>> >>>> From input by XFS developers >>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1765547#c7) it seems clear >>>> that c8bb23cbdbe causes fundamental performance problems on XFS with >>>> aio=native that cannot be fixed. In other cases, c8bb23cbdbe improves >>>> performance or we wouldn’t have it. >>>> >>>> In general, avoiding performance regressions is more important than >>>> improving performance, unless the regressions are just a minor corner >>>> case or insignificant when compared to the improvement. The XFS >>>> regression is no minor corner case, and it isn’t insignificant. Laurent >>>> Vivier has found performance to decrease by as much as 88 % (on ppc64le, >>>> fio in a guest with 4k blocks, iodepth=8: 1662 kB/s from 13.9 MB/s). >>> >>> Ah, crap. >>> >>> I wanted to send this series as early today as possible to get as much >>> feedback as possible, so I’ve only started doing benchmarks now. >>> >>> The obvious >>> >>> $ qemu-img bench -t none -n -w -S 65536 test.qcow2 >>> >>> on XFS takes like 6 seconds on master, and like 50 to 80 seconds with >>> c8bb23cbdbe reverted. So now on to guest tests... >> >> Aha, that's very interesting) What about aio-native which should be slowed >> down? >> Could it be tested like this? > > That is aio=native (-n). > > But so far I don’t see any significant difference in guest tests (i.e., > fio --rw=write --bs=4k --iodepth=8 --runtime=1m --direct=1 > --ioengine=libaio --thread --numjobs=16 --size=2G --time_based), neither > with 64 kB nor with 2 MB clusters. (But only on XFS, I’ll have to see > about ext4 still.)
hmm, this possibly mostly tests writes to already allocated clusters. Has fio an option to behave like qemu-img bench with -S 65536, i.e. write once into each cluster? > > (Reverting c8bb23cbdbe makes it like 1 to 2 % faster.) > > Max > -- Best regards, Vladimir
