On 01.11.19 11:28, Vladimir Sementsov-Ogievskiy wrote:
> 01.11.2019 13:20, Max Reitz wrote:
>> On 01.11.19 11:00, Max Reitz wrote:
>>> Hi,
>>>
>>> This series builds on the previous RFC.  The workaround is now applied
>>> unconditionally of AIO mode and filesystem because we don’t know those
>>> things for remote filesystems.  Furthermore, bdrv_co_get_self_request()
>>> has been moved to block/io.c.
>>>
>>> Applying the workaround unconditionally is fine from a performance
>>> standpoint, because it should actually be dead code, thanks to patch 1
>>> (the elephant in the room).  As far as I know, there is no other block
>>> driver but qcow2 in handle_alloc_space() that would submit zero writes
>>> as part of normal I/O so it can occur concurrently to other write
>>> requests.  It still makes sense to take the workaround for file-posix
>>> because we can’t really prevent that any other block driver will submit
>>> zero writes as part of normal I/O in the future.
>>>
>>> Anyway, let’s get to the elephant.
>>>
>>>  From input by XFS developers
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1765547#c7) it seems clear
>>> that c8bb23cbdbe causes fundamental performance problems on XFS with
>>> aio=native that cannot be fixed.  In other cases, c8bb23cbdbe improves
>>> performance or we wouldn’t have it.
>>>
>>> In general, avoiding performance regressions is more important than
>>> improving performance, unless the regressions are just a minor corner
>>> case or insignificant when compared to the improvement.  The XFS
>>> regression is no minor corner case, and it isn’t insignificant.  Laurent
>>> Vivier has found performance to decrease by as much as 88 % (on ppc64le,
>>> fio in a guest with 4k blocks, iodepth=8: 1662 kB/s from 13.9 MB/s).
>>
>> Ah, crap.
>>
>> I wanted to send this series as early today as possible to get as much
>> feedback as possible, so I’ve only started doing benchmarks now.
>>
>> The obvious
>>
>> $ qemu-img bench -t none -n -w -S 65536 test.qcow2
>>
>> on XFS takes like 6 seconds on master, and like 50 to 80 seconds with
>> c8bb23cbdbe reverted.  So now on to guest tests...
> 
> Aha, that's very interesting) What about aio-native which should be slowed 
> down?
> Could it be tested like this?

That is aio=native (-n).

But so far I don’t see any significant difference in guest tests (i.e.,
fio --rw=write --bs=4k --iodepth=8 --runtime=1m --direct=1
--ioengine=libaio --thread --numjobs=16 --size=2G --time_based), neither
with 64 kB nor with 2 MB clusters.  (But only on XFS, I’ll have to see
about ext4 still.)

(Reverting c8bb23cbdbe makes it like 1 to 2 % faster.)

Max

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to