Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS

Max Reitz Fri, 01 Nov 2019 06:31:33 -0700

On 01.11.19 13:34, Max Reitz wrote:
> On 01.11.19 12:20, Max Reitz wrote:
>> On 01.11.19 12:16, Vladimir Sementsov-Ogievskiy wrote:
>>> 01.11.2019 14:12, Max Reitz wrote:
>>>> On 01.11.19 11:28, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 01.11.2019 13:20, Max Reitz wrote:
>>>>>> On 01.11.19 11:00, Max Reitz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> This series builds on the previous RFC.  The workaround is now applied
>>>>>>> unconditionally of AIO mode and filesystem because we don’t know those
>>>>>>> things for remote filesystems.  Furthermore, bdrv_co_get_self_request()
>>>>>>> has been moved to block/io.c.
>>>>>>>
>>>>>>> Applying the workaround unconditionally is fine from a performance
>>>>>>> standpoint, because it should actually be dead code, thanks to patch 1
>>>>>>> (the elephant in the room).  As far as I know, there is no other block
>>>>>>> driver but qcow2 in handle_alloc_space() that would submit zero writes
>>>>>>> as part of normal I/O so it can occur concurrently to other write
>>>>>>> requests.  It still makes sense to take the workaround for file-posix
>>>>>>> because we can’t really prevent that any other block driver will submit
>>>>>>> zero writes as part of normal I/O in the future.
>>>>>>>
>>>>>>> Anyway, let’s get to the elephant.
>>>>>>>
>>>>>>>   From input by XFS developers
>>>>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1765547#c7) it seems clear
>>>>>>> that c8bb23cbdbe causes fundamental performance problems on XFS with
>>>>>>> aio=native that cannot be fixed.  In other cases, c8bb23cbdbe improves
>>>>>>> performance or we wouldn’t have it.
>>>>>>>
>>>>>>> In general, avoiding performance regressions is more important than
>>>>>>> improving performance, unless the regressions are just a minor corner
>>>>>>> case or insignificant when compared to the improvement.  The XFS
>>>>>>> regression is no minor corner case, and it isn’t insignificant.  Laurent
>>>>>>> Vivier has found performance to decrease by as much as 88 % (on ppc64le,
>>>>>>> fio in a guest with 4k blocks, iodepth=8: 1662 kB/s from 13.9 MB/s).
>>>>>>
>>>>>> Ah, crap.
>>>>>>
>>>>>> I wanted to send this series as early today as possible to get as much
>>>>>> feedback as possible, so I’ve only started doing benchmarks now.
>>>>>>
>>>>>> The obvious
>>>>>>
>>>>>> $ qemu-img bench -t none -n -w -S 65536 test.qcow2
>>>>>>
>>>>>> on XFS takes like 6 seconds on master, and like 50 to 80 seconds with
>>>>>> c8bb23cbdbe reverted.  So now on to guest tests...
>>>>>
>>>>> Aha, that's very interesting) What about aio-native which should be 
>>>>> slowed down?
>>>>> Could it be tested like this?
>>>>
>>>> That is aio=native (-n).
>>>>
>>>> But so far I don’t see any significant difference in guest tests (i.e.,
>>>> fio --rw=write --bs=4k --iodepth=8 --runtime=1m --direct=1
>>>> --ioengine=libaio --thread --numjobs=16 --size=2G --time_based), neither
>>>> with 64 kB nor with 2 MB clusters.  (But only on XFS, I’ll have to see
>>>> about ext4 still.)
>>>
>>> hmm, this possibly mostly tests writes to already allocated clusters. Has 
>>> fio
>>> an option to behave like qemu-img bench with -S 65536, i.e. write once into
>>> each cluster?
>>
>> Maybe, but is that a realistic depiction of whether this change is worth
>> it?  That is why I’m doing the guest test, to see whether it actually
>> has much impact on the guest.
> 
> I’ve changed the above fio invocation to use --rw=randwrite and added
> --fallocate=none.  The performance went down, but it went down both with
> and without c8bb23cbdbe.
> 
> So on my XFS system (XFS on luks on SSD), I see:
> - with c8bb23cbdbe: 26.0 - 27.9 MB/s
> - without c8bb23cbdbe: 25.6 - 27 MB/s
> 
> On my ext4 system (native on SSD), I see:
> - with: 39.4 - 41.5 MB/s
> - without: 39.4 - 42.0 MB/s
> 
> So basically no difference for XFS, and really no difference for ext4.
> (I ran these tests with 2 MB clusters.)


For 64 kB clusters, I have on XFS:
- with: 30.3 - 31.3 MB/s
- without: 27.4 - 27.9 MB/s

On ext4:
- with: 34.9 - 36.4 MB/s
- without: 33.8 - 38 MB/s

For good measure, 2 MB clusters with aio=threads (but still O_DIRECT) on
XFS:
- with: 25.7 MB/s - 27.0 MB/s
- without: 24.6 MB/s - 26.7 MB/s

On ext4:
- with: 16.7 MB/s - 19.3 MB/s
- without: 16.3 MB/s - 18.4 MB/s

(Note that the difference between ext4 and XFS is probably just because
these are two different systems with different SSDs.)

So there is little difference, if any significant at all (those were all
just three test runs).  There does seem to be a bit of a drift for
aio=threads, but then again it’s also just much slower than aio=native
for ext4.

(Also note that I ran this on an unfixed kernel.  Maybe the XFS fix will
slow things down, but I haven’t tested that yet.)

So unless there are realistic guest benchmarks for ext4 that say we
should keep the patch, I’m going to queue the revert for now (“now” =
4.1.1 and 4.2.0).

Max

signature.asc
Description: OpenPGP digital signature

Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS

Reply via email to