Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS

Denis Lunev Fri, 01 Nov 2019 06:37:34 -0700

On 11/1/19 4:09 PM, Vladimir Sementsov-Ogievskiy wrote:
> 01.11.2019 15:34, Max Reitz wrote:
>> On 01.11.19 12:20, Max Reitz wrote:
>>> On 01.11.19 12:16, Vladimir Sementsov-Ogievskiy wrote:
>>>> 01.11.2019 14:12, Max Reitz wrote:
>>>>> On 01.11.19 11:28, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 01.11.2019 13:20, Max Reitz wrote:
>>>>>>> On 01.11.19 11:00, Max Reitz wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This series builds on the previous RFC.  The workaround is now applied
>>>>>>>> unconditionally of AIO mode and filesystem because we don’t know those
>>>>>>>> things for remote filesystems.  Furthermore, bdrv_co_get_self_request()
>>>>>>>> has been moved to block/io.c.
>>>>>>>>
>>>>>>>> Applying the workaround unconditionally is fine from a performance
>>>>>>>> standpoint, because it should actually be dead code, thanks to patch 1
>>>>>>>> (the elephant in the room).  As far as I know, there is no other block
>>>>>>>> driver but qcow2 in handle_alloc_space() that would submit zero writes
>>>>>>>> as part of normal I/O so it can occur concurrently to other write
>>>>>>>> requests.  It still makes sense to take the workaround for file-posix
>>>>>>>> because we can’t really prevent that any other block driver will submit
>>>>>>>> zero writes as part of normal I/O in the future.
>>>>>>>>
>>>>>>>> Anyway, let’s get to the elephant.
>>>>>>>>
>>>>>>>>    From input by XFS developers
>>>>>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1765547#c7) it seems clear
>>>>>>>> that c8bb23cbdbe causes fundamental performance problems on XFS with
>>>>>>>> aio=native that cannot be fixed.  In other cases, c8bb23cbdbe improves
>>>>>>>> performance or we wouldn’t have it.
>>>>>>>>
>>>>>>>> In general, avoiding performance regressions is more important than
>>>>>>>> improving performance, unless the regressions are just a minor corner
>>>>>>>> case or insignificant when compared to the improvement.  The XFS
>>>>>>>> regression is no minor corner case, and it isn’t insignificant.  
>>>>>>>> Laurent
>>>>>>>> Vivier has found performance to decrease by as much as 88 % (on 
>>>>>>>> ppc64le,
>>>>>>>> fio in a guest with 4k blocks, iodepth=8: 1662 kB/s from 13.9 MB/s).
>>>>>>> Ah, crap.
>>>>>>>
>>>>>>> I wanted to send this series as early today as possible to get as much
>>>>>>> feedback as possible, so I’ve only started doing benchmarks now.
>>>>>>>
>>>>>>> The obvious
>>>>>>>
>>>>>>> $ qemu-img bench -t none -n -w -S 65536 test.qcow2
>>>>>>>
>>>>>>> on XFS takes like 6 seconds on master, and like 50 to 80 seconds with
>>>>>>> c8bb23cbdbe reverted.  So now on to guest tests...
>>>>>> Aha, that's very interesting) What about aio-native which should be 
>>>>>> slowed down?
>>>>>> Could it be tested like this?
>>>>> That is aio=native (-n).
>>>>>
>>>>> But so far I don’t see any significant difference in guest tests (i.e.,
>>>>> fio --rw=write --bs=4k --iodepth=8 --runtime=1m --direct=1
>>>>> --ioengine=libaio --thread --numjobs=16 --size=2G --time_based), neither
>>>>> with 64 kB nor with 2 MB clusters.  (But only on XFS, I’ll have to see
>>>>> about ext4 still.)
>>>> hmm, this possibly mostly tests writes to already allocated clusters. Has 
>>>> fio
>>>> an option to behave like qemu-img bench with -S 65536, i.e. write once into
>>>> each cluster?
>>> Maybe, but is that a realistic depiction of whether this change is worth
>>> it?  That is why I’m doing the guest test, to see whether it actually
>>> has much impact on the guest.
>> I’ve changed the above fio invocation to use --rw=randwrite and added
>> --fallocate=none.  The performance went down, but it went down both with
>> and without c8bb23cbdbe.
>>
>> So on my XFS system (XFS on luks on SSD), I see:
>> - with c8bb23cbdbe: 26.0 - 27.9 MB/s
>> - without c8bb23cbdbe: 25.6 - 27 MB/s
>>
>> On my ext4 system (native on SSD), I see:
>> - with: 39.4 - 41.5 MB/s
>> - without: 39.4 - 42.0 MB/s
>>
>> So basically no difference for XFS, and really no difference for ext4.
>> (I ran these tests with 2 MB clusters.)
>>
> Hmm. I don't know. For me it seems obvious that zeroing 2M cluster is slow, 
> and this
> is proved by simple tests with qemu-img bench, that fallocate is faster than 
> zeroing
> most of the cluster.
>
> So, if some guest test doesn't show the difference, this means that "small 
> write into
> new cluster" is effectively rare case in this test.. And this doesn't prove 
> that it's
> always rare and insignificant.
>
> I don't sure that we have a real-world example that proves necessity of this 
> optimization,
> or was there some original bug about low-performance which was fixed by this 
> optimization..
> Den, Anton, do we have something about it?
>
sorry, I have missed the beginning of the thread.


Which driver is used for virtual disk - cached or non-cached IO
is used in QEMU? We use non-cached by default and this could
make a difference significantly.

Max,

can you pls share your domain.xml of the guest config and
fio file for guest. I will recheck to be 120% sure.

Thank you in advance,
    Den

Re: [PATCH for-4.2 0/4] qcow2: Fix data corruption on XFS

Reply via email to