On 26.07.23 16:32, Thomas Huth wrote:
On 26/07/2023 15.00, Peter Maydell wrote:
On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quint...@redhat.com> wrote:
To make things easier, this is the part that show how it breaks (this is
the gcov test):
357/423 qemu:block / io-qcow2-copy-before-write
ERROR 6.38s exit status 1
PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3 MALLOC_PERTURB_=44
/builds/juan.quintela/qemu/build/pyvenv/bin/python3
/builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap -qcow2
copy-before-write --source-dir /builds/juan.quintela/qemu/tests/qemu-iotests
--build-dir /builds/juan.quintela/qemu/build/tests/qemu-iotests
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
--- /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
+++
/builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
@@ -1,5 +1,21 @@
-....
+...F
+======================================================================
+FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+ File
"/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", line
210, in test_timeout_break_snapshot
+ self.assertEqual(log, """\
+AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at offset 0\n1
MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed: Permission denied\n'
+ wrote 524288/524288 bytes at offset 0
+ 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ wrote 524288/524288 bytes at offset 524288
+ 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++ read failed: Permission denied
+- read 1048576/1048576 bytes at offset 0
+- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
This iotest failing is an intermittent that I've seen running
pullreqs on master. I tend to see it on the s390 host. I
suspect a race condition somewhere where it fails if the host
is heavily loaded.
It's obviously a failure in an iotest, so let's CC: the corresponding people
(done now).
Sorry for long delay.
Does it still fail?
In the test we expect that copy-before-write operation fails (because of
throttling and timeout), and therefore snapshot is broken and next read from
snapshot should fail.
But most probably the copy-before-write operation succeeded in this case for
some reason.. I don't think that throttling and timeouts in block layer can
guarantee some determinism.. But usually it works.
we use throttling with bps-write = 300 * 1024, i.e. 300KB per second. and
cbw-timeout is set to 1 second.
Then we do write 512K,
then the comment say:
# We need second write to trigger throttling
and we write another 512K.
first 512K are written, and we should wait 512/300 = 1.7 seconds since _start_
of that write before issuing the second one.. But if write was slow we may have
to wait less than a second from finish of the first write start the second one.
Then timeout will not fire.
====
I see two possible ways to fix that:
1. decrease bps-write a bit. For example to 200 BPS.
2. rework the test to use null-co instead of real images. This way we will not
suffer from unstable IO duration.
So, is the problem still fire sometimes?
--
Best regards,
Vladimir