On 18/10/2019 10.42, Max Reitz wrote: > On 18.10.19 08:20, Thomas Huth wrote: >> On 17/10/2019 18.41, Peter Maydell wrote: >>> On Fri, 27 Sep 2019 at 17:44, Max Reitz <mre...@redhat.com> wrote: >>>> >>>> On 27.09.19 18:39, Peter Maydell wrote: >>>>> Hi; I just saw this iotest failure (on an s390x box, as it happens): >>>>> >>>>> TEST iotest-qcow2: 130 [fail] >>>>> QEMU -- >>>>> "/home/linux1/qemu/build/all/tests/qemu-iotests/../../s390x-softmmu/qemu-system-s390x" >>>>> -nodefaults -display none -machine accel=qtest >>>>> QEMU_IMG -- >>>>> "/home/linux1/qemu/build/all/tests/qemu-iotests/../../qemu-img" >>>>> QEMU_IO -- >>>>> "/home/linux1/qemu/build/all/tests/qemu-iotests/../../qemu-io" >>>>> --cache writeback -f qcow2 >>>>> QEMU_NBD -- >>>>> "/home/linux1/qemu/build/all/tests/qemu-iotests/../../qemu-nbd" >>>>> IMGFMT -- qcow2 (compat=1.1) >>>>> IMGPROTO -- file >>>>> PLATFORM -- Linux/s390x lxub05 4.15.0-58-generic >>>>> TEST_DIR -- /home/linux1/qemu/build/all/tests/qemu-iotests/scratch >>>>> SOCKET_SCM_HELPER -- >>>>> /home/linux1/qemu/build/all/tests/qemu-iotests/socket_scm_helper >>>>> >>>>> --- /home/linux1/qemu/tests/qemu-iotests/130.out 2019-05-10 >>>>> 12:27:16.948075733 -0400 >>>>> +++ /home/linux1/qemu/build/all/tests/qemu-iotests/130.out.bad >>>>> 2019-09-27 12:01:23.649722655 -0400 >>>>> @@ -18,20 +18,22 @@ >>>>> QEMU X.Y.Z monitor - type 'help' for more information >>>>> (qemu) commit testdisk >>>>> (qemu) >>>>> -image: TEST_DIR/t.IMGFMT >>>>> -file format: IMGFMT >>>>> -virtual size: 64 MiB (67108864 bytes) >>>>> -backing file: TEST_DIR/t.IMGFMT.orig >>>>> -backing file format: raw >>>>> +qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Failed to get shared >>>>> "write" lock >>>>> +Is another process using the image [TEST_DIR/t.IMGFMT]? >>>>> >>>>> === Marking image dirty (lazy refcounts) === >>>>> >>>>> +qemu-img: TEST_DIR/t.IMGFMT: Failed to get "write" lock >>>>> +Is another process using the image [TEST_DIR/t.IMGFMT]? >>>>> Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 >>>>> -wrote 4096/4096 bytes at offset 0 >>>>> -4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>>>> +qemu-io: can't open device >>>>> /home/linux1/qemu/build/all/tests/qemu-iotests/scratch/t.qcow2: Failed >>>>> to get "write" lock >>>>> +Is another process using the image >>>>> [/home/linux1/qemu/build/all/tests/qemu-iotests/scratch/t.qcow2]? >>>>> +no file open, try 'help open' >>>>> image: TEST_DIR/t.IMGFMT >>>>> file format: IMGFMT >>>>> virtual size: 64 MiB (67108864 bytes) >>>>> +backing file: TEST_DIR/t.IMGFMT.orig >>>>> +backing file format: raw >>>>> Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 >>>>> backing_file=TEST_DIR/t.IMGFMT.orig backing_fmt=raw >>>>> wrote 4096/4096 bytes at offset 0 >>>>> 4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>>>> >>>>> >>>>> >>>>> This looks suspiciously like the test isn't using a unique >>>>> filename for its disk image: "qemu-iotests/scratch/t.qcow2" >>>>> in the build directory, and so perhaps it has collided with >>>>> another iotest ? >>>>> >>>>> If we run 'make check' with a -j<something> option do the >>>>> iotests all get run serially anyway, or do they run in >>>>> parallel against each other ? >>>> >>>> As far as I know, all iotests are executed serially. Anything else >>>> would not work with the same scratch directory. >>>> >>>> The only thing I suspect is that some tool has been accidentally left >>>> running by some previous test that still accesses its own image. But I >>>> don’t know. >>> >>> Just saw this one again with the same iotest 130 on the same >>> s390 box; only difference is that the log this time around >>> has the first part where qemu-img fails, but not the second part >>> where qemu-io fails: >>> >>> --- /home/linux1/qemu/tests/qemu-iotests/130.out 2019-05-10 >>> 12:27:16.948075733 -0400 >>> +++ /home/linux1/qemu/build/all/tests/qemu-iotests/130.out.bad >>> 2019-10-17 11:56:43.450750873 -0400 >>> @@ -18,11 +18,8 @@ >>> QEMU X.Y.Z monitor - type 'help' for more information >>> (qemu) commit testdisk >>> (qemu) >>> -image: TEST_DIR/t.IMGFMT >>> -file format: IMGFMT >>> -virtual size: 64 MiB (67108864 bytes) >>> -backing file: TEST_DIR/t.IMGFMT.orig >>> -backing file format: raw >>> +qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Failed to get shared "write" >>> lock >>> +Is another process using the image [TEST_DIR/t.IMGFMT]? >>> >>> === Marking image dirty (lazy refcounts) === >>> >>> On the host machine there don't seem to be any stray >>> processes which might have held the file open, and >>> indeed the file doesn't exist at all, so it got removed >>> by some cleanup or other. >> >> Ok, so unless someone has a clue what might be going on here (is there a >> race in the test?), I'd suggest that we simply remove 130 from the auto >> group again. Shall I send a patch? > > I don’t have much of an idea. It looks like maybe the qemu process > (which dos the commit) is lingering, but that shouldn’t be because > _cleanup_qemu always waits for it. (Also, I can’t reproduce the problem > on my system.) > > The only hunch that I have is that 130 seems to be the only test that > uses _cleanup_qemu to kill a qemu process (i.e. without wait=1) while it > has taken locks on an image and then still uses the image afterwards. > Maybe making it quit qemu through HMP would fix the problem. But > knowing is difficult because I can’t reproduce it.
I also can't reproduce the problem. I've now run 130 several hundred times in a loop on a s390x box and never saw the problem. I think we should disable it for the time being. Thomas
signature.asc
Description: OpenPGP digital signature