Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup
Kevin Wolf writes: > Am 19.11.2019 um 12:35 hat Sergio Lopez geschrieben: >> >> Kevin Wolf writes: >> >> > Am 19.11.2019 um 11:54 hat Sergio Lopez geschrieben: >> >> >> >> Max Reitz writes: >> >> >> >> > On 13.11.19 14:24, Sergio Lopez wrote: >> >> >> >> >> >> Sergio Lopez writes: >> >> >> >> >> >>> no-re...@patchew.org writes: >> >> >>> >> >> Patchew URL: >> >> https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/ >> >> >> >> >> >> >> >> Hi, >> >> >> >> This series failed the docker-quick@centos7 build test. Please find >> >> the testing commands and >> >> their output below. If you have Docker installed, you can probably >> >> reproduce it >> >> locally. >> >> >> >> === TEST SCRIPT BEGIN === >> >> #!/bin/bash >> >> make docker-image-centos7 V=1 NETWORK=1 >> >> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 >> >> === TEST SCRIPT END === >> >> >> >> TESTiotest-qcow2: 268 >> >> Failures: 141 >> >> >>> >> >> >>> Hm... 141 didn't fail in my test machine. I'm going to have a look. >> >> >> >> >> >> So here's the output: >> >> >> >> >> >> --- /root/qemu/tests/qemu-iotests/141.out 2019-11-12 >> >> >> 04:43:27.651557587 -0500 >> >> >> +++ /root/qemu/build/tests/qemu-iotests/141.out.bad2019-11-13 >> >> >> 08:12:06.575967337 -0500 >> >> >> @@ -10,6 +10,8 @@ >> >> >> Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 >> >> >> backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT >> >> >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": >> >> >> "job0"}} >> >> >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": >> >> >> "job0"}} >> >> >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": >> >> >> "job0"}} >> >> >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": >> >> >> "job0"}} >> >> >> {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: >> >> >> node is used as backing hd of 'NODE_NAME'"}} >> >> >> {"return": {}} >> >> >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": >> >> >> "job0"}} >> >> >> >> >> >> Those extra lines, the "paused" and "running", are a result of the job >> >> >> being done in a transaction, within a drained section. >> >> >> >> >> >> We can update 141.out, but now I'm wondering, was it safe creating the >> >> >> job at do_drive_backup() outside of a drained section, as >> >> >> qmp_drive_backup was doing? >> >> > >> >> > I think it is. Someone needs to drain the source node before attaching >> >> > the job filter (which intercepts writes), and bdrv_backup_top_append() >> >> > does precisely this. >> >> > >> >> > If the source node is in an I/O thread, you could argue that the drain >> >> > starts later than when the user has invoked the backup command, and so >> >> > some writes might slip through. That’s correct. But at the same time, >> >> > it’s impossible to drain it the instant the command is received. So >> >> > some writes might always slip through (and the drain will not stop them >> >> > either, it will just let them happen). >> >> > >> >> > Therefore, I think it’s fine the way it is. >> >> > >> >> >> Do you think there may be any potential drawbacks as a result of always >> >> >> doing it now inside a drained section? >> >> > >> >> > Well, one drawback is clearly visible. The job goes to paused for no >> >> > reason. >> >> >> >> This is something that already happens when requesting the drive-backup >> >> through a transaction: >> >> >> >> {"execute":"transaction","arguments":{"actions":[{"type":"drive-backup","data":{"device":"drv0","target":"o.qcow2","sync":"full","format":"qcow2"}}]}} >> >> >> >> I don't think it makes sense to have two different behaviors for the >> >> same action. So we either accept the additional pause+resume iteration >> >> for qmp_drive_backup, or we remove the drained section from the >> >> transaction based one. >> >> >> >> What do you think? >> > >> > Draining all involved nodes is necessary for transactions, because you >> > want a consistent backup across all involved disks. That is, you want it >> > to be a snapshot at the same point in time for all of them - no requests >> > may happen between starting backup on the first and the second disk. >> > >> > For a single device operation, this requirement doesn't exist, because >> > there is nothing else that must happen at the same point in time. >> >> This poses a problem with the unification strategy you suggested for qmp >> commands and
Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup
Am 19.11.2019 um 12:35 hat Sergio Lopez geschrieben: > > Kevin Wolf writes: > > > Am 19.11.2019 um 11:54 hat Sergio Lopez geschrieben: > >> > >> Max Reitz writes: > >> > >> > On 13.11.19 14:24, Sergio Lopez wrote: > >> >> > >> >> Sergio Lopez writes: > >> >> > >> >>> no-re...@patchew.org writes: > >> >>> > >> Patchew URL: > >> https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/ > >> > >> > >> > >> Hi, > >> > >> This series failed the docker-quick@centos7 build test. Please find > >> the testing commands and > >> their output below. If you have Docker installed, you can probably > >> reproduce it > >> locally. > >> > >> === TEST SCRIPT BEGIN === > >> #!/bin/bash > >> make docker-image-centos7 V=1 NETWORK=1 > >> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 > >> === TEST SCRIPT END === > >> > >> TESTiotest-qcow2: 268 > >> Failures: 141 > >> >>> > >> >>> Hm... 141 didn't fail in my test machine. I'm going to have a look. > >> >> > >> >> So here's the output: > >> >> > >> >> --- /root/qemu/tests/qemu-iotests/141.out 2019-11-12 > >> >> 04:43:27.651557587 -0500 > >> >> +++ /root/qemu/build/tests/qemu-iotests/141.out.bad 2019-11-13 > >> >> 08:12:06.575967337 -0500 > >> >> @@ -10,6 +10,8 @@ > >> >> Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 > >> >> backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT > >> >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": > >> >> "job0"}} > >> >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": > >> >> "job0"}} > >> >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": > >> >> "job0"}} > >> >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": > >> >> "job0"}} > >> >> {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: node > >> >> is used as backing hd of 'NODE_NAME'"}} > >> >> {"return": {}} > >> >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": > >> >> "job0"}} > >> >> > >> >> Those extra lines, the "paused" and "running", are a result of the job > >> >> being done in a transaction, within a drained section. > >> >> > >> >> We can update 141.out, but now I'm wondering, was it safe creating the > >> >> job at do_drive_backup() outside of a drained section, as > >> >> qmp_drive_backup was doing? > >> > > >> > I think it is. Someone needs to drain the source node before attaching > >> > the job filter (which intercepts writes), and bdrv_backup_top_append() > >> > does precisely this. > >> > > >> > If the source node is in an I/O thread, you could argue that the drain > >> > starts later than when the user has invoked the backup command, and so > >> > some writes might slip through. That’s correct. But at the same time, > >> > it’s impossible to drain it the instant the command is received. So > >> > some writes might always slip through (and the drain will not stop them > >> > either, it will just let them happen). > >> > > >> > Therefore, I think it’s fine the way it is. > >> > > >> >> Do you think there may be any potential drawbacks as a result of always > >> >> doing it now inside a drained section? > >> > > >> > Well, one drawback is clearly visible. The job goes to paused for no > >> > reason. > >> > >> This is something that already happens when requesting the drive-backup > >> through a transaction: > >> > >> {"execute":"transaction","arguments":{"actions":[{"type":"drive-backup","data":{"device":"drv0","target":"o.qcow2","sync":"full","format":"qcow2"}}]}} > >> > >> I don't think it makes sense to have two different behaviors for the > >> same action. So we either accept the additional pause+resume iteration > >> for qmp_drive_backup, or we remove the drained section from the > >> transaction based one. > >> > >> What do you think? > > > > Draining all involved nodes is necessary for transactions, because you > > want a consistent backup across all involved disks. That is, you want it > > to be a snapshot at the same point in time for all of them - no requests > > may happen between starting backup on the first and the second disk. > > > > For a single device operation, this requirement doesn't exist, because > > there is nothing else that must happen at the same point in time. > > This poses a problem with the unification strategy you suggested for qmp > commands and transactions. I guess that, if we really want to preserve > the original behavior, we can extend DriveBackup to add a flag to > indicate
Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup
Kevin Wolf writes: > Am 19.11.2019 um 11:54 hat Sergio Lopez geschrieben: >> >> Max Reitz writes: >> >> > On 13.11.19 14:24, Sergio Lopez wrote: >> >> >> >> Sergio Lopez writes: >> >> >> >>> no-re...@patchew.org writes: >> >>> >> Patchew URL: >> https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/ >> >> >> >> Hi, >> >> This series failed the docker-quick@centos7 build test. Please find the >> testing commands and >> their output below. If you have Docker installed, you can probably >> reproduce it >> locally. >> >> === TEST SCRIPT BEGIN === >> #!/bin/bash >> make docker-image-centos7 V=1 NETWORK=1 >> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 >> === TEST SCRIPT END === >> >> TESTiotest-qcow2: 268 >> Failures: 141 >> >>> >> >>> Hm... 141 didn't fail in my test machine. I'm going to have a look. >> >> >> >> So here's the output: >> >> >> >> --- /root/qemu/tests/qemu-iotests/141.out 2019-11-12 04:43:27.651557587 >> >> -0500 >> >> +++ /root/qemu/build/tests/qemu-iotests/141.out.bad 2019-11-13 >> >> 08:12:06.575967337 -0500 >> >> @@ -10,6 +10,8 @@ >> >> Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 >> >> backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT >> >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}} >> >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} >> >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}} >> >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} >> >> {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: node >> >> is used as backing hd of 'NODE_NAME'"}} >> >> {"return": {}} >> >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >> >> "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": >> >> "job0"}} >> >> >> >> Those extra lines, the "paused" and "running", are a result of the job >> >> being done in a transaction, within a drained section. >> >> >> >> We can update 141.out, but now I'm wondering, was it safe creating the >> >> job at do_drive_backup() outside of a drained section, as >> >> qmp_drive_backup was doing? >> > >> > I think it is. Someone needs to drain the source node before attaching >> > the job filter (which intercepts writes), and bdrv_backup_top_append() >> > does precisely this. >> > >> > If the source node is in an I/O thread, you could argue that the drain >> > starts later than when the user has invoked the backup command, and so >> > some writes might slip through. That’s correct. But at the same time, >> > it’s impossible to drain it the instant the command is received. So >> > some writes might always slip through (and the drain will not stop them >> > either, it will just let them happen). >> > >> > Therefore, I think it’s fine the way it is. >> > >> >> Do you think there may be any potential drawbacks as a result of always >> >> doing it now inside a drained section? >> > >> > Well, one drawback is clearly visible. The job goes to paused for no >> > reason. >> >> This is something that already happens when requesting the drive-backup >> through a transaction: >> >> {"execute":"transaction","arguments":{"actions":[{"type":"drive-backup","data":{"device":"drv0","target":"o.qcow2","sync":"full","format":"qcow2"}}]}} >> >> I don't think it makes sense to have two different behaviors for the >> same action. So we either accept the additional pause+resume iteration >> for qmp_drive_backup, or we remove the drained section from the >> transaction based one. >> >> What do you think? > > Draining all involved nodes is necessary for transactions, because you > want a consistent backup across all involved disks. That is, you want it > to be a snapshot at the same point in time for all of them - no requests > may happen between starting backup on the first and the second disk. > > For a single device operation, this requirement doesn't exist, because > there is nothing else that must happen at the same point in time. This poses a problem with the unification strategy you suggested for qmp commands and transactions. I guess that, if we really want to preserve the original behavior, we can extend DriveBackup to add a flag to indicate whether the transaction should create a drained section or not. Does this sound reasonable to you? Thanks, Sergio. signature.asc Description: PGP signature
Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup
Am 19.11.2019 um 11:54 hat Sergio Lopez geschrieben: > > Max Reitz writes: > > > On 13.11.19 14:24, Sergio Lopez wrote: > >> > >> Sergio Lopez writes: > >> > >>> no-re...@patchew.org writes: > >>> > Patchew URL: > https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/ > > > > Hi, > > This series failed the docker-quick@centos7 build test. Please find the > testing commands and > their output below. If you have Docker installed, you can probably > reproduce it > locally. > > === TEST SCRIPT BEGIN === > #!/bin/bash > make docker-image-centos7 V=1 NETWORK=1 > time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 > === TEST SCRIPT END === > > TESTiotest-qcow2: 268 > Failures: 141 > >>> > >>> Hm... 141 didn't fail in my test machine. I'm going to have a look. > >> > >> So here's the output: > >> > >> --- /root/qemu/tests/qemu-iotests/141.out 2019-11-12 04:43:27.651557587 > >> -0500 > >> +++ /root/qemu/build/tests/qemu-iotests/141.out.bad2019-11-13 > >> 08:12:06.575967337 -0500 > >> @@ -10,6 +10,8 @@ > >> Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 > >> backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT > >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}} > >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} > >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}} > >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} > >> {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: node is > >> used as backing hd of 'NODE_NAME'"}} > >> {"return": {}} > >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >> "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}} > >> > >> Those extra lines, the "paused" and "running", are a result of the job > >> being done in a transaction, within a drained section. > >> > >> We can update 141.out, but now I'm wondering, was it safe creating the > >> job at do_drive_backup() outside of a drained section, as > >> qmp_drive_backup was doing? > > > > I think it is. Someone needs to drain the source node before attaching > > the job filter (which intercepts writes), and bdrv_backup_top_append() > > does precisely this. > > > > If the source node is in an I/O thread, you could argue that the drain > > starts later than when the user has invoked the backup command, and so > > some writes might slip through. That’s correct. But at the same time, > > it’s impossible to drain it the instant the command is received. So > > some writes might always slip through (and the drain will not stop them > > either, it will just let them happen). > > > > Therefore, I think it’s fine the way it is. > > > >> Do you think there may be any potential drawbacks as a result of always > >> doing it now inside a drained section? > > > > Well, one drawback is clearly visible. The job goes to paused for no > > reason. > > This is something that already happens when requesting the drive-backup > through a transaction: > > {"execute":"transaction","arguments":{"actions":[{"type":"drive-backup","data":{"device":"drv0","target":"o.qcow2","sync":"full","format":"qcow2"}}]}} > > I don't think it makes sense to have two different behaviors for the > same action. So we either accept the additional pause+resume iteration > for qmp_drive_backup, or we remove the drained section from the > transaction based one. > > What do you think? Draining all involved nodes is necessary for transactions, because you want a consistent backup across all involved disks. That is, you want it to be a snapshot at the same point in time for all of them - no requests may happen between starting backup on the first and the second disk. For a single device operation, this requirement doesn't exist, because there is nothing else that must happen at the same point in time. Kevin signature.asc Description: PGP signature
Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup
Max Reitz writes: > On 13.11.19 14:24, Sergio Lopez wrote: >> >> Sergio Lopez writes: >> >>> no-re...@patchew.org writes: >>> Patchew URL: https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === TESTiotest-qcow2: 268 Failures: 141 >>> >>> Hm... 141 didn't fail in my test machine. I'm going to have a look. >> >> So here's the output: >> >> --- /root/qemu/tests/qemu-iotests/141.out2019-11-12 04:43:27.651557587 >> -0500 >> +++ /root/qemu/build/tests/qemu-iotests/141.out.bad 2019-11-13 >> 08:12:06.575967337 -0500 >> @@ -10,6 +10,8 @@ >> Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 >> backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": >> "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}} >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": >> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": >> "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}} >> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": >> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} >> {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: node is >> used as backing hd of 'NODE_NAME'"}} >> {"return": {}} >> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": >> "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}} >> >> Those extra lines, the "paused" and "running", are a result of the job >> being done in a transaction, within a drained section. >> >> We can update 141.out, but now I'm wondering, was it safe creating the >> job at do_drive_backup() outside of a drained section, as >> qmp_drive_backup was doing? > > I think it is. Someone needs to drain the source node before attaching > the job filter (which intercepts writes), and bdrv_backup_top_append() > does precisely this. > > If the source node is in an I/O thread, you could argue that the drain > starts later than when the user has invoked the backup command, and so > some writes might slip through. That’s correct. But at the same time, > it’s impossible to drain it the instant the command is received. So > some writes might always slip through (and the drain will not stop them > either, it will just let them happen). > > Therefore, I think it’s fine the way it is. > >> Do you think there may be any potential drawbacks as a result of always >> doing it now inside a drained section? > > Well, one drawback is clearly visible. The job goes to paused for no > reason. This is something that already happens when requesting the drive-backup through a transaction: {"execute":"transaction","arguments":{"actions":[{"type":"drive-backup","data":{"device":"drv0","target":"o.qcow2","sync":"full","format":"qcow2"}}]}} I don't think it makes sense to have two different behaviors for the same action. So we either accept the additional pause+resume iteration for qmp_drive_backup, or we remove the drained section from the transaction based one. What do you think? Cheers, Sergio. signature.asc Description: PGP signature
Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup
On 13.11.19 14:24, Sergio Lopez wrote: > > Sergio Lopez writes: > >> no-re...@patchew.org writes: >> >>> Patchew URL: https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/ >>> >>> >>> >>> Hi, >>> >>> This series failed the docker-quick@centos7 build test. Please find the >>> testing commands and >>> their output below. If you have Docker installed, you can probably >>> reproduce it >>> locally. >>> >>> === TEST SCRIPT BEGIN === >>> #!/bin/bash >>> make docker-image-centos7 V=1 NETWORK=1 >>> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 >>> === TEST SCRIPT END === >>> >>> TESTiotest-qcow2: 268 >>> Failures: 141 >> >> Hm... 141 didn't fail in my test machine. I'm going to have a look. > > So here's the output: > > --- /root/qemu/tests/qemu-iotests/141.out 2019-11-12 04:43:27.651557587 > -0500 > +++ /root/qemu/build/tests/qemu-iotests/141.out.bad 2019-11-13 > 08:12:06.575967337 -0500 > @@ -10,6 +10,8 @@ > Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 > backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} > {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: node is > used as backing hd of 'NODE_NAME'"}} > {"return": {}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}} > > Those extra lines, the "paused" and "running", are a result of the job > being done in a transaction, within a drained section. > > We can update 141.out, but now I'm wondering, was it safe creating the > job at do_drive_backup() outside of a drained section, as > qmp_drive_backup was doing? I think it is. Someone needs to drain the source node before attaching the job filter (which intercepts writes), and bdrv_backup_top_append() does precisely this. If the source node is in an I/O thread, you could argue that the drain starts later than when the user has invoked the backup command, and so some writes might slip through. That’s correct. But at the same time, it’s impossible to drain it the instant the command is received. So some writes might always slip through (and the drain will not stop them either, it will just let them happen). Therefore, I think it’s fine the way it is. > Do you think there may be any potential drawbacks as a result of always > doing it now inside a drained section? Well, one drawback is clearly visible. The job goes to paused for no reason. Max signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup
Sergio Lopez writes: > no-re...@patchew.org writes: > >> Patchew URL: https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/ >> >> >> >> Hi, >> >> This series failed the docker-quick@centos7 build test. Please find the >> testing commands and >> their output below. If you have Docker installed, you can probably reproduce >> it >> locally. >> >> === TEST SCRIPT BEGIN === >> #!/bin/bash >> make docker-image-centos7 V=1 NETWORK=1 >> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 >> === TEST SCRIPT END === >> >> TESTiotest-qcow2: 268 >> Failures: 141 > > Hm... 141 didn't fail in my test machine. I'm going to have a look. So here's the output: --- /root/qemu/tests/qemu-iotests/141.out 2019-11-12 04:43:27.651557587 -0500 +++ /root/qemu/build/tests/qemu-iotests/141.out.bad 2019-11-13 08:12:06.575967337 -0500 @@ -10,6 +10,8 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: node is used as backing hd of 'NODE_NAME'"}} {"return": {}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}} Those extra lines, the "paused" and "running", are a result of the job being done in a transaction, within a drained section. We can update 141.out, but now I'm wondering, was it safe creating the job at do_drive_backup() outside of a drained section, as qmp_drive_backup was doing? Do you think there may be any potential drawbacks as a result of always doing it now inside a drained section? Thanks, Sergio. > Sergio. > >> Failed 1 of 108 iotests >> make: *** [check-tests/check-block.sh] Error 1 >> Traceback (most recent call last): >> File "./tests/docker/docker.py", line 662, in >> sys.exit(main()) >> --- >> raise CalledProcessError(retcode, cmd) >> subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', >> '--label', 'com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417', '-u', >> '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', >> 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', >> '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', >> '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', >> '/var/tmp/patchew-tester-tmp-6a9_8q0n/src/docker-src.2019-11-12-17.38.46.26027:/var/tmp/qemu:z,ro', >> 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit >> status 2. >> filter=--filter=label=com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417 >> make[1]: *** [docker-run] Error 1 >> make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-6a9_8q0n/src' >> make: *** [docker-run-test-quick@centos7] Error 2 >> >> real10m57.839s >> user0m8.062s >> >> >> The full log is available at >> http://patchew.org/logs/20191112113012.71136-1-...@redhat.com/testing.docker-quick@centos7/?type=message. >> --- >> Email generated automatically by Patchew [https://patchew.org/]. >> Please send your feedback to patchew-de...@redhat.com
Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup
no-re...@patchew.org writes: > Patchew URL: https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/ > > > > Hi, > > This series failed the docker-quick@centos7 build test. Please find the > testing commands and > their output below. If you have Docker installed, you can probably reproduce > it > locally. > > === TEST SCRIPT BEGIN === > #!/bin/bash > make docker-image-centos7 V=1 NETWORK=1 > time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 > === TEST SCRIPT END === > > TESTiotest-qcow2: 268 > Failures: 141 Hm... 141 didn't fail in my test machine. I'm going to have a look. Sergio. > Failed 1 of 108 iotests > make: *** [check-tests/check-block.sh] Error 1 > Traceback (most recent call last): > File "./tests/docker/docker.py", line 662, in > sys.exit(main()) > --- > raise CalledProcessError(retcode, cmd) > subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', > '--label', 'com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417', '-u', > '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', > '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', > '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', > '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', > '/var/tmp/patchew-tester-tmp-6a9_8q0n/src/docker-src.2019-11-12-17.38.46.26027:/var/tmp/qemu:z,ro', > 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit > status 2. > filter=--filter=label=com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417 > make[1]: *** [docker-run] Error 1 > make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-6a9_8q0n/src' > make: *** [docker-run-test-quick@centos7] Error 2 > > real10m57.839s > user0m8.062s > > > The full log is available at > http://patchew.org/logs/20191112113012.71136-1-...@redhat.com/testing.docker-quick@centos7/?type=message. > --- > Email generated automatically by Patchew [https://patchew.org/]. > Please send your feedback to patchew-de...@redhat.com
Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup
Patchew URL: https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === TESTiotest-qcow2: 268 Failures: 141 Failed 1 of 108 iotests make: *** [check-tests/check-block.sh] Error 1 Traceback (most recent call last): File "./tests/docker/docker.py", line 662, in sys.exit(main()) --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-6a9_8q0n/src/docker-src.2019-11-12-17.38.46.26027:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417 make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-6a9_8q0n/src' make: *** [docker-run-test-quick@centos7] Error 2 real10m57.839s user0m8.062s The full log is available at http://patchew.org/logs/20191112113012.71136-1-...@redhat.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com