Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
PING PING! On 28.08.2018 13:23, Denis Plotnikov wrote: On 27.08.2018 19:05, John Snow wrote: On 08/27/2018 03:05 AM, Denis Plotnikov wrote: PING! PING! Sorry, Kevin and Stefan are both on PTO right now, I think. I can't promise I have the time to look soon, but you at least deserve an answer for the radio silence the last week. --js Thanks for the response! I'll be waiting for some comments! Denis On 14.08.2018 10:08, Denis Plotnikov wrote: On 13.08.2018 19:30, Kevin Wolf wrote: Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben: Ping ping! On 16.07.2018 21:59, John Snow wrote: On 07/16/2018 11:01 AM, Denis Plotnikov wrote: Ping! I never saw a reply to Stefan's question on July 2nd, did you reply off-list? --js Yes, I did. I talked to Stefan why the patch set appeared. The rest of us still don't know the answer. I had the same question. Kevin Yes, that's my fault. I should have post it earlier. I reviewed the problem once again and come up with the following explanation. Indeed, if the global lock has been taken by the main thread the vCPU threads won't be able to execute mmio ide. But, if the main thread will release the lock then nothing will prevent vCPU threads form execution what they want, e.g writing to the block device. In case of running the mirroring it is possible. Let's take a look at the following snippet of mirror_run. This is a part the mirroring completion part. bdrv_drained_begin(bs); cnt = bdrv_get_dirty_count(s->dirty_bitmap); >> if (cnt > 0 || mirror_flush(s) < 0) { bdrv_drained_end(bs); continue; } (X) assert(QLIST_EMPTY(>tracked_requests)); mirror_flush here can yield the current coroutine so nothing more can be executed. We could end up with the situation when the main loop have to revolve to poll for another timer/bh to process. While revolving it releases the global lock. If the global lock is waited for by a vCPU (any other) thread, the waiting thread will get the lock and make what it intends. This is something that I can observe: mirror_flush yields coroutine, the main thread revolves and locks because a vCPU was waiting for the lock. Now the vCPU thread owns the lock and the main thread waits for the lock releasing. The vCPU thread does cmd_write_dma and releases the lock. Then, the main thread gets the lock and continues to run eventually proceeding with the coroutine yeiled. If the vCPU requests aren't completed by the moment we will assert at (X). If the vCPU requests are completed we won't even notice that we had some writes while in the drained section. Denis On 29.06.2018 15:40, Denis Plotnikov wrote: There are cases when a request to a block driver state shouldn't have appeared producing dangerous race conditions. This misbehaviour is usually happens with storage devices emulated without eventfd for guest to host notifications like IDE. The issue arises when the context is in the "drained" section and doesn't expect the request to come, but request comes from the device not using iothread and which context is processed by the main loop. The main loop apart of the iothread event loop isn't blocked by the "drained" section. The request coming and processing while in "drained" section can spoil the block driver state consistency. This behavior can be observed in the following KVM-based case: 1. Setup a VM with an IDE disk. 2. Inside a VM start a disk writing load for the IDE device e.g: dd if= of= bs=X count=Y oflag=direct 3. On the host create a mirroring block job for the IDE device e.g: drive_mirror 4. On the host finish the block job e.g: block_job_complete Having done the 4th action, you could get an assert: assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. On my setup, the assert is 1/3 reproducible. The patch series introduces the mechanism to postpone the requests until the BDS leaves "drained" section for the devices not using iothreads. Also, it modifies the asynchronous block backend infrastructure to use that mechanism to release the assert bug for IDE devices. Denis Plotnikov (2): async: add infrastructure for postponed actions block: postpone the coroutine executing if the BDS's is drained block/block-backend.c | 58 ++- include/block/aio.h | 63 +++ util/async.c | 33 +++ 3 files changed, 142 insertions(+), 12 deletions(-) -- Best, Denis -- Best, Denis
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
On 27.08.2018 19:05, John Snow wrote: On 08/27/2018 03:05 AM, Denis Plotnikov wrote: PING! PING! Sorry, Kevin and Stefan are both on PTO right now, I think. I can't promise I have the time to look soon, but you at least deserve an answer for the radio silence the last week. --js Thanks for the response! I'll be waiting for some comments! Denis On 14.08.2018 10:08, Denis Plotnikov wrote: On 13.08.2018 19:30, Kevin Wolf wrote: Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben: Ping ping! On 16.07.2018 21:59, John Snow wrote: On 07/16/2018 11:01 AM, Denis Plotnikov wrote: Ping! I never saw a reply to Stefan's question on July 2nd, did you reply off-list? --js Yes, I did. I talked to Stefan why the patch set appeared. The rest of us still don't know the answer. I had the same question. Kevin Yes, that's my fault. I should have post it earlier. I reviewed the problem once again and come up with the following explanation. Indeed, if the global lock has been taken by the main thread the vCPU threads won't be able to execute mmio ide. But, if the main thread will release the lock then nothing will prevent vCPU threads form execution what they want, e.g writing to the block device. In case of running the mirroring it is possible. Let's take a look at the following snippet of mirror_run. This is a part the mirroring completion part. bdrv_drained_begin(bs); cnt = bdrv_get_dirty_count(s->dirty_bitmap); >> if (cnt > 0 || mirror_flush(s) < 0) { bdrv_drained_end(bs); continue; } (X) assert(QLIST_EMPTY(>tracked_requests)); mirror_flush here can yield the current coroutine so nothing more can be executed. We could end up with the situation when the main loop have to revolve to poll for another timer/bh to process. While revolving it releases the global lock. If the global lock is waited for by a vCPU (any other) thread, the waiting thread will get the lock and make what it intends. This is something that I can observe: mirror_flush yields coroutine, the main thread revolves and locks because a vCPU was waiting for the lock. Now the vCPU thread owns the lock and the main thread waits for the lock releasing. The vCPU thread does cmd_write_dma and releases the lock. Then, the main thread gets the lock and continues to run eventually proceeding with the coroutine yeiled. If the vCPU requests aren't completed by the moment we will assert at (X). If the vCPU requests are completed we won't even notice that we had some writes while in the drained section. Denis On 29.06.2018 15:40, Denis Plotnikov wrote: There are cases when a request to a block driver state shouldn't have appeared producing dangerous race conditions. This misbehaviour is usually happens with storage devices emulated without eventfd for guest to host notifications like IDE. The issue arises when the context is in the "drained" section and doesn't expect the request to come, but request comes from the device not using iothread and which context is processed by the main loop. The main loop apart of the iothread event loop isn't blocked by the "drained" section. The request coming and processing while in "drained" section can spoil the block driver state consistency. This behavior can be observed in the following KVM-based case: 1. Setup a VM with an IDE disk. 2. Inside a VM start a disk writing load for the IDE device e.g: dd if= of= bs=X count=Y oflag=direct 3. On the host create a mirroring block job for the IDE device e.g: drive_mirror 4. On the host finish the block job e.g: block_job_complete Having done the 4th action, you could get an assert: assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. On my setup, the assert is 1/3 reproducible. The patch series introduces the mechanism to postpone the requests until the BDS leaves "drained" section for the devices not using iothreads. Also, it modifies the asynchronous block backend infrastructure to use that mechanism to release the assert bug for IDE devices. Denis Plotnikov (2): async: add infrastructure for postponed actions block: postpone the coroutine executing if the BDS's is drained block/block-backend.c | 58 ++- include/block/aio.h | 63 +++ util/async.c | 33 +++ 3 files changed, 142 insertions(+), 12 deletions(-) -- Best, Denis -- Best, Denis
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
On 08/27/2018 03:05 AM, Denis Plotnikov wrote: > PING! PING! > Sorry, Kevin and Stefan are both on PTO right now, I think. I can't promise I have the time to look soon, but you at least deserve an answer for the radio silence the last week. --js > On 14.08.2018 10:08, Denis Plotnikov wrote: >> >> >> On 13.08.2018 19:30, Kevin Wolf wrote: >>> Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben: Ping ping! On 16.07.2018 21:59, John Snow wrote: > > > On 07/16/2018 11:01 AM, Denis Plotnikov wrote: >> Ping! >> > > I never saw a reply to Stefan's question on July 2nd, did you reply > off-list? > > --js Yes, I did. I talked to Stefan why the patch set appeared. >>> >>> The rest of us still don't know the answer. I had the same question. >>> >>> Kevin >> Yes, that's my fault. I should have post it earlier. >> >> I reviewed the problem once again and come up with the following >> explanation. >> Indeed, if the global lock has been taken by the main thread the vCPU >> threads won't be able to execute mmio ide. >> But, if the main thread will release the lock then nothing will prevent >> vCPU threads form execution what they want, e.g writing to the block >> device. >> >> In case of running the mirroring it is possible. Let's take a look >> at the following snippet of mirror_run. This is a part the mirroring >> completion part. >> >> bdrv_drained_begin(bs); >> cnt = bdrv_get_dirty_count(s->dirty_bitmap); >> >> if (cnt > 0 || mirror_flush(s) < 0) { >> bdrv_drained_end(bs); >> continue; >> } >> >> (X) assert(QLIST_EMPTY(>tracked_requests)); >> >> mirror_flush here can yield the current coroutine so nothing more can >> be executed. >> We could end up with the situation when the main loop have to revolve >> to poll for another timer/bh to process. While revolving it releases >> the global lock. If the global lock is waited for by a vCPU (any >> other) thread, the waiting thread will get the lock and make what it >> intends. >> >> This is something that I can observe: >> >> mirror_flush yields coroutine, the main thread revolves and locks >> because a vCPU was waiting for the lock. Now the vCPU thread owns the >> lock and the main thread waits for the lock releasing. >> The vCPU thread does cmd_write_dma and releases the lock. Then, the main >> thread gets the lock and continues to run eventually proceeding with >> the coroutine yeiled. >> If the vCPU requests aren't completed by the moment we will assert at >> (X). If the vCPU requests are completed we won't even notice that we >> had some writes while in the drained section. >> >> Denis >>> >> On 29.06.2018 15:40, Denis Plotnikov wrote: >>> There are cases when a request to a block driver state shouldn't >>> have >>> appeared producing dangerous race conditions. >>> This misbehaviour is usually happens with storage devices emulated >>> without eventfd for guest to host notifications like IDE. >>> >>> The issue arises when the context is in the "drained" section >>> and doesn't expect the request to come, but request comes from the >>> device not using iothread and which context is processed by the main >>> loop. >>> >>> The main loop apart of the iothread event loop isn't blocked by the >>> "drained" section. >>> The request coming and processing while in "drained" section can >>> spoil >>> the >>> block driver state consistency. >>> >>> This behavior can be observed in the following KVM-based case: >>> >>> 1. Setup a VM with an IDE disk. >>> 2. Inside a VM start a disk writing load for the IDE device >>> e.g: dd if= of= bs=X count=Y oflag=direct >>> 3. On the host create a mirroring block job for the IDE device >>> e.g: drive_mirror >>> 4. On the host finish the block job >>> e.g: block_job_complete >>> Having done the 4th action, you could get an assert: >>> assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. >>> On my setup, the assert is 1/3 reproducible. >>> >>> The patch series introduces the mechanism to postpone the requests >>> until the BDS leaves "drained" section for the devices not using >>> iothreads. >>> Also, it modifies the asynchronous block backend infrastructure >>> to use >>> that mechanism to release the assert bug for IDE devices. >>> >>> Denis Plotnikov (2): >>> async: add infrastructure for postponed actions >>> block: postpone the coroutine executing if the BDS's is drained >>> >>> block/block-backend.c | 58 >>> ++- >>> include/block/aio.h | 63 >>> +++ >>> util/async.c | 33 +++ >>> 3 files changed, 142 insertions(+), 12 deletions(-)
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
PING! PING! On 14.08.2018 10:08, Denis Plotnikov wrote: On 13.08.2018 19:30, Kevin Wolf wrote: Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben: Ping ping! On 16.07.2018 21:59, John Snow wrote: On 07/16/2018 11:01 AM, Denis Plotnikov wrote: Ping! I never saw a reply to Stefan's question on July 2nd, did you reply off-list? --js Yes, I did. I talked to Stefan why the patch set appeared. The rest of us still don't know the answer. I had the same question. Kevin Yes, that's my fault. I should have post it earlier. I reviewed the problem once again and come up with the following explanation. Indeed, if the global lock has been taken by the main thread the vCPU threads won't be able to execute mmio ide. But, if the main thread will release the lock then nothing will prevent vCPU threads form execution what they want, e.g writing to the block device. In case of running the mirroring it is possible. Let's take a look at the following snippet of mirror_run. This is a part the mirroring completion part. bdrv_drained_begin(bs); cnt = bdrv_get_dirty_count(s->dirty_bitmap); >> if (cnt > 0 || mirror_flush(s) < 0) { bdrv_drained_end(bs); continue; } (X) assert(QLIST_EMPTY(>tracked_requests)); mirror_flush here can yield the current coroutine so nothing more can be executed. We could end up with the situation when the main loop have to revolve to poll for another timer/bh to process. While revolving it releases the global lock. If the global lock is waited for by a vCPU (any other) thread, the waiting thread will get the lock and make what it intends. This is something that I can observe: mirror_flush yields coroutine, the main thread revolves and locks because a vCPU was waiting for the lock. Now the vCPU thread owns the lock and the main thread waits for the lock releasing. The vCPU thread does cmd_write_dma and releases the lock. Then, the main thread gets the lock and continues to run eventually proceeding with the coroutine yeiled. If the vCPU requests aren't completed by the moment we will assert at (X). If the vCPU requests are completed we won't even notice that we had some writes while in the drained section. Denis On 29.06.2018 15:40, Denis Plotnikov wrote: There are cases when a request to a block driver state shouldn't have appeared producing dangerous race conditions. This misbehaviour is usually happens with storage devices emulated without eventfd for guest to host notifications like IDE. The issue arises when the context is in the "drained" section and doesn't expect the request to come, but request comes from the device not using iothread and which context is processed by the main loop. The main loop apart of the iothread event loop isn't blocked by the "drained" section. The request coming and processing while in "drained" section can spoil the block driver state consistency. This behavior can be observed in the following KVM-based case: 1. Setup a VM with an IDE disk. 2. Inside a VM start a disk writing load for the IDE device e.g: dd if= of= bs=X count=Y oflag=direct 3. On the host create a mirroring block job for the IDE device e.g: drive_mirror 4. On the host finish the block job e.g: block_job_complete Having done the 4th action, you could get an assert: assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. On my setup, the assert is 1/3 reproducible. The patch series introduces the mechanism to postpone the requests until the BDS leaves "drained" section for the devices not using iothreads. Also, it modifies the asynchronous block backend infrastructure to use that mechanism to release the assert bug for IDE devices. Denis Plotnikov (2): async: add infrastructure for postponed actions block: postpone the coroutine executing if the BDS's is drained block/block-backend.c | 58 ++- include/block/aio.h | 63 +++ util/async.c | 33 +++ 3 files changed, 142 insertions(+), 12 deletions(-) -- Best, Denis -- Best, Denis
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
ping ping! On 14.08.2018 10:08, Denis Plotnikov wrote: On 13.08.2018 19:30, Kevin Wolf wrote: Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben: Ping ping! On 16.07.2018 21:59, John Snow wrote: On 07/16/2018 11:01 AM, Denis Plotnikov wrote: Ping! I never saw a reply to Stefan's question on July 2nd, did you reply off-list? --js Yes, I did. I talked to Stefan why the patch set appeared. The rest of us still don't know the answer. I had the same question. Kevin Yes, that's my fault. I should have post it earlier. I reviewed the problem once again and come up with the following explanation. Indeed, if the global lock has been taken by the main thread the vCPU threads won't be able to execute mmio ide. But, if the main thread will release the lock then nothing will prevent vCPU threads form execution what they want, e.g writing to the block device. In case of running the mirroring it is possible. Let's take a look at the following snippet of mirror_run. This is a part the mirroring completion part. bdrv_drained_begin(bs); cnt = bdrv_get_dirty_count(s->dirty_bitmap); >> if (cnt > 0 || mirror_flush(s) < 0) { bdrv_drained_end(bs); continue; } (X) assert(QLIST_EMPTY(>tracked_requests)); mirror_flush here can yield the current coroutine so nothing more can be executed. We could end up with the situation when the main loop have to revolve to poll for another timer/bh to process. While revolving it releases the global lock. If the global lock is waited for by a vCPU (any other) thread, the waiting thread will get the lock and make what it intends. This is something that I can observe: mirror_flush yields coroutine, the main thread revolves and locks because a vCPU was waiting for the lock. Now the vCPU thread owns the lock and the main thread waits for the lock releasing. The vCPU thread does cmd_write_dma and releases the lock. Then, the main thread gets the lock and continues to run eventually proceeding with the coroutine yeiled. If the vCPU requests aren't completed by the moment we will assert at (X). If the vCPU requests are completed we won't even notice that we had some writes while in the drained section. Denis On 29.06.2018 15:40, Denis Plotnikov wrote: There are cases when a request to a block driver state shouldn't have appeared producing dangerous race conditions. This misbehaviour is usually happens with storage devices emulated without eventfd for guest to host notifications like IDE. The issue arises when the context is in the "drained" section and doesn't expect the request to come, but request comes from the device not using iothread and which context is processed by the main loop. The main loop apart of the iothread event loop isn't blocked by the "drained" section. The request coming and processing while in "drained" section can spoil the block driver state consistency. This behavior can be observed in the following KVM-based case: 1. Setup a VM with an IDE disk. 2. Inside a VM start a disk writing load for the IDE device e.g: dd if= of= bs=X count=Y oflag=direct 3. On the host create a mirroring block job for the IDE device e.g: drive_mirror 4. On the host finish the block job e.g: block_job_complete Having done the 4th action, you could get an assert: assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. On my setup, the assert is 1/3 reproducible. The patch series introduces the mechanism to postpone the requests until the BDS leaves "drained" section for the devices not using iothreads. Also, it modifies the asynchronous block backend infrastructure to use that mechanism to release the assert bug for IDE devices. Denis Plotnikov (2): async: add infrastructure for postponed actions block: postpone the coroutine executing if the BDS's is drained block/block-backend.c | 58 ++- include/block/aio.h | 63 +++ util/async.c | 33 +++ 3 files changed, 142 insertions(+), 12 deletions(-) -- Best, Denis -- Best, Denis
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
ping ping! On 14.08.2018 10:08, Denis Plotnikov wrote: On 13.08.2018 19:30, Kevin Wolf wrote: Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben: Ping ping! On 16.07.2018 21:59, John Snow wrote: On 07/16/2018 11:01 AM, Denis Plotnikov wrote: Ping! I never saw a reply to Stefan's question on July 2nd, did you reply off-list? --js Yes, I did. I talked to Stefan why the patch set appeared. The rest of us still don't know the answer. I had the same question. Kevin Yes, that's my fault. I should have post it earlier. I reviewed the problem once again and come up with the following explanation. Indeed, if the global lock has been taken by the main thread the vCPU threads won't be able to execute mmio ide. But, if the main thread will release the lock then nothing will prevent vCPU threads form execution what they want, e.g writing to the block device. In case of running the mirroring it is possible. Let's take a look at the following snippet of mirror_run. This is a part the mirroring completion part. bdrv_drained_begin(bs); cnt = bdrv_get_dirty_count(s->dirty_bitmap); >> if (cnt > 0 || mirror_flush(s) < 0) { bdrv_drained_end(bs); continue; } (X) assert(QLIST_EMPTY(>tracked_requests)); mirror_flush here can yield the current coroutine so nothing more can be executed. We could end up with the situation when the main loop have to revolve to poll for another timer/bh to process. While revolving it releases the global lock. If the global lock is waited for by a vCPU (any other) thread, the waiting thread will get the lock and make what it intends. This is something that I can observe: mirror_flush yields coroutine, the main thread revolves and locks because a vCPU was waiting for the lock. Now the vCPU thread owns the lock and the main thread waits for the lock releasing. The vCPU thread does cmd_write_dma and releases the lock. Then, the main thread gets the lock and continues to run eventually proceeding with the coroutine yeiled. If the vCPU requests aren't completed by the moment we will assert at (X). If the vCPU requests are completed we won't even notice that we had some writes while in the drained section. Denis On 29.06.2018 15:40, Denis Plotnikov wrote: There are cases when a request to a block driver state shouldn't have appeared producing dangerous race conditions. This misbehaviour is usually happens with storage devices emulated without eventfd for guest to host notifications like IDE. The issue arises when the context is in the "drained" section and doesn't expect the request to come, but request comes from the device not using iothread and which context is processed by the main loop. The main loop apart of the iothread event loop isn't blocked by the "drained" section. The request coming and processing while in "drained" section can spoil the block driver state consistency. This behavior can be observed in the following KVM-based case: 1. Setup a VM with an IDE disk. 2. Inside a VM start a disk writing load for the IDE device e.g: dd if= of= bs=X count=Y oflag=direct 3. On the host create a mirroring block job for the IDE device e.g: drive_mirror 4. On the host finish the block job e.g: block_job_complete Having done the 4th action, you could get an assert: assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. On my setup, the assert is 1/3 reproducible. The patch series introduces the mechanism to postpone the requests until the BDS leaves "drained" section for the devices not using iothreads. Also, it modifies the asynchronous block backend infrastructure to use that mechanism to release the assert bug for IDE devices. Denis Plotnikov (2): async: add infrastructure for postponed actions block: postpone the coroutine executing if the BDS's is drained block/block-backend.c | 58 ++- include/block/aio.h | 63 +++ util/async.c | 33 +++ 3 files changed, 142 insertions(+), 12 deletions(-) -- Best, Denis -- Best, Denis
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
On 13.08.2018 19:30, Kevin Wolf wrote: Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben: Ping ping! On 16.07.2018 21:59, John Snow wrote: On 07/16/2018 11:01 AM, Denis Plotnikov wrote: Ping! I never saw a reply to Stefan's question on July 2nd, did you reply off-list? --js Yes, I did. I talked to Stefan why the patch set appeared. The rest of us still don't know the answer. I had the same question. Kevin Yes, that's my fault. I should have post it earlier. I reviewed the problem once again and come up with the following explanation. Indeed, if the global lock has been taken by the main thread the vCPU threads won't be able to execute mmio ide. But, if the main thread will release the lock then nothing will prevent vCPU threads form execution what they want, e.g writing to the block device. In case of running the mirroring it is possible. Let's take a look at the following snippet of mirror_run. This is a part the mirroring completion part. bdrv_drained_begin(bs); cnt = bdrv_get_dirty_count(s->dirty_bitmap); >> if (cnt > 0 || mirror_flush(s) < 0) { bdrv_drained_end(bs); continue; } (X) assert(QLIST_EMPTY(>tracked_requests)); mirror_flush here can yield the current coroutine so nothing more can be executed. We could end up with the situation when the main loop have to revolve to poll for another timer/bh to process. While revolving it releases the global lock. If the global lock is waited for by a vCPU (any other) thread, the waiting thread will get the lock and make what it intends. This is something that I can observe: mirror_flush yields coroutine, the main thread revolves and locks because a vCPU was waiting for the lock. Now the vCPU thread owns the lock and the main thread waits for the lock releasing. The vCPU thread does cmd_write_dma and releases the lock. Then, the main thread gets the lock and continues to run eventually proceeding with the coroutine yeiled. If the vCPU requests aren't completed by the moment we will assert at (X). If the vCPU requests are completed we won't even notice that we had some writes while in the drained section. Denis On 29.06.2018 15:40, Denis Plotnikov wrote: There are cases when a request to a block driver state shouldn't have appeared producing dangerous race conditions. This misbehaviour is usually happens with storage devices emulated without eventfd for guest to host notifications like IDE. The issue arises when the context is in the "drained" section and doesn't expect the request to come, but request comes from the device not using iothread and which context is processed by the main loop. The main loop apart of the iothread event loop isn't blocked by the "drained" section. The request coming and processing while in "drained" section can spoil the block driver state consistency. This behavior can be observed in the following KVM-based case: 1. Setup a VM with an IDE disk. 2. Inside a VM start a disk writing load for the IDE device e.g: dd if= of= bs=X count=Y oflag=direct 3. On the host create a mirroring block job for the IDE device e.g: drive_mirror 4. On the host finish the block job e.g: block_job_complete Having done the 4th action, you could get an assert: assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. On my setup, the assert is 1/3 reproducible. The patch series introduces the mechanism to postpone the requests until the BDS leaves "drained" section for the devices not using iothreads. Also, it modifies the asynchronous block backend infrastructure to use that mechanism to release the assert bug for IDE devices. Denis Plotnikov (2): async: add infrastructure for postponed actions block: postpone the coroutine executing if the BDS's is drained block/block-backend.c | 58 ++- include/block/aio.h | 63 +++ util/async.c | 33 +++ 3 files changed, 142 insertions(+), 12 deletions(-) -- Best, Denis -- Best, Denis
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben: > Ping ping! > > On 16.07.2018 21:59, John Snow wrote: > > > > > > On 07/16/2018 11:01 AM, Denis Plotnikov wrote: > > > Ping! > > > > > > > I never saw a reply to Stefan's question on July 2nd, did you reply > > off-list? > > > > --js > Yes, I did. I talked to Stefan why the patch set appeared. The rest of us still don't know the answer. I had the same question. Kevin > > > On 29.06.2018 15:40, Denis Plotnikov wrote: > > > > There are cases when a request to a block driver state shouldn't have > > > > appeared producing dangerous race conditions. > > > > This misbehaviour is usually happens with storage devices emulated > > > > without eventfd for guest to host notifications like IDE. > > > > > > > > The issue arises when the context is in the "drained" section > > > > and doesn't expect the request to come, but request comes from the > > > > device not using iothread and which context is processed by the main > > > > loop. > > > > > > > > The main loop apart of the iothread event loop isn't blocked by the > > > > "drained" section. > > > > The request coming and processing while in "drained" section can spoil > > > > the > > > > block driver state consistency. > > > > > > > > This behavior can be observed in the following KVM-based case: > > > > > > > > 1. Setup a VM with an IDE disk. > > > > 2. Inside a VM start a disk writing load for the IDE device > > > > e.g: dd if= of= bs=X count=Y oflag=direct > > > > 3. On the host create a mirroring block job for the IDE device > > > > e.g: drive_mirror > > > > 4. On the host finish the block job > > > > e.g: block_job_complete > > > > Having done the 4th action, you could get an assert: > > > > assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. > > > > On my setup, the assert is 1/3 reproducible. > > > > > > > > The patch series introduces the mechanism to postpone the requests > > > > until the BDS leaves "drained" section for the devices not using > > > > iothreads. > > > > Also, it modifies the asynchronous block backend infrastructure to use > > > > that mechanism to release the assert bug for IDE devices. > > > > > > > > Denis Plotnikov (2): > > > > async: add infrastructure for postponed actions > > > > block: postpone the coroutine executing if the BDS's is drained > > > > > > > > block/block-backend.c | 58 ++- > > > > include/block/aio.h | 63 > > > > +++ > > > > util/async.c | 33 +++ > > > > 3 files changed, 142 insertions(+), 12 deletions(-) > > > > > > > > > -- > Best, > Denis
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
Ping ping! On 16.07.2018 21:59, John Snow wrote: On 07/16/2018 11:01 AM, Denis Plotnikov wrote: Ping! I never saw a reply to Stefan's question on July 2nd, did you reply off-list? --js Yes, I did. I talked to Stefan why the patch set appeared. On 29.06.2018 15:40, Denis Plotnikov wrote: There are cases when a request to a block driver state shouldn't have appeared producing dangerous race conditions. This misbehaviour is usually happens with storage devices emulated without eventfd for guest to host notifications like IDE. The issue arises when the context is in the "drained" section and doesn't expect the request to come, but request comes from the device not using iothread and which context is processed by the main loop. The main loop apart of the iothread event loop isn't blocked by the "drained" section. The request coming and processing while in "drained" section can spoil the block driver state consistency. This behavior can be observed in the following KVM-based case: 1. Setup a VM with an IDE disk. 2. Inside a VM start a disk writing load for the IDE device e.g: dd if= of= bs=X count=Y oflag=direct 3. On the host create a mirroring block job for the IDE device e.g: drive_mirror 4. On the host finish the block job e.g: block_job_complete Having done the 4th action, you could get an assert: assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. On my setup, the assert is 1/3 reproducible. The patch series introduces the mechanism to postpone the requests until the BDS leaves "drained" section for the devices not using iothreads. Also, it modifies the asynchronous block backend infrastructure to use that mechanism to release the assert bug for IDE devices. Denis Plotnikov (2): async: add infrastructure for postponed actions block: postpone the coroutine executing if the BDS's is drained block/block-backend.c | 58 ++- include/block/aio.h | 63 +++ util/async.c | 33 +++ 3 files changed, 142 insertions(+), 12 deletions(-) -- Best, Denis
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
On Mon, Jul 02, 2018 at 04:18:43PM +0100, Stefan Hajnoczi wrote: > On Fri, Jun 29, 2018 at 03:40:50PM +0300, Denis Plotnikov wrote: > > There are cases when a request to a block driver state shouldn't have > > appeared producing dangerous race conditions. > > This misbehaviour is usually happens with storage devices emulated > > without eventfd for guest to host notifications like IDE. > > > > The issue arises when the context is in the "drained" section > > and doesn't expect the request to come, but request comes from the > > device not using iothread and which context is processed by the main loop. > > > > The main loop apart of the iothread event loop isn't blocked by the > > "drained" section. > > The request coming and processing while in "drained" section can spoil the > > block driver state consistency. > > > > This behavior can be observed in the following KVM-based case: > > > > 1. Setup a VM with an IDE disk. > > 2. Inside a VM start a disk writing load for the IDE device > > e.g: dd if= of= bs=X count=Y oflag=direct > > 3. On the host create a mirroring block job for the IDE device > > e.g: drive_mirror > > 4. On the host finish the block job > > e.g: block_job_complete > > > > Having done the 4th action, you could get an assert: > > assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. > > On my setup, the assert is 1/3 reproducible. > > > > The patch series introduces the mechanism to postpone the requests > > until the BDS leaves "drained" section for the devices not using iothreads. > > Also, it modifies the asynchronous block backend infrastructure to use > > that mechanism to release the assert bug for IDE devices. > > I don't understand the scenario. IDE emulation runs in the vcpu and > main loop threads. These threads hold the global mutex when executing > QEMU code. If thread A is in a drained region with the global mutex, > then thread B cannot run QEMU code since it would need to global mutex. > > So I guess the problem is not that thread B will submit new requests, > but maybe that the IDE DMA code will run a completion in thread A and > submit another request in the drained region? Ping! :) Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
On 16.07.2018 21:59, John Snow wrote: On 07/16/2018 11:01 AM, Denis Plotnikov wrote: Ping! I never saw a reply to Stefan's question on July 2nd, did you reply off-list? For some reason, there are no Stefan's replies on my server. Found it in the web. Will respond to it shortly. Thanks! Denis --js On 29.06.2018 15:40, Denis Plotnikov wrote: There are cases when a request to a block driver state shouldn't have appeared producing dangerous race conditions. This misbehaviour is usually happens with storage devices emulated without eventfd for guest to host notifications like IDE. The issue arises when the context is in the "drained" section and doesn't expect the request to come, but request comes from the device not using iothread and which context is processed by the main loop. The main loop apart of the iothread event loop isn't blocked by the "drained" section. The request coming and processing while in "drained" section can spoil the block driver state consistency. This behavior can be observed in the following KVM-based case: 1. Setup a VM with an IDE disk. 2. Inside a VM start a disk writing load for the IDE device e.g: dd if= of= bs=X count=Y oflag=direct 3. On the host create a mirroring block job for the IDE device e.g: drive_mirror 4. On the host finish the block job e.g: block_job_complete Having done the 4th action, you could get an assert: assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. On my setup, the assert is 1/3 reproducible. The patch series introduces the mechanism to postpone the requests until the BDS leaves "drained" section for the devices not using iothreads. Also, it modifies the asynchronous block backend infrastructure to use that mechanism to release the assert bug for IDE devices. Denis Plotnikov (2): async: add infrastructure for postponed actions block: postpone the coroutine executing if the BDS's is drained block/block-backend.c | 58 ++- include/block/aio.h | 63 +++ util/async.c | 33 +++ 3 files changed, 142 insertions(+), 12 deletions(-) -- Best, Denis
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
On 07/16/2018 11:01 AM, Denis Plotnikov wrote: > Ping! > I never saw a reply to Stefan's question on July 2nd, did you reply off-list? --js > On 29.06.2018 15:40, Denis Plotnikov wrote: >> There are cases when a request to a block driver state shouldn't have >> appeared producing dangerous race conditions. >> This misbehaviour is usually happens with storage devices emulated >> without eventfd for guest to host notifications like IDE. >> >> The issue arises when the context is in the "drained" section >> and doesn't expect the request to come, but request comes from the >> device not using iothread and which context is processed by the main >> loop. >> >> The main loop apart of the iothread event loop isn't blocked by the >> "drained" section. >> The request coming and processing while in "drained" section can spoil >> the >> block driver state consistency. >> >> This behavior can be observed in the following KVM-based case: >> >> 1. Setup a VM with an IDE disk. >> 2. Inside a VM start a disk writing load for the IDE device >> e.g: dd if= of= bs=X count=Y oflag=direct >> 3. On the host create a mirroring block job for the IDE device >> e.g: drive_mirror >> 4. On the host finish the block job >> e.g: block_job_complete >> Having done the 4th action, you could get an assert: >> assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. >> On my setup, the assert is 1/3 reproducible. >> >> The patch series introduces the mechanism to postpone the requests >> until the BDS leaves "drained" section for the devices not using >> iothreads. >> Also, it modifies the asynchronous block backend infrastructure to use >> that mechanism to release the assert bug for IDE devices. >> >> Denis Plotnikov (2): >> async: add infrastructure for postponed actions >> block: postpone the coroutine executing if the BDS's is drained >> >> block/block-backend.c | 58 ++- >> include/block/aio.h | 63 +++ >> util/async.c | 33 +++ >> 3 files changed, 142 insertions(+), 12 deletions(-) >> >
Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
On Fri, Jun 29, 2018 at 03:40:50PM +0300, Denis Plotnikov wrote: > There are cases when a request to a block driver state shouldn't have > appeared producing dangerous race conditions. > This misbehaviour is usually happens with storage devices emulated > without eventfd for guest to host notifications like IDE. > > The issue arises when the context is in the "drained" section > and doesn't expect the request to come, but request comes from the > device not using iothread and which context is processed by the main loop. > > The main loop apart of the iothread event loop isn't blocked by the > "drained" section. > The request coming and processing while in "drained" section can spoil the > block driver state consistency. > > This behavior can be observed in the following KVM-based case: > > 1. Setup a VM with an IDE disk. > 2. Inside a VM start a disk writing load for the IDE device > e.g: dd if= of= bs=X count=Y oflag=direct > 3. On the host create a mirroring block job for the IDE device > e.g: drive_mirror > 4. On the host finish the block job > e.g: block_job_complete > > Having done the 4th action, you could get an assert: > assert(QLIST_EMPTY(>tracked_requests)) from mirror_run. > On my setup, the assert is 1/3 reproducible. > > The patch series introduces the mechanism to postpone the requests > until the BDS leaves "drained" section for the devices not using iothreads. > Also, it modifies the asynchronous block backend infrastructure to use > that mechanism to release the assert bug for IDE devices. I don't understand the scenario. IDE emulation runs in the vcpu and main loop threads. These threads hold the global mutex when executing QEMU code. If thread A is in a drained region with the global mutex, then thread B cannot run QEMU code since it would need to global mutex. So I guess the problem is not that thread B will submit new requests, but maybe that the IDE DMA code will run a completion in thread A and submit another request in the drained region? Stefan signature.asc Description: PGP signature