Re: [Qemu-devel] libiscsi task cancellation
> On 8 Feb 2018, at 15:08, Stefan Hajnoczi wrote: > > On Thu, Feb 8, 2018 at 2:14 PM, Paolo Bonzini wrote: >> On 08/02/2018 15:08, Stefan Hajnoczi wrote: >>> Now on to libiscsi: >>> >>> The iscsi_task_mgmt_async() API documentation says: >>> >>> * abort_task will also cancel the scsi task. The callback for the >>> scsi task will be invoked with >>> *SCSI_STATUS_CANCELLED >>> >>> I see that the ABORT TASK TMF response invokes the user's >>> iscsi_task_mgmt_async() callback but not the command callback. I'm >>> not sure how the command callback is invoked with >>> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send >>> that response. >>> >>> Is libiscsi honoring its iscsi_task_mgmt_async() contract? >> >> No, and QEMU is assuming the "wrong" behavior: >> >> static void >> iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void >> *command_data, >>void *private_data) >> { >>IscsiAIOCB *acb = private_data; >> >>acb->status = -ECANCELED; >>iscsi_schedule_bh(acb); >> } > > FWIW My recent use-after-free fix series is still pending review: > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gnu.org_archive_html_qemu-2Ddevel_2018-2D02_msg00570.html&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=CCrJKVC5zGot8PrnI-iYe00MdX4pgdQfMRigp14Ptmk&m=E9W0Q_a83TeT4c8PEn--G-Dk9dZ5UGO2CZRZzal0Sak&s=BC2keaBG9mG3yYI_IYPwAoA3_I69KmPZordXnDguF0I&e= > > It changes the code and fixes a QEMU bug, but it doesn't address the > iscsi_task_mgmt_async() issue we're discussing now. > > I think users *can* work around the iscsi_task_mgmt_async() issue as follows: > > If the command callback has not been invoked yet when the ABORT TASK > TMF callback is invoked, call iscsi_scsi_cancel_task() so that > libiscsi invokes the command callback with SCSI_STATUS_CANCELLED and > frees the PDU. > > Here is how that looks in QEMU: > > diff --git a/block/iscsi.c b/block/iscsi.c > index 8140baac15..7b4b43dc55 100644 > --- a/block/iscsi.c > +++ b/block/iscsi.c > @@ -290,8 +290,12 @@ iscsi_abort_task_cb(struct iscsi_context *iscsi, > int status, void *command_data, > { > IscsiAIOCB *acb = private_data; > > -acb->status = -ECANCELED; > -iscsi_schedule_bh(acb); > +/* If the command callback hasn't been called yet, drop the task */ > +if (!acb->bh) { > +/* Call iscsi_aio_ioctl_cb() with SCSI_STATUS_CANCELLED */ > +iscsi_scsi_cancel_task(iscsi, acb->task); > +} > + > qemu_aio_unref(acb); /* acquired in iscsi_aio_cancel() */ > } Reviewed-by: Felipe Franciosi Tested-by: Sreejith Mohanan We've been over this and Sreejith confirmed this fixes the issue we are seeing. Nevertheless, I think we should continue this conversation to clarify what are the expectations with regards to requests that may still be inflight. F.
Re: [Qemu-devel] libiscsi task cancellation
> On 8 Feb 2018, at 14:58, Hannes Reinecke wrote: > > On 02/08/2018 03:14 PM, Paolo Bonzini wrote: >> On 08/02/2018 15:08, Stefan Hajnoczi wrote: >>> Now on to libiscsi: >>> >>> The iscsi_task_mgmt_async() API documentation says: >>> >>> * abort_task will also cancel the scsi task. The callback for the >>> scsi task will be invoked with >>> *SCSI_STATUS_CANCELLED >>> >>> I see that the ABORT TASK TMF response invokes the user's >>> iscsi_task_mgmt_async() callback but not the command callback. I'm >>> not sure how the command callback is invoked with >>> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send >>> that response. >>> >>> Is libiscsi honoring its iscsi_task_mgmt_async() contract? >> >> No, and QEMU is assuming the "wrong" behavior: >> >> static void >> iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void >> *command_data, >>void *private_data) >> { >>IscsiAIOCB *acb = private_data; >> >>acb->status = -ECANCELED; >>iscsi_schedule_bh(acb); >> } >> > The definition of ABORT TASK TMF in SAM is pretty much useless. > To quote: > > A response of FUNCTION COMPLETE shall indicate that the task was aborted > or was not in the task set. > > IE we have no idea if we ever managed to abort the task; if the task had > been in-flight by the time we've send the TMF we'll be getting a > FUNCTION COMPLETE, too. Why is that a problem? I am under the impression that drivers can cope with that. You can complete the TMF after the original request completed (successfully or not). > So most FC HBA firmware implement the abort task with just a blacklist; > the TMF will be returned immediately and the command response will be > dropped if and when is arrives. That sounds very dangerous, but maybe it's safe for FC (I don't know details about it). What happens if you complete the TMF immediately and the OS believes the IO (eg. a write) has been aborted, then issue another write for the same LBA and that completes successfully before the original request (which is still outstanding)? In our implementation, we sit on the TMF and wait for the original request to either complete or abort. Only then we respond the TMF to the issuing OS. Cheers, Felipe
Re: [Qemu-devel] libiscsi task cancellation
On Thu, Feb 8, 2018 at 3:08 PM, Stefan Hajnoczi wrote: > On Thu, Feb 8, 2018 at 2:14 PM, Paolo Bonzini wrote: >> On 08/02/2018 15:08, Stefan Hajnoczi wrote: > I think users *can* work around the iscsi_task_mgmt_async() issue as follows: > > If the command callback has not been invoked yet when the ABORT TASK > TMF callback is invoked, call iscsi_scsi_cancel_task() so that > libiscsi invokes the command callback with SCSI_STATUS_CANCELLED and > frees the PDU. > > Here is how that looks in QEMU: > > diff --git a/block/iscsi.c b/block/iscsi.c > index 8140baac15..7b4b43dc55 100644 > --- a/block/iscsi.c > +++ b/block/iscsi.c > @@ -290,8 +290,12 @@ iscsi_abort_task_cb(struct iscsi_context *iscsi, > int status, void *command_data, > { > IscsiAIOCB *acb = private_data; > > -acb->status = -ECANCELED; > -iscsi_schedule_bh(acb); > +/* If the command callback hasn't been called yet, drop the task */ > +if (!acb->bh) { > +/* Call iscsi_aio_ioctl_cb() with SCSI_STATUS_CANCELLED */ > +iscsi_scsi_cancel_task(iscsi, acb->task); > +} > + > qemu_aio_unref(acb); /* acquired in iscsi_aio_cancel() */ > } By the way, this is on top of https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg00570.html. Stefan
Re: [Qemu-devel] libiscsi task cancellation
On Thu, Feb 8, 2018 at 2:58 PM, Hannes Reinecke wrote: > On 02/08/2018 03:14 PM, Paolo Bonzini wrote: >> On 08/02/2018 15:08, Stefan Hajnoczi wrote: >>> Now on to libiscsi: >>> >>> The iscsi_task_mgmt_async() API documentation says: >>> >>> * abort_task will also cancel the scsi task. The callback for the >>> scsi task will be invoked with >>> *SCSI_STATUS_CANCELLED >>> >>> I see that the ABORT TASK TMF response invokes the user's >>> iscsi_task_mgmt_async() callback but not the command callback. I'm >>> not sure how the command callback is invoked with >>> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send >>> that response. >>> >>> Is libiscsi honoring its iscsi_task_mgmt_async() contract? >> >> No, and QEMU is assuming the "wrong" behavior: >> >> static void >> iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void >> *command_data, >> void *private_data) >> { >> IscsiAIOCB *acb = private_data; >> >> acb->status = -ECANCELED; >> iscsi_schedule_bh(acb); >> } >> > The definition of ABORT TASK TMF in SAM is pretty much useless. > To quote: > > A response of FUNCTION COMPLETE shall indicate that the task was aborted > or was not in the task set. > > IE we have no idea if we ever managed to abort the task; if the task had > been in-flight by the time we've send the TMF we'll be getting a > FUNCTION COMPLETE, too. > > So most FC HBA firmware implement the abort task with just a blacklist; > the TMF will be returned immediately and the command response will be > dropped if and when is arrives. Great, thanks for confirming. Stefan
Re: [Qemu-devel] libiscsi task cancellation
On Thu, Feb 8, 2018 at 2:14 PM, Paolo Bonzini wrote: > On 08/02/2018 15:08, Stefan Hajnoczi wrote: >> Now on to libiscsi: >> >> The iscsi_task_mgmt_async() API documentation says: >> >> * abort_task will also cancel the scsi task. The callback for the >> scsi task will be invoked with >> *SCSI_STATUS_CANCELLED >> >> I see that the ABORT TASK TMF response invokes the user's >> iscsi_task_mgmt_async() callback but not the command callback. I'm >> not sure how the command callback is invoked with >> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send >> that response. >> >> Is libiscsi honoring its iscsi_task_mgmt_async() contract? > > No, and QEMU is assuming the "wrong" behavior: > > static void > iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void > *command_data, > void *private_data) > { > IscsiAIOCB *acb = private_data; > > acb->status = -ECANCELED; > iscsi_schedule_bh(acb); > } FWIW My recent use-after-free fix series is still pending review: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg00570.html It changes the code and fixes a QEMU bug, but it doesn't address the iscsi_task_mgmt_async() issue we're discussing now. I think users *can* work around the iscsi_task_mgmt_async() issue as follows: If the command callback has not been invoked yet when the ABORT TASK TMF callback is invoked, call iscsi_scsi_cancel_task() so that libiscsi invokes the command callback with SCSI_STATUS_CANCELLED and frees the PDU. Here is how that looks in QEMU: diff --git a/block/iscsi.c b/block/iscsi.c index 8140baac15..7b4b43dc55 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -290,8 +290,12 @@ iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data, { IscsiAIOCB *acb = private_data; -acb->status = -ECANCELED; -iscsi_schedule_bh(acb); +/* If the command callback hasn't been called yet, drop the task */ +if (!acb->bh) { +/* Call iscsi_aio_ioctl_cb() with SCSI_STATUS_CANCELLED */ +iscsi_scsi_cancel_task(iscsi, acb->task); +} + qemu_aio_unref(acb); /* acquired in iscsi_aio_cancel() */ }
Re: [Qemu-devel] libiscsi task cancellation
On 02/08/2018 03:14 PM, Paolo Bonzini wrote: > On 08/02/2018 15:08, Stefan Hajnoczi wrote: >> Now on to libiscsi: >> >> The iscsi_task_mgmt_async() API documentation says: >> >> * abort_task will also cancel the scsi task. The callback for the >> scsi task will be invoked with >> *SCSI_STATUS_CANCELLED >> >> I see that the ABORT TASK TMF response invokes the user's >> iscsi_task_mgmt_async() callback but not the command callback. I'm >> not sure how the command callback is invoked with >> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send >> that response. >> >> Is libiscsi honoring its iscsi_task_mgmt_async() contract? > > No, and QEMU is assuming the "wrong" behavior: > > static void > iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void > *command_data, > void *private_data) > { > IscsiAIOCB *acb = private_data; > > acb->status = -ECANCELED; > iscsi_schedule_bh(acb); > } > The definition of ABORT TASK TMF in SAM is pretty much useless. To quote: A response of FUNCTION COMPLETE shall indicate that the task was aborted or was not in the task set. IE we have no idea if we ever managed to abort the task; if the task had been in-flight by the time we've send the TMF we'll be getting a FUNCTION COMPLETE, too. So most FC HBA firmware implement the abort task with just a blacklist; the TMF will be returned immediately and the command response will be dropped if and when is arrives. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [Qemu-devel] libiscsi task cancellation
On 08/02/2018 15:08, Stefan Hajnoczi wrote: > Now on to libiscsi: > > The iscsi_task_mgmt_async() API documentation says: > > * abort_task will also cancel the scsi task. The callback for the > scsi task will be invoked with > *SCSI_STATUS_CANCELLED > > I see that the ABORT TASK TMF response invokes the user's > iscsi_task_mgmt_async() callback but not the command callback. I'm > not sure how the command callback is invoked with > SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send > that response. > > Is libiscsi honoring its iscsi_task_mgmt_async() contract? No, and QEMU is assuming the "wrong" behavior: static void iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data, void *private_data) { IscsiAIOCB *acb = private_data; acb->status = -ECANCELED; iscsi_schedule_bh(acb); } Paolo