Re: [Qemu-devel] libiscsi task cancellation

2018-02-14 Thread Felipe Franciosi

> On 8 Feb 2018, at 15:08, Stefan Hajnoczi  wrote:
> 
> On Thu, Feb 8, 2018 at 2:14 PM, Paolo Bonzini  wrote:
>> On 08/02/2018 15:08, Stefan Hajnoczi wrote:
>>> Now on to libiscsi:
>>> 
>>> The iscsi_task_mgmt_async() API documentation says:
>>> 
>>>  * abort_task will also cancel the scsi task. The callback for the
>>> scsi task will be invoked with
>>>  *SCSI_STATUS_CANCELLED
>>> 
>>> I see that the ABORT TASK TMF response invokes the user's
>>> iscsi_task_mgmt_async() callback but not the command callback.  I'm
>>> not sure how the command callback is invoked with
>>> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send
>>> that response.
>>> 
>>> Is libiscsi honoring its iscsi_task_mgmt_async() contract?
>> 
>> No, and QEMU is assuming the "wrong" behavior:
>> 
>> static void
>> iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void 
>> *command_data,
>>void *private_data)
>> {
>>IscsiAIOCB *acb = private_data;
>> 
>>acb->status = -ECANCELED;
>>iscsi_schedule_bh(acb);
>> }
> 
> FWIW My recent use-after-free fix series is still pending review:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gnu.org_archive_html_qemu-2Ddevel_2018-2D02_msg00570.html=DwIBaQ=s883GpUCOChKOHiocYtGcg=CCrJKVC5zGot8PrnI-iYe00MdX4pgdQfMRigp14Ptmk=E9W0Q_a83TeT4c8PEn--G-Dk9dZ5UGO2CZRZzal0Sak=BC2keaBG9mG3yYI_IYPwAoA3_I69KmPZordXnDguF0I=
> 
> It changes the code and fixes a QEMU bug, but it doesn't address the
> iscsi_task_mgmt_async() issue we're discussing now.
> 
> I think users *can* work around the iscsi_task_mgmt_async() issue as follows:
> 
> If the command callback has not been invoked yet when the ABORT TASK
> TMF callback is invoked, call iscsi_scsi_cancel_task() so that
> libiscsi invokes the command callback with SCSI_STATUS_CANCELLED and
> frees the PDU.
> 
> Here is how that looks in QEMU:
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 8140baac15..7b4b43dc55 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -290,8 +290,12 @@ iscsi_abort_task_cb(struct iscsi_context *iscsi,
> int status, void *command_data,
> {
> IscsiAIOCB *acb = private_data;
> 
> -acb->status = -ECANCELED;
> -iscsi_schedule_bh(acb);
> +/* If the command callback hasn't been called yet, drop the task */
> +if (!acb->bh) {
> +/* Call iscsi_aio_ioctl_cb() with SCSI_STATUS_CANCELLED */
> +iscsi_scsi_cancel_task(iscsi, acb->task);
> +}
> +
> qemu_aio_unref(acb); /* acquired in iscsi_aio_cancel() */
> }

Reviewed-by: Felipe Franciosi 
Tested-by: Sreejith Mohanan 

We've been over this and Sreejith confirmed this fixes the issue we are seeing. 
Nevertheless, I think we should continue this conversation to clarify what are 
the expectations with regards to requests that may still be inflight.

F.


Re: [Qemu-devel] libiscsi task cancellation

2018-02-08 Thread Felipe Franciosi

> On 8 Feb 2018, at 14:58, Hannes Reinecke  wrote:
> 
> On 02/08/2018 03:14 PM, Paolo Bonzini wrote:
>> On 08/02/2018 15:08, Stefan Hajnoczi wrote:
>>> Now on to libiscsi:
>>> 
>>> The iscsi_task_mgmt_async() API documentation says:
>>> 
>>>  * abort_task will also cancel the scsi task. The callback for the
>>> scsi task will be invoked with
>>>  *SCSI_STATUS_CANCELLED
>>> 
>>> I see that the ABORT TASK TMF response invokes the user's
>>> iscsi_task_mgmt_async() callback but not the command callback.  I'm
>>> not sure how the command callback is invoked with
>>> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send
>>> that response.
>>> 
>>> Is libiscsi honoring its iscsi_task_mgmt_async() contract?
>> 
>> No, and QEMU is assuming the "wrong" behavior:
>> 
>> static void
>> iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void 
>> *command_data,
>>void *private_data)
>> {
>>IscsiAIOCB *acb = private_data;
>> 
>>acb->status = -ECANCELED;
>>iscsi_schedule_bh(acb);
>> }
>> 
> The definition of ABORT TASK TMF in SAM is pretty much useless.
> To quote:
> 
> A response of FUNCTION COMPLETE shall indicate that the task was aborted
> or was not in the task set.
> 
> IE we have no idea if we ever managed to abort the task; if the task had
> been in-flight by the time we've send the TMF we'll be getting a
> FUNCTION COMPLETE, too.

Why is that a problem? I am under the impression that drivers can cope with 
that. You can complete the TMF after the original request completed 
(successfully or not).

> So most FC HBA firmware implement the abort task with just a blacklist;
> the TMF will be returned immediately and the command response will be
> dropped if and when is arrives.

That sounds very dangerous, but maybe it's safe for FC (I don't know details 
about it). What happens if you complete the TMF immediately and the OS believes 
the IO (eg. a write) has been aborted, then issue another write for the same 
LBA and that completes successfully before the original request (which is still 
outstanding)?

In our implementation, we sit on the TMF and wait for the original request to 
either complete or abort. Only then we respond the TMF to the issuing OS.

Cheers,
Felipe


Re: [Qemu-devel] libiscsi task cancellation

2018-02-08 Thread Stefan Hajnoczi
On Thu, Feb 8, 2018 at 3:08 PM, Stefan Hajnoczi  wrote:
> On Thu, Feb 8, 2018 at 2:14 PM, Paolo Bonzini  wrote:
>> On 08/02/2018 15:08, Stefan Hajnoczi wrote:
> I think users *can* work around the iscsi_task_mgmt_async() issue as follows:
>
> If the command callback has not been invoked yet when the ABORT TASK
> TMF callback is invoked, call iscsi_scsi_cancel_task() so that
> libiscsi invokes the command callback with SCSI_STATUS_CANCELLED and
> frees the PDU.
>
> Here is how that looks in QEMU:
>
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 8140baac15..7b4b43dc55 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -290,8 +290,12 @@ iscsi_abort_task_cb(struct iscsi_context *iscsi,
> int status, void *command_data,
>  {
>  IscsiAIOCB *acb = private_data;
>
> -acb->status = -ECANCELED;
> -iscsi_schedule_bh(acb);
> +/* If the command callback hasn't been called yet, drop the task */
> +if (!acb->bh) {
> +/* Call iscsi_aio_ioctl_cb() with SCSI_STATUS_CANCELLED */
> +iscsi_scsi_cancel_task(iscsi, acb->task);
> +}
> +
>  qemu_aio_unref(acb); /* acquired in iscsi_aio_cancel() */
>  }

By the way, this is on top of
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg00570.html.

Stefan



Re: [Qemu-devel] libiscsi task cancellation

2018-02-08 Thread Stefan Hajnoczi
On Thu, Feb 8, 2018 at 2:58 PM, Hannes Reinecke  wrote:
> On 02/08/2018 03:14 PM, Paolo Bonzini wrote:
>> On 08/02/2018 15:08, Stefan Hajnoczi wrote:
>>> Now on to libiscsi:
>>>
>>> The iscsi_task_mgmt_async() API documentation says:
>>>
>>>   * abort_task will also cancel the scsi task. The callback for the
>>> scsi task will be invoked with
>>>   *SCSI_STATUS_CANCELLED
>>>
>>> I see that the ABORT TASK TMF response invokes the user's
>>> iscsi_task_mgmt_async() callback but not the command callback.  I'm
>>> not sure how the command callback is invoked with
>>> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send
>>> that response.
>>>
>>> Is libiscsi honoring its iscsi_task_mgmt_async() contract?
>>
>> No, and QEMU is assuming the "wrong" behavior:
>>
>> static void
>> iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void 
>> *command_data,
>> void *private_data)
>> {
>> IscsiAIOCB *acb = private_data;
>>
>> acb->status = -ECANCELED;
>> iscsi_schedule_bh(acb);
>> }
>>
> The definition of ABORT TASK TMF in SAM is pretty much useless.
> To quote:
>
> A response of FUNCTION COMPLETE shall indicate that the task was aborted
> or was not in the task set.
>
> IE we have no idea if we ever managed to abort the task; if the task had
> been in-flight by the time we've send the TMF we'll be getting a
> FUNCTION COMPLETE, too.
>
> So most FC HBA firmware implement the abort task with just a blacklist;
> the TMF will be returned immediately and the command response will be
> dropped if and when is arrives.

Great, thanks for confirming.

Stefan



Re: [Qemu-devel] libiscsi task cancellation

2018-02-08 Thread Stefan Hajnoczi
On Thu, Feb 8, 2018 at 2:14 PM, Paolo Bonzini  wrote:
> On 08/02/2018 15:08, Stefan Hajnoczi wrote:
>> Now on to libiscsi:
>>
>> The iscsi_task_mgmt_async() API documentation says:
>>
>>   * abort_task will also cancel the scsi task. The callback for the
>> scsi task will be invoked with
>>   *SCSI_STATUS_CANCELLED
>>
>> I see that the ABORT TASK TMF response invokes the user's
>> iscsi_task_mgmt_async() callback but not the command callback.  I'm
>> not sure how the command callback is invoked with
>> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send
>> that response.
>>
>> Is libiscsi honoring its iscsi_task_mgmt_async() contract?
>
> No, and QEMU is assuming the "wrong" behavior:
>
> static void
> iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void 
> *command_data,
> void *private_data)
> {
> IscsiAIOCB *acb = private_data;
>
> acb->status = -ECANCELED;
> iscsi_schedule_bh(acb);
> }

FWIW My recent use-after-free fix series is still pending review:
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg00570.html

It changes the code and fixes a QEMU bug, but it doesn't address the
iscsi_task_mgmt_async() issue we're discussing now.

I think users *can* work around the iscsi_task_mgmt_async() issue as follows:

If the command callback has not been invoked yet when the ABORT TASK
TMF callback is invoked, call iscsi_scsi_cancel_task() so that
libiscsi invokes the command callback with SCSI_STATUS_CANCELLED and
frees the PDU.

Here is how that looks in QEMU:

diff --git a/block/iscsi.c b/block/iscsi.c
index 8140baac15..7b4b43dc55 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -290,8 +290,12 @@ iscsi_abort_task_cb(struct iscsi_context *iscsi,
int status, void *command_data,
 {
 IscsiAIOCB *acb = private_data;

-acb->status = -ECANCELED;
-iscsi_schedule_bh(acb);
+/* If the command callback hasn't been called yet, drop the task */
+if (!acb->bh) {
+/* Call iscsi_aio_ioctl_cb() with SCSI_STATUS_CANCELLED */
+iscsi_scsi_cancel_task(iscsi, acb->task);
+}
+
 qemu_aio_unref(acb); /* acquired in iscsi_aio_cancel() */
 }



Re: [Qemu-devel] libiscsi task cancellation

2018-02-08 Thread Hannes Reinecke
On 02/08/2018 03:14 PM, Paolo Bonzini wrote:
> On 08/02/2018 15:08, Stefan Hajnoczi wrote:
>> Now on to libiscsi:
>>
>> The iscsi_task_mgmt_async() API documentation says:
>>
>>   * abort_task will also cancel the scsi task. The callback for the
>> scsi task will be invoked with
>>   *SCSI_STATUS_CANCELLED
>>
>> I see that the ABORT TASK TMF response invokes the user's
>> iscsi_task_mgmt_async() callback but not the command callback.  I'm
>> not sure how the command callback is invoked with
>> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send
>> that response.
>>
>> Is libiscsi honoring its iscsi_task_mgmt_async() contract?
> 
> No, and QEMU is assuming the "wrong" behavior:
> 
> static void
> iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void 
> *command_data,
> void *private_data)
> {
> IscsiAIOCB *acb = private_data;
> 
> acb->status = -ECANCELED;
> iscsi_schedule_bh(acb);
> }
> 
The definition of ABORT TASK TMF in SAM is pretty much useless.
To quote:

A response of FUNCTION COMPLETE shall indicate that the task was aborted
or was not in the task set.

IE we have no idea if we ever managed to abort the task; if the task had
been in-flight by the time we've send the TMF we'll be getting a
FUNCTION COMPLETE, too.

So most FC HBA firmware implement the abort task with just a blacklist;
the TMF will be returned immediately and the command response will be
dropped if and when is arrives.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)



[Qemu-devel] libiscsi task cancellation

2018-02-08 Thread Stefan Hajnoczi
SAM-5, 5.6 Aborting commands says:

  A command is aborted when a SCSI device condition (see 6.3), command,
  or task management function causes termination of the command prior to
  its completion by the device server. After a command is aborted and
  TASK ABORTED status, if any, is returned, the SCSI target device shall
  send no further requests or responses for that command.

Then "Table 48 — Task management functions that abort commands (part 1
of 2)" goes on to say that no TASK ABORTED status is returned for
aborted commands "if the ABORT TASK scope is I_T_L nexus".

My interpretation is that the target doesn't have to send a response
for the aborted command.  It just needs to complete the ABORT TASK
TMF.  Afterwards the initiator knows the task is gone and will receive
no more responses.

Is my understanding correct?

Now on to libiscsi:

The iscsi_task_mgmt_async() API documentation says:

  * abort_task will also cancel the scsi task. The callback for the
scsi task will be invoked with
  *SCSI_STATUS_CANCELLED

I see that the ABORT TASK TMF response invokes the user's
iscsi_task_mgmt_async() callback but not the command callback.  I'm
not sure how the command callback is invoked with
SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send
that response.

Is libiscsi honoring its iscsi_task_mgmt_async() contract?

Thanks,
Stefan



Re: [Qemu-devel] libiscsi task cancellation

2018-02-08 Thread Paolo Bonzini
On 08/02/2018 15:08, Stefan Hajnoczi wrote:
> Now on to libiscsi:
> 
> The iscsi_task_mgmt_async() API documentation says:
> 
>   * abort_task will also cancel the scsi task. The callback for the
> scsi task will be invoked with
>   *SCSI_STATUS_CANCELLED
> 
> I see that the ABORT TASK TMF response invokes the user's
> iscsi_task_mgmt_async() callback but not the command callback.  I'm
> not sure how the command callback is invoked with
> SCSI_STATUS_CANCELLED unless libiscsi is relying on the target to send
> that response.
> 
> Is libiscsi honoring its iscsi_task_mgmt_async() contract?

No, and QEMU is assuming the "wrong" behavior:

static void
iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data,
void *private_data)
{
IscsiAIOCB *acb = private_data;

acb->status = -ECANCELED;
iscsi_schedule_bh(acb);
}

Paolo