Re: [PATCH 2/2] libiscsi: check for pending TMF during task xmit

Hannes Reinecke Fri, 31 Jul 2009 07:54:11 -0700

Hannes Reinecke wrote:
> Hannes Reinecke wrote:
>> Mike Christie wrote:
>>> Mike Christie wrote:
>>>> Hannes Reinecke wrote:
>>>>> Sigh. Why do you have to make is so complicated ...
>>>>> My patch was easy and simple originally. And now this :-)
>>>>>
>>>> This gets really ugly if we do it in libiscsi_tcp. I moved the check to 
>>>> libiscsi and I changed the abort task test to check for the rtt since 
>>>> that works for data outs. I think the attached patch will do what you 
>>>> wanted. It is only compile tested.
>>>>
>>> Bah. The lun and itt is not set for scsi cmd pdus. This should fix it.
>>>
>>> For the lu reset and requeue (r2t data-out handling) or scsi cmd case, 
>>> the task sc lun is always going to be set.
>>>
>>> For the abort and requeue or cmd case, we only need to check the itt/rtt 
>>> for data outs when doing a abort task (the requeue case), because the 
>>> cmd has already been sent (iscsi_eh_abort checks for it on the cmd queue 
>>> before sending) so there is no point to check at that point (also the 
>>> itt is not set for scsi cmd pdus yet).
>>>
>>> It might be nicer to move the restrictions check after the prep scsi cmd 
>>> pdu call but you need the cmdsn scsi_prep_scsi_cmd_pdu patch I sent the 
>>> other day.
>>>
>> Better. Nearly there. I'm running with fast_abort disabled, and occasionally
>> I'm getting this (this is now the HP MSA2012i, so we can't really blame
>> NetApp here):
>>
> [ .. ]
> 
> Bah. I _hate_ SMP.
> This 'curious' behaviour is actually a race condition.
> Any changes to 'tmf_state' are not reflected to other threads/cpus/whatever.
> Two thing I did to remedy this:
> - Break out from the iscsi_data_xmit() loop once tmf_state is something other
>   than TMF_INITIAL, ie effectively single-stepping PDUs during TMF
> - Make tmf_state atomic.
> 
> After these changes the race window is much smaller:
> 
> Jul 31 14:02:20 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
> ffff8800744cda80 lun 2]
> Jul 31 14:02:20 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
> timeout
> Jul 31 14:02:20 tyne kernel:  session1: mgmtpdu [op 0x2 hdr->itt 0x3 datalen 
> 0]
> Jul 31 14:02:20 tyne kernel:  connection1:0: mgmtpdu [itt 0x3 task 
> ffff88007b91cc00] xmit
> Jul 31 14:02:20 tyne kernel:  connection1:0: xmit pdu [op 42 itt 0x3 lun 2 
> count 0]
> Jul 31 14:02:20 tyne kernel:  connection1:0: tmf rsp [itt 0x3] response 0 
> state 1
> Jul 31 14:02:20 tyne kernel:  connection1:0: task [op 1 itt 0x4e lun 2] reset.
> Jul 31 14:02:20 tyne kernel:  connection1:0: task [op 1 itt 0x4e lun 2] reset.
> Jul 31 14:02:20 tyne kernel:  connection1:0: task [op 1 itt 0x4e lun 2] reset.
> Jul 31 14:02:20 tyne kernel:  connection1:0: task [op 1 itt 0x4e lun 2] reset.
> Jul 31 14:02:20 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
> 
> note the 'xmit pdu' line.
> Which means that we again managed to send a PDU _after_ the tmf command.
> And the 'count 0' means it's the first PDU transmitted from iscsi_data_xmit(),
> ie scheduled from another work_queue item.
> 
> Continuing with investigation.
> 
Hmm. I must admit I'm slightly at a loss here.
I do have this function:


static int iscsi_check_tmf_restrictions(struct iscsi_task *task, int opcode)
{
        struct iscsi_conn *conn = task->conn;
        struct iscsi_tm *tmf = &conn->tmhdr;
        unsigned int hdr_lun, task_lun;

        if (iscsi_conn_tmf_state(conn) == TMF_INITIAL)
                return 0;

        if ((tmf->opcode & ISCSI_OPCODE_MASK) != ISCSI_OP_SCSI_TMFUNC) {
                iscsi_conn_printk(KERN_INFO, conn,
                                  "ignore tmf op %x\n", tmf->opcode);
                return 0;
        }

        hdr_lun = scsilun_to_int((struct scsi_lun *)tmf->lun);
        task_lun = scsilun_to_int((struct scsi_lun *)task->lun);
        switch (ISCSI_TM_FUNC_VALUE(tmf)) {
        case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
                /* Skip unaffected LUNs */
                if (hdr_lun != task_lun)
                        return 0;
                /*
                 * all scsi cmd pdus and, if fast_abort is set, data-outs
                 * in response to a r2t will fail to be sent here
                 *
                 * If we are sending a initial r2t with a scsi cmd pdu
                 * we do not hit this test, but the tmf will be sent
                 * after the scsi cmd pdu and data-out for the initial r2t.
                 */
                if (conn->session->fast_abort) {
                        iscsi_conn_printk(KERN_INFO, conn,
                                          "task [op %x itt 0x%x lun %u] "
                                          "fast abort.\n",
                                          task->hdr->opcode, task->hdr->itt,
                                          hdr_lun);
                        return -EACCES;
                }
                /* Continue with R2T response */
                if ((task->hdr->opcode & ISCSI_OPCODE_MASK) !=
                    ISCSI_OP_SCSI_DATA_OUT) {
                        iscsi_conn_printk(KERN_INFO, conn,
                                          "task [op %x itt 0x%x lun %u] "
                                          "reset.\n",
                                          task->hdr->opcode, task->hdr->itt,
                                          hdr_lun);
                        return -EACCES;
                }
                break;
        case ISCSI_TM_FUNC_ABORT_TASK:
                /*
                 * the caller has already checked if the task
                 * they want to abort was in the pending queue so if
                 * we are here the cmd pdu has gone out already, and
                 * we will only hit this for data-outs
                 */
                if (opcode == ISCSI_OP_SCSI_DATA_OUT &&
                    task->hdr_itt == tmf->rtt) {
                        iscsi_conn_printk(KERN_INFO, conn,
                                          "task [sc %p itt 0x%x lun %u] "
                                          "aborted.\n",
                                          task->sc, task->hdr->itt, hdr_lun);
                        return -EACCES;
                }
                break;
        }

        return 0;
}

called (supposedly) from here:

        /* process pending command queue */
        while (!list_empty(&conn->cmdqueue)) {
                if (unlikely(atomic_read(&conn->suspend_tx)))
                        break;
                task = list_entry(conn->cmdqueue.next, struct iscsi_task,
                                  running);
                if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_CMD) < 0)
                        break;

                conn->task = task;
                list_del_init(&conn->task->running);

                if (conn->session->state == ISCSI_STATE_LOGGING_OUT) {
                        fail_scsi_task(conn->task, DID_IMM_RETRY);
                        continue;
                }
                rc = iscsi_prep_scsi_cmd_pdu(conn->task);
                if (rc) {
                        if (rc == -ENOMEM) {
                                list_add_tail(&conn->task->running,
                                              &conn->cmdqueue);
                                conn->task = NULL;
                                goto again;
                        } else
                                fail_scsi_task(conn->task, DID_ABORT);
                        continue;
                }
                if (iscsi_conn_tmf_state(conn) != TMF_INITIAL)
                        iscsi_conn_printk(KERN_INFO, conn,
                                  "xmit cmdpdu [itt 0x%x task %p] xmit\n",
                                  conn->task->itt, conn->task);
                rc = iscsi_xmit_task(conn);
                if (rc) {
                        iscsi_conn_printk(KERN_INFO, conn,
                                          "retry xmit itt 0x%x rc %d\n",
                                          conn->task->itt, rc);
                        goto again;
                }

and then I get these logs:

Jul 31 16:36:36 tyne kernel:  session2: iscsi_eh_device_reset LU Reset [sc 
ffff8800717ad180 lun 7]
Jul 31 16:36:36 tyne kernel:  session2: iscsi_exec_task_mgmt_fn tmf set timeout
Jul 31 16:36:36 tyne kernel:  session2: mgmtpdu [op 0x2 hdr->itt 0x3a datalen 0]
Jul 31 16:36:36 tyne kernel:  connection2:0: mgmtpdu [itt 0x3a task 
ffff88007ad60800 count 0] xmit
Jul 31 16:36:36 tyne kernel:  connection2:0: single-stepping for TMF
Jul 31 16:36:36 tyne kernel:  connection2:0: xmit cmdpdu [itt 0x22 task 
ffff88007ad9a800] xmit
Jul 31 16:36:36 tyne kernel:  connection2:0: task [op 1 itt 0x6a lun 7] reset.
Jul 31 16:36:36 tyne kernel:  connection2:0: tmf rsp [itt 0x3a] response 0 
state 1
Jul 31 16:36:36 tyne kernel:  session2: iscsi_suspend_tx suspend Tx

Huh? <me> being completely stupid?
We have already sent the TMF pdu, and (apparently) the tmf_state is already
set to something other than TMF_INITIAL.
And yet iscsi_check_tmf_restrictions doesn't trigger?
Or rather, triggers for the _second_ and not the first?

What the f*** is going on here?

(Yes, I've already set the respective variables to atomic)
The 'single-stepping' line means that the tmf_state has already been updated 
after
the mgmt pdu has been sent.
I would understand it to a limited extend when the whole beast is running on 
several
CPUs simultaneously, so that the tmf_state update came in after eg 
iscsi_prep_XXX

But we're creating a singlethread workqueue, where this should not happen IIRC.

Problem here is that we're transferring the SCSI command, but not the
data out command as by that time the command is already cleared.
This causes the MSA target to drop the connection eventually (silently,
so as to not make things to easy for us). And consequently we're having
to do a full session recovery.

Alright, I'll give it a skip for the weekend.
Maybe someone more enlightened than me finds it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Re: [PATCH 2/2] libiscsi: check for pending TMF during task xmit

Reply via email to