On 02/02/2010 06:46 AM, Or Gerlitz wrote:
optimize iser scsi command response processing flow to avoid taking extra
reference on the iscsi task and to use libiscsi lockless completion path.
This way there's no contention on the session lock between the scsi command
submission to the scsi command completion flows.

Signed-off-by: Or Gerlitz<ogerl...@voltaire.com>

-----

Mike, I'd like to get feedback on this approach/patch, basically I'd
to avoid taking the session lock on the command response flow:

- the kfifo_put call is safe since there's one consumer (xmit flow under
   lock for passthrough, and xmit worker for non passthrough) and one
   producer (response flow has single tasklet instance).

I do do not know what you mean that there is one consumer. There is a kfifo_get in the passthrough path (__iscsi_conn_send_pdu) and in the non passthrough path (iscsi_queuecommand). We could have 4 threads/contects calling kfifo_get at the same time: 1. iscsi_queuecommand call from scsi-ml. This would be from the kblockd thread or block softirq. 2. __iscsi_conn_send_pdu call from userspace. This would be called from the userspace context. 3. __iscsi_conn_send_pdu from tasklet (for iser it would be your completion tasklet and for other drivers it would be there isr or the net softirq) when responding to a nop_in from the target. 4. __iscsi_conn_send_pdu from the conn->transport_timer timer context when sending a nop_out to ping the target.


There is normally one __kfifo_put in the completion path. However, there are __kfifo_puts all over the place when you take into consideration the error paths. We can do __kfifo_put in iscsi_queuecommand, in the xmit_thread in
iscsi_data_xmit, and in __iscsi_conn_send_pdu.


- to my understanding, the  _get/put task calls in iser_rcv_completion add
   nothing so I removed them


If you did not have a get() on the task and are running iser_task_rdma_finalize while some other code is doing the final put (could be due to some sort of cleanup like from a abort or lun/target reset) then the task could be reallocated while iser_task_rdma_finalize is still accessing it. This should not happen though, because the only case it would happen on is where are sending tmfs and we get success responses, and then the target continues to send data/responses for the task.

One question on this. Maybe there is one other place it could happen. If after ep_disconnect has completed can iser_rcv_completion be running or get called? If so then libiscsi could be cleaning up tasks (scsi_fail_tasks) for erl0 cleanup and then that could race with iser_rcv_completion. I think we (maybe Erez and I) discussed this on the list and we said it should not be possible, so maybe there is just the tmf case which should not occur and we can think of something that does not affect perf.

+               err = iscsi_complete_pdu(conn->iscsi_conn, hdr,
+                                               rx_data, rx_data_len);


libiscsi will still check the session->state bits and access the task list under the session lock in iscsi_queuecommand, iscsi_complete_pdu grabs the session->lock to access the tasks list struct, and iscsi_data_xmit grabs the session->lock to access the list and some other bits.

I think we can change the session state to an atomic easily. For the other parts though...

I attached a patch, rm-list4.patch, I was working on a long long time ago. It is really really raw. The patch replaces the session lock with:

- sn_lock: access sequence number stuff under. This was going to be removed for atomics or something, but I did not get to finish it up.

Currently that lock could be taken by the queuecommand path, the passthrough path and completion path.

- task->lock: access task bits under this lock. This is accessed under the queuecommand path, passthrough, data_xmit, and completion path. It is different from the session lock though in that it is per task, so one task can complete, and be sent while another tasks can be processed.

- pool_lock. This could be taken in the queuecommand, and passthrough and completion paths.

This lock was going to be removed and so were the kfifos. And instead of using a kfifo, I was going to use the scsi_cmnd->request->tag and session->cmds array. That tag would then be used to reference the session->cmds array to get a task. We would not need any lock for this then.

This requires doing a blk_queue_tag per session. This was going to be built on the attached patch, make-tagging-more-generic.patch, and some other changes so that we created a blk_queue_tag per session in libiscsi and set the bqt to the queue in a scsi_slave_configure callout.


So in the end the only lock we would have in the io path is the per task one.

Again, the patch is really really raw. It is not meant to be run and not really reviewed. It just meant to hopefully give an idea of what I was trying to do. The queue_map part does not work at all because it reorder tasks so I need to figure out how to handle that still. The other parts are easier.

--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index b3e5e08..faa6620 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -445,12 +445,16 @@ static int iscsi_sw_tcp_pdu_init(struct iscsi_task *task,
 				 unsigned int offset, unsigned int count)
 {
 	struct iscsi_conn *conn = task->conn;
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
 	int err = 0;
 
+//	spin_lock_bh(&tcp_sw_conn);
+
 	iscsi_sw_tcp_send_hdr_prep(conn, task->hdr, task->hdr_len);
 
 	if (!count)
-		return 0;
+		goto unlock;
 
 	if (!task->sc)
 		iscsi_sw_tcp_send_linear_data_prep(conn, task->data, count);
@@ -464,9 +468,12 @@ static int iscsi_sw_tcp_pdu_init(struct iscsi_task *task,
 
 	if (err) {
 		iscsi_conn_failure(conn, err);
-		return -EIO;
+		err = -EIO;
 	}
-	return 0;
+
+unlock:
+//	spin_unlock_bh(&tcp_sw_conn);
+	return err;
 }
 
 static int iscsi_sw_tcp_pdu_alloc(struct iscsi_task *task, uint8_t opcode)
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index bde05e3..e2f5724 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -85,6 +85,12 @@ inline void iscsi_conn_queue_work(struct iscsi_conn *conn)
 }
 EXPORT_SYMBOL_GPL(iscsi_conn_queue_work);
 
+/*
+ * If the window closed with commands running then the scsi/block plug
+ * code or the command completion will start new IO. If this closed
+ * with no scsi commands running then the scsi/block plug code will
+ * get us running again.
+ */
 void
 iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_nopin *hdr)
 {
@@ -98,23 +104,15 @@ iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_nopin *hdr)
 	if (iscsi_sna_lt(max_cmdsn, exp_cmdsn - 1))
 		return;
 
+	spin_lock(&session->sn_lock);
 	if (exp_cmdsn != session->exp_cmdsn &&
 	    !iscsi_sna_lt(exp_cmdsn, session->exp_cmdsn))
 		session->exp_cmdsn = exp_cmdsn;
 
 	if (max_cmdsn != session->max_cmdsn &&
-	    !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) {
+	    !iscsi_sna_lt(max_cmdsn, session->max_cmdsn))
 		session->max_cmdsn = max_cmdsn;
-		/*
-		 * if the window closed with IO queued, then kick the
-		 * xmit thread
-		 */
-		if (!list_empty(&session->leadconn->xmitqueue) ||
-		    !list_empty(&session->leadconn->mgmtqueue)) {
-			if (!(session->tt->caps & CAP_DATA_PATH_OFFLOAD))
-				iscsi_conn_queue_work(session->leadconn);
-		}
-	}
+	spin_unlock(&session->sn_lock);
 }
 EXPORT_SYMBOL_GPL(iscsi_update_cmdsn);
 
@@ -240,6 +238,23 @@ static int iscsi_prep_bidi_ahs(struct iscsi_task *task)
 	return 0;
 }
 
+static int iscsi_check_cmdsn_window_closed(struct iscsi_conn *conn)
+{
+	struct iscsi_session *session = conn->session;
+
+	/*
+	 * Check for iSCSI window and take care of CmdSN wrap-around
+	 */
+	if (!iscsi_sna_lte(session->cmdsn, session->max_cmdsn)) {
+		ISCSI_DBG_SESSION(session, "iSCSI CmdSN closed. ExpCmdSn "
+				  "%u MaxCmdSN %u CmdSN %u\n",
+				  session->exp_cmdsn, session->max_cmdsn,
+				  session->cmdsn);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
 /**
  * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu
  * @task: iscsi task
@@ -254,9 +269,20 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
 	struct scsi_cmnd *sc = task->sc;
 	struct iscsi_cmd *hdr;
 	unsigned hdrlength, cmd_len;
+	__be32 cmdsn;
 	itt_t itt;
 	int rc;
 
+	spin_lock(&session->sn_lock);
+	rc = iscsi_check_cmdsn_window_closed(conn);
+	if (rc) {
+		spin_unlock(&session->sn_lock);
+		return rc;
+	}
+	cmdsn = session->cmdsn;
+	session->cmdsn++;
+	spin_unlock(&session->sn_lock);
+
 	rc = conn->session->tt->alloc_pdu(task, ISCSI_OP_SCSI_CMD);
 	if (rc)
 		return rc;
@@ -277,8 +303,10 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
 	hdr->flags = ISCSI_ATTR_SIMPLE;
 	int_to_scsilun(sc->device->lun, (struct scsi_lun *)hdr->lun);
 	memcpy(task->lun, hdr->lun, sizeof(task->lun));
-	hdr->cmdsn = task->cmdsn = cpu_to_be32(session->cmdsn);
-	session->cmdsn++;
+
+	hdr->cmdsn = task->cmdsn = cpu_to_be32(cmdsn);
+
+// mnc need to access under lock or make atomic?
 	hdr->exp_statsn = cpu_to_be32(conn->exp_statsn);
 	cmd_len = sc->cmd_len;
 	if (cmd_len < ISCSI_CDB_SIZE)
@@ -360,12 +388,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
 	WARN_ON(hdrlength >= 256);
 	hdr->hlength = hdrlength & 0xFF;
 
-	if (session->tt->init_task && session->tt->init_task(task))
-		return -EIO;
-
-	task->state = ISCSI_TASK_RUNNING;
-	list_move_tail(&task->running, &conn->run_list);
-
+	task->state = ISCSI_TASK_PDU_SETUP;
 	conn->scsicmd_pdus_cnt++;
 	ISCSI_DBG_SESSION(session, "iscsi prep [%s cid %d sc %p cdb 0x%x "
 			  "itt 0x%x len %d bidi_len %d cmdsn %d win %d]\n",
@@ -391,10 +414,15 @@ static void iscsi_complete_command(struct iscsi_task *task)
 {
 	struct iscsi_conn *conn = task->conn;
 	struct iscsi_session *session = conn->session;
-	struct scsi_cmnd *sc = task->sc;
+	struct scsi_cmnd *sc;
+	unsigned long flags;
 
+	spin_lock_irqsave(&task->lock, flags);
+	/* bad target could send a response while r2t is queued */
+	clear_bit(task->itt, conn->queue_map);
+
+	sc = task->sc;
 	session->tt->cleanup_task(task);
-	list_del_init(&task->running);
 	task->state = ISCSI_TASK_COMPLETED;
 	task->sc = NULL;
 
@@ -403,48 +431,65 @@ static void iscsi_complete_command(struct iscsi_task *task)
 	/*
 	 * login task is preallocated so do not free
 	 */
-	if (conn->login_task == task)
+	if (conn->login_task == task) {
+		spin_unlock_irqrestore(&task->lock, flags);
 		return;
+	}
 
 	__kfifo_put(session->cmdpool.queue, (void*)&task, sizeof(void*));
 
 	if (conn->ping_task == task)
 		conn->ping_task = NULL;
 
-	if (sc) {
-		task->sc = NULL;
-		/* SCSI eh reuses commands to verify us */
-		sc->SCp.ptr = NULL;
-		/*
-		 * queue command may call this to free the task, but
-		 * not have setup the sc callback
-		 */
-		if (sc->scsi_done)
-			sc->scsi_done(sc);
+	if (!sc) {
+		spin_unlock_irqrestore(&task->lock, flags);
+		return;
 	}
+
+	task->sc = NULL;
+	/* SCSI eh reuses commands to verify us */
+	sc->SCp.ptr = NULL;
+	spin_unlock_irqrestore(&task->lock, flags);
+	/*
+	 * queue command may call this to free the task, but
+	 * not have setup the sc callback
+	 */
+	if (sc->scsi_done)
+		sc->scsi_done(sc);
 }
 
-void __iscsi_get_task(struct iscsi_task *task)
+void iscsi_get_task(struct iscsi_task *task)
 {
 	atomic_inc(&task->refcount);
 }
-EXPORT_SYMBOL_GPL(__iscsi_get_task);
+EXPORT_SYMBOL_GPL(iscsi_get_task);
 
-static void __iscsi_put_task(struct iscsi_task *task)
+void iscsi_put_task(struct iscsi_task *task)
 {
 	if (atomic_dec_and_test(&task->refcount))
 		iscsi_complete_command(task);
 }
+EXPORT_SYMBOL_GPL(iscsi_put_task);
 
-void iscsi_put_task(struct iscsi_task *task)
+int iscsi_get_task_lock(struct iscsi_task *task)
 {
-	struct iscsi_session *session = task->conn->session;
+	spin_lock_bh(&task->lock);
+	if (task->state == ISCSI_TASK_COMPLETED) {
+		spin_unlock_bh(&task->lock);
+		return -EINVAL;
+	}
 
-	spin_lock_bh(&session->lock);
-	__iscsi_put_task(task);
-	spin_unlock_bh(&session->lock);
+	iscsi_get_task(task);
+	return 0;
 }
-EXPORT_SYMBOL_GPL(iscsi_put_task);
+EXPORT_SYMBOL_GPL(iscsi_get_task_lock);
+
+void iscsi_put_task_lock(struct iscsi_task *task)
+{
+	spin_unlock_bh(&task->lock);
+	iscsi_put_task(task);
+}
+EXPORT_SYMBOL_GPL(iscsi_put_task_lock);
 
 /*
  * session lock must be held
@@ -454,29 +499,21 @@ static void fail_command(struct iscsi_conn *conn, struct iscsi_task *task,
 {
 	struct scsi_cmnd *sc;
 
-	sc = task->sc;
-	if (!sc)
-		return;
+	clear_bit(task->itt, conn->queue_map);
 
-	if (task->state == ISCSI_TASK_PENDING)
-		/*
-		 * cmd never made it to the xmit thread, so we should not count
-		 * the cmd in the sequencing
-		 */
-		conn->session->queued_cmdsn--;
-
-	sc->result = err;
-	if (!scsi_bidi_cmnd(sc))
-		scsi_set_resid(sc, scsi_bufflen(sc));
-	else {
-		scsi_out(sc)->resid = scsi_out(sc)->length;
-		scsi_in(sc)->resid = scsi_in(sc)->length;
+	sc = task->sc;
+	if (sc) {
+		sc->result = err;
+		if (!scsi_bidi_cmnd(sc))
+			scsi_set_resid(sc, scsi_bufflen(sc));
+		else {
+			scsi_out(sc)->resid = scsi_out(sc)->length;
+			scsi_in(sc)->resid = scsi_in(sc)->length;
+		}
 	}
 
-	if (conn->task == task)
-		conn->task = NULL;
-	/* release ref from queuecommand */
-	__iscsi_put_task(task);
+	/* release ref from queuecommand/send_pdu */
+	iscsi_put_task(task);
 }
 
 static int iscsi_prep_mgmt_task(struct iscsi_conn *conn,
@@ -495,6 +532,7 @@ static int iscsi_prep_mgmt_task(struct iscsi_conn *conn,
 	/*
 	 * pre-format CmdSN for outgoing PDU.
 	 */
+	spin_lock_bh(&session->sn_lock);
 	nop->cmdsn = cpu_to_be32(session->cmdsn);
 	if (hdr->itt != RESERVED_ITT) {
 		/*
@@ -503,20 +541,15 @@ static int iscsi_prep_mgmt_task(struct iscsi_conn *conn,
 		 * we should start checking the cmdsn numbers for mgmt tasks.
 		 */
 		if (conn->c_stage == ISCSI_CONN_STARTED &&
-		    !(hdr->opcode & ISCSI_OP_IMMEDIATE)) {
-			session->queued_cmdsn++;
+		    !(hdr->opcode & ISCSI_OP_IMMEDIATE))
 			session->cmdsn++;
-		}
 	}
-
-	if (session->tt->init_task && session->tt->init_task(task))
-		return -EIO;
+	spin_unlock_bh(&session->sn_lock);
 
 	if ((hdr->opcode & ISCSI_OPCODE_MASK) == ISCSI_OP_LOGOUT)
 		session->state = ISCSI_STATE_LOGGING_OUT;
 
-	task->state = ISCSI_TASK_RUNNING;
-	list_move_tail(&task->running, &conn->mgmt_run_list);
+	task->state = ISCSI_TASK_PDU_SETUP;
 	ISCSI_DBG_SESSION(session, "mgmtpdu [op 0x%x hdr->itt 0x%x "
 			  "datalen %d]\n", hdr->opcode & ISCSI_OPCODE_MASK,
 			  hdr->itt, task->data_count);
@@ -542,15 +575,24 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 		 * Same task can be used. Same ITT must be used.
 		 * Note that login_task is preallocated at conn_create().
 		 */
+	{
+		printk(KERN_ERR "login\n");
 		task = conn->login_task;
+	}
 	else {
 		BUG_ON(conn->c_stage == ISCSI_CONN_INITIAL_STAGE);
 		BUG_ON(conn->c_stage == ISCSI_CONN_STOPPED);
 
+		spin_lock_bh(&session->pool_lock);
 		if (!__kfifo_get(session->cmdpool.queue,
-				 (void*)&task, sizeof(void*)))
+				 (void*)&task, sizeof(void*))) {
+			spin_unlock_bh(&session->pool_lock);
 			return NULL;
+		}
+		spin_unlock_bh(&session->pool_lock);
 	}
+
+	spin_lock_bh(&task->lock);
 	/*
 	 * released in complete pdu for task we expect a response for, and
 	 * released by the lld when it has transmitted the task for
@@ -569,6 +611,7 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 	if (conn->session->tt->alloc_pdu(task, hdr->opcode)) {
 		iscsi_conn_printk(KERN_ERR, conn, "Could not allocate "
 				 "pdu for mgmt task.\n");
+		spin_unlock_bh(&task->lock);
 		goto requeue_task;
 	}
 	itt = task->hdr->itt;
@@ -583,29 +626,44 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 						   task->conn->session->age);
 	}
 
-	INIT_LIST_HEAD(&task->running);
-	list_add_tail(&task->running, &conn->mgmtqueue);
+	if (iscsi_prep_mgmt_task(conn, task)) {
+printk(KERN_ERR "iscsi_prep_mgmt_task fail\n");
+		spin_unlock_bh(&task->lock);
+		goto free_task;
+	}
 
 	if (session->tt->caps & CAP_DATA_PATH_OFFLOAD) {
-		if (iscsi_prep_mgmt_task(conn, task))
+		task->state = ISCSI_TASK_RUNNING;
+		spin_unlock_bh(&task->lock);
+
+		if (session->tt->init_task && session->tt->init_task(task))
 			goto free_task;
 
 		if (session->tt->xmit_task(task))
 			goto free_task;
 
-	} else
-		iscsi_conn_queue_work(conn);
+	} else {
+		/* get a ref for the list - xmitwork will release */
+printk(KERN_ERR "queue mgmt %d\n", task->itt);
+		iscsi_get_task(task);
+		task->state = ISCSI_TASK_QUEUED;
+		iscsi_queue_task(task);
+		spin_unlock_bh(&task->lock);
+	}
 
 	return task;
 
 free_task:
-	__iscsi_put_task(task);
+	iscsi_put_task(task);
 	return NULL;
 
 requeue_task:
-	if (task != conn->login_task)
+	if (task != conn->login_task) {
+		spin_lock_bh(&session->pool_lock);
 		__kfifo_put(session->cmdpool.queue, (void*)&task,
 			    sizeof(void*));
+		spin_unlock_bh(&session->pool_lock);
+	}
 	return NULL;
 }
 
@@ -705,7 +763,7 @@ out:
 			  sc, sc->result, task->itt);
 	conn->scsirsp_pdus_cnt++;
 
-	__iscsi_put_task(task);
+	iscsi_put_task(task);
 }
 
 /**
@@ -739,7 +797,7 @@ iscsi_data_in_rsp(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 	}
 
 	conn->scsirsp_pdus_cnt++;
-	__iscsi_put_task(task);
+	iscsi_put_task(task);
 }
 
 static void iscsi_tmf_rsp(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
@@ -842,7 +900,7 @@ static struct iscsi_task *iscsi_itt_to_task(struct iscsi_conn *conn, itt_t itt)
 }
 
 /**
- * __iscsi_complete_pdu - complete pdu
+ * iscsi_complete_pdu - complete pdu
  * @conn: iscsi conn
  * @hdr: iscsi header
  * @data: data buffer
@@ -852,8 +910,8 @@ static struct iscsi_task *iscsi_itt_to_task(struct iscsi_conn *conn, itt_t itt)
  * queuecommand or send generic. session lock must be held and verify
  * itt must have been called.
  */
-int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
-			 char *data, int datalen)
+int iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
+		       char *data, int datalen)
 {
 	struct iscsi_session *session = conn->session;
 	int opcode = hdr->opcode & ISCSI_OPCODE_MASK, rc = 0;
@@ -959,7 +1017,7 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 		}
 
 		iscsi_tmf_rsp(conn, hdr);
-		__iscsi_put_task(task);
+		iscsi_put_task(task);
 		break;
 	case ISCSI_OP_NOOP_IN:
 		iscsi_update_cmdsn(session, (struct iscsi_nopin*)hdr);
@@ -977,7 +1035,7 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 			goto recv_pdu;
 
 		mod_timer(&conn->transport_timer, jiffies + conn->recv_timeout);
-		__iscsi_put_task(task);
+		iscsi_put_task(task);
 		break;
 	default:
 		rc = ISCSI_ERR_BAD_OPCODE;
@@ -989,19 +1047,7 @@ out:
 recv_pdu:
 	if (iscsi_recv_pdu(conn->cls_conn, hdr, data, datalen))
 		rc = ISCSI_ERR_CONN_FAILED;
-	__iscsi_put_task(task);
-	return rc;
-}
-EXPORT_SYMBOL_GPL(__iscsi_complete_pdu);
-
-int iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
-		       char *data, int datalen)
-{
-	int rc;
-
-	spin_lock(&conn->session->lock);
-	rc = __iscsi_complete_pdu(conn, hdr, data, datalen);
-	spin_unlock(&conn->session->lock);
+	iscsi_put_task(task);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(iscsi_complete_pdu);
@@ -1044,8 +1090,6 @@ EXPORT_SYMBOL_GPL(iscsi_verify_itt);
  * @itt: itt
  *
  * This should be used for cmd tasks.
- *
- * The session lock must be held.
  */
 struct iscsi_task *iscsi_itt_to_ctask(struct iscsi_conn *conn, itt_t itt)
 {
@@ -1121,33 +1165,17 @@ void iscsi_conn_failure(struct iscsi_conn *conn, enum iscsi_err err)
 }
 EXPORT_SYMBOL_GPL(iscsi_conn_failure);
 
-static int iscsi_check_cmdsn_window_closed(struct iscsi_conn *conn)
-{
-	struct iscsi_session *session = conn->session;
-
-	/*
-	 * Check for iSCSI window and take care of CmdSN wrap-around
-	 */
-	if (!iscsi_sna_lte(session->queued_cmdsn, session->max_cmdsn)) {
-		ISCSI_DBG_SESSION(session, "iSCSI CmdSN closed. ExpCmdSn "
-				  "%u MaxCmdSN %u CmdSN %u/%u\n",
-				  session->exp_cmdsn, session->max_cmdsn,
-				  session->cmdsn, session->queued_cmdsn);
-		return -ENOSPC;
-	}
-	return 0;
-}
-
 static int iscsi_xmit_task(struct iscsi_conn *conn)
 {
 	struct iscsi_task *task = conn->task;
 	int rc;
 
-	__iscsi_get_task(task);
-	spin_unlock_bh(&conn->session->lock);
+	if (!task)
+		BUG();
+
+	iscsi_get_task(task);
 	rc = conn->session->tt->xmit_task(task);
-	spin_lock_bh(&conn->session->lock);
-	__iscsi_put_task(task);
+	iscsi_put_task(task);
 	if (!rc)
 		/* done with this task */
 		conn->task = NULL;
@@ -1155,21 +1183,23 @@ static int iscsi_xmit_task(struct iscsi_conn *conn)
 }
 
 /**
- * iscsi_requeue_task - requeue task to run from session workqueue
+ * iscsi_queue_task - queue task to run from session workqueue
  * @task: task to requeue
  *
  * LLDs that need to run a task from the session workqueue should call
- * this. The session lock must be held. This should only be called
- * by software drivers.
+ * this. This should only be called by software drivers.
+ *
+ * The caller must also have got a refcount on the task which will
+ * be released in the xmit work.
  */
-void iscsi_requeue_task(struct iscsi_task *task)
+void iscsi_queue_task(struct iscsi_task *task)
 {
 	struct iscsi_conn *conn = task->conn;
 
-	list_move_tail(&task->running, &conn->requeue);
+	set_bit(task->itt, conn->queue_map);
 	iscsi_conn_queue_work(conn);
 }
-EXPORT_SYMBOL_GPL(iscsi_requeue_task);
+EXPORT_SYMBOL_GPL(iscsi_queue_task);
 
 /**
  * iscsi_data_xmit - xmit any command into the scheduled connection
@@ -1182,100 +1212,140 @@ EXPORT_SYMBOL_GPL(iscsi_requeue_task);
  **/
 static int iscsi_data_xmit(struct iscsi_conn *conn)
 {
+	struct iscsi_session *session = conn->session;
+	struct iscsi_task *task;
+	unsigned long tag = 0;
 	int rc = 0;
 
-	spin_lock_bh(&conn->session->lock);
-	if (unlikely(conn->suspend_tx)) {
-		ISCSI_DBG_SESSION(conn->session, "Tx suspended!\n");
-		spin_unlock_bh(&conn->session->lock);
+	if (unlikely(test_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx))) {
+		ISCSI_DBG_SESSION(session, "Tx suspended!\n");
 		return -ENODATA;
 	}
 
 	if (conn->task) {
 		rc = iscsi_xmit_task(conn);
 	        if (rc)
-		        goto again;
+			return rc;
 	}
 
-	/*
-	 * process mgmt pdus like nops before commands since we should
-	 * only have one nop-out as a ping from us and targets should not
-	 * overflow us with nop-ins
-	 */
-check_mgmt:
-	while (!list_empty(&conn->mgmtqueue)) {
-		conn->task = list_entry(conn->mgmtqueue.next,
-					 struct iscsi_task, running);
-		if (iscsi_prep_mgmt_task(conn, conn->task)) {
-			__iscsi_put_task(conn->task);
-			conn->task = NULL;
-			continue;
-		}
-		rc = iscsi_xmit_task(conn);
-		if (rc)
-			goto again;
-	}
 
-	/* process pending command queue */
-	while (!list_empty(&conn->xmitqueue)) {
-		if (conn->tmf_state == TMF_QUEUED)
-			break;
+/*
+printk(KERN_ERR "iscsi_data_xmit %d\n",
+	find_first_bit(conn->queue_map, session->cmds_max));
 
-		conn->task = list_entry(conn->xmitqueue.next,
-					 struct iscsi_task, running);
-		if (conn->session->state == ISCSI_STATE_LOGGING_OUT) {
-			fail_command(conn, conn->task, DID_IMM_RETRY << 16);
-			continue;
-		}
-		rc = iscsi_prep_scsi_cmd_pdu(conn->task);
-		if (rc) {
-			if (rc == -ENOMEM) {
-				conn->task = NULL;
-				goto again;
-			} else
-				fail_command(conn, conn->task, DID_ABORT << 16);
+
+printk(KERN_ERR "iscsi_data_xmit %d\n",
+	find_next_bit(conn->queue_map, session->cmds_max, 0));
+
+
+printk(KERN_ERR "iscsi_data_xmit %d\n",
+	find_next_bit(conn->queue_map, session->cmds_max, 1));
+*/
+
+
+
+	/* just loop through once */
+	while ((tag = find_next_bit(conn->queue_map, session->cmds_max, tag)) <
+		session->cmds_max) {
+		task = session->cmds[tag];
+
+//printk(KERN_ERR "iscsi_data_xmit task tag itt 0x%x\n", tag);
+		tag++;
+
+		spin_lock_bh(&task->lock);
+		if (task->state == ISCSI_TASK_COMPLETED) {
+			printk(KERN_ERR "compled tasl???\n");
+			clear_bit(task->itt, conn->queue_map);
+			spin_unlock_bh(&task->lock);
+			iscsi_put_task(task);
 			continue;
 		}
-		rc = iscsi_xmit_task(conn);
-		if (rc)
-			goto again;
-		/*
-		 * we could continuously get new task requests so
-		 * we need to check the mgmt queue for nops that need to
-		 * be sent to aviod starvation
-		 */
-		if (!list_empty(&conn->mgmtqueue))
-			goto check_mgmt;
-	}
-
-	while (!list_empty(&conn->requeue)) {
-		if (conn->session->fast_abort && conn->tmf_state != TMF_INITIAL)
-			break;
 
-		/*
-		 * we always do fastlogout - conn stop code will clean up.
-		 */
-		if (conn->session->state == ISCSI_STATE_LOGGING_OUT)
-			break;
+		if (!task->sc) {
+			if (session->tt->init_task &&
+			    session->tt->init_task(task)) {
+				/* let iscsi eh handle */
+	
+//printk(KERN_ERR "init task fail for mgmt task\n");
+				clear_bit(task->itt, conn->queue_map);
+				spin_unlock_bh(&task->lock);
+				iscsi_put_task(task);
+				continue;
+			}
+			task->state = ISCSI_TASK_SETUP;
+		} else {
+			if (task->state == ISCSI_TASK_QUEUED) {
+				if (conn->tmf_state == TMF_QUEUED) {
+					/*
+					 * skip for now in case it is
+					 * related to a tmf being sent
+					 */
+					spin_unlock_bh(&task->lock);
+					continue;
+				}
+
+				if (session->state == ISCSI_STATE_LOGGING_OUT) {
+					/*
+					 * don't let it sit in queue during
+					 * logout because it could take
+					 * a while.
+					 */
+					spin_unlock_bh(&task->lock);
+					fail_command(conn, task,
+						     DID_IMM_RETRY << 16);
+					iscsi_put_task(task);
+					continue;
+				}
+				if (session->tt->init_task &&
+				    session->tt->init_task(task)) {
+					spin_unlock_bh(&task->lock);
+					fail_command(conn, task,
+						     DID_ABORT << 16);
+					iscsi_put_task(task);
+					continue;
+
+				}
+				task->state = ISCSI_TASK_SETUP;
+			} else if (task->state == ISCSI_TASK_RUNNING) {
+				/* requeue or r2t */
+
+				if (session->fast_abort &&
+				    conn->tmf_state != TMF_INITIAL) {
+					spin_unlock_bh(&task->lock);
+					continue;
+				}
+
+				if (session->state == ISCSI_STATE_LOGGING_OUT) {
+					/*
+					 * we always do fastlogout - conn
+					 * stop code will clean up.
+					 */
+					spin_unlock_bh(&task->lock);
+					continue;
+				}
+			}
+		}
+//printk(KERN_ERR "iscsi_data_xmit task xmitting\n");
+		if (atomic_read(&task->refcount) == 1) {
+			printk(KERN_ERR "state %d, sc %s",
+				task->state, task->sc ? "sc" : "mgmt");
+		}
+		conn->task = task;
+		task->state = ISCSI_TASK_RUNNING;
+		clear_bit(task->itt, conn->queue_map);
+		spin_unlock_bh(&task->lock);
 
-		conn->task = list_entry(conn->requeue.next,
-					 struct iscsi_task, running);
-		conn->task->state = ISCSI_TASK_RUNNING;
-		list_move_tail(conn->requeue.next, &conn->run_list);
 		rc = iscsi_xmit_task(conn);
+		/* release from queueing work */
+		iscsi_put_task(task);
 		if (rc)
-			goto again;
-		if (!list_empty(&conn->mgmtqueue))
-			goto check_mgmt;
+			break;
 	}
-	spin_unlock_bh(&conn->session->lock);
-	return -ENODATA;
 
-again:
-	if (unlikely(conn->suspend_tx))
-		rc = -ENODATA;
-	spin_unlock_bh(&conn->session->lock);
-	return rc;
+	if (rc)
+		return rc;
+	else
+		return -ENODATA;
 }
 
 static void iscsi_xmitworker(struct work_struct *work)
@@ -1291,23 +1361,20 @@ static void iscsi_xmitworker(struct work_struct *work)
 	} while (rc >= 0 || rc == -EAGAIN);
 }
 
-static inline struct iscsi_task *iscsi_alloc_task(struct iscsi_conn *conn,
-						  struct scsi_cmnd *sc)
+static inline struct iscsi_task *iscsi_alloc_task(struct iscsi_conn *conn)
 {
 	struct iscsi_task *task;
 
 	if (!__kfifo_get(conn->session->cmdpool.queue,
-			 (void *) &task, sizeof(void *)))
+			 (void *) &task, sizeof(void *))) {
+		spin_unlock(&conn->session->pool_lock);
 		return NULL;
-
-	sc->SCp.phase = conn->session->age;
-	sc->SCp.ptr = (char *) task;
+	}
 
 	atomic_set(&task->refcount, 1);
-	task->state = ISCSI_TASK_PENDING;
+	task->sc = NULL;
+	task->state = ISCSI_TASK_ALLOCATED;
 	task->conn = conn;
-	task->sc = sc;
-	INIT_LIST_HEAD(&task->running);
 	return task;
 }
 
@@ -1392,37 +1459,54 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
 		goto fault;
 	}
 
-	if (iscsi_check_cmdsn_window_closed(conn)) {
-		reason = FAILURE_WINDOW_CLOSED;
-		goto reject;
-	}
-
-	task = iscsi_alloc_task(conn, sc);
+	task = iscsi_alloc_task(conn);
 	if (!task) {
 		reason = FAILURE_OOM;
 		goto reject;
 	}
-	list_add_tail(&task->running, &conn->xmitqueue);
+
+	spin_lock(&task->lock);
+	sc->SCp.phase = conn->session->age;
+	sc->SCp.ptr = (char *) task;
+	task->sc = sc;
+
+	reason = iscsi_prep_scsi_cmd_pdu(task);
+	if (reason) {
+		spin_unlock(&task->lock);
+		switch (reason) {
+		case -ENOMEM:
+			reason = FAILURE_OOM;
+			goto prepd_reject;
+		case -ENOSPC:
+			reason = FAILURE_WINDOW_CLOSED;
+			goto prepd_reject;
+		default:
+			sc->result = DID_ABORT << 16;
+			goto prepd_fault;
+		}
+	}
 
 	if (session->tt->caps & CAP_DATA_PATH_OFFLOAD) {
-		reason = iscsi_prep_scsi_cmd_pdu(task);
-		if (reason) {
-			if (reason == -ENOMEM) {
-				reason = FAILURE_OOM;
-				goto prepd_reject;
-			} else {
-				sc->result = DID_ABORT << 16;
-				goto prepd_fault;
-			}
+		task->state = ISCSI_TASK_RUNNING;
+
+		if (session->tt->init_task && session->tt->init_task(task)) {	
+			sc->result = DID_ABORT << 16;
+			spin_unlock(&task->lock);
+			goto prepd_fault;
 		}
+		spin_unlock(&task->lock);
+
 		if (session->tt->xmit_task(task)) {
 			reason = FAILURE_SESSION_NOT_READY;
 			goto prepd_reject;
 		}
-	} else
-		iscsi_conn_queue_work(conn);
+	} else {
+		iscsi_get_task(task);
+		task->state = ISCSI_TASK_QUEUED;
+		iscsi_queue_task(task);
+		spin_unlock(&task->lock);
+	}
 
-	session->queued_cmdsn++;
 	spin_unlock(&session->lock);
 	spin_lock(host->host_lock);
 	return 0;
@@ -1601,36 +1685,45 @@ static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn,
 static void fail_all_commands(struct iscsi_conn *conn, unsigned lun,
 			      int error)
 {
-	struct iscsi_task *task, *tmp;
+	struct iscsi_session *session = conn->session;
+	struct iscsi_task *task;
+	int i;
 
-	/* flush pending */
-	list_for_each_entry_safe(task, tmp, &conn->xmitqueue, running) {
-		if (lun == task->sc->device->lun || lun == -1) {
-			ISCSI_DBG_SESSION(conn->session,
-					  "failing pending sc %p itt 0x%x\n",
-					  task->sc, task->itt);
-			fail_command(conn, task, error << 16);
-		}
-	}
+	for (i = 0; i < session->cmds_max; i++) {
+		task = session->cmds[i];
+
+		// todo grab lock!!!!!
+		if (task->state == ISCSI_TASK_COMPLETED || !task->sc)
+			continue;
 
-	list_for_each_entry_safe(task, tmp, &conn->requeue, running) {
-		if (lun == task->sc->device->lun || lun == -1) {
+		if (lun == -1 || lun == task->sc->device->lun) {
 			ISCSI_DBG_SESSION(conn->session,
-					  "failing requeued sc %p itt 0x%x\n",
+					  "failing task sc %p itt 0x%x\n",
 					  task->sc, task->itt);
 			fail_command(conn, task, error << 16);
 		}
 	}
+}
 
-	/* fail all other running */
-	list_for_each_entry_safe(task, tmp, &conn->run_list, running) {
-		if (lun == task->sc->device->lun || lun == -1) {
-			ISCSI_DBG_SESSION(conn->session,
-					 "failing in progress sc %p itt 0x%x\n",
-					 task->sc, task->itt);
-			fail_command(conn, task, error << 16);
-		}
+static void flush_control_queues(struct iscsi_conn *conn)
+{
+	struct iscsi_session *session = conn->session;
+	struct iscsi_task *task;
+	int i;
+
+	for (i = 0; i < session->cmds_max; i++) {
+		task = session->cmds[i];
+
+		// todo grab lock!!!!!
+		if (task->state == ISCSI_TASK_COMPLETED || task->sc)
+			continue;
+
+		ISCSI_DBG_SESSION(session, "failing mgmt task itt 0x%x\n",
+				  task->itt);
+		/* release ref from prep task */
+		iscsi_put_task(task);
 	}
+	conn->task = NULL;
 }
 
 void iscsi_suspend_tx(struct iscsi_conn *conn)
@@ -1812,7 +1905,11 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 		goto success;
 	}
 
-	if (task->state == ISCSI_TASK_PENDING) {
+
+// mnc fix me need task lock
+	if (task->state == ISCSI_TASK_ALLOCATED ||
+	    task->state == ISCSI_TASK_PDU_SETUP ||
+	    task->state == ISCSI_TASK_QUEUED) {
 		fail_command(conn, task, DID_ABORT << 16);
 		goto success;
 	}
@@ -1843,10 +1940,10 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 		 * good and have never sent us a successful tmf response
 		 * then sent more data for the cmd.
 		 */
-		spin_lock(&session->lock);
+		spin_lock_bh(&session->lock);
 		fail_command(conn, task, DID_ABORT << 16);
 		conn->tmf_state = TMF_INITIAL;
-		spin_unlock(&session->lock);
+		spin_unlock_bh(&session->lock);
 		iscsi_start_tx(conn);
 		goto success_unlocked;
 	case TMF_TIMEDOUT:
@@ -2235,13 +2332,15 @@ iscsi_session_setup(struct iscsi_transport *iscsit, struct Scsi_Host *shost,
 	session->abort_timeout = 10;
 	session->scsi_cmds_max = scsi_cmds;
 	session->cmds_max = total_cmds;
-	session->queued_cmdsn = session->cmdsn = initial_cmdsn;
+	session->cmdsn = initial_cmdsn;
 	session->exp_cmdsn = initial_cmdsn + 1;
 	session->max_cmdsn = initial_cmdsn + 1;
 	session->max_r2t = 1;
 	session->tt = iscsit;
 	mutex_init(&session->eh_mutex);
+	spin_lock_init(&session->sn_lock);
 	spin_lock_init(&session->lock);
+	spin_lock_init(&session->pool_lock);
 
 	/* initialize SCSI PDU commands pool */
 	if (iscsi_pool_init(&session->cmdpool, session->cmds_max,
@@ -2256,7 +2355,8 @@ iscsi_session_setup(struct iscsi_transport *iscsit, struct Scsi_Host *shost,
 		if (cmd_task_size)
 			task->dd_data = &task[1];
 		task->itt = cmd_i;
-		INIT_LIST_HEAD(&task->running);
+		task->state = ISCSI_TASK_COMPLETED;
+		spin_lock_init(&task->lock);
 	}
 
 	if (!try_module_get(iscsit->owner))
@@ -2308,6 +2408,19 @@ void iscsi_session_teardown(struct iscsi_cls_session *cls_session)
 }
 EXPORT_SYMBOL_GPL(iscsi_session_teardown);
 
+static int iscsi_conn_setup_queue(struct iscsi_conn *conn)
+{
+	int nr_ulongs = ALIGN(conn->session->cmds_max,
+			      BITS_PER_LONG) / BITS_PER_LONG;
+
+	conn->queue_map = kzalloc(nr_ulongs * sizeof(unsigned long),
+				  GFP_KERNEL);
+	if (!conn->queue_map)
+		return -ENOMEM;
+	INIT_WORK(&conn->xmitwork, iscsi_xmitworker);
+	return 0;
+}
+
 /**
  * iscsi_conn_setup - create iscsi_cls_conn and iscsi_conn
  * @cls_session: iscsi_cls_session
@@ -2342,22 +2455,18 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session, int dd_size,
 	conn->transport_timer.data = (unsigned long)conn;
 	conn->transport_timer.function = iscsi_check_transport_timeouts;
 
-	INIT_LIST_HEAD(&conn->run_list);
-	INIT_LIST_HEAD(&conn->mgmt_run_list);
-	INIT_LIST_HEAD(&conn->mgmtqueue);
-	INIT_LIST_HEAD(&conn->xmitqueue);
-	INIT_LIST_HEAD(&conn->requeue);
-	INIT_WORK(&conn->xmitwork, iscsi_xmitworker);
+	if (iscsi_conn_setup_queue(conn))
+		goto queue_setup_fail;
 
 	/* allocate login_task used for the login/text sequences */
-	spin_lock_bh(&session->lock);
+	spin_lock_bh(&session->pool_lock);
 	if (!__kfifo_get(session->cmdpool.queue,
                          (void*)&conn->login_task,
 			 sizeof(void*))) {
-		spin_unlock_bh(&session->lock);
+		spin_unlock_bh(&session->pool_lock);
 		goto login_task_alloc_fail;
 	}
-	spin_unlock_bh(&session->lock);
+	spin_unlock_bh(&session->pool_lock);
 
 	data = (char *) __get_free_pages(GFP_KERNEL,
 					 get_order(ISCSI_DEF_MAX_RECV_SEG_LEN));
@@ -2374,6 +2483,8 @@ login_task_data_alloc_fail:
 	__kfifo_put(session->cmdpool.queue, (void*)&conn->login_task,
 		    sizeof(void*));
 login_task_alloc_fail:
+	kfree(conn->queue_map);
+queue_setup_fail:
 	iscsi_destroy_conn(cls_conn);
 	return NULL;
 }
@@ -2430,16 +2541,22 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 	/* flush queued up work because we free the connection below */
 	iscsi_suspend_tx(conn);
 
-	spin_lock_bh(&session->lock);
 	free_pages((unsigned long) conn->data,
 		   get_order(ISCSI_DEF_MAX_RECV_SEG_LEN));
 	kfree(conn->persistent_address);
+
+	spin_lock_bh(&session->pool_lock);
 	__kfifo_put(session->cmdpool.queue, (void*)&conn->login_task,
 		    sizeof(void*));
+	spin_unlock_bh(&session->pool_lock);
+
+	spin_lock_bh(&session->lock);
 	if (session->leadconn == conn)
 		session->leadconn = NULL;
 	spin_unlock_bh(&session->lock);
 
+	kfree(conn->queue_map);
+
 	iscsi_destroy_conn(cls_conn);
 }
 EXPORT_SYMBOL_GPL(iscsi_conn_teardown);
@@ -2449,6 +2566,10 @@ int iscsi_conn_start(struct iscsi_cls_conn *cls_conn)
 	struct iscsi_conn *conn = cls_conn->dd_data;
 	struct iscsi_session *session = conn->session;
 
+	if (find_first_bit(conn->queue_map, session->cmds_max) !=
+		session->cmds_max)
+		BUG();
+
 	if (!session) {
 		iscsi_conn_printk(KERN_ERR, conn,
 				  "can't start unbound connection\n");
@@ -2478,7 +2599,6 @@ int iscsi_conn_start(struct iscsi_cls_conn *cls_conn)
 	spin_lock_bh(&session->lock);
 	conn->c_stage = ISCSI_CONN_STARTED;
 	session->state = ISCSI_STATE_LOGGED_IN;
-	session->queued_cmdsn = session->cmdsn;
 
 	conn->last_recv = jiffies;
 	conn->last_ping = jiffies;
@@ -2512,30 +2632,6 @@ int iscsi_conn_start(struct iscsi_cls_conn *cls_conn)
 }
 EXPORT_SYMBOL_GPL(iscsi_conn_start);
 
-static void
-flush_control_queues(struct iscsi_session *session, struct iscsi_conn *conn)
-{
-	struct iscsi_task *task, *tmp;
-
-	/* handle pending */
-	list_for_each_entry_safe(task, tmp, &conn->mgmtqueue, running) {
-		ISCSI_DBG_SESSION(session, "flushing pending mgmt task "
-				  "itt 0x%x\n", task->itt);
-		/* release ref from prep task */
-		__iscsi_put_task(task);
-	}
-
-	/* handle running */
-	list_for_each_entry_safe(task, tmp, &conn->mgmt_run_list, running) {
-		ISCSI_DBG_SESSION(session, "flushing running mgmt task "
-				  "itt 0x%x\n", task->itt);
-		/* release ref from prep task */
-		__iscsi_put_task(task);
-	}
-
-	conn->task = NULL;
-}
-
 static void iscsi_start_session_recovery(struct iscsi_session *session,
 					 struct iscsi_conn *conn, int flag)
 {
@@ -2591,7 +2687,7 @@ static void iscsi_start_session_recovery(struct iscsi_session *session,
 		fail_all_commands(conn, -1, DID_TRANSPORT_DISRUPTED);
 	else
 		fail_all_commands(conn, -1, DID_ERROR);
-	flush_control_queues(session, conn);
+	flush_control_queues(conn);
 	spin_unlock_bh(&session->lock);
 	mutex_unlock(&session->eh_mutex);
 }
diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c
index 91f8ce4..952511f 100644
--- a/drivers/scsi/libiscsi_tcp.c
+++ b/drivers/scsi/libiscsi_tcp.c
@@ -441,7 +441,7 @@ void iscsi_tcp_cleanup_task(struct iscsi_task *task)
 	struct iscsi_r2t_info *r2t;
 
 	/* nothing to do for mgmt or pending tasks */
-	if (!task->sc || task->state == ISCSI_TASK_PENDING)
+	if (!task->sc || task->state != ISCSI_TASK_RUNNING)
 		return;
 
 	/* flush task's r2t queues */
@@ -573,11 +573,12 @@ static int iscsi_tcp_r2t_rsp(struct iscsi_conn *conn, struct iscsi_task *task)
 	r2t->datasn = 0;
 	r2t->sent = 0;
 
+	/* get ref for requeue list - xmit queue will release */
+	iscsi_get_task(task);
 	tcp_task->exp_datasn = r2tsn + 1;
 	__kfifo_put(tcp_task->r2tqueue, (void*)&r2t, sizeof(void*));
 	conn->r2t_pdus_cnt++;
-
-	iscsi_requeue_task(task);
+	iscsi_queue_task(task);
 	return 0;
 }
 
@@ -648,16 +649,15 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 
 	switch(opcode) {
 	case ISCSI_OP_SCSI_DATA_IN:
-		spin_lock(&conn->session->lock);
 		task = iscsi_itt_to_ctask(conn, hdr->itt);
-		if (!task)
+		if (!task || iscsi_get_task_lock(task))
 			rc = ISCSI_ERR_BAD_ITT;
-		else
+		else {
 			rc = iscsi_tcp_data_in(conn, task);
-		if (rc) {
-			spin_unlock(&conn->session->lock);
-			break;
+			iscsi_put_task_lock(task);
 		}
+		if (rc)
+			break;
 
 		if (tcp_conn->in.datalen) {
 			struct iscsi_tcp_task *tcp_task = task->dd_data;
@@ -687,11 +687,9 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 						   tcp_conn->in.datalen,
 						   iscsi_tcp_process_data_in,
 						   rx_hash);
-			spin_unlock(&conn->session->lock);
 			return rc;
 		}
-		rc = __iscsi_complete_pdu(conn, hdr, NULL, 0);
-		spin_unlock(&conn->session->lock);
+		rc = iscsi_complete_pdu(conn, hdr, NULL, 0);
 		break;
 	case ISCSI_OP_SCSI_CMD_RSP:
 		if (tcp_conn->in.datalen) {
@@ -701,17 +699,19 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 		rc = iscsi_complete_pdu(conn, hdr, NULL, 0);
 		break;
 	case ISCSI_OP_R2T:
-		spin_lock(&conn->session->lock);
 		task = iscsi_itt_to_ctask(conn, hdr->itt);
-		if (!task)
+		if (!task || iscsi_get_task_lock(task)) {
 			rc = ISCSI_ERR_BAD_ITT;
-		else if (ahslen)
+			break;
+		}
+
+		if (ahslen)
 			rc = ISCSI_ERR_AHSLEN;
 		else if (task->sc->sc_data_direction == DMA_TO_DEVICE)
 			rc = iscsi_tcp_r2t_rsp(conn, task);
 		else
 			rc = ISCSI_ERR_PROTO;
-		spin_unlock(&conn->session->lock);
+		iscsi_put_task_lock(task);
 		break;
 	case ISCSI_OP_LOGIN_RSP:
 	case ISCSI_OP_TEXT_RSP:
@@ -954,14 +954,12 @@ EXPORT_SYMBOL_GPL(iscsi_tcp_task_init);
 
 static struct iscsi_r2t_info *iscsi_tcp_get_curr_r2t(struct iscsi_task *task)
 {
-	struct iscsi_session *session = task->conn->session;
 	struct iscsi_tcp_task *tcp_task = task->dd_data;
 	struct iscsi_r2t_info *r2t = NULL;
 
 	if (iscsi_task_has_unsol_data(task))
 		r2t = &task->unsol_r2t;
 	else {
-		spin_lock_bh(&session->lock);
 		if (tcp_task->r2t) {
 			r2t = tcp_task->r2t;
 			/* Continue with this R2T? */
@@ -980,7 +978,6 @@ static struct iscsi_r2t_info *iscsi_tcp_get_curr_r2t(struct iscsi_task *task)
 				    (void *)&tcp_task->r2t, sizeof(void *));
 			r2t = tcp_task->r2t;
 		}
-		spin_unlock_bh(&session->lock);
 	}
 
 	return r2t;
@@ -1018,16 +1015,25 @@ flush:
 	if (task->sc->sc_data_direction != DMA_TO_DEVICE)
 		return 0;
 
+	if (iscsi_get_task_lock(task)) {
+		ISCSI_DBG_TCP(conn, "Dropping failed task 0x%x\n",
+			      task->itt);
+		return 0;
+	}
+		
 	r2t = iscsi_tcp_get_curr_r2t(task);
 	if (r2t == NULL) {
+		iscsi_put_task_lock(task);
 		/* Waiting for more R2Ts to arrive. */
 		ISCSI_DBG_TCP(conn, "no R2Ts yet\n");
 		return 0;
 	}
 
 	rc = conn->session->tt->alloc_pdu(task, ISCSI_OP_SCSI_DATA_OUT);
-	if (rc)
+	if (rc) {
+		iscsi_put_task_lock(task);
 		return rc;
+	}
 	iscsi_prep_data_out_pdu(task, r2t, (struct iscsi_data *) task->hdr);
 
 	ISCSI_DBG_TCP(conn, "sol dout %p [dsn %d itt 0x%x doff %d dlen %d]\n",
@@ -1036,9 +1042,12 @@ flush:
 
 	rc = conn->session->tt->init_pdu(task, r2t->data_offset + r2t->sent,
 					 r2t->data_count);
-	if (rc)
+	if (rc) {
+		iscsi_put_task_lock(task);
 		return rc;
+	}
 	r2t->sent += r2t->data_count;
+	iscsi_put_task_lock(task);
 	goto flush;
 }
 EXPORT_SYMBOL_GPL(iscsi_tcp_task_xmit);
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 7ffaed2..80ab781 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -82,7 +82,10 @@ enum {
 
 enum {
 	ISCSI_TASK_COMPLETED,
-	ISCSI_TASK_PENDING,
+	ISCSI_TASK_ALLOCATED,
+	ISCSI_TASK_PDU_SETUP,
+	ISCSI_TASK_QUEUED,
+	ISCSI_TASK_SETUP,
 	ISCSI_TASK_RUNNING,
 };
 
@@ -121,10 +124,10 @@ struct iscsi_task {
 	struct scsi_cmnd	*sc;		/* associated SCSI cmd*/
 	struct iscsi_conn	*conn;		/* used connection    */
 
-	/* state set/tested under session->lock */
+	/* state set/tested under task->lock */
+	spinlock_t		lock;		/* protects task state, */
 	int			state;
 	atomic_t		refcount;
-	struct list_head	running;	/* running cmd list */
 	void			*dd_data;	/* driver/transport data */
 };
 
@@ -179,12 +182,8 @@ struct iscsi_conn {
 	struct iscsi_task	*task;		/* xmit task in progress */
 
 	/* xmit */
-	struct list_head	mgmtqueue;	/* mgmt (control) xmit queue */
-	struct list_head	mgmt_run_list;	/* list of control tasks */
-	struct list_head	xmitqueue;	/* data-path cmd queue */
-	struct list_head	run_list;	/* list of cmds in progress */
-	struct list_head	requeue;	/* tasks needing another run */
-	struct work_struct	xmitwork;	/* per-conn. xmit workqueue */
+	unsigned long		*queue_map;
+	struct work_struct	xmitwork;
 	unsigned long		suspend_tx;	/* suspend Tx */
 	unsigned long		suspend_rx;	/* suspend Rx */
 
@@ -255,9 +254,6 @@ struct iscsi_session {
 	uint32_t		exp_cmdsn;
 	uint32_t		max_cmdsn;
 
-	/* This tracks the reqs queued into the initiator */
-	uint32_t		queued_cmdsn;
-
 	/* configuration */
 	int			abort_timeout;
 	int			lu_reset_timeout;
@@ -284,12 +280,11 @@ struct iscsi_session {
 	struct iscsi_transport	*tt;
 	struct Scsi_Host	*host;
 	struct iscsi_conn	*leadconn;	/* leading connection */
+	spinlock_t		sn_lock;	/* protects sequence nrs */
+	/* this will be removed in next patch */
+	spinlock_t		pool_lock;	/* protects cmd pool, */
 	spinlock_t		lock;		/* protects session state, *
-						 * sequence numbers,       *
-						 * session resources:      *
-						 * - cmdpool,		   *
-						 * - mgmtpool,		   *
-						 * - r2tpool		   */
+						 * session resources:      */
 	int			state;		/* session state           */
 	int			age;		/* counts session re-opens */
 
@@ -400,13 +395,13 @@ extern int iscsi_conn_send_pdu(struct iscsi_cls_conn *, struct iscsi_hdr *,
 				char *, uint32_t);
 extern int iscsi_complete_pdu(struct iscsi_conn *, struct iscsi_hdr *,
 			      char *, int);
-extern int __iscsi_complete_pdu(struct iscsi_conn *, struct iscsi_hdr *,
-				char *, int);
 extern int iscsi_verify_itt(struct iscsi_conn *, itt_t);
 extern struct iscsi_task *iscsi_itt_to_ctask(struct iscsi_conn *, itt_t);
-extern void iscsi_requeue_task(struct iscsi_task *task);
+extern void iscsi_queue_task(struct iscsi_task *task);
 extern void iscsi_put_task(struct iscsi_task *task);
-extern void __iscsi_get_task(struct iscsi_task *task);
+extern void iscsi_get_task(struct iscsi_task *task);
+extern int iscsi_get_task_lock(struct iscsi_task *task);
+extern void iscsi_put_task_lock(struct iscsi_task *task);
 
 /*
  * generic helpers
diff --git a/include/scsi/libiscsi_tcp.h b/include/scsi/libiscsi_tcp.h
index 9e3182e..efa9dc2 100644
--- a/include/scsi/libiscsi_tcp.h
+++ b/include/scsi/libiscsi_tcp.h
@@ -76,6 +76,7 @@ struct iscsi_tcp_conn {
 };
 
 struct iscsi_tcp_task {
+	spinlock_t		lock;
 	uint32_t		exp_datasn;	/* expected target's R2TSN/DataSN */
 	int			data_offset;
 	struct iscsi_r2t_info	*r2t;		/* in progress solict R2T */
diff --git a/block/blk-tag.c b/block/blk-tag.c
index 3c518e3..0614faf 100644
--- a/block/blk-tag.c
+++ b/block/blk-tag.c
@@ -106,7 +106,7 @@ EXPORT_SYMBOL(blk_queue_free_tags);
 static int
 init_tag_map(struct request_queue *q, struct blk_queue_tag *tags, int depth)
 {
-	struct request **tag_index;
+	void **tag_index;
 	unsigned long *tag_map;
 	int nr_ulongs;
 
@@ -116,7 +116,7 @@ init_tag_map(struct request_queue *q, struct blk_queue_tag *tags, int depth)
 		       __func__, depth);
 	}
 
-	tag_index = kzalloc(depth * sizeof(struct request *), GFP_ATOMIC);
+	tag_index = kzalloc(depth * sizeof(void *), GFP_ATOMIC);
 	if (!tag_index)
 		goto fail;
 
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(blk_queue_init_tags);
 int blk_queue_resize_tags(struct request_queue *q, int new_depth)
 {
 	struct blk_queue_tag *bqt = q->queue_tags;
-	struct request **tag_index;
+	void **tag_index;
 	unsigned long *tag_map;
 	int max_depth, nr_ulongs;
 
@@ -254,7 +254,7 @@ int blk_queue_resize_tags(struct request_queue *q, int new_depth)
 	if (init_tag_map(q, bqt, new_depth))
 		return -ENOMEM;
 
-	memcpy(bqt->tag_index, tag_index, max_depth * sizeof(struct request *));
+	memcpy(bqt->tag_index, tag_index, max_depth * sizeof(void *));
 	nr_ulongs = ALIGN(max_depth, BITS_PER_LONG) / BITS_PER_LONG;
 	memcpy(bqt->tag_map, tag_map, nr_ulongs * sizeof(unsigned long));
 
@@ -265,24 +265,12 @@ int blk_queue_resize_tags(struct request_queue *q, int new_depth)
 EXPORT_SYMBOL(blk_queue_resize_tags);
 
 /**
- * blk_queue_end_tag - end tag operations for a request
- * @q:  the request queue for the device
- * @rq: the request that has completed
- *
- *  Description:
- *    Typically called when end_that_request_first() returns %0, meaning
- *    all transfers have been done for a request. It's important to call
- *    this function before end_that_request_last(), as that will put the
- *    request back on the free list thus corrupting the internal tag list.
- *
- *  Notes:
- *   queue lock must be held.
+ * blk_map_end_tag - end tag operation
+ * @bqt: block queue tag
+ * @tag: tag to clear
  **/
-void blk_queue_end_tag(struct request_queue *q, struct request *rq)
+void blk_map_end_tag(struct blk_queue_tag *bqt, int tag)
 {
-	struct blk_queue_tag *bqt = q->queue_tags;
-	int tag = rq->tag;
-
 	BUG_ON(tag == -1);
 
 	if (unlikely(tag >= bqt->real_max_depth))
@@ -292,10 +280,6 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
 		 */
 		return;
 
-	list_del_init(&rq->queuelist);
-	rq->cmd_flags &= ~REQ_QUEUED;
-	rq->tag = -1;
-
 	if (unlikely(bqt->tag_index[tag] == NULL))
 		printk(KERN_ERR "%s: tag %d is missing\n",
 		       __func__, tag);
@@ -313,9 +297,65 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
 	 */
 	clear_bit_unlock(tag, bqt->tag_map);
 }
+EXPORT_SYMBOL(blk_map_end_tag);
+
+/**
+ * blk_queue_end_tag - end tag operations for a request
+ * @q:  the request queue for the device
+ * @rq: the request that has completed
+ *
+ *  Description:
+ *    Typically called when end_that_request_first() returns %0, meaning
+ *    all transfers have been done for a request. It's important to call
+ *    this function before end_that_request_last(), as that will put the
+ *    request back on the free list thus corrupting the internal tag list.
+ *
+ *  Notes:
+ *   queue lock must be held.
+ **/
+void blk_queue_end_tag(struct request_queue *q, struct request *rq)
+{
+	blk_map_end_tag(q->queue_tags, rq->tag);
+
+	list_del_init(&rq->queuelist);
+	rq->cmd_flags &= ~REQ_QUEUED;
+	rq->tag = -1;
+}
 EXPORT_SYMBOL(blk_queue_end_tag);
 
 /**
+ * blk_map_start_tag - find a free tag
+ * @bqt: block queue tag
+ * @object: object to store in bqt tag_index at index returned by tag
+ * @offset: offset into bqt tag map
+ **/
+int blk_map_start_tag(struct blk_queue_tag *bqt, void *object, unsigned offset)
+{
+	unsigned max_depth;
+	int tag;
+
+	/*
+	 * Protect against shared tag maps, as we may not have exclusive
+	 * access to the tag map.
+	 */
+	max_depth = bqt->max_depth;
+	do {
+		tag = find_next_zero_bit(bqt->tag_map, max_depth, offset);
+		if (tag >= max_depth)
+			return -1;
+
+	} while (test_and_set_bit_lock(tag, bqt->tag_map));
+	/*
+	 * We need lock ordering semantics given by test_and_set_bit_lock.
+	 * See blk_map_end_tag for details.
+	 */
+
+	bqt->tag_index[tag] = object;
+	return tag;
+}
+EXPORT_SYMBOL(blk_map_start_tag);
+
+/**
  * blk_queue_start_tag - find a free tag and assign it
  * @q:  the request queue for the device
  * @rq:  the block request that needs tagging
@@ -347,10 +387,8 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq)
 		BUG();
 	}
 
+
 	/*
-	 * Protect against shared tag maps, as we may not have exclusive
-	 * access to the tag map.
-	 *
 	 * We reserve a few tags just for sync IO, since we don't want
 	 * to starve sync IO on behalf of flooding async IO.
 	 */
@@ -360,20 +398,12 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq)
 	else
 		offset = max_depth >> 2;
 
-	do {
-		tag = find_next_zero_bit(bqt->tag_map, max_depth, offset);
-		if (tag >= max_depth)
-			return 1;
-
-	} while (test_and_set_bit_lock(tag, bqt->tag_map));
-	/*
-	 * We need lock ordering semantics given by test_and_set_bit_lock.
-	 * See blk_queue_end_tag for details.
-	 */
+	tag = blk_map_start_tag(bqt, rq, offset);
+	if (tag < 0)
+		return 1;
 
 	rq->cmd_flags |= REQ_QUEUED;
 	rq->tag = tag;
-	bqt->tag_index[tag] = rq;
 	blkdev_dequeue_request(rq);
 	list_add(&rq->queuelist, &q->tag_busy_list);
 	return 0;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 465d6ba..d748261 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -290,7 +290,7 @@ enum blk_queue_state {
 };
 
 struct blk_queue_tag {
-	struct request **tag_index;	/* map of busy tags */
+	void **tag_index;		/* map of busy tags */
 	unsigned long *tag_map;		/* bit map of free/busy tags */
 	int busy;			/* current depth */
 	int max_depth;			/* what we will send to device */
@@ -904,6 +904,8 @@ extern int blk_queue_resize_tags(struct request_queue *, int);
 extern void blk_queue_invalidate_tags(struct request_queue *);
 extern struct blk_queue_tag *blk_init_tags(int);
 extern void blk_free_tags(struct blk_queue_tag *);
+extern int blk_map_start_tag(struct blk_queue_tag *, void *, unsigned);
+extern void blk_map_end_tag(struct blk_queue_tag *, int);
 
 static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 						int tag)

Reply via email to