Re: [PATCH 3/4] libiscsi: Set the 'FINAL' bit to terminate data transfer on TMF

2009-08-13 Thread Hannes Reinecke

Mike Christie wrote:
> Hannes Reinecke wrote:
>> Whenever we send a Data-Out response to an affected LUN during
>> LU Reset, we should be setting the 'FINAL' bit. This will
>> indicate to the target that we consider this transfer finished.
>>
>> Signed-off-by: Hannes Reinecke 
>> ---
>>  drivers/scsi/libiscsi.c |4 
>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
>> index a0d1217..16d35f0 100644
>> --- a/drivers/scsi/libiscsi.c
>> +++ b/drivers/scsi/libiscsi.c
>> @@ -321,6 +321,10 @@ static int iscsi_check_tmf_restrictions(struct
>> iscsi_task *task, int opcode)
>>task->itt, task->hdr_itt, hdr_lun);
>>  return -EACCES;
>>  }
>> +/*
>> + * Set FINAL flag to terminate data transfer.
>> + */
>> +task->hdr->flags |= ISCSI_FLAG_CMD_FINAL;
> 
> 
> Wasn't this in the first patch before?
> 
Yes. But I split it off as it's a different issue.
And this one apparently requires more discussion :-)

> The data-out hdr is not prepd yet.  The entire thing gets memset'd, so
> ORing it does not help.
> 
Yes, and no.
We're hitting this section for Data-Out PDUs only, ie for tasks on the
'requeue' list. All other tasks (for which iscsi_prep_scsi_cmd() would
be called) are filtered out here:

if (opcode != ISCSI_OP_SCSI_DATA_OUT) {
iscsi_conn_printk(KERN_INFO, conn,

as 'opcode' is the parameter passed during invocation, not the actual
opcode of the task.
So the 'FINAL' bit will only be set when we're here:

list_for_each_entry_safe(task, tmptask, &conn->requeue, running) {
/*
 * we always do fastlogout - conn stop code will clean up.
 */
if (conn->session->state == ISCSI_STATE_LOGGING_OUT)
break;

if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
continue;

conn->task = task;
list_del_init(&conn->task->running);
if (conn->task->state == ISCSI_TASK_PENDING)
conn->task->state = ISCSI_TASK_RUNNING;
if (conn->conn_debug)
iscsi_conn_printk(KERN_INFO, conn,
  "requeue cmdpdu [itt 0x%x cmdsn %u "
  "lun %u] xmit\n", conn->task->itt,
  be32_to_cpu(conn->task->cmdsn),
  conn->task->sc ? 
conn->task->sc->device->lun : -1);
rc = iscsi_xmit_task(conn);

and there we don't do any header prep()ing.

> Plus you only set the F bit for the final PDU. This would send multiple
> PDUs with the final bit set. I think the point of sending a pdu with the
> F bit set is that you do not have to send all the data-outs for the
> sequence (if a tmf was executing then it does not have to wait for MB of
> data). Just send one with the F bit set.
> 
Yes, that was the plan. But I'm not _that_ firm with the R2T code, so
the following argumentation might be wrong.
So, I was under the impression that we're only ever send 1 Data-Out
PDU in response to a R2T. However, there might be more than one
R2T/Data-Out pairs exchanged until the entire data is transferred.
The idea here was that we're setting the 'FINAL' flag for a Data-Out
PDU whenever the data transfer should be stopped.
This _should_ induce the Target to stop sending us R2Ts.
If it does, the transfer is terminated and everyone's happy.
If it doesn't (ie it'll ignore the FINAL flag) we would still
have to send a Data-Out Response.

> You want to modify iscsi_prep_data_out_pdu to check if a tmf would
> affect that task, and if so then hit the final bit on it. We probably
> also want to indicate to the caller that we ended it early so it it can
> stop sending more data-outs for that r2t.
> 
Does it make a difference?
When changing iscsi_prep_data_out_pdu() we would be modifying the
header when it's being prep()ed, ie after the SCSI cmd PDU is sent.
But then the task is being rescheduled on the 'requeue' list after
a R2T has been received, and my patch sends the flag just before
it's being sent.
Or am I wrong?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 1/4] libiscsi: Check TMF state before sending PDU

2009-08-13 Thread Hannes Reinecke

Mike Christie wrote:
> Hannes Reinecke wrote:
>> Before we're trying to send a PDU we have to check whether a TMF
>> is active. If so and if the PDU will be affected by the TMF
>> we should allow only Data-out PDUs to be sent, or, if
>> fast_abort is set, no PDUs at all.
> 
> You sort of changed the behavior of the code and it does not match the
> description.
> 
> The fast_abort setting was a hack and not the iscsi RFC one as we
> discussed before. It used to apply to aborts and lu resets. Now, it only
> applies to lu resets.
> 
I know it's not the RFC one.
Only I wasn't quite sure what to do with it for aborts;
the current implemented 'fast_abort' logic I deciphered as
'terminate transfer as fast as possible'.
So for LU Resets we wouldn't transfer any data to the affected
LUN, and for aborts with would just have to block the task
with the corresponding itt.

> Was this intentional?
> 
Not really. But was should be done for aborts when fast_abort
is set?
We only have to block the task with the corresponding itt, no?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [RFC] Dynamically adjust shost->can_queue

2009-08-13 Thread Mike Christie

Mike Christie wrote:
> 
> We were discussing this on the fcoe list, and I mentioned it here on the 
> scsi cmd timer patch. What I thought we might want to do is instead of 
> doing it based on cmdsn window rejects which can sometimes be static, we 
> can take into account how often commands are timing while they are still 
> pending. For example if a cmd is still pending and the scsi command 
> timer fires, then can_queue is probably too high at that time.
> 
> But we probably also need to check the cmdsn window rejects, because I 
> think that can also change on some targets. So it can be separate 
> patches, and I am not against what you are trying to do.

Actually, I am not sure what it is buying us except extra complexity 
just so we can avoid a debug printk. What is the reason for the 
threshhold code? In your previous patch you just set the can_queue to 
the cmdsn window size, which seemed fine except the window size could be 
larger than the number of commands that user had set for 
node.session.cmds_max.


> 
> We could also then add some code to try and dynamically increase it like 
> Vasu is doing for the device queue_depth. This can_queue rampup would be 
> for the target port resources/limitations instead of Vasu's device one.
> 

Oh yeah, just to be clear when I say can_queue I sometimes meant both 
and sometimes just mean one or the other. The host port can be connected 
to multiple target ports. Then the target port could be connected to 
multiple initiator ports. Some of the ports can be on the same target or 
initiator, but they can also be on completely different boxes, so you 
end up with both the host and target can_queue limits fluctuating for 
different reasons. In the end I think you would want to handle both.

For the target can_queue you could do something where we never send more 
than cmdsn window commands, then you could time commands and track if it 
was taking longer or shorter with different number of commands and you 
cold track if cmds were timing out, and depending on the results you 
could increase/decrease the target can_queue similar to what is done 
with the fc drives and the device queue_depth.

We would want to do something similar for the host can queue too, and 
then also we will want to hook into Vasu's device queue_depth 
rampup/down code as well. Also did you try the QUEUE_FULL patch I sent 
with the MSA target?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 4/4] libiscsi: Flush workqueue before suspend

2009-08-13 Thread Mike Christie

Mike Christie wrote:
> some additions:
> 

Ugh, sorry, one more change. If iscsi recovery kicks in due to a logout, 
iscsi/conn error or someone just tryiing to shutdown the session then we 
could be calling the ep_disconnect code or conn stop.

> 
> Mike Christie wrote:
>> wait_on_commands()
>> {
>   while (check_restrictions(conn->task) &&
>   session->state == LOGGED_IN &&
!test_bit conn->suspend_tx
mutex_unlock(eh mutex)
>   wait for a while
mutex_lock (eh_mutex);
> 
>>  loop for each cmd on requeue list
>   if (session->state != LOGGED_IN ||
   !test_bit conn->suspend_tx)
>   break;
> 
>>  if (check_restrictions(task))
mutex_unlock(eh mutex)
>>  wait for a while then restart loop and check
mutex_lock (eh_mutex);
>> }
>>
> 
> > 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 4/4] libiscsi: Flush workqueue before suspend

2009-08-13 Thread Mike Christie

some additions:


Mike Christie wrote:
> wait_on_commands()
> {
while (check_restrictions(conn->task) &&
session->state == LOGGED_IN)
wait for a while

>   loop for each cmd on requeue list
if (session->state != LOGGED_IN)
break;

>   if (check_restrictions(task))
>   wait for a while then restart loop and check
> }
> 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 3/4] libiscsi: Set the 'FINAL' bit to terminate data transfer on TMF

2009-08-13 Thread Mike Christie

Mike Christie wrote:
> Plus you only set the F bit for the final PDU. This would send multiple 

I mean, I think you only want to set the F bit for the final pdu, and in 
this case we would just send the one pdu with the F bit set.


> PDUs with the final bit set. I think the point of sending a pdu with the 
> F bit set is that you do not have to send all the data-outs for the 
> sequence (if a tmf was executing then it does not have to wait for MB of 
> data). Just send one with the F bit set.
> 
> You want to modify iscsi_prep_data_out_pdu to check if a tmf would 
> affect that task, and if so then hit the final bit on it. We probably 
> also want to indicate to the caller that we ended it early so it it can 
> stop sending more data-outs for that r2t.
> 
> 
>>  break;
>>  case ISCSI_TM_FUNC_ABORT_TASK:
>>  /*
> 
> 
> > 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 4/4] libiscsi: Flush workqueue before suspend

2009-08-13 Thread Mike Christie

Hannes Reinecke wrote:
> 
> If 'fast_abort' is not set we should be sending all outstanding
> Data-out PDUs to the target, even after we have received the
> tmf response. So we should make sure that the workqueue is really
> flushed and all PDUs have been processed before setting the
> 'SUSPEND' bit.

I tried to say this in the previous mails. Let me know if you are also 
disagreeing with this or maybe just missed it.

iscsi_tcp, iser and cxgb3i can return from xmit_task when some buffer or 
window closes. It can do this at any time, and it could leave a partial 
pdu sent or if we are still trying to send all the data-outs for a 
sequence then it could get stuck in the middle of that.

Instead of suspend_tx, I think we have to add some wait_on_cmds type of 
function where we wait for commends affected by the tmf to be sent.

So something like

wait_on_commands()
{
loop for each cmd on requeue list
if (check_restrictions(task))
wait for a while then restart loop and check
}

// Instead of the wait we could also add some per task callback that 
gets called when the data/pdus are finally sent, but that is probably 
overkill.



eh_lu_reset()
{

if (tmf successful)
wait_on_commands()
fail_scsi_tasks()
}


> 
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/libiscsi.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 16d35f0..5606c4d 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1880,9 +1880,9 @@ void iscsi_suspend_tx(struct iscsi_conn *conn)
>   struct Scsi_Host *shost = conn->session->host;
>   struct iscsi_host *ihost = shost_priv(shost);
>  
> - set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
>   if (ihost->workq)
>   flush_workqueue(ihost->workq);
> + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
>  }
>  EXPORT_SYMBOL_GPL(iscsi_suspend_tx);
>  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 3/4] libiscsi: Set the 'FINAL' bit to terminate data transfer on TMF

2009-08-13 Thread Mike Christie

Hannes Reinecke wrote:
> Whenever we send a Data-Out response to an affected LUN during
> LU Reset, we should be setting the 'FINAL' bit. This will
> indicate to the target that we consider this transfer finished.
> 
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/libiscsi.c |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index a0d1217..16d35f0 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -321,6 +321,10 @@ static int iscsi_check_tmf_restrictions(struct 
> iscsi_task *task, int opcode)
> task->itt, task->hdr_itt, hdr_lun);
>   return -EACCES;
>   }
> + /*
> +  * Set FINAL flag to terminate data transfer.
> +  */
> + task->hdr->flags |= ISCSI_FLAG_CMD_FINAL;


Wasn't this in the first patch before?

The data-out hdr is not prepd yet.  The entire thing gets memset'd, so 
ORing it does not help.

Plus you only set the F bit for the final PDU. This would send multiple 
PDUs with the final bit set. I think the point of sending a pdu with the 
F bit set is that you do not have to send all the data-outs for the 
sequence (if a tmf was executing then it does not have to wait for MB of 
data). Just send one with the F bit set.

You want to modify iscsi_prep_data_out_pdu to check if a tmf would 
affect that task, and if so then hit the final bit on it. We probably 
also want to indicate to the caller that we ended it early so it it can 
stop sending more data-outs for that r2t.


>   break;
>   case ISCSI_TM_FUNC_ABORT_TASK:
>   /*


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [RFC] Dynamically adjust shost->can_queue

2009-08-13 Thread Mike Christie

Hannes Reinecke wrote:
> Hi Mike,
> 
> I got fed up with the large number of rejects I've seen from queuecommand
> due to the CmdSN window closing on us.

Does this affect performance or is the major problem debug printks?

> So I put together a patch for dynamically adjusting shost->can_queue
> depending on the number of rejects.
> 
> Purely as a proof-on-concept, but works out quite nicely.
> I did on purpose _not_ calling the change_queue_depth callback
> as this would induce quite some overhead here with very
> little gain, as shost->can_queue seems to be more appropriate
> here as we're deducing the number of commands for the host/target,
> not the LUN.
> 
> Interesting though, both NetApp and MSA level out at around
> 64 commands. Maybe we should consider changing the default ...
> 

I think a netapp we have here can go to 256 or 128.




diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 0356e3d..a4d6eda 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1592,7 +1743,7 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void 
(*done)(struct scsi_cmnd *))
struct iscsi_cls_session *cls_session;
struct Scsi_Host *host;
struct iscsi_host *ihost;
-   int reason = 0;
+   int reason = 0, adjust_qdepth = 0;
struct iscsi_session *session;
struct iscsi_conn *conn;
struct iscsi_task *task = NULL;
@@ -1684,27 +1835,53 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, 
void (*done)(struct scsi_cmnd *))
}

session->queued_cmdsn++;
+   if (session->reject_count < 20) {
+   adjust_qdepth = 1;
+   session->reject_count = 100;
+   } else {
+   session->reject_count--;
+   }
spin_unlock(&session->lock);
spin_lock(host->host_lock);
+   if (adjust_qdepth == 1) {
+   host->can_queue++;


I think this should be starget->can_queue, because bnx2i and cxgb3i can 
have multiple sessions per host.


You also want to add some code so you do not go over 
starget->can_queue, becuase we run out of preallocated cmds, and you 
will hit the same reject printks. Also we do not want to go over this 
value in some cases, because users might have set this value lower than 
the target cmdns window because the network could be slow and the target 
just tells us about what it can handle in a normal setup or wrt its 
resources.

We were discussing this on the fcoe list, and I mentioned it here on the 
scsi cmd timer patch. What I thought we might want to do is instead of 
doing it based on cmdsn window rejects which can sometimes be static, we 
can take into account how often commands are timing while they are still 
pending. For example if a cmd is still pending and the scsi command 
timer fires, then can_queue is probably too high at that time.

But we probably also need to check the cmdsn window rejects, because I 
think that can also change on some targets. So it can be separate 
patches, and I am not against what you are trying to do.

We could also then add some code to try and dynamically increase it like 
Vasu is doing for the device queue_depth. This can_queue rampup would be 
for the target port resources/limitations instead of Vasu's device one.



+   iscsi_session_printk(KERN_INFO, session,
+"adjusted queue depth to %u\n",
+host->can_queue);


For a real patch you probably want to make this a debug printk. We could 
end up with a lot of printks.

+   }
return 0;

  prepd_reject:
sc->scsi_done = NULL;
iscsi_complete_task(task, ISCSI_TASK_COMPLETED);
  reject:
-   spin_unlock(&session->lock);
ISCSI_DBG_SESSION(session, "cmd 0x%x rejected (%d)\n",
  sc->cmnd[0], reason);
+
+   if (session->reject_count > 200) {
+   adjust_qdepth = -1;
+   session->reject_count = 100;
+   } else {
+   session->reject_count++;
+   }
+   spin_unlock(&session->lock);
spin_lock(host->host_lock);
+   if (adjust_qdepth == -1) {
+   host->can_queue--;
+   iscsi_session_printk(KERN_INFO, session,
+"adjusted queue depth to %u\n",
+host->can_queue);
+   }
return SCSI_MLQUEUE_TARGET_BUSY;

  prepd_fault:
sc->scsi_done = NULL;
iscsi_complete_task(task, ISCSI_TASK_COMPLETED);
  fault:
-   spin_unlock(&session->lock);
+   session->fault_count++;
ISCSI_DBG_SESSION(session, "iscsi: cmd 0x%x is not queued (%d)\n",
  sc->cmnd[0], reason);
+   spin_unlock(&session->lock);
if (!scsi_bidi_cmnd(sc))
scsi_set_resid(sc, scsi_bufflen(sc));
else {
@@ -2793,6 +2982,7 @@ int iscsi_conn_start(struct iscsi_cls_conn *cls_conn)

conn->last_recv = jiffies;
conn->last_ping = jiffie

Re: [PATCH 1/4] libiscsi: Check TMF state before sending PDU

2009-08-13 Thread Mike Christie

Hannes Reinecke wrote:
> Before we're trying to send a PDU we have to check whether a TMF
> is active. If so and if the PDU will be affected by the TMF
> we should allow only Data-out PDUs to be sent, or, if
> fast_abort is set, no PDUs at all.

You sort of changed the behavior of the code and it does not match the 
description.

The fast_abort setting was a hack and not the iscsi RFC one as we 
discussed before. It used to apply to aborts and lu resets. Now, it only 
applies to lu resets.

Was this intentional?


> 
> Signed-off-by: Mike Christie 
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/libiscsi.c|  112 
> +++-
>  include/scsi/iscsi_proto.h |2 +
>  2 files changed, 102 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index ef08f10..fd34347 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -263,6 +263,88 @@ static int iscsi_prep_bidi_ahs(struct iscsi_task *task)
>  }
>  
>  /**
> + * iscsi_check_tmf_restrictions - check if a task is affected by TMF
> + * @task: iscsi task
> + * @opcode: opcode to check for
> + *
> + * During TMF a task has to be checked if it's affected.
> + * All unrelated I/O can be passed through, but I/O to the
> + * affected LUN should be restricted.
> + * If 'fast_abort' is set we won't be sending any I/O to the
> + * affected LUN.
> + * Otherwise the target is waiting for all TTTs to be completed,
> + * so we have to send all outstanding Data-Out PDUs to the target.
> + */
> +static int iscsi_check_tmf_restrictions(struct iscsi_task *task, int opcode)
> +{
> + struct iscsi_conn *conn = task->conn;
> + struct iscsi_tm *tmf = &conn->tmhdr;
> + unsigned int hdr_lun;
> +
> + if (conn->tmf_state == TMF_INITIAL)
> + return 0;
> +
> + if ((tmf->opcode & ISCSI_OPCODE_MASK) != ISCSI_OP_SCSI_TMFUNC)
> + return 0;
> +
> + switch (ISCSI_TM_FUNC_VALUE(tmf)) {
> + case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
> + /*
> +  * Allow PDUs for unrelated LUNs
> +  */
> + hdr_lun = scsilun_to_int((struct scsi_lun *)tmf->lun);
> + if (hdr_lun != task->sc->device->lun)
> + return 0;
> +
> + /*
> +  * Fail all SCSI cmd PDUs
> +  */
> + if (opcode != ISCSI_OP_SCSI_DATA_OUT) {
> + iscsi_conn_printk(KERN_INFO, conn,
> +   "task [op %x/%x itt "
> +   "0x%x/0x%x lun %u] "
> +   "rejected.\n",
> +   task->hdr->opcode, opcode,
> +   task->itt, task->hdr_itt, hdr_lun);
> + return -EACCES;
> + }
> + /*
> +  * And also all data-out PDUs in response to R2T
> +  * if fast_abort is set.
> +  */
> + if (conn->session->fast_abort) {
> + iscsi_conn_printk(KERN_INFO, conn,
> +   "task [op %x/%x itt "
> +   "0x%x/0x%x lun %u] "
> +   "fast abort.\n",
> +   task->hdr->opcode, opcode,
> +   task->itt, task->hdr_itt, hdr_lun);
> + return -EACCES;
> + }
> + break;
> + case ISCSI_TM_FUNC_ABORT_TASK:
> + /*
> +  * the caller has already checked if the task
> +  * they want to abort was in the pending queue so if
> +  * we are here the cmd pdu has gone out already, and
> +  * we will only hit this for data-outs
> +  */
> + if (opcode == ISCSI_OP_SCSI_DATA_OUT &&
> + task->hdr_itt == tmf->rtt) {
> + ISCSI_DBG_SESSION(conn->session,
> +   "Preventing task %x/%x from sending "
> +   "data-out due to abort task in "
> +   "progress\n", task->itt,
> +   task->hdr_itt);
> + return -EACCES;
> + }
> + break;
> + }
> +
> + return 0;
> +}
> +
> +/**
>   * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu
>   * @task: iscsi task
>   *
> @@ -279,6 +361,10 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task 
> *task)
>   itt_t itt;
>   int rc;
>  
> + rc = iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_CMD);
> + if (rc)
> + return rc;
> +
>   if (conn->session->tt->alloc_pdu) {
>   rc = conn->session->tt->alloc_pdu(task, ISCSI_OP_SCSI_CMD);
>   if (rc)
> @@ -1329,6 +1415,7 @@ EXPORT_SYMBOL_GPL(iscsi_requeue_task);
>   **/
>  static in

[RFC] Dynamically adjust shost->can_queue

2009-08-13 Thread Hannes Reinecke
Hi Mike,

I got fed up with the large number of rejects I've seen from queuecommand
due to the CmdSN window closing on us.
So I put together a patch for dynamically adjusting shost->can_queue
depending on the number of rejects.

Purely as a proof-on-concept, but works out quite nicely.
I did on purpose _not_ calling the change_queue_depth callback
as this would induce quite some overhead here with very
little gain, as shost->can_queue seems to be more appropriate
here as we're deducing the number of commands for the host/target,
not the LUN.

Interesting though, both NetApp and MSA level out at around
64 commands. Maybe we should consider changing the default ...

Feedback welcome.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 0356e3d..a4d6eda 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1592,7 +1743,7 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void 
(*done)(struct scsi_cmnd *))
struct iscsi_cls_session *cls_session;
struct Scsi_Host *host;
struct iscsi_host *ihost;
-   int reason = 0;
+   int reason = 0, adjust_qdepth = 0;
struct iscsi_session *session;
struct iscsi_conn *conn;
struct iscsi_task *task = NULL;
@@ -1684,27 +1835,53 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void 
(*done)(struct scsi_cmnd *))
}
 
session->queued_cmdsn++;
+   if (session->reject_count < 20) {
+   adjust_qdepth = 1;
+   session->reject_count = 100;
+   } else {
+   session->reject_count--;
+   }
spin_unlock(&session->lock);
spin_lock(host->host_lock);
+   if (adjust_qdepth == 1) {
+   host->can_queue++;
+   iscsi_session_printk(KERN_INFO, session,
+"adjusted queue depth to %u\n",
+host->can_queue);
+   }
return 0;
 
 prepd_reject:
sc->scsi_done = NULL;
iscsi_complete_task(task, ISCSI_TASK_COMPLETED);
 reject:
-   spin_unlock(&session->lock);
ISCSI_DBG_SESSION(session, "cmd 0x%x rejected (%d)\n",
  sc->cmnd[0], reason);
+
+   if (session->reject_count > 200) {
+   adjust_qdepth = -1;
+   session->reject_count = 100;
+   } else {
+   session->reject_count++;
+   }
+   spin_unlock(&session->lock);
spin_lock(host->host_lock);
+   if (adjust_qdepth == -1) {
+   host->can_queue--;
+   iscsi_session_printk(KERN_INFO, session,
+"adjusted queue depth to %u\n",
+host->can_queue);
+   }
return SCSI_MLQUEUE_TARGET_BUSY;
 
 prepd_fault:
sc->scsi_done = NULL;
iscsi_complete_task(task, ISCSI_TASK_COMPLETED);
 fault:
-   spin_unlock(&session->lock);
+   session->fault_count++;
ISCSI_DBG_SESSION(session, "iscsi: cmd 0x%x is not queued (%d)\n",
  sc->cmnd[0], reason);
+   spin_unlock(&session->lock);
if (!scsi_bidi_cmnd(sc))
scsi_set_resid(sc, scsi_bufflen(sc));
else {
@@ -2793,6 +2982,7 @@ int iscsi_conn_start(struct iscsi_cls_conn *cls_conn)
 
conn->last_recv = jiffies;
conn->last_ping = jiffies;
+   session->reject_count = 100;
if (conn->recv_timeout && conn->ping_timeout)
mod_timer(&conn->transport_timer,
  jiffies + (conn->recv_timeout * HZ));
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 61afeb5..b006a32 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -298,6 +303,7 @@ struct iscsi_session {
 * - r2tpool   */
int state;  /* session state   */
int age;/* counts session re-opens */
+   unsigned long   reject_count;   /* counts cmd rejects */
 
int scsi_cmds_max;  /* max scsi commands */
int cmds_max;   /* size of cmds array */


[PATCH 4/4] libiscsi: Flush workqueue before suspend

2009-08-13 Thread Hannes Reinecke


If 'fast_abort' is not set we should be sending all outstanding
Data-out PDUs to the target, even after we have received the
tmf response. So we should make sure that the workqueue is really
flushed and all PDUs have been processed before setting the
'SUSPEND' bit.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/libiscsi.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 16d35f0..5606c4d 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1880,9 +1880,9 @@ void iscsi_suspend_tx(struct iscsi_conn *conn)
struct Scsi_Host *shost = conn->session->host;
struct iscsi_host *ihost = shost_priv(shost);
 
-   set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
if (ihost->workq)
flush_workqueue(ihost->workq);
+   set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 }
 EXPORT_SYMBOL_GPL(iscsi_suspend_tx);
 
-- 
1.6.0.2


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



[PATCH 3/4] libiscsi: Set the 'FINAL' bit to terminate data transfer on TMF

2009-08-13 Thread Hannes Reinecke


Whenever we send a Data-Out response to an affected LUN during
LU Reset, we should be setting the 'FINAL' bit. This will
indicate to the target that we consider this transfer finished.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/libiscsi.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index a0d1217..16d35f0 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -321,6 +321,10 @@ static int iscsi_check_tmf_restrictions(struct iscsi_task 
*task, int opcode)
  task->itt, task->hdr_itt, hdr_lun);
return -EACCES;
}
+   /*
+* Set FINAL flag to terminate data transfer.
+*/
+   task->hdr->flags |= ISCSI_FLAG_CMD_FINAL;
break;
case ISCSI_TM_FUNC_ABORT_TASK:
/*
-- 
1.6.0.2


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



[PATCH 2/4] libiscsi: Traverse all Data-Out PDUs

2009-08-13 Thread Hannes Reinecke


When fast_abort is not set we have to send out all outstanding
Data-out PDUs. However, the loop in iscsi_data_xmit() will be
terminated at the first PDU for which check_tmf_restrictions()
returns non-zero, causing all other PDUs in the queue to be
not checked. So we should traverse all queue entries here
to catch all outstanding Data-out PDUs.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/libiscsi.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index fd34347..a0d1217 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1415,7 +1415,7 @@ EXPORT_SYMBOL_GPL(iscsi_requeue_task);
  **/
 static int iscsi_data_xmit(struct iscsi_conn *conn)
 {
-   struct iscsi_task *task;
+   struct iscsi_task *task, *tmptask;
int rc = 0;
 
spin_lock_bh(&conn->session->lock);
@@ -1466,7 +1466,10 @@ check_mgmt:
list_add_tail(&conn->task->running,
  &conn->cmdqueue);
conn->task = NULL;
-   goto done;
+   if (rc == -ENOMEM)
+   goto done;
+   else
+   break;
} else
fail_scsi_task(conn->task, DID_ABORT);
continue;
@@ -1483,17 +1486,15 @@ check_mgmt:
goto check_mgmt;
}
 
-   while (!list_empty(&conn->requeue)) {
+   list_for_each_entry_safe(task, tmptask, &conn->requeue, running) {
/*
 * we always do fastlogout - conn stop code will clean up.
 */
if (conn->session->state == ISCSI_STATE_LOGGING_OUT)
break;
 
-   task = list_entry(conn->requeue.next, struct iscsi_task,
- running);
if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
-   break;
+   continue;
 
conn->task = task;
list_del_init(&conn->task->running);
-- 
1.6.0.2


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



[PATCH 1/4] libiscsi: Check TMF state before sending PDU

2009-08-13 Thread Hannes Reinecke


Before we're trying to send a PDU we have to check whether a TMF
is active. If so and if the PDU will be affected by the TMF
we should allow only Data-out PDUs to be sent, or, if
fast_abort is set, no PDUs at all.

Signed-off-by: Mike Christie 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/libiscsi.c|  112 +++-
 include/scsi/iscsi_proto.h |2 +
 2 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index ef08f10..fd34347 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -263,6 +263,88 @@ static int iscsi_prep_bidi_ahs(struct iscsi_task *task)
 }
 
 /**
+ * iscsi_check_tmf_restrictions - check if a task is affected by TMF
+ * @task: iscsi task
+ * @opcode: opcode to check for
+ *
+ * During TMF a task has to be checked if it's affected.
+ * All unrelated I/O can be passed through, but I/O to the
+ * affected LUN should be restricted.
+ * If 'fast_abort' is set we won't be sending any I/O to the
+ * affected LUN.
+ * Otherwise the target is waiting for all TTTs to be completed,
+ * so we have to send all outstanding Data-Out PDUs to the target.
+ */
+static int iscsi_check_tmf_restrictions(struct iscsi_task *task, int opcode)
+{
+   struct iscsi_conn *conn = task->conn;
+   struct iscsi_tm *tmf = &conn->tmhdr;
+   unsigned int hdr_lun;
+
+   if (conn->tmf_state == TMF_INITIAL)
+   return 0;
+
+   if ((tmf->opcode & ISCSI_OPCODE_MASK) != ISCSI_OP_SCSI_TMFUNC)
+   return 0;
+
+   switch (ISCSI_TM_FUNC_VALUE(tmf)) {
+   case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
+   /*
+* Allow PDUs for unrelated LUNs
+*/
+   hdr_lun = scsilun_to_int((struct scsi_lun *)tmf->lun);
+   if (hdr_lun != task->sc->device->lun)
+   return 0;
+
+   /*
+* Fail all SCSI cmd PDUs
+*/
+   if (opcode != ISCSI_OP_SCSI_DATA_OUT) {
+   iscsi_conn_printk(KERN_INFO, conn,
+ "task [op %x/%x itt "
+ "0x%x/0x%x lun %u] "
+ "rejected.\n",
+ task->hdr->opcode, opcode,
+ task->itt, task->hdr_itt, hdr_lun);
+   return -EACCES;
+   }
+   /*
+* And also all data-out PDUs in response to R2T
+* if fast_abort is set.
+*/
+   if (conn->session->fast_abort) {
+   iscsi_conn_printk(KERN_INFO, conn,
+ "task [op %x/%x itt "
+ "0x%x/0x%x lun %u] "
+ "fast abort.\n",
+ task->hdr->opcode, opcode,
+ task->itt, task->hdr_itt, hdr_lun);
+   return -EACCES;
+   }
+   break;
+   case ISCSI_TM_FUNC_ABORT_TASK:
+   /*
+* the caller has already checked if the task
+* they want to abort was in the pending queue so if
+* we are here the cmd pdu has gone out already, and
+* we will only hit this for data-outs
+*/
+   if (opcode == ISCSI_OP_SCSI_DATA_OUT &&
+   task->hdr_itt == tmf->rtt) {
+   ISCSI_DBG_SESSION(conn->session,
+ "Preventing task %x/%x from sending "
+ "data-out due to abort task in "
+ "progress\n", task->itt,
+ task->hdr_itt);
+   return -EACCES;
+   }
+   break;
+   }
+
+   return 0;
+}
+
+/**
  * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu
  * @task: iscsi task
  *
@@ -279,6 +361,10 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
itt_t itt;
int rc;
 
+   rc = iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_CMD);
+   if (rc)
+   return rc;
+
if (conn->session->tt->alloc_pdu) {
rc = conn->session->tt->alloc_pdu(task, ISCSI_OP_SCSI_CMD);
if (rc)
@@ -1329,6 +1415,7 @@ EXPORT_SYMBOL_GPL(iscsi_requeue_task);
  **/
 static int iscsi_data_xmit(struct iscsi_conn *conn)
 {
+   struct iscsi_task *task;
int rc = 0;
 
spin_lock_bh(&conn->session->lock);
@@ -1366,11 +1453,8 @@ check_mgmt:
 
/* process pending command queue */
while (!list_empty(&conn->cmdqueue)) {
-   if (conn->tmf_state == TMF_QUEUED)
-   break;
-
-   conn->task = list_entry(conn->cm

[PATCH 0/4] Update LU Reset handling (take #2)

2009-08-13 Thread Hannes Reinecke

Hi Mike,

these is my second stab at the LU Reset handling update.
The first patch implements the 'iscsi_check_tmf_restrictions'
function, which allows us to filter out individual PDUs
which should not be send.

The other patches are merely for improving LU Reset handling
when 'fast_abort' is not set.

In this case we should be sending all outstanding data-out
PDUs to the target.
However, with the current list traversal of while(!list_empty())
for the 'requeue' list we will break out on the first PDU
for which tmf restriction check triggers. This will cause all
other PDUs in the list not to be checked.
So the first patch changes this to a 'normal' list traversal
with 'list_for_each', which will process all PDUs.
The second patch just sets the 'FINAL' bit for Data-out
PDUs to terminate the transfer.
And the third patch is flushing the workqueue _before_
the SUSPEND bit is set; otherwise iscsi_data_xmit()
will be short-circuited by the SUSPEND-bit set, causing
any data still queued not to be send.

One thing I'm not quite sure yet:
What are the contents of the 'requeue' list?
Will that only be 'DATA-Out' PDUs?
If so, then the last patch should be irrelevant as
the 'requeue' list will be flushed anyway.
If not, then the first and the third patch should
be modified. The first should filter out Data-Out
PDUs, but in realiter it just checks which queue
is being used. So if the 'requeue' list contains
non-Data-Out PDUs this check is clearly bogus.
Similar goes for the third patch; the FINAL bit
really makes sense only for Data-Out PDUs to
be set; for normal SCSI cmd PDUs the FINAL
bit has a different meaning and will induce
a protocol error.

As for the 'incomplete PDU' handling; we might
be able to catch this by updating the second patch.
Instead of breaking out of the loop we can just
continue when an error occurred; this way we
make sure that we're processing all outstanding
PDUs.
Those which failed will get processed by the
next call to data_xmit(); assuming it was a
transient error we should be able to process
them. Of course, no guarantees there.
The proper way would be to check the possible
return codes and only break out on the
non-retryable ones only, but continue for
others.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---