Re: [PATCH] cnic: Fix ISCSI_KEVENT_IF_DOWN message handling.

2009-07-29 Thread Mike Christie

Michael Chan wrote:
 When a net device goes down or when the bnx2i driver is unloaded,
 the code was not generating the ISCSI_KEVENT_IF_DOWN message
 properly and this could cause the userspace driver to crash.
 
 This is fixed by sending the message properly in the shutdown path.
 cnic_uio_stop() is also added to send the message when bnx2i is
 unregistering.
 
 Signed-off-by: Michael Chan mc...@broadcom.com
 ---
  drivers/net/cnic.c |   23 +--
  1 files changed, 21 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/cnic.c b/drivers/net/cnic.c
 index 4d1515f..4869d77 100644
 --- a/drivers/net/cnic.c
 +++ b/drivers/net/cnic.c
 @@ -227,7 +227,7 @@ static int cnic_send_nlmsg(struct cnic_local *cp, u32 
 type,
   }
  
   rcu_read_lock();
 - ulp_ops = rcu_dereference(cp-ulp_ops[CNIC_ULP_ISCSI]);
 + ulp_ops = rcu_dereference(cnic_ulp_tbl[CNIC_ULP_ISCSI]);
   if (ulp_ops)
   ulp_ops-iscsi_nl_send_msg(cp-dev, msg_type, buf, len);
   rcu_read_unlock();
 @@ -319,6 +319,20 @@ static int cnic_abort_prep(struct cnic_sock *csk)
   return 0;
  }
  
 +static void cnic_uio_stop(void)
 +{
 + struct cnic_dev *dev;
 +
 + read_lock(cnic_dev_lock);
 + list_for_each_entry(dev, cnic_dev_list, list) {
 + struct cnic_local *cp = dev-cnic_priv;
 +
 + if (cp-cnic_uinfo)
 + cnic_send_nlmsg(cp, ISCSI_KEVENT_IF_DOWN, NULL);

I don't think you can call this with the cnic_dev_lock held. They have 
the same sleeping restrictions as a spin_lock right? If so, the problem 
is that iscsi_nl_send_ms calls iscsi_offload_mesg which uses GFP_NOIO 
and can sleep.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH] Don't kill iscsid if logout from all nodes fail

2009-07-29 Thread Mike Christie

Erez Zilber wrote:
 If 'iscsiadm -m node --logoutall=all' fails when stopping
 the open-iscsi service, we shouldn't kill iscsid.
 
 This solves the following race:
 1. A logout from a node is initiated by the user.
 2. Before the logout completes, the user runs /etc/init.d/iscsi stop.
The 'stop' method logs out from all nodes. When it tries to logout
from the node that is already logging out (step #1), it fails
because it is already logging out. Then, the 'stop' method kills
iscsid.
 3. The logout command form step #1 returns and notifies the (dead) daemon.
 
 Now, running 'iscsiadm -m session' shows a session (which, actually, doesn't
 exist anymore) and the iscsi service is down.
 
 Signed-off-by: Erez Zilber erezzi.l...@gmail.com
 

Thanks Erez. Merged in a62d1b60856dc3118ab1d07990d43695b336fd69. It 
should be on kernel.org in a little bit.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH] cnic: Fix ISCSI_KEVENT_IF_DOWN message handling.

2009-07-29 Thread Michael Chan

Mike Christie wrote:

 Michael Chan wrote:
  When a net device goes down or when the bnx2i driver is unloaded,
  the code was not generating the ISCSI_KEVENT_IF_DOWN message
  properly and this could cause the userspace driver to crash.
 
  This is fixed by sending the message properly in the shutdown path.
  cnic_uio_stop() is also added to send the message when bnx2i is
  unregistering.
 
  Signed-off-by: Michael Chan mc...@broadcom.com
  ---
   drivers/net/cnic.c |   23 +--
   1 files changed, 21 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/net/cnic.c b/drivers/net/cnic.c
  index 4d1515f..4869d77 100644
  --- a/drivers/net/cnic.c
  +++ b/drivers/net/cnic.c
  @@ -227,7 +227,7 @@ static int cnic_send_nlmsg(struct
 cnic_local *cp, u32 type,
  }
 
  rcu_read_lock();
  -   ulp_ops = rcu_dereference(cp-ulp_ops[CNIC_ULP_ISCSI]);
  +   ulp_ops = rcu_dereference(cnic_ulp_tbl[CNIC_ULP_ISCSI]);
  if (ulp_ops)
  ulp_ops-iscsi_nl_send_msg(cp-dev, msg_type, buf, len);
  rcu_read_unlock();
  @@ -319,6 +319,20 @@ static int cnic_abort_prep(struct
 cnic_sock *csk)
  return 0;
   }
 
  +static void cnic_uio_stop(void)
  +{
  +   struct cnic_dev *dev;
  +
  +   read_lock(cnic_dev_lock);
  +   list_for_each_entry(dev, cnic_dev_list, list) {
  +   struct cnic_local *cp = dev-cnic_priv;
  +
  +   if (cp-cnic_uinfo)
  +   cnic_send_nlmsg(cp, ISCSI_KEVENT_IF_DOWN, NULL);

 I don't think you can call this with the cnic_dev_lock held.
 They have
 the same sleeping restrictions as a spin_lock right? If so,
 the problem
 is that iscsi_nl_send_ms calls iscsi_offload_mesg which uses GFP_NOIO
 and can sleep.


In that case, can I send in a patch to change iscsi_offload_mesg() to
use GFP_ATOMIC?


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH] cnic: Fix ISCSI_KEVENT_IF_DOWN message handling.

2009-07-29 Thread Mike Christie

Michael Chan wrote:
 Mike Christie wrote:
 
 Michael Chan wrote:
 When a net device goes down or when the bnx2i driver is unloaded,
 the code was not generating the ISCSI_KEVENT_IF_DOWN message
 properly and this could cause the userspace driver to crash.

 This is fixed by sending the message properly in the shutdown path.
 cnic_uio_stop() is also added to send the message when bnx2i is
 unregistering.

 Signed-off-by: Michael Chan mc...@broadcom.com
 ---
  drivers/net/cnic.c |   23 +--
  1 files changed, 21 insertions(+), 2 deletions(-)

 diff --git a/drivers/net/cnic.c b/drivers/net/cnic.c
 index 4d1515f..4869d77 100644
 --- a/drivers/net/cnic.c
 +++ b/drivers/net/cnic.c
 @@ -227,7 +227,7 @@ static int cnic_send_nlmsg(struct
 cnic_local *cp, u32 type,
 }

 rcu_read_lock();
 -   ulp_ops = rcu_dereference(cp-ulp_ops[CNIC_ULP_ISCSI]);
 +   ulp_ops = rcu_dereference(cnic_ulp_tbl[CNIC_ULP_ISCSI]);
 if (ulp_ops)
 ulp_ops-iscsi_nl_send_msg(cp-dev, msg_type, buf, len);
 rcu_read_unlock();
 @@ -319,6 +319,20 @@ static int cnic_abort_prep(struct
 cnic_sock *csk)
 return 0;
  }

 +static void cnic_uio_stop(void)
 +{
 +   struct cnic_dev *dev;
 +
 +   read_lock(cnic_dev_lock);
 +   list_for_each_entry(dev, cnic_dev_list, list) {
 +   struct cnic_local *cp = dev-cnic_priv;
 +
 +   if (cp-cnic_uinfo)
 +   cnic_send_nlmsg(cp, ISCSI_KEVENT_IF_DOWN, NULL);
 I don't think you can call this with the cnic_dev_lock held.
 They have
 the same sleeping restrictions as a spin_lock right? If so,
 the problem
 is that iscsi_nl_send_ms calls iscsi_offload_mesg which uses GFP_NOIO
 and can sleep.


 In that case, can I send in a patch to change iscsi_offload_mesg() to
 use GFP_ATOMIC?
 

Yes, I guess so.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: LUN Reset TMF and R2T

2009-07-29 Thread Hannes Reinecke

Steven Hayter wrote:
 On 28/07/2009 06:14 pm, Mike Christie wrote:
 On 07/28/2009 06:53 AM, Hannes Reinecke wrote:
 Hi all,

 when my device-reset testcase I've come across this:

 Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
 8800731e9480 lun 6]
 Jul 28 12:46:08 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
 timeout
 Jul 28 12:46:08 tyne kernel:  session1: mgmtpdu [op 0x2 hdr-itt 0x69 
 datalen 0]
 Jul 28 12:46:08 tyne kernel:  connection1:0: mgmtpdu [itt 0x69 task 
 88007b022800] xmit
 Jul 28 12:46:08 tyne kernel:  connection1:0: tmf rsp [itt 0x69] response 0 
 state 1
 Jul 28 12:46:08 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
 Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
 88006fd20380 itt 0x54 state 3
 Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
 88006fd20380 lun 6 itt x54] state 3
 Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
 88007119b880 itt 0x5d state 3
 Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
 88007119b880 lun 6 itt x5d] state 3
 Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
 88007116ec80 itt 0x60 state 3
 Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
 88007116ec80 lun 6 itt x60] state 3
 Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_tasks failing sc 
 880079dd8180 itt 0x61 state 3
 Jul 28 12:46:08 tyne kernel:  session1: fail_scsi_task fail task [sc 
 880079dd8180 lun 6 itt x61] state 3
 Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x5d in R2T hdr
 Jul 28 12:46:08 tyne kernel:  session1: iscsi_start_tx resume Tx
 Jul 28 12:46:08 tyne kernel:  session1: iscsi_eh_device_reset dev reset 
 result = SUCCESS
 Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x60 in R2T hdr
 Jul 28 12:46:08 tyne kernel:  connection1:0: invalid itt 0x61 in R2T hdr

 As you can see, we're receiving R2Ts for tasks we've just aborted :-(

 Looking closely, I don't _actually_ think the we've received them 
 out-of-order (which would be
 a violation of the RFC). The problem seems to be our skb handling (again):

 We're reading an skb, and call the handler function once the PDU is ready. 
 However, we're _not_
 checking if there is more data to be read from the socket.
 So it looks to me as if we're first reading the TMF response, aborting all 
 tasks, and then
 continue reading PDUs for tasks which we just aborted.
 We will definately do this. You mean the target sends a tmf response
 that indicates it cleaned up some tasks, then it sends pdus for the
 tasks that should have been affected by the TMF, right? If so I do not
 think targets are allowed to do this. In 3.5.1.4 we have:

  After the Task Management response indicates Task Management function
  completion, the initiator will not receive any additional responses
  from the affected tasks.

 additional responses means scsi response pdus and data-in with status,
 right? Does it also mean R2Ts? I thought it did, so we will just drop
 the session when getting all those pdus we thought the target should not
 be sending.

 If additional responses does not mean R2Ts, then what are we supposed
 to do? Handle them? Silently drop them? I could not find anything in the
 RFC.

 The nasty problem with the code and this scenario is that we preallcoate
 the tasks and itts. Once iscsi_eh_device_reset returns SUCCESS and
 cleans up the tasks, the scsi layer can start sending us commands. We
 could then allocate a task/itt that was used before and should have been
 cleaned up. The target could then send us pdus for the cleaned up
 task/itt while we are using the task/itt for a new command. Then Kablewly.
 
 It does look confusing, I think RFC 5048, Section 4.1.2. Clarified 
 Multi-Task Abort Semantics, gives guidelines as to what should happen.
 
 Every way read it, the target shouldn't be sending R2Ts for tasks which 
 are part of the affected task set.  (those equal or exceeding the CmdSN 
 of the reset TMF).  But I've been wrong in the past.
 
Nevermind, found the reason.
Totally different story, but we've been the culprit nevertheless.

iscsi_xmit_task() runs in a loop, disregarding any TMF state.
So we will happily continue sending R2T transfers even though
the LU Reset has already finished.

Patch to follow.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi

[PATCH 0/2] Update TMF handling

2009-07-29 Thread Hannes Reinecke

Hi all,

these two patches update the TMF handling to make it more
efficient and less error prone.

The first patch is just a minor tweak to allow new TMF
tasks as soon as we've received a response for the pending
one. Reasoning here is that eg LUN Reset might take
quite a while to abort all outstanding tasks, during which
time we cannot send any other LUN Reset even to another
LUN. So obviously, allowing another LUN Reset here
is the right thing to do. And even if we would be sending
a LUN Reset to this LUN we wouldn't do any harm as the
SCSI command abort is protected by a lock, so nothing
will happen here for consecutive LUN Resets.
And of course we're observing the error recovery
hierarchy, so an ABORT TASK will be rejected if LUN
Reset is in progress.

The second patch is the more important one, as it
fixes an error during LUN Reset handling in the
initiator. When sending a LUN Reset during an
ongoing R2T transfer, we're suspending Tx and
aborting all _SCSI_ tasks. However, once we're
done there we're resuming Tx and the R2T transfer
will happily continue. So we should rather be
checking for ongoing TMF tasks in iscsi_task_xmit
and terminate the I/O of the task affects us.
Note we're not actually interested in the outcome
of the TMF task as the I/O will be stopped
anyway even if the TMF task fails.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



[PATCH 2/2] libiscsi: check for pending TMF during task xmit

2009-07-29 Thread Hannes Reinecke


iscsi_tcp_task_xmit() doesn't check for pending TMF
tasks, so we might happily continue sending R2T data
even though we've already aborted the command.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/libiscsi_tcp.c |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c
index 2e0746d..83ddb44 100644
--- a/drivers/scsi/libiscsi_tcp.c
+++ b/drivers/scsi/libiscsi_tcp.c
@@ -1000,6 +1000,30 @@ static struct iscsi_r2t_info 
*iscsi_tcp_get_curr_r2t(struct iscsi_task *task)
return r2t;
 }
 
+static int iscsi_tcp_check_tmf_task(struct iscsi_task *task)
+{
+   struct iscsi_conn *conn = task-conn;
+   struct iscsi_tm *hdr = conn-tmhdr;
+   unsigned int hdr_lun, task_lun;
+
+   if (hdr-opcode != (ISCSI_OP_SCSI_TMFUNC | ISCSI_OP_IMMEDIATE))
+   return FAILED;
+
+   /* Check for matching LUN */
+   hdr_lun = scsilun_to_int((struct scsi_lun *)hdr-lun);
+   task_lun = scsilun_to_int((struct scsi_lun *)task-lun);
+   if (hdr_lun != task_lun)
+   return FAILED;
+
+   /* Check for matching task */
+   if (ISCSI_TM_FUNC_VALUE(hdr) == ISCSI_TM_FUNC_ABORT_TASK) {
+   if (task-cmdsn != hdr-refcmdsn)
+   return FAILED;
+   }
+
+   return SUCCESS;
+}
+
 /**
  * iscsi_tcp_task_xmit - xmit normal PDU task
  * @task: iscsi command task
@@ -1032,6 +1056,11 @@ flush:
if (task-sc-sc_data_direction != DMA_TO_DEVICE)
return 0;
 
+   /* Check for pending TMF */
+   if (conn-tmf_state != TMF_INITIAL 
+   iscsi_tcp_check_tmf_task(task) == SUCCESS)
+   return 0;
+
r2t = iscsi_tcp_get_curr_r2t(task);
if (r2t == NULL) {
/* Waiting for more R2Ts to arrive. */
-- 
1.6.0.2


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 0/2] Update TMF handling

2009-07-29 Thread Mike Christie


 
 The second patch is the more important one, as it
 fixes an error during LUN Reset handling in the
 initiator. When sending a LUN Reset during an
 ongoing R2T transfer, we're suspending Tx and
 aborting all _SCSI_ tasks. However, once we're
 done there we're resuming Tx and the R2T transfer
 will happily continue. So we should rather be

This should not be happening. When iscsi_suspend_tx returns the tx 
thread has stopped so we know there are no users accessing the task 
(well, there could be if a target is sending a tmf response then a r2t, 
but if the target is following the rfc there should not be).

So when fail_scsi_tasks calls

fail_scsi_task -iscsi_complete_task (this will cleanup conn-task if 
this is the same task) - __iscsi_put_task

this should be the last put on the task and that should release it 
calling iscsi_free_task which should call cleanup_task to kill any 
pending r2t handling and it would remove it from the requeue list.

If we are sending a data-out for a task that has had fail_scsi_task 
-iscsi_complete_task - __iscsi_put_task called for it then we are in 
bigger trouble because the last put should have been called on it and we 
  are accessing a bad task.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 0/2] Update TMF handling

2009-07-29 Thread Hannes Reinecke

Mike Christie wrote:
 
 The second patch is the more important one, as it
 fixes an error during LUN Reset handling in the
 initiator. When sending a LUN Reset during an
 ongoing R2T transfer, we're suspending Tx and
 aborting all _SCSI_ tasks. However, once we're
 done there we're resuming Tx and the R2T transfer
 will happily continue. So we should rather be
 
 This should not be happening. When iscsi_suspend_tx returns the tx 
 thread has stopped so we know there are no users accessing the task 
 (well, there could be if a target is sending a tmf response then a r2t, 
 but if the target is following the rfc there should not be).
 
 So when fail_scsi_tasks calls
 
 fail_scsi_task -iscsi_complete_task (this will cleanup conn-task if 
 this is the same task) - __iscsi_put_task
 
 this should be the last put on the task and that should release it 
 calling iscsi_free_task which should call cleanup_task to kill any 
 pending r2t handling and it would remove it from the requeue list.
 
 If we are sending a data-out for a task that has had fail_scsi_task 
 -iscsi_complete_task - __iscsi_put_task called for it then we are in 
 bigger trouble because the last put should have been called on it and we 
   are accessing a bad task.
 
I fully agree, this is something which shouldn't happen.

However, using this patch stops me from receiving invalid R2T PDUs.
So I can't be that far off the mark.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH] libiscsi: Update queuecommand status return codes

2009-07-29 Thread Mike Christie

Hannes Reinecke wrote:
 For multipathing we should ensure to return a DID_TRANSPORT_XX
 result code whenever applicable; this will ensure a fast failover
 to other paths if this one is temporarily out of order.
 
 Signed-off-by: Hannes Reinecke h...@suse.de
 ---
  drivers/scsi/libiscsi.c |   11 ---
  1 files changed, 4 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
 index 716cc34..fc10544 100644
 --- a/drivers/scsi/libiscsi.c
 +++ b/drivers/scsi/libiscsi.c
 @@ -1429,12 +1429,6 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void 
 (*done)(struct scsi_cmnd *))
   session = cls_session-dd_data;
   spin_lock(session-lock);
  
 - reason = iscsi_session_chkready(cls_session);
 - if (reason) {
 - sc-result = reason;
 - goto fault;
 - }
 -
   if (session-state != ISCSI_STATE_LOGGED_IN) {
   /*
* to handle the race between when we set the recovery state
 @@ -1444,6 +1438,9 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void 
 (*done)(struct scsi_cmnd *))
*/
   switch (session-state) {
   case ISCSI_STATE_FAILED:
 + reason = FAILURE_SESSION_FAILED;
 + sc-result = DID_TRANSPORT_DISRUPTED  16;
 + break;

This probably speeds up the failover time by accident because the 
retries/allowed counter/check hits zero before the 
replacement_timeout/recovery_timeout (fast io fail in fc class terms) 
timer has fired.

This can be in the failed state for a couple seconds while we 
transistion the sdevs to blocked. At this time we do not want the 
retries to be decremented.



   case ISCSI_STATE_IN_RECOVERY:
   reason = FAILURE_SESSION_IN_RECOVERY;
   sc-result = DID_IMM_RETRY  16;
 @@ -1462,7 +1459,7 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void 
 (*done)(struct scsi_cmnd *))
   break;
   default:
   reason = FAILURE_SESSION_FREED;
 - sc-result = DID_NO_CONNECT  16;
 + sc-result = DID_TRANSPORT_FAILFAST  16;

I am not sure why you are changing this one. When are you hitting it? 
What is the session-state.


   }
   goto fault;
   }


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 0/2] Update TMF handling

2009-07-29 Thread Mike Christie

Hannes Reinecke wrote:
 Mike Christie wrote:
 The second patch is the more important one, as it
 fixes an error during LUN Reset handling in the
 initiator. When sending a LUN Reset during an
 ongoing R2T transfer, we're suspending Tx and
 aborting all _SCSI_ tasks. However, once we're
 done there we're resuming Tx and the R2T transfer
 will happily continue. So we should rather be
 This should not be happening. When iscsi_suspend_tx returns the tx 
 thread has stopped so we know there are no users accessing the task 
 (well, there could be if a target is sending a tmf response then a r2t, 
 but if the target is following the rfc there should not be).

 So when fail_scsi_tasks calls

 fail_scsi_task -iscsi_complete_task (this will cleanup conn-task if 
 this is the same task) - __iscsi_put_task

 this should be the last put on the task and that should release it 
 calling iscsi_free_task which should call cleanup_task to kill any 
 pending r2t handling and it would remove it from the requeue list.

 If we are sending a data-out for a task that has had fail_scsi_task 
 -iscsi_complete_task - __iscsi_put_task called for it then we are in 
 bigger trouble because the last put should have been called on it and we 
   are accessing a bad task.

 This is the log I'm getting:
 
 
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
 88007b94d080 lun 6]
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
 timeout
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x3a lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: mgmtpdu [op 0x2 hdr-itt 0x5d datalen 
 0]
 Jul 29 10:34:48 tyne kernel:  connection1:0: mgmtpdu [itt 0x5d task 
 88007a01fc00] xmit
 Jul 29 10:34:48 tyne kernel:  connection1:0: tmf rsp [itt 0x5d] response 0 
 state 1
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x72 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_complete_task task itt 0x72 sc 
 88007b5bc580 still active
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x57 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x59 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: Tx suspended!
 
 So we're indeed would have continued the R2T task (itt 0x57 and itt 0x59) 
 even though we've
 already received a valid TMF response.
 So I'm afraid it's us ...

Ah, I misunderstood you. I do not think it has to do with the cleanup 
still leaving r2ts. I am not sure where you are putting printks, but I 
think it is this:

 while (!list_empty(conn-requeue)) {
 if (conn-session-fast_abort  conn-tmf_state != 
TMF_INITIAL)
 break;

Once the tmf completes, we will start sending data again.

This sort of lines up with where I think you put your printks. Is 
iscsi_suspend_tx suspend Tx getting printed out before or after the 
the flush_workqueue.


 
 I really do think part of the problem is that we setting the SUSPEND bit
 without holding the session lock. We _check_ it under the lock in
 iscsi_xmit(), but setting is done without the lock.
 Which of course causes all sorts of race conditions.

Yeah, we use atomics to set it then just do a if (conn-suspend_tx) 
under the lock to test it.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 0/2] Update TMF handling

2009-07-29 Thread Mike Christie

Mike Christie wrote:
 Hannes Reinecke wrote:
 Mike Christie wrote:
 The second patch is the more important one, as it
 fixes an error during LUN Reset handling in the
 initiator. When sending a LUN Reset during an
 ongoing R2T transfer, we're suspending Tx and
 aborting all _SCSI_ tasks. However, once we're
 done there we're resuming Tx and the R2T transfer
 will happily continue. So we should rather be
 This should not be happening. When iscsi_suspend_tx returns the tx 
 thread has stopped so we know there are no users accessing the task 
 (well, there could be if a target is sending a tmf response then a r2t, 
 but if the target is following the rfc there should not be).

 So when fail_scsi_tasks calls

 fail_scsi_task -iscsi_complete_task (this will cleanup conn-task if 
 this is the same task) - __iscsi_put_task

 this should be the last put on the task and that should release it 
 calling iscsi_free_task which should call cleanup_task to kill any 
 pending r2t handling and it would remove it from the requeue list.

 If we are sending a data-out for a task that has had fail_scsi_task 
 -iscsi_complete_task - __iscsi_put_task called for it then we are in 
 bigger trouble because the last put should have been called on it and we 
   are accessing a bad task.

 This is the log I'm getting:


 Jul 29 10:34:48 tyne kernel:  session1: iscsi_eh_device_reset LU Reset [sc 
 88007b94d080 lun 6]
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
 timeout
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x3a lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: mgmtpdu [op 0x2 hdr-itt 0x5d 
 datalen 0]
 Jul 29 10:34:48 tyne kernel:  connection1:0: mgmtpdu [itt 0x5d task 
 88007a01fc00] xmit
 Jul 29 10:34:48 tyne kernel:  connection1:0: tmf rsp [itt 0x5d] response 0 
 state 1
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x72 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_complete_task task itt 0x72 sc 
 88007b5bc580 still active
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x57 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x59 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: Tx suspended!

 So we're indeed would have continued the R2T task (itt 0x57 and itt 0x59) 
 even though we've
 already received a valid TMF response.
 So I'm afraid it's us ...
 
 Ah, I misunderstood you. I do not think it has to do with the cleanup 
 still leaving r2ts. I am not sure where you are putting printks, but I 
 think it is this:
 
  while (!list_empty(conn-requeue)) {
  if (conn-session-fast_abort  conn-tmf_state != 
 TMF_INITIAL)
  break;
 
 Once the tmf completes, we will start sending data again.
 

Ooops. I am too sleepy. Ignore that. I am wrong there.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 0/2] Update TMF handling

2009-07-29 Thread Mike Christie

Mike Christie wrote:
 Mike Christie wrote:
 Mike Christie wrote:
 Mike Christie wrote:
 Hannes Reinecke wrote:
 Mike Christie wrote:
 The second patch is the more important one, as it
 fixes an error during LUN Reset handling in the
 initiator. When sending a LUN Reset during an
 ongoing R2T transfer, we're suspending Tx and
 aborting all _SCSI_ tasks. However, once we're
 done there we're resuming Tx and the R2T transfer
 will happily continue. So we should rather be
 This should not be happening. When iscsi_suspend_tx returns the tx 
 thread has stopped so we know there are no users accessing the task 
 (well, there could be if a target is sending a tmf response then a r2t, 
 but if the target is following the rfc there should not be).

 So when fail_scsi_tasks calls

 fail_scsi_task -iscsi_complete_task (this will cleanup conn-task if 
 this is the same task) - __iscsi_put_task

 this should be the last put on the task and that should release it 
 calling iscsi_free_task which should call cleanup_task to kill any 
 pending r2t handling and it would remove it from the requeue list.

 If we are sending a data-out for a task that has had fail_scsi_task 
 -iscsi_complete_task - __iscsi_put_task called for it then we are in 
 bigger trouble because the last put should have been called on it and we 
   are accessing a bad task.

 This is the log I'm getting:


 Jul 29 10:34:48 tyne kernel:  session1: iscsi_eh_device_reset LU Reset 
 [sc 88007b94d080 lun 6]
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
 timeout
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x3a lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: mgmtpdu [op 0x2 hdr-itt 0x5d 
 datalen 0]
 Jul 29 10:34:48 tyne kernel:  connection1:0: mgmtpdu [itt 0x5d task 
 88007a01fc00] xmit
 Jul 29 10:34:48 tyne kernel:  connection1:0: tmf rsp [itt 0x5d] response 
 0 state 1
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x72 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_complete_task task itt 0x72 
 sc 88007b5bc580 still active
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x57 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x59 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: Tx suspended!

 So we're indeed would have continued the R2T task (itt 0x57 and itt 0x59) 
 even though we've
 already received a valid TMF response.
 So I'm afraid it's us ...
 Ah, I misunderstood you. I do not think it has to do with the cleanup 
 still leaving r2ts. I am not sure where you are putting printks, but I 
 think it is this:

  while (!list_empty(conn-requeue)) {
  if (conn-session-fast_abort  conn-tmf_state != 
 TMF_INITIAL)
  break;

 Once the tmf completes, we will start sending data again.

 Ooops. I am too sleepy. Ignore that. I am wrong there.

 I guess if fast_abort is 0 though, we will hit this problem. And we will 
 send data-outs when getting tmf responses as well as when we are sending 
 the tmf.
 
 
 
 I think the problem is wording like in 10.5.1:
 
 For ABORT TASK SET and CLEAR TASK SET, the issuing initiator MUST
 continue to respond to all valid target transfer tags (received via
 R2T, Text Response, NOP-In, or SCSI Data-In PDUs) related to the
 affected task set, even after issuing the task management request.
 
 I think in some other doc (probably the one Mathew and Ulrich mentioned) 
 there is wording about doing similar for abort and lu resets.
 
 The things is that I think half of targets want us to respond to r2ts 
 and half do not. This is where the fast_abort comes from. If set then we 
 reply to r2ts and if not set we do not. I think once we get a successful 

Fudge. I am really going to be now. I mean if it is set we do not reply 
to r2ts. If not set then we reply.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 2/2] libiscsi: check for pending TMF during task xmit

2009-07-29 Thread Mike Christie

Hannes Reinecke wrote:
 iscsi_tcp_task_xmit() doesn't check for pending TMF
 tasks, so we might happily continue sending R2T data
 even though we've already aborted the command.
 
 Signed-off-by: Hannes Reinecke h...@suse.de


The patch is better than how we stop all r2t processing right now so 
even if the problem is just us not checking the suspend bit right, I 
think this patch makes a nice improvement.

Some comments.


 ---
  drivers/scsi/libiscsi_tcp.c |   29 +
  1 files changed, 29 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c
 index 2e0746d..83ddb44 100644
 --- a/drivers/scsi/libiscsi_tcp.c
 +++ b/drivers/scsi/libiscsi_tcp.c
 @@ -1000,6 +1000,30 @@ static struct iscsi_r2t_info 
 *iscsi_tcp_get_curr_r2t(struct iscsi_task *task)
   return r2t;
  }
  
 +static int iscsi_tcp_check_tmf_task(struct iscsi_task *task)
 +{
 + struct iscsi_conn *conn = task-conn;
 + struct iscsi_tm *hdr = conn-tmhdr;
 + unsigned int hdr_lun, task_lun;
 +
 + if (hdr-opcode != (ISCSI_OP_SCSI_TMFUNC | ISCSI_OP_IMMEDIATE))

Could you just mask the opcode off and not assume other bits are set or 
not set?

If ((hdr-opcode  ISCSI_OPCODE_MASK) == ISCSI_OP_SCSI_TMFUNC)



 + return FAILED;

Could you not reuse the scsi eh return values here. Just do 0 and a EXXX 
value.


 +
 + /* Check for matching LUN */
 + hdr_lun = scsilun_to_int((struct scsi_lun *)hdr-lun);
 + task_lun = scsilun_to_int((struct scsi_lun *)task-lun);
 + if (hdr_lun != task_lun)
 + return FAILED;
 +
 + /* Check for matching task */
 + if (ISCSI_TM_FUNC_VALUE(hdr) == ISCSI_TM_FUNC_ABORT_TASK) {
 + if (task-cmdsn != hdr-refcmdsn)
 + return FAILED;
 + }
 +
 + return SUCCESS;
 +}
 +
  /**
   * iscsi_tcp_task_xmit - xmit normal PDU task
   * @task: iscsi command task
 @@ -1032,6 +1056,11 @@ flush:
   if (task-sc-sc_data_direction != DMA_TO_DEVICE)
   return 0;
  
 + /* Check for pending TMF */


Could you add a check if fast_abort is set then do this? If it is not 
set, it means targets want us to respond to r2ts while the tmf is in flight.


 + if (conn-tmf_state != TMF_INITIAL 
 + iscsi_tcp_check_tmf_task(task) == SUCCESS)
 + return 0;
 +
   r2t = iscsi_tcp_get_curr_r2t(task);
   if (r2t == NULL) {
   /* Waiting for more R2Ts to arrive. */


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH 0/2] Update TMF handling

2009-07-29 Thread Hannes Reinecke

Mike Christie wrote:
 Mike Christie wrote:
 Mike Christie wrote:
 Mike Christie wrote:
 Mike Christie wrote:
 Hannes Reinecke wrote:
 Mike Christie wrote:
 The second patch is the more important one, as it
 fixes an error during LUN Reset handling in the
 initiator. When sending a LUN Reset during an
 ongoing R2T transfer, we're suspending Tx and
 aborting all _SCSI_ tasks. However, once we're
 done there we're resuming Tx and the R2T transfer
 will happily continue. So we should rather be
 This should not be happening. When iscsi_suspend_tx returns the tx 
 thread has stopped so we know there are no users accessing the task 
 (well, there could be if a target is sending a tmf response then a r2t, 
 but if the target is following the rfc there should not be).

 So when fail_scsi_tasks calls

 fail_scsi_task -iscsi_complete_task (this will cleanup conn-task if 
 this is the same task) - __iscsi_put_task

 this should be the last put on the task and that should release it 
 calling iscsi_free_task which should call cleanup_task to kill any 
 pending r2t handling and it would remove it from the requeue list.

 If we are sending a data-out for a task that has had fail_scsi_task 
 -iscsi_complete_task - __iscsi_put_task called for it then we are in 
 bigger trouble because the last put should have been called on it and 
 we 
   are accessing a bad task.

 This is the log I'm getting:


 Jul 29 10:34:48 tyne kernel:  session1: iscsi_eh_device_reset LU Reset 
 [sc 88007b94d080 lun 6]
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
 timeout
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x3a lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: mgmtpdu [op 0x2 hdr-itt 0x5d 
 datalen 0]
 Jul 29 10:34:48 tyne kernel:  connection1:0: mgmtpdu [itt 0x5d task 
 88007a01fc00] xmit
 Jul 29 10:34:48 tyne kernel:  connection1:0: tmf rsp [itt 0x5d] response 
 0 state 1
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x72 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_complete_task task itt 
 0x72 sc 88007b5bc580 still active
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x57 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x59 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: Tx suspended!

 So we're indeed would have continued the R2T task (itt 0x57 and itt 
 0x59) even though we've
 already received a valid TMF response.
 So I'm afraid it's us ...
 Ah, I misunderstood you. I do not think it has to do with the cleanup 
 still leaving r2ts. I am not sure where you are putting printks, but I 
 think it is this:

  while (!list_empty(conn-requeue)) {
  if (conn-session-fast_abort  conn-tmf_state != 
 TMF_INITIAL)
  break;

 Once the tmf completes, we will start sending data again.

 Ooops. I am too sleepy. Ignore that. I am wrong there.

 I guess if fast_abort is 0 though, we will hit this problem. And we will 
 send data-outs when getting tmf responses as well as when we are sending 
 the tmf.


 I think the problem is wording like in 10.5.1:

 For ABORT TASK SET and CLEAR TASK SET, the issuing initiator MUST
 continue to respond to all valid target transfer tags (received via
 R2T, Text Response, NOP-In, or SCSI Data-In PDUs) related to the
 affected task set, even after issuing the task management request.

 I think in some other doc (probably the one Mathew and Ulrich mentioned) 
 there is wording about doing similar for abort and lu resets.

 The things is that I think half of targets want us to respond to r2ts 
 and half do not. This is where the fast_abort comes from. If set then we 
 reply to r2ts and if not set we do not. I think once we get a successful 
 
 Fudge. I am really going to be now. I mean if it is set we do not reply 
 to r2ts. If not set then we reply.
 
Actually, I think it's a race condition:

drivers/scsi/libiscsi.c:iscsi_eh_device_reset()
rc = SUCCESS;
spin_unlock_bh(session-lock);

iscsi_suspend_tx(conn);

So the workqueue thread could wedge in after we've
unlocked the session lock and start sending data
even though we're meant to suspend transmitting here.

Will be trying it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi

Re: [PATCH 0/2] Update TMF handling

2009-07-29 Thread Mike Christie
Hannes Reinecke wrote:
 Mike Christie wrote:
 Mike Christie wrote:
 Mike Christie wrote:
 Mike Christie wrote:
 Mike Christie wrote:
 Hannes Reinecke wrote:
 Mike Christie wrote:
 The second patch is the more important one, as it
 fixes an error during LUN Reset handling in the
 initiator. When sending a LUN Reset during an
 ongoing R2T transfer, we're suspending Tx and
 aborting all _SCSI_ tasks. However, once we're
 done there we're resuming Tx and the R2T transfer
 will happily continue. So we should rather be
 This should not be happening. When iscsi_suspend_tx returns the tx 
 thread has stopped so we know there are no users accessing the task 
 (well, there could be if a target is sending a tmf response then a 
 r2t, 
 but if the target is following the rfc there should not be).

 So when fail_scsi_tasks calls

 fail_scsi_task -iscsi_complete_task (this will cleanup conn-task if 
 this is the same task) - __iscsi_put_task

 this should be the last put on the task and that should release it 
 calling iscsi_free_task which should call cleanup_task to kill any 
 pending r2t handling and it would remove it from the requeue list.

 If we are sending a data-out for a task that has had fail_scsi_task 
 -iscsi_complete_task - __iscsi_put_task called for it then we are in 
 bigger trouble because the last put should have been called on it and 
 we 
   are accessing a bad task.

 This is the log I'm getting:


 Jul 29 10:34:48 tyne kernel:  session1: iscsi_eh_device_reset LU Reset 
 [sc 88007b94d080 lun 6]
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_exec_task_mgmt_fn tmf set 
 timeout
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x3a lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: mgmtpdu [op 0x2 hdr-itt 0x5d 
 datalen 0]
 Jul 29 10:34:48 tyne kernel:  connection1:0: mgmtpdu [itt 0x5d task 
 88007a01fc00] xmit
 Jul 29 10:34:48 tyne kernel:  connection1:0: tmf rsp [itt 0x5d] 
 response 0 state 1
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x72 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_suspend_tx suspend Tx
 Jul 29 10:34:48 tyne kernel:  session1: iscsi_complete_task task itt 
 0x72 sc 88007b5bc580 still active
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x57 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  connection1:0: task itt 0x59 lun 6 abort 
 transfer
 Jul 29 10:34:48 tyne kernel:  session1: Tx suspended!

 So we're indeed would have continued the R2T task (itt 0x57 and itt 
 0x59) even though we've
 already received a valid TMF response.
 So I'm afraid it's us ...
 Ah, I misunderstood you. I do not think it has to do with the cleanup 
 still leaving r2ts. I am not sure where you are putting printks, but I 
 think it is this:

  while (!list_empty(conn-requeue)) {
  if (conn-session-fast_abort  conn-tmf_state != 
 TMF_INITIAL)
  break;

 Once the tmf completes, we will start sending data again.

 Ooops. I am too sleepy. Ignore that. I am wrong there.

 I guess if fast_abort is 0 though, we will hit this problem. And we will 
 send data-outs when getting tmf responses as well as when we are sending 
 the tmf.

 I think the problem is wording like in 10.5.1:

 For ABORT TASK SET and CLEAR TASK SET, the issuing initiator MUST
 continue to respond to all valid target transfer tags (received via
 R2T, Text Response, NOP-In, or SCSI Data-In PDUs) related to the
 affected task set, even after issuing the task management request.

 I think in some other doc (probably the one Mathew and Ulrich mentioned) 
 there is wording about doing similar for abort and lu resets.

 The things is that I think half of targets want us to respond to r2ts 
 and half do not. This is where the fast_abort comes from. If set then we 
 reply to r2ts and if not set we do not. I think once we get a successful 
 Fudge. I am really going to be now. I mean if it is set we do not reply 
 to r2ts. If not set then we reply.

 Actually, I think it's a race condition:
 
 drivers/scsi/libiscsi.c:iscsi_eh_device_reset()
   rc = SUCCESS;
   spin_unlock_bh(session-lock);
 
   iscsi_suspend_tx(conn);
 
 So the workqueue thread could wedge in after we've
 unlocked the session lock and start sending data
 even though we're meant to suspend transmitting here.
 
 Will be trying it.
 

U, you are right. And we are probably hitting this:

/* process pending command queue */
 while (!list_empty(conn-cmdqueue)) {
 if (conn-tmf_state == TMF_QUEUED)
 break;

Once the tmf completes we start sending new commands, because the 
tmf_state changes.

But then if the tmf had cmdns 10 and that completes then we start 
sending new commands above (tmf_state == TMF_SUCCESS), 
iscsi_eh_device_reset will cleanup cmdd with sn less than 10 and will 
cleanup cmds with sn higher than 10. We clean everything up. But cmds 
with cmdns with 11 are ok and 

Kernel panic -not syncing :Fatal Exception ---Process istoid1

2009-07-29 Thread Raj

I am trying to configure the Linux VTL (open iscsi) using the Vmware
iSCSI intiator from a ESXi box. It does get configured, but the Linux
VTL comes into a panic mode and freezes when the ESXi starts scanning
for devices. Any suggestions please.


The Kernel panic error message is on the Process istoid1

0Kernel panic -not syncing :Fatal Exception

Regards
Raj






--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Kernel panic -not syncing :Fatal Exception ---Process istoid1

2009-07-29 Thread Arne Redlich

Am Mittwoch, den 29.07.2009, 19:14 +0530 schrieb Raj:
 Screenshot of the error message attached:
 
 
 On Wed, Jul 29, 2009 at 7:11 PM, Raj rajeevman...@gmail.com wrote:
 I am trying to configure the Linux VTL (open iscsi) using the
 Vmware
 iSCSI intiator from a ESXi box. It does get configured, but
 the Linux
 VTL comes into a panic mode and freezes when the ESXi starts
 scanning
 for devices. Any suggestions please.
 
 
 The Kernel panic error message is on the Process istoid1
 
 0Kernel panic -not syncing :Fatal Exception
 
 Regards
 Raj
 

This problem seems not related to the open-iscsi initiator. What is this
Linux VTL? Do you have any links?

Thanks,
Arne




--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH] iscsi: Use GFP_ATOMIC in iscsi_offload_mesg().

2009-07-29 Thread Mike Christie

On 07/29/2009 01:49 PM, Michael Chan wrote:
 Changing to GFP_ATOMIC because the only caller in cnic/bnx2i may
 be calling this function while holding spin_lock.

 This problem was discovered by Mike Christie.

 Signed-off-by: Michael Chanmc...@broadcom.com
 ---
   drivers/scsi/scsi_transport_iscsi.c |4 ++--
   1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/scsi/scsi_transport_iscsi.c 
 b/drivers/scsi/scsi_transport_iscsi.c
 index 783e33c..b47240c 100644
 --- a/drivers/scsi/scsi_transport_iscsi.c
 +++ b/drivers/scsi/scsi_transport_iscsi.c
 @@ -990,7 +990,7 @@ int iscsi_offload_mesg(struct Scsi_Host *shost,
   struct iscsi_uevent *ev;
   int len = NLMSG_SPACE(sizeof(*ev) + data_size);

 - skb = alloc_skb(len, GFP_NOIO);
 + skb = alloc_skb(len, GFP_ATOMIC);
   if (!skb) {
   printk(KERN_ERR can not deliver iscsi offload message:OOM\n);
   return -ENOMEM;
 @@ -1012,7 +1012,7 @@ int iscsi_offload_mesg(struct Scsi_Host *shost,

   memcpy((char *)ev + sizeof(*ev), data, data_size);

 - return iscsi_multicast_skb(skb, ISCSI_NL_GRP_UIP, GFP_NOIO);
 + return iscsi_multicast_skb(skb, ISCSI_NL_GRP_UIP, GFP_ATOMIC);
   }
   EXPORT_SYMBOL_GPL(iscsi_offload_mesg);


Using GFP_NOIO and changing the locking is my preference normally,  but 
if the locking changes are going to be a problem, then this is ok with 
me since we can fail allocations in other parts of the code.

Acked-by: Mike Christie micha...@cs.wisc.edu

Dave Miller probably wants to take this in his tree since it fixes a bug 
with this patch
http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=6d7760a88c25057c2c2243e5dfe2d731064bd31d

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: [PATCH] Add logging to scsi_transport_iscsi.c

2009-07-29 Thread Mike Christie
On 07/26/2009 08:48 AM, Erez Zilber wrote:
 I've attached a new version. I hope it's better. Whenever possible,
 there's a dbg statement before  after. For example, if we free the
 conn object, I can't put a dbg call after it (because conn is already
 NULL). If you still see specific things that need to be fixed, let me
 know.


Thanks for the work on this.

How about the attached.
- I added a : between the function name and debug output.
- Removed some extra newlines
- Tried to add dbg statements at the top and end of functions that can 
take a long time or fail in odd ways because they call into the scsi 
layer like the scanning, blocking, target removal, etc. For functions 
like allocation, adding, destroying and freeing I tried to just add a 
dbg statement at the top of end of the function.

The patch was made over the linux-2.6-iscsi tree iscsi branch.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index b47240c..5d765f5 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -36,6 +36,38 @@
 
 #define ISCSI_TRANSPORT_VERSION 2.0-870
 
+static int dbg_session;
+module_param_named(debug_session, dbg_session, int,
+  S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(debug_session,
+Turn on debugging for sessions in scsi_transport_iscsi 
+module. Set to 1 to turn on, and zero to turn off. Default 
+is off.);
+
+static int dbg_conn;
+module_param_named(debug_conn, dbg_conn, int,
+  S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(debug_conn,
+Turn on debugging for connections in scsi_transport_iscsi 
+module. Set to 1 to turn on, and zero to turn off. Default 
+is off.);
+
+#define ISCSI_DBG_TRANS_SESSION(_session, dbg_fmt, arg...) \
+   do {\
+   if (dbg_session)\
+   iscsi_cls_session_printk(KERN_INFO, _session,   \
+%s:  dbg_fmt,\
+__func__, ##arg);  \
+   } while (0);
+
+#define ISCSI_DBG_TRANS_CONN(_conn, dbg_fmt, arg...)   \
+   do {\
+   if (dbg_conn)   \
+   iscsi_cls_conn_printk(KERN_INFO, _conn, \
+ %s:  dbg_fmt,   \
+ __func__, ##arg); \
+   } while (0);
+
 struct iscsi_internal {
struct scsi_transport_template t;
struct iscsi_transport *iscsi_transport;
@@ -377,6 +409,7 @@ static void iscsi_session_release(struct device *dev)
 
shost = iscsi_session_to_shost(session);
scsi_host_put(shost);
+   ISCSI_DBG_TRANS_SESSION(session, Completing session release\n);
kfree(session);
 }
 
@@ -441,6 +474,9 @@ static int iscsi_user_scan_session(struct device *dev, void 
*data)
return 0;
 
session = iscsi_dev_to_session(dev);
+
+   ISCSI_DBG_TRANS_SESSION(session, Scanning session\n);
+
shost = iscsi_session_to_shost(session);
ihost = shost-shost_data;
 
@@ -448,8 +484,7 @@ static int iscsi_user_scan_session(struct device *dev, void 
*data)
spin_lock_irqsave(session-lock, flags);
if (session-state != ISCSI_SESSION_LOGGED_IN) {
spin_unlock_irqrestore(session-lock, flags);
-   mutex_unlock(ihost-mutex);
-   return 0;
+   goto user_scan_exit;
}
id = session-target_id;
spin_unlock_irqrestore(session-lock, flags);
@@ -462,7 +497,10 @@ static int iscsi_user_scan_session(struct device *dev, 
void *data)
scsi_scan_target(session-dev, 0, id,
 scan_data-lun, 1);
}
+   
+user_scan_exit:
mutex_unlock(ihost-mutex);
+   ISCSI_DBG_TRANS_SESSION(session, Completed session scan\n);
return 0;
 }
 
@@ -522,7 +560,9 @@ static void session_recovery_timedout(struct work_struct 
*work)
if (session-transport-session_recovery_timedout)
session-transport-session_recovery_timedout(session);
 
+   ISCSI_DBG_TRANS_SESSION(session, Unblocking SCSI target\n);
scsi_target_unblock(session-dev);
+   

Re: iscsiadm -m iface + routing

2009-07-29 Thread julian thomas
Hello,

Could you please send the mib,snmpwalk output of EqualLogic.If it supports
SMI-s could you post the mof files for the same.Or is there any other
way(CLI Interface)to monitor equallogic...?

On Tue, Jul 28, 2009 at 11:42 PM, Mike Christie micha...@cs.wisc.eduwrote:


 Ulrich Windl wrote:
  On 28 Jul 2009 at 0:22, Moi meme wrote:
 
  Hello,
 
  I am using a DELL Equallogic at work and I use a SLES10 SP2 (was
  SP1 before last week-end), are they known problems with the SLES SP2 ?
  I didn't notice any problem since the upgrade !
 
  Same here: Only when the network has a problem, I see _many_ messages.
  Only problem (not iSCSI) is that he links in /dev/disk/by-id are not
 reliably
  populated after boot. This may be a multipath/udev feature. As we boot
 very
  rarely, I did not put much effort into examining this...
 

 What EQL firmware are you using? On the EQL box if you do a show
 command it is in there.

 I was having a similar problem and updated the firmware to 4.1.4 and it
 has been working for me now. For some reason the udev scsi_id callout
 would send some commands to the target, and the target would never respond.

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: undefined reference to `strlcpy' when building iscsid

2009-07-29 Thread hostile

Hi,

you need to compile sysdeps.c in utils/sysdeps first. Then it will
work.

Best regards

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



[PATCH 2/2] bnx2i : Fix cid #n not valid issue

2009-07-29 Thread Anil Veerabhadrappa


* when bnx2i_adapter_ready() fails, connection handle(cid) = 0 is
  wrongly freed because 'cid' is not yet allocated for the endpoint
* Fix is to initialize bnx2i_ep-ep_iscsi_cid to '-1' in bnx2i_alloc_ep()
  and not in bnx2i_ep_connect() to avoid releasing invalid 'cid'
* There is already a check in bnx2i_free_iscsi_cid() not to free
  invalid iscsi connection handle (-1)

Signed-off-by: Anil Veerabhadrappa ani...@broadcom.com
---
 drivers/scsi/bnx2i/bnx2i_iscsi.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c
index 9535bb6..08d0bfc 100644
--- a/drivers/scsi/bnx2i/bnx2i_iscsi.c
+++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c
@@ -387,6 +387,7 @@ static struct iscsi_endpoint *bnx2i_alloc_ep(struct 
bnx2i_hba *hba)
bnx2i_ep = ep-dd_data;
INIT_LIST_HEAD(bnx2i_ep-link);
bnx2i_ep-state = EP_STATE_IDLE;
+   bnx2i_ep-ep_iscsi_cid = (u16) -1;
bnx2i_ep-hba = hba;
bnx2i_ep-hba_age = hba-age;
hba-ofld_conns_active++;
@@ -1678,8 +1679,6 @@ static struct iscsi_endpoint *bnx2i_ep_connect(struct 
Scsi_Host *shost,
goto net_if_down;
}
 
-   bnx2i_ep-state = EP_STATE_IDLE;
-   bnx2i_ep-ep_iscsi_cid = (u16) -1;
bnx2i_ep-num_active_cmds = 0;
iscsi_cid = bnx2i_alloc_iscsi_cid(hba);
if (iscsi_cid == -1) {
-- 
1.5.4.3





--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---