Re: Failover time of iSCSI multipath devices.

2010-03-16 Thread Alex Zeffertt

Mike Christie wrote:

On 03/15/2010 05:56 AM, Alex Zeffertt wrote:

The bugzilla ticket requests a merge of two git commits, but neither of
those contain the libiscsi.c change that addresses bug #2. Was this a
mistake, or did you deliberately omit that part of your
speed-up-conn-fail-take3.patch when you raised the ticket?



Hey,

It was laziness. I did not update the bugzilla. When I made it, I 
thought we were only hitting #1 (this was the first patch I sent in this 
thread). But when I was testing those 2 patches with RHEL 5, I finally 
hit the problem that bet was hitting. When I figured out that we were 
hitting #2, I made the second patch in this thread. I then just did not 
update the bugzilla with the new patch. For RHEL I ended up sending the 
second patch though.




Thanks for the clarification.  Is the fix for #2 being upstreamed?  If so, is 
there a git commit I can reference?  (This will make it easier for us to drop 
the patch when we pull a kernel which has the fix in it.)


Thanks in advance,

Alex

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-16 Thread Mike Christie

On 03/16/2010 04:50 AM, Alex Zeffertt wrote:

Mike Christie wrote:

On 03/15/2010 05:56 AM, Alex Zeffertt wrote:

The bugzilla ticket requests a merge of two git commits, but neither of
those contain the libiscsi.c change that addresses bug #2. Was this a
mistake, or did you deliberately omit that part of your
speed-up-conn-fail-take3.patch when you raised the ticket?



Hey,

It was laziness. I did not update the bugzilla. When I made it, I
thought we were only hitting #1 (this was the first patch I sent in
this thread). But when I was testing those 2 patches with RHEL 5, I
finally hit the problem that bet was hitting. When I figured out that
we were hitting #2, I made the second patch in this thread. I then
just did not update the bugzilla with the new patch. For RHEL I ended
up sending the second patch though.



Thanks for the clarification. Is the fix for #2 being upstreamed? If so,


I sent it to linux-scsi/James a couple days after I sent the patch in 
this thread. It is not merged yet.




is there a git commit I can reference? (This will make it easier for us
to drop the patch when we pull a kernel which has the fix in it.)



Do you want me to cc you on all future iscsi patches that go upstream? 
When James merges it and sends it to linus, then I get a automated 
message from him. If I cc you, you can get one too.





Thanks in advance,

Alex


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-16 Thread bennyturns
I am trying work out a formula for total failover time of my
multipathed iSCSI device so far I have:

failover time = nop timout + nop interval + replacement_timeout
seconds + scsi block device timeout(/sys/block/sdX/device/timeout)

Is there anything else that I am missing?

-b



On Mar 15, 4:53 pm, Mike Christie micha...@cs.wisc.edu wrote:
 On 03/15/2010 05:56 AM, Alex Zeffertt wrote:



  The bugzilla ticket requests a merge of two git commits, but neither of
  those contain the libiscsi.c change that addresses bug #2. Was this a
  mistake, or did you deliberately omit that part of your
  speed-up-conn-fail-take3.patch when you raised the ticket?

 Hey,

 It was laziness. I did not update the bugzilla. When I made it, I
 thought we were only hitting #1 (this was the first patch I sent in this
 thread). But when I was testing those 2 patches with RHEL 5, I finally
 hit the problem that bet was hitting. When I figured out that we were
 hitting #2, I made the second patch in this thread. I then just did not
 update the bugzilla with the new patch. For RHEL I ended up sending the
 second patch though.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-16 Thread Mike Christie

On 03/16/2010 04:02 PM, bennyturns wrote:

I am trying work out a formula for total failover time of my
multipathed iSCSI device so far I have:

failover time = nop timout + nop interval + replacement_timeout
seconds + scsi block device timeout(/sys/block/sdX/device/timeout)



/sys/block/sdX/device/timeout is the scsi cmd timeout. It only comes 
into play if you have nops off or have their timers set higher than the 
scsi cmd timeout (you do not want to do this). When using nops if they 
timeout then if the scsi cmd timer fires, the iscsi code would basically 
tell the scsi layer they it is handling the problem so do not run the 
scsi error handler.




So it is:

failover time = nop timout + nop interval + replacement_timeout
or
/sys/block/sdX/device/timeout + replacement_timeout + min(abort, lun 
reset timeoutt, target reset timeout).





Is there anything else that I am missing?

-b



On Mar 15, 4:53 pm, Mike Christiemicha...@cs.wisc.edu  wrote:

On 03/15/2010 05:56 AM, Alex Zeffertt wrote:




The bugzilla ticket requests a merge of two git commits, but neither of
those contain the libiscsi.c change that addresses bug #2. Was this a
mistake, or did you deliberately omit that part of your
speed-up-conn-fail-take3.patch when you raised the ticket?


Hey,

It was laziness. I did not update the bugzilla. When I made it, I
thought we were only hitting #1 (this was the first patch I sent in this
thread). But when I was testing those 2 patches with RHEL 5, I finally
hit the problem that bet was hitting. When I figured out that we were
hitting #2, I made the second patch in this thread. I then just did not
update the bugzilla with the new patch. For RHEL I ended up sending the
second patch though.




--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-16 Thread bennyturns
Thks Mike, that explains it :)

On Mar 16, 5:27 pm, Mike Christie micha...@cs.wisc.edu wrote:
 On 03/16/2010 04:02 PM, bennyturns wrote:

  I am trying work out a formula for total failover time of my
  multipathed iSCSI device so far I have:

  failover time = nop timout + nop interval + replacement_timeout
  seconds + scsi block device timeout(/sys/block/sdX/device/timeout)

 /sys/block/sdX/device/timeout is the scsi cmd timeout. It only comes
 into play if you have nops off or have their timers set higher than the
 scsi cmd timeout (you do not want to do this). When using nops if they
 timeout then if the scsi cmd timer fires, the iscsi code would basically
 tell the scsi layer they it is handling the problem so do not run the
 scsi error handler.

 So it is:

 failover time = nop timout + nop interval + replacement_timeout
 or
 /sys/block/sdX/device/timeout + replacement_timeout + min(abort, lun
 reset timeoutt, target reset timeout).

  Is there anything else that I am missing?

  -b

  On Mar 15, 4:53 pm, Mike Christiemicha...@cs.wisc.edu  wrote:
  On 03/15/2010 05:56 AM, Alex Zeffertt wrote:

  The bugzilla ticket requests a merge of two git commits, but neither of
  those contain the libiscsi.c change that addresses bug #2. Was this a
  mistake, or did you deliberately omit that part of your
  speed-up-conn-fail-take3.patch when you raised the ticket?

  Hey,

  It was laziness. I did not update the bugzilla. When I made it, I
  thought we were only hitting #1 (this was the first patch I sent in this
  thread). But when I was testing those 2 patches with RHEL 5, I finally
  hit the problem that bet was hitting. When I figured out that we were
  hitting #2, I made the second patch in this thread. I then just did not
  update the bugzilla with the new patch. For RHEL I ended up sending the
  second patch though.



-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-15 Thread Alex Zeffertt

Mike Christie wrote:

On 03/07/2010 07:46 AM, Pasi Kärkkäinen wrote:

On Fri, Mar 05, 2010 at 05:07:53AM -0600, Mike Christie wrote:

On 03/01/2010 08:53 PM, Mike Christie wrote:

On 03/01/2010 12:06 PM, bet wrote:

1. Based on my timeouts I would think that my session would time out

Yes. It should timeout about 15 secs after you see
Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of
5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304,
now 4894304

You might be hitting a bug where the network layer gets stuck trying to
send data. I attached a patch that should fix the problem.


It looks like we have two bugs.

1. We can get stuck in the network code.
2. There is a race where the session-state can get reset due to the
xmit thread throwing an error after we have set the session-state but
before we have set the stop_stage.

The attached patch for RHEL 5.5 should fix them all.


Hello,

Will this patch be in the next RHEL 5.5 beta kernel? Easier to test if there's
no need to build custom kernel :)



I am not sure if it will be in the next 5.5 beta. It should be in 5.5 
though. Do you have a bugzilla account? I made this bugzilla

https://bugzilla.redhat.com/show_bug.cgi?id=570681
You can add yourself to it and when the patch is merged you will get a 
notification and a link to a test kernel.


If you do not have a bugzilla account, just let me know and I will ping 
you when it is available in a test kernel.




Hi Mike,

The bugzilla ticket requests a merge of two git commits, but neither of those 
contain the libiscsi.c change that addresses bug #2.  Was this a mistake, or did 
you deliberately omit that part of your speed-up-conn-fail-take3.patch when you 
raised the ticket?


TIA,

Alex

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-15 Thread Mike Christie

On 03/15/2010 05:56 AM, Alex Zeffertt wrote:


The bugzilla ticket requests a merge of two git commits, but neither of
those contain the libiscsi.c change that addresses bug #2. Was this a
mistake, or did you deliberately omit that part of your
speed-up-conn-fail-take3.patch when you raised the ticket?



Hey,

It was laziness. I did not update the bugzilla. When I made it, I 
thought we were only hitting #1 (this was the first patch I sent in this 
thread). But when I was testing those 2 patches with RHEL 5, I finally 
hit the problem that bet was hitting. When I figured out that we were 
hitting #2, I made the second patch in this thread. I then just did not 
update the bugzilla with the new patch. For RHEL I ended up sending the 
second patch though.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-08 Thread Mike Christie

On 03/07/2010 07:46 AM, Pasi Kärkkäinen wrote:

On Fri, Mar 05, 2010 at 05:07:53AM -0600, Mike Christie wrote:

On 03/01/2010 08:53 PM, Mike Christie wrote:

On 03/01/2010 12:06 PM, bet wrote:

1. Based on my timeouts I would think that my session would time out


Yes. It should timeout about 15 secs after you see
Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of
5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304,
now 4894304

You might be hitting a bug where the network layer gets stuck trying to
send data. I attached a patch that should fix the problem.



It looks like we have two bugs.

1. We can get stuck in the network code.
2. There is a race where the session-state can get reset due to the
xmit thread throwing an error after we have set the session-state but
before we have set the stop_stage.

The attached patch for RHEL 5.5 should fix them all.



Hello,

Will this patch be in the next RHEL 5.5 beta kernel? Easier to test if there's
no need to build custom kernel :)



I am not sure if it will be in the next 5.5 beta. It should be in 5.5 
though. Do you have a bugzilla account? I made this bugzilla

https://bugzilla.redhat.com/show_bug.cgi?id=570681
You can add yourself to it and when the patch is merged you will get a 
notification and a link to a test kernel.


If you do not have a bugzilla account, just let me know and I will ping 
you when it is available in a test kernel.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-08 Thread Pasi Kärkkäinen
On Mon, Mar 08, 2010 at 02:07:14PM -0600, Mike Christie wrote:
 On 03/07/2010 07:46 AM, Pasi Kärkkäinen wrote:
 On Fri, Mar 05, 2010 at 05:07:53AM -0600, Mike Christie wrote:
 On 03/01/2010 08:53 PM, Mike Christie wrote:
 On 03/01/2010 12:06 PM, bet wrote:
 1. Based on my timeouts I would think that my session would time out

 Yes. It should timeout about 15 secs after you see
 Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of
 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304,
 now 4894304

 You might be hitting a bug where the network layer gets stuck trying to
 send data. I attached a patch that should fix the problem.


 It looks like we have two bugs.

 1. We can get stuck in the network code.
 2. There is a race where the session-state can get reset due to the
 xmit thread throwing an error after we have set the session-state but
 before we have set the stop_stage.

 The attached patch for RHEL 5.5 should fix them all.


 Hello,

 Will this patch be in the next RHEL 5.5 beta kernel? Easier to test if 
 there's
 no need to build custom kernel :)


 I am not sure if it will be in the next 5.5 beta. It should be in 5.5  
 though. Do you have a bugzilla account? I made this bugzilla
 https://bugzilla.redhat.com/show_bug.cgi?id=570681
 You can add yourself to it and when the patch is merged you will get a  
 notification and a link to a test kernel.

 If you do not have a bugzilla account, just let me know and I will ping  
 you when it is available in a test kernel.


I just added myself to the bug. Thanks!

-- Pasi

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-07 Thread Pasi Kärkkäinen
On Fri, Mar 05, 2010 at 05:07:53AM -0600, Mike Christie wrote:
 On 03/01/2010 08:53 PM, Mike Christie wrote:
 On 03/01/2010 12:06 PM, bet wrote:
 1. Based on my timeouts I would think that my session would time out

 Yes. It should timeout about 15 secs after you see
   Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of
   5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304,
   now 4894304

 You might be hitting a bug where the network layer gets stuck trying to
 send data. I attached a patch that should fix the problem.


 It looks like we have two bugs.

 1. We can get stuck in the network code.
 2. There is a race where the session-state can get reset due to the  
 xmit thread throwing an error after we have set the session-state but  
 before we have set the stop_stage.

 The attached patch for RHEL 5.5 should fix them all.


Hello,

Will this patch be in the next RHEL 5.5 beta kernel? Easier to test if there's 
no need to build custom kernel :)

-- Pasi

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-05 Thread Mike Christie

On 03/01/2010 08:53 PM, Mike Christie wrote:

On 03/01/2010 12:06 PM, bet wrote:

1. Based on my timeouts I would think that my session would time out


Yes. It should timeout about 15 secs after you see
  Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of
  5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304,
  now 4894304

You might be hitting a bug where the network layer gets stuck trying to
send data. I attached a patch that should fix the problem.



It looks like we have two bugs.

1. We can get stuck in the network code.
2. There is a race where the session-state can get reset due to the 
xmit thread throwing an error after we have set the session-state but 
before we have set the stop_stage.


The attached patch for RHEL 5.5 should fix them all.

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 5c39369..2c908ce 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -254,8 +254,6 @@ static int iscsi_sw_tcp_xmit_segment(struct iscsi_tcp_conn 
*tcp_conn,
 
if (r  0) {
iscsi_tcp_segment_unmap(segment);
-   if (copied || r == -EAGAIN)
-   break;
return r;
}
copied += r;
@@ -276,11 +274,17 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
 
while (1) {
rc = iscsi_sw_tcp_xmit_segment(tcp_conn, segment);
-   if (rc  0) {
+   /*
+* We may not have been able to send data because the conn
+* is getting stopped. libiscsi will know so propogate err
+* for it to do the right thing.
+*/
+   if (rc == -EAGAIN)
+   return rc;
+   else if (rc  0) {
rc = ISCSI_ERR_XMIT_FAILED;
goto error;
-   }
-   if (rc == 0)
+   } else if (rc == 0)
break;
 
consumed += rc;
@@ -561,9 +565,10 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn 
*cls_conn, int flag)
struct iscsi_conn *conn = cls_conn-dd_data;
struct iscsi_tcp_conn *tcp_conn = conn-dd_data;
struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn-dd_data;
+   struct socket *sock = tcp_sw_conn-sock;
 
/* userspace may have goofed up and not bound us */
-   if (!tcp_sw_conn-sock)
+   if (!sock)
return;
/*
 * Make sure our recv side is stopped.
@@ -574,6 +579,11 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn 
*cls_conn, int flag)
set_bit(ISCSI_SUSPEND_BIT, conn-suspend_rx);
write_unlock_bh(tcp_sw_conn-sock-sk-sk_callback_lock);
 
+   if (sock-sk-sk_sleep  waitqueue_active(sock-sk-sk_sleep)) {
+   sock-sk-sk_err = EIO;
+   wake_up_interruptible(sock-sk-sk_sleep);
+   }
+
iscsi2_conn_stop(cls_conn, flag);
iscsi_sw_tcp_release_conn(conn);
 }
diff --git a/drivers/scsi/libiscsi2.c b/drivers/scsi/libiscsi2.c
index 61abdf9..262617e 100644
--- a/drivers/scsi/libiscsi2.c
+++ b/drivers/scsi/libiscsi2.c
@@ -2657,14 +2657,15 @@ static void iscsi_start_session_recovery(struct 
iscsi_session *session,
session-state = ISCSI_STATE_TERMINATE;
else if (conn-stop_stage != STOP_CONN_RECOVER)
session-state = ISCSI_STATE_IN_RECOVERY;
+
+   old_stop_stage = conn-stop_stage;
+   conn-stop_stage = flag;
spin_unlock_bh(session-lock);
 
del_timer_sync(conn-transport_timer);
iscsi2_suspend_tx(conn);
 
spin_lock_bh(session-lock);
-   old_stop_stage = conn-stop_stage;
-   conn-stop_stage = flag;
conn-c_stage = ISCSI_CONN_STOPPED;
spin_unlock_bh(session-lock);
 


Re: Failover time of iSCSI multipath devices.

2010-03-02 Thread bennyturns
I was able to get my failover time down to about 25-30 seconds:

Mar  1 12:32:37 bentCluster-1 kernel: tg3: eth0: Link is down.

Mar  1 12:33:03 bentCluster-1 multipathd: checker failed path 8:224 in
map mpath0
Mar  1 12:33:03 bentCluster-1 kernel: end_request: I/O error, dev sdo,
sector 1249431
Mar  1 12:33:03 bentCluster-1 multipathd: mpath0: remaining active
paths: 1

I ended up setting:

[r...@bentcluster-1 ~]# echo noop  /sys/block/sdn/queue/scheduler
[r...@bentcluster-1 ~]# echo noop  /sys/block/sdo/queue/scheduler

[r...@bentcluster-1 ~]# echo 64  /sys/block/sdn/queue/max_sectors_kb
[r...@bentcluster-1 ~]# echo 64  /sys/block/sdo/queue/max_sectors_kb

[r...@bentcluster-1 ~]# echo 5  /sys/block/sdn/device/timeout
[r...@bentcluster-1 ~]# echo 5  /sys/block/sdo/device/timeout

I couldn't get it under 90 seconds without /sys/block/sdn/device/
timeout being set and in my best test I hit 26 seconds.  I have a
couple questions:

1.  Do I need the scsi timeout to be turned down or could I be hitting
the bug Mike mentioned?

2.  The patch that Mike attached to this tread, is there a Red Hat BZ
associated with it so I can track its progress?  If not should I open
a BZ?

3.  In a best case scenario what kind of failover time can I expect
with multipath and iSCSI?  I see about 25-30 seconds, is this
accurate?  I saw 3 second failover time using bonded NICs instead of
dm-multipath, is there any specific reason to use multipathd instead
of channel bonding?

Thanks for all the help everyone!

-Ben

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Failover time of iSCSI multipath devices.

2010-03-01 Thread Mike Christie

On 03/01/2010 12:06 PM, bet wrote:

1.  Based on my timeouts I would think that my session would time out


Yes. It should timeout about 15 secs after you see
 Mar  1 07:14:27 bentCluster-1 kernel:  connection4:0: ping timeout of
 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304,
 now 4894304

You might be hitting a bug where the network layer gets stuck trying to 
send data. I attached a patch that should fix the problem.


If you do not know how to build a RHEL kernel let me know the arch you 
are using and I can build a kernel here (it takes about a day).





after 15 seconds.  Anyone have an idea why is it taking 67 seconds?
Am I missing any other timeout values?


No. The ones you have set are it.



2.  In a perfect world what is the best case scenario for the failure
of my iSCSI session?



It should work like in that doc.

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 5c39369..e840806 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -276,11 +276,12 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
 
while (1) {
rc = iscsi_sw_tcp_xmit_segment(tcp_conn, segment);
-   if (rc  0) {
+   if (rc == -EAGAIN)
+   return rc;
+   else if (rc  0) {
rc = ISCSI_ERR_XMIT_FAILED;
goto error;
-   }
-   if (rc == 0)
+   } else if (rc == 0)
break;
 
consumed += rc;
@@ -561,9 +562,10 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn 
*cls_conn, int flag)
struct iscsi_conn *conn = cls_conn-dd_data;
struct iscsi_tcp_conn *tcp_conn = conn-dd_data;
struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn-dd_data;
+   struct socket *sock = tcp_sw_conn-sock;
 
/* userspace may have goofed up and not bound us */
-   if (!tcp_sw_conn-sock)
+   if (!sock)
return;
/*
 * Make sure our recv side is stopped.
@@ -574,6 +576,11 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn 
*cls_conn, int flag)
set_bit(ISCSI_SUSPEND_BIT, conn-suspend_rx);
write_unlock_bh(tcp_sw_conn-sock-sk-sk_callback_lock);
 
+   if (sock-sk-sk_sleep  waitqueue_active(sock-sk-sk_sleep)) {
+   sock-sk-sk_err = EIO;
+   wake_up_interruptible(sock-sk-sk_sleep);
+   }
+
iscsi2_conn_stop(cls_conn, flag);
iscsi_sw_tcp_release_conn(conn);
 }


Re: Failover time of iSCSI multipath devices.

2010-03-01 Thread guy keren

Mike Christie wrote:

On 03/01/2010 12:06 PM, bet wrote:

1.  Based on my timeouts I would think that my session would time out


Yes. It should timeout about 15 secs after you see
  Mar  1 07:14:27 bentCluster-1 kernel:  connection4:0: ping timeout of
  5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304,
  now 4894304

You might be hitting a bug where the network layer gets stuck trying to 
send data. I attached a patch that should fix the problem.


If you do not know how to build a RHEL kernel let me know the arch you 
are using and I can build a kernel here (it takes about a day).





after 15 seconds.  Anyone have an idea why is it taking 67 seconds?
Am I missing any other timeout values?


No. The ones you have set are it.



2.  In a perfect world what is the best case scenario for the failure
of my iSCSI session?



It should work like in that doc.



wouldn't the abort timeout also have an effect here? or will iSCSI fail 
the coming abort (that the mid-layer sends when it gets an error sending 
a SCSI command) immediately?


--guy

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.