Re: Failover time of iSCSI multipath devices.
Mike Christie wrote: On 03/15/2010 05:56 AM, Alex Zeffertt wrote: The bugzilla ticket requests a merge of two git commits, but neither of those contain the libiscsi.c change that addresses bug #2. Was this a mistake, or did you deliberately omit that part of your speed-up-conn-fail-take3.patch when you raised the ticket? Hey, It was laziness. I did not update the bugzilla. When I made it, I thought we were only hitting #1 (this was the first patch I sent in this thread). But when I was testing those 2 patches with RHEL 5, I finally hit the problem that bet was hitting. When I figured out that we were hitting #2, I made the second patch in this thread. I then just did not update the bugzilla with the new patch. For RHEL I ended up sending the second patch though. Thanks for the clarification. Is the fix for #2 being upstreamed? If so, is there a git commit I can reference? (This will make it easier for us to drop the patch when we pull a kernel which has the fix in it.) Thanks in advance, Alex -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
On 03/16/2010 04:50 AM, Alex Zeffertt wrote: Mike Christie wrote: On 03/15/2010 05:56 AM, Alex Zeffertt wrote: The bugzilla ticket requests a merge of two git commits, but neither of those contain the libiscsi.c change that addresses bug #2. Was this a mistake, or did you deliberately omit that part of your speed-up-conn-fail-take3.patch when you raised the ticket? Hey, It was laziness. I did not update the bugzilla. When I made it, I thought we were only hitting #1 (this was the first patch I sent in this thread). But when I was testing those 2 patches with RHEL 5, I finally hit the problem that bet was hitting. When I figured out that we were hitting #2, I made the second patch in this thread. I then just did not update the bugzilla with the new patch. For RHEL I ended up sending the second patch though. Thanks for the clarification. Is the fix for #2 being upstreamed? If so, I sent it to linux-scsi/James a couple days after I sent the patch in this thread. It is not merged yet. is there a git commit I can reference? (This will make it easier for us to drop the patch when we pull a kernel which has the fix in it.) Do you want me to cc you on all future iscsi patches that go upstream? When James merges it and sends it to linus, then I get a automated message from him. If I cc you, you can get one too. Thanks in advance, Alex -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
I am trying work out a formula for total failover time of my multipathed iSCSI device so far I have: failover time = nop timout + nop interval + replacement_timeout seconds + scsi block device timeout(/sys/block/sdX/device/timeout) Is there anything else that I am missing? -b On Mar 15, 4:53 pm, Mike Christie micha...@cs.wisc.edu wrote: On 03/15/2010 05:56 AM, Alex Zeffertt wrote: The bugzilla ticket requests a merge of two git commits, but neither of those contain the libiscsi.c change that addresses bug #2. Was this a mistake, or did you deliberately omit that part of your speed-up-conn-fail-take3.patch when you raised the ticket? Hey, It was laziness. I did not update the bugzilla. When I made it, I thought we were only hitting #1 (this was the first patch I sent in this thread). But when I was testing those 2 patches with RHEL 5, I finally hit the problem that bet was hitting. When I figured out that we were hitting #2, I made the second patch in this thread. I then just did not update the bugzilla with the new patch. For RHEL I ended up sending the second patch though. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
On 03/16/2010 04:02 PM, bennyturns wrote: I am trying work out a formula for total failover time of my multipathed iSCSI device so far I have: failover time = nop timout + nop interval + replacement_timeout seconds + scsi block device timeout(/sys/block/sdX/device/timeout) /sys/block/sdX/device/timeout is the scsi cmd timeout. It only comes into play if you have nops off or have their timers set higher than the scsi cmd timeout (you do not want to do this). When using nops if they timeout then if the scsi cmd timer fires, the iscsi code would basically tell the scsi layer they it is handling the problem so do not run the scsi error handler. So it is: failover time = nop timout + nop interval + replacement_timeout or /sys/block/sdX/device/timeout + replacement_timeout + min(abort, lun reset timeoutt, target reset timeout). Is there anything else that I am missing? -b On Mar 15, 4:53 pm, Mike Christiemicha...@cs.wisc.edu wrote: On 03/15/2010 05:56 AM, Alex Zeffertt wrote: The bugzilla ticket requests a merge of two git commits, but neither of those contain the libiscsi.c change that addresses bug #2. Was this a mistake, or did you deliberately omit that part of your speed-up-conn-fail-take3.patch when you raised the ticket? Hey, It was laziness. I did not update the bugzilla. When I made it, I thought we were only hitting #1 (this was the first patch I sent in this thread). But when I was testing those 2 patches with RHEL 5, I finally hit the problem that bet was hitting. When I figured out that we were hitting #2, I made the second patch in this thread. I then just did not update the bugzilla with the new patch. For RHEL I ended up sending the second patch though. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
Thks Mike, that explains it :) On Mar 16, 5:27 pm, Mike Christie micha...@cs.wisc.edu wrote: On 03/16/2010 04:02 PM, bennyturns wrote: I am trying work out a formula for total failover time of my multipathed iSCSI device so far I have: failover time = nop timout + nop interval + replacement_timeout seconds + scsi block device timeout(/sys/block/sdX/device/timeout) /sys/block/sdX/device/timeout is the scsi cmd timeout. It only comes into play if you have nops off or have their timers set higher than the scsi cmd timeout (you do not want to do this). When using nops if they timeout then if the scsi cmd timer fires, the iscsi code would basically tell the scsi layer they it is handling the problem so do not run the scsi error handler. So it is: failover time = nop timout + nop interval + replacement_timeout or /sys/block/sdX/device/timeout + replacement_timeout + min(abort, lun reset timeoutt, target reset timeout). Is there anything else that I am missing? -b On Mar 15, 4:53 pm, Mike Christiemicha...@cs.wisc.edu wrote: On 03/15/2010 05:56 AM, Alex Zeffertt wrote: The bugzilla ticket requests a merge of two git commits, but neither of those contain the libiscsi.c change that addresses bug #2. Was this a mistake, or did you deliberately omit that part of your speed-up-conn-fail-take3.patch when you raised the ticket? Hey, It was laziness. I did not update the bugzilla. When I made it, I thought we were only hitting #1 (this was the first patch I sent in this thread). But when I was testing those 2 patches with RHEL 5, I finally hit the problem that bet was hitting. When I figured out that we were hitting #2, I made the second patch in this thread. I then just did not update the bugzilla with the new patch. For RHEL I ended up sending the second patch though. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
Mike Christie wrote: On 03/07/2010 07:46 AM, Pasi Kärkkäinen wrote: On Fri, Mar 05, 2010 at 05:07:53AM -0600, Mike Christie wrote: On 03/01/2010 08:53 PM, Mike Christie wrote: On 03/01/2010 12:06 PM, bet wrote: 1. Based on my timeouts I would think that my session would time out Yes. It should timeout about 15 secs after you see Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 You might be hitting a bug where the network layer gets stuck trying to send data. I attached a patch that should fix the problem. It looks like we have two bugs. 1. We can get stuck in the network code. 2. There is a race where the session-state can get reset due to the xmit thread throwing an error after we have set the session-state but before we have set the stop_stage. The attached patch for RHEL 5.5 should fix them all. Hello, Will this patch be in the next RHEL 5.5 beta kernel? Easier to test if there's no need to build custom kernel :) I am not sure if it will be in the next 5.5 beta. It should be in 5.5 though. Do you have a bugzilla account? I made this bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=570681 You can add yourself to it and when the patch is merged you will get a notification and a link to a test kernel. If you do not have a bugzilla account, just let me know and I will ping you when it is available in a test kernel. Hi Mike, The bugzilla ticket requests a merge of two git commits, but neither of those contain the libiscsi.c change that addresses bug #2. Was this a mistake, or did you deliberately omit that part of your speed-up-conn-fail-take3.patch when you raised the ticket? TIA, Alex -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
On 03/15/2010 05:56 AM, Alex Zeffertt wrote: The bugzilla ticket requests a merge of two git commits, but neither of those contain the libiscsi.c change that addresses bug #2. Was this a mistake, or did you deliberately omit that part of your speed-up-conn-fail-take3.patch when you raised the ticket? Hey, It was laziness. I did not update the bugzilla. When I made it, I thought we were only hitting #1 (this was the first patch I sent in this thread). But when I was testing those 2 patches with RHEL 5, I finally hit the problem that bet was hitting. When I figured out that we were hitting #2, I made the second patch in this thread. I then just did not update the bugzilla with the new patch. For RHEL I ended up sending the second patch though. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
On 03/07/2010 07:46 AM, Pasi Kärkkäinen wrote: On Fri, Mar 05, 2010 at 05:07:53AM -0600, Mike Christie wrote: On 03/01/2010 08:53 PM, Mike Christie wrote: On 03/01/2010 12:06 PM, bet wrote: 1. Based on my timeouts I would think that my session would time out Yes. It should timeout about 15 secs after you see Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 You might be hitting a bug where the network layer gets stuck trying to send data. I attached a patch that should fix the problem. It looks like we have two bugs. 1. We can get stuck in the network code. 2. There is a race where the session-state can get reset due to the xmit thread throwing an error after we have set the session-state but before we have set the stop_stage. The attached patch for RHEL 5.5 should fix them all. Hello, Will this patch be in the next RHEL 5.5 beta kernel? Easier to test if there's no need to build custom kernel :) I am not sure if it will be in the next 5.5 beta. It should be in 5.5 though. Do you have a bugzilla account? I made this bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=570681 You can add yourself to it and when the patch is merged you will get a notification and a link to a test kernel. If you do not have a bugzilla account, just let me know and I will ping you when it is available in a test kernel. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
On Mon, Mar 08, 2010 at 02:07:14PM -0600, Mike Christie wrote: On 03/07/2010 07:46 AM, Pasi Kärkkäinen wrote: On Fri, Mar 05, 2010 at 05:07:53AM -0600, Mike Christie wrote: On 03/01/2010 08:53 PM, Mike Christie wrote: On 03/01/2010 12:06 PM, bet wrote: 1. Based on my timeouts I would think that my session would time out Yes. It should timeout about 15 secs after you see Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 You might be hitting a bug where the network layer gets stuck trying to send data. I attached a patch that should fix the problem. It looks like we have two bugs. 1. We can get stuck in the network code. 2. There is a race where the session-state can get reset due to the xmit thread throwing an error after we have set the session-state but before we have set the stop_stage. The attached patch for RHEL 5.5 should fix them all. Hello, Will this patch be in the next RHEL 5.5 beta kernel? Easier to test if there's no need to build custom kernel :) I am not sure if it will be in the next 5.5 beta. It should be in 5.5 though. Do you have a bugzilla account? I made this bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=570681 You can add yourself to it and when the patch is merged you will get a notification and a link to a test kernel. If you do not have a bugzilla account, just let me know and I will ping you when it is available in a test kernel. I just added myself to the bug. Thanks! -- Pasi -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
On Fri, Mar 05, 2010 at 05:07:53AM -0600, Mike Christie wrote: On 03/01/2010 08:53 PM, Mike Christie wrote: On 03/01/2010 12:06 PM, bet wrote: 1. Based on my timeouts I would think that my session would time out Yes. It should timeout about 15 secs after you see Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 You might be hitting a bug where the network layer gets stuck trying to send data. I attached a patch that should fix the problem. It looks like we have two bugs. 1. We can get stuck in the network code. 2. There is a race where the session-state can get reset due to the xmit thread throwing an error after we have set the session-state but before we have set the stop_stage. The attached patch for RHEL 5.5 should fix them all. Hello, Will this patch be in the next RHEL 5.5 beta kernel? Easier to test if there's no need to build custom kernel :) -- Pasi -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
On 03/01/2010 08:53 PM, Mike Christie wrote: On 03/01/2010 12:06 PM, bet wrote: 1. Based on my timeouts I would think that my session would time out Yes. It should timeout about 15 secs after you see Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 You might be hitting a bug where the network layer gets stuck trying to send data. I attached a patch that should fix the problem. It looks like we have two bugs. 1. We can get stuck in the network code. 2. There is a race where the session-state can get reset due to the xmit thread throwing an error after we have set the session-state but before we have set the stop_stage. The attached patch for RHEL 5.5 should fix them all. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 5c39369..2c908ce 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -254,8 +254,6 @@ static int iscsi_sw_tcp_xmit_segment(struct iscsi_tcp_conn *tcp_conn, if (r 0) { iscsi_tcp_segment_unmap(segment); - if (copied || r == -EAGAIN) - break; return r; } copied += r; @@ -276,11 +274,17 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn) while (1) { rc = iscsi_sw_tcp_xmit_segment(tcp_conn, segment); - if (rc 0) { + /* +* We may not have been able to send data because the conn +* is getting stopped. libiscsi will know so propogate err +* for it to do the right thing. +*/ + if (rc == -EAGAIN) + return rc; + else if (rc 0) { rc = ISCSI_ERR_XMIT_FAILED; goto error; - } - if (rc == 0) + } else if (rc == 0) break; consumed += rc; @@ -561,9 +565,10 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag) struct iscsi_conn *conn = cls_conn-dd_data; struct iscsi_tcp_conn *tcp_conn = conn-dd_data; struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn-dd_data; + struct socket *sock = tcp_sw_conn-sock; /* userspace may have goofed up and not bound us */ - if (!tcp_sw_conn-sock) + if (!sock) return; /* * Make sure our recv side is stopped. @@ -574,6 +579,11 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag) set_bit(ISCSI_SUSPEND_BIT, conn-suspend_rx); write_unlock_bh(tcp_sw_conn-sock-sk-sk_callback_lock); + if (sock-sk-sk_sleep waitqueue_active(sock-sk-sk_sleep)) { + sock-sk-sk_err = EIO; + wake_up_interruptible(sock-sk-sk_sleep); + } + iscsi2_conn_stop(cls_conn, flag); iscsi_sw_tcp_release_conn(conn); } diff --git a/drivers/scsi/libiscsi2.c b/drivers/scsi/libiscsi2.c index 61abdf9..262617e 100644 --- a/drivers/scsi/libiscsi2.c +++ b/drivers/scsi/libiscsi2.c @@ -2657,14 +2657,15 @@ static void iscsi_start_session_recovery(struct iscsi_session *session, session-state = ISCSI_STATE_TERMINATE; else if (conn-stop_stage != STOP_CONN_RECOVER) session-state = ISCSI_STATE_IN_RECOVERY; + + old_stop_stage = conn-stop_stage; + conn-stop_stage = flag; spin_unlock_bh(session-lock); del_timer_sync(conn-transport_timer); iscsi2_suspend_tx(conn); spin_lock_bh(session-lock); - old_stop_stage = conn-stop_stage; - conn-stop_stage = flag; conn-c_stage = ISCSI_CONN_STOPPED; spin_unlock_bh(session-lock);
Re: Failover time of iSCSI multipath devices.
I was able to get my failover time down to about 25-30 seconds: Mar 1 12:32:37 bentCluster-1 kernel: tg3: eth0: Link is down. Mar 1 12:33:03 bentCluster-1 multipathd: checker failed path 8:224 in map mpath0 Mar 1 12:33:03 bentCluster-1 kernel: end_request: I/O error, dev sdo, sector 1249431 Mar 1 12:33:03 bentCluster-1 multipathd: mpath0: remaining active paths: 1 I ended up setting: [r...@bentcluster-1 ~]# echo noop /sys/block/sdn/queue/scheduler [r...@bentcluster-1 ~]# echo noop /sys/block/sdo/queue/scheduler [r...@bentcluster-1 ~]# echo 64 /sys/block/sdn/queue/max_sectors_kb [r...@bentcluster-1 ~]# echo 64 /sys/block/sdo/queue/max_sectors_kb [r...@bentcluster-1 ~]# echo 5 /sys/block/sdn/device/timeout [r...@bentcluster-1 ~]# echo 5 /sys/block/sdo/device/timeout I couldn't get it under 90 seconds without /sys/block/sdn/device/ timeout being set and in my best test I hit 26 seconds. I have a couple questions: 1. Do I need the scsi timeout to be turned down or could I be hitting the bug Mike mentioned? 2. The patch that Mike attached to this tread, is there a Red Hat BZ associated with it so I can track its progress? If not should I open a BZ? 3. In a best case scenario what kind of failover time can I expect with multipath and iSCSI? I see about 25-30 seconds, is this accurate? I saw 3 second failover time using bonded NICs instead of dm-multipath, is there any specific reason to use multipathd instead of channel bonding? Thanks for all the help everyone! -Ben -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
On 03/01/2010 12:06 PM, bet wrote: 1. Based on my timeouts I would think that my session would time out Yes. It should timeout about 15 secs after you see Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 You might be hitting a bug where the network layer gets stuck trying to send data. I attached a patch that should fix the problem. If you do not know how to build a RHEL kernel let me know the arch you are using and I can build a kernel here (it takes about a day). after 15 seconds. Anyone have an idea why is it taking 67 seconds? Am I missing any other timeout values? No. The ones you have set are it. 2. In a perfect world what is the best case scenario for the failure of my iSCSI session? It should work like in that doc. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 5c39369..e840806 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -276,11 +276,12 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn) while (1) { rc = iscsi_sw_tcp_xmit_segment(tcp_conn, segment); - if (rc 0) { + if (rc == -EAGAIN) + return rc; + else if (rc 0) { rc = ISCSI_ERR_XMIT_FAILED; goto error; - } - if (rc == 0) + } else if (rc == 0) break; consumed += rc; @@ -561,9 +562,10 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag) struct iscsi_conn *conn = cls_conn-dd_data; struct iscsi_tcp_conn *tcp_conn = conn-dd_data; struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn-dd_data; + struct socket *sock = tcp_sw_conn-sock; /* userspace may have goofed up and not bound us */ - if (!tcp_sw_conn-sock) + if (!sock) return; /* * Make sure our recv side is stopped. @@ -574,6 +576,11 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag) set_bit(ISCSI_SUSPEND_BIT, conn-suspend_rx); write_unlock_bh(tcp_sw_conn-sock-sk-sk_callback_lock); + if (sock-sk-sk_sleep waitqueue_active(sock-sk-sk_sleep)) { + sock-sk-sk_err = EIO; + wake_up_interruptible(sock-sk-sk_sleep); + } + iscsi2_conn_stop(cls_conn, flag); iscsi_sw_tcp_release_conn(conn); }
Re: Failover time of iSCSI multipath devices.
Mike Christie wrote: On 03/01/2010 12:06 PM, bet wrote: 1. Based on my timeouts I would think that my session would time out Yes. It should timeout about 15 secs after you see Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 You might be hitting a bug where the network layer gets stuck trying to send data. I attached a patch that should fix the problem. If you do not know how to build a RHEL kernel let me know the arch you are using and I can build a kernel here (it takes about a day). after 15 seconds. Anyone have an idea why is it taking 67 seconds? Am I missing any other timeout values? No. The ones you have set are it. 2. In a perfect world what is the best case scenario for the failure of my iSCSI session? It should work like in that doc. wouldn't the abort timeout also have an effect here? or will iSCSI fail the coming abort (that the mid-layer sends when it gets an error sending a SCSI command) immediately? --guy -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.