Could you run with the attached patch? It just prints out a little more
info. When we get the conn error, it will print out a message if it is
due to the target dropping the connection and it will print out stack
trace so we can see exactly what piece of code is throwing the error.
On 07/13/2010 09:33 PM, Sean S wrote:
Nothing else in the log from iscsid. No mention of a failed reconnect,
although the only log I'm really able to access post failure is dmesg.
Since I'm running a root iscsi, I couldn't get to /var/log/messages
which maybe was a little more verbose? What sort of network problems
Yeah, by default the iscsid messages go there. iscsid should be spitting
out a cannot connect $some_error_value_or_string that would help tell us
why we cannot reach the target anymore.
might cause this? The "network" in this situation is a simple gigE
switch with about 3 or 4 systems on it. The target and initiator are
on the same subnet, nothing fancy. Is there some additional debug
you'd recommend turning on? Any tips or tricks when running with a
root iscsi drive?
Not that I can think of at the iscsi layer.
Curiously, if I physically disconnect the ethernet from the initiator
while running, all I/O access is correctly paused without returning I/
O errors. If I then reconnect before the 400s is up things go back to
normal. I don't however see the "detected conn error (1011)" message
in this situation however. Not sure if that really means anything.
You should see the conn error 1011 message if
1. you have nops on and they timeout and that causes us to log that error.
2. the network layer figures out there is a problem and notifies us. It
is possible that you pull a cable and plug it back in before the network
throws an error.
3. iscsi driver or protocol error. In this case we should relogin quickly.
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.
diff -aurp open-iscsi-2.0.871.3/kernel/iscsi_tcp.c tmp/kernel/iscsi_tcp.c
--- open-iscsi-2.0.871.3/kernel/iscsi_tcp.c 2010-03-05 16:32:44.000000000
-0600
+++ tmp/kernel/iscsi_tcp.c 2010-07-13 22:28:45.000000000 -0500
@@ -141,7 +141,7 @@ static void iscsi_sw_tcp_state_change(st
if ((sk->sk_state == TCP_CLOSE_WAIT ||
sk->sk_state == TCP_CLOSE) &&
!atomic_read(&sk->sk_rmem_alloc)) {
- ISCSI_SW_TCP_DBG(conn, "iscsi_tcp_state_change: "
+ iscsi_conn_printk(KERN_ERR, conn, "iscsi_tcp_state_change: "
"TCP_CLOSE|TCP_CLOSE_WAIT\n");
iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
}
diff -aurp open-iscsi-2.0.871.3/kernel/libiscsi.c tmp/kernel/libiscsi.c
--- open-iscsi-2.0.871.3/kernel/libiscsi.c 2010-03-05 16:32:44.000000000
-0600
+++ tmp/kernel/libiscsi.c 2010-07-13 22:32:12.000000000 -0500
@@ -1174,6 +1174,8 @@ void iscsi_conn_failure(struct iscsi_con
struct iscsi_session *session = conn->session;
unsigned long flags;
+ dump_stack();
+
spin_lock_irqsave(&session->lock, flags);
if (session->state == ISCSI_STATE_FAILED) {
spin_unlock_irqrestore(&session->lock, flags);