On Nov 11, 5:14 pm, Mike Christie <micha...@cs.wisc.edu> wrote:
> Niels Callesøe wrote:
> > Hello group
>
> > I am running a number of HP blade servers in a C7200 enclosure.
> > Several of them have access to individual LUN's on an MSA 2012i using
> > open-iscsi. Recently, however, I have experienced unexplained hangs of
> > the servers in question and the only appearent thing they have in
> > common (beside being blade servers) is that they have access to the IP-
> > SAN.
>
> > When the servers fail, they do so in a fashion where they will still
> > respond to, for example, ping requests. But they refuse to respond to
> > higher level access, such as spawning a shell for login. This means
> > that when the error occurs, I cannot even log into the machines to
> > troubleshoot the problem (regardless of remote or local login), even
> > though the console greeting is printed readily.
>
> > My question is primarily whether this sounds like something the iscsi-
> > driver could cause and, equally importantly, how one would go about
> > troubleshooting the issue. One thing that makes it particularly
> > elusive is that I cannot seem to provoke the error state and it does
> > not occur very often (at least not while the platform is not yet in
> > full production).
>
> > Possibly relevant information follows:
>
> > OS: centos-release-5-3.el5.centos.1
> > iscsi version: iscsid version 2.0-868
> > MSA: Current Storage Controller Code Version J210P12
>
> > I can, and have started, upgrades to more recent versions of all
> > three. However, those were the versions running when the problem was
> > caused last -- and since I cannot provoke it, I have no real way of
> > knowing if version upgrades will solve the issue (unless someone in
> > this group can confirm that it will, of course).
>
> It could be iscsi. Are you using multipath and do you know if there are
> path failures when the system hangs? Is there anything in the log?

I am using multipath, I believe, as I can access either of the MSA
controllers via either of two Gbit interfaces on the blades. I'll
paste what I believe to be the relevant lines from messages below.

Other than the startup messages, as best I can tell there is nothing
else relevant in the logs. I do have logs of what happens during a
network failure, which I'll paste below also, but this failure at
least did not cause the machine to hang. I suspect that whatever
causes the hang also prevents writing to the log...

I can, of course, induce almost any kind of failure on one or both of
the links if you think that will help troubleshooting.

> If there is nothing in the log at the time of the hang, could you hook
> up a serial line? I am hoping a oops will get spit out at the time of
> the hang.

I can attach a remote console, if that will do the trick? Usually I
only open one to attempt login after something goes wrong and prevents
ssh, but I should be able to open one and just keep it there to watch
for any console dumpage. Or perhaps I am misunderstanding you?

On to the log-dumps:

>>> iscsi starting (previous) <<<
Nov  9 13:42:39 promethium kernel: Loading iSCSI transport class
v2.0-724.
Nov  9 13:42:39 promethium kernel: iscsi: registered transport (tcp)
Nov  9 13:42:39 promethium kernel: iscsi: registered transport (iser)
Nov  9 13:42:39 promethium kernel: bnx2: eth0: using MSI
Nov  9 13:42:39 promethium kernel: bnx2: eth0 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov  9 13:42:39 promethium kernel: bnx2: eth1: using MSI
Nov  9 13:42:39 promethium kernel: bnx2: eth1 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov  9 13:42:39 promethium kernel: bnx2: eth2: using MSI
Nov  9 13:42:39 promethium kernel: bnx2: eth2 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov  9 13:42:39 promethium kernel: bnx2: eth3: using MSI
Nov  9 13:42:39 promethium kernel: bnx2: eth3 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov  9 13:42:39 promethium kernel: scsi0 : iSCSI Initiator over TCP/IP
Nov  9 13:42:39 promethium kernel: scsi1 : iSCSI Initiator over TCP/IP
Nov  9 13:42:39 promethium kernel: scsi2 : iSCSI Initiator over TCP/IP
Nov  9 13:42:39 promethium kernel: scsi3 : iSCSI Initiator over TCP/IP
Nov  9 13:42:39 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov  9 13:42:39 promethium kernel:   Type:
Enclosure                          ANSI SCSI revision: 05
Nov  9 13:42:39 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov  9 13:42:39 promethium kernel:   Type:
Enclosure                          ANSI SCSI revision: 05
Nov  9 13:42:39 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov  9 13:42:39 promethium kernel:   Type:
Enclosure                          ANSI SCSI revision: 05
Nov  9 13:42:39 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov  9 13:42:39 promethium kernel:   Type:
Enclosure                          ANSI SCSI revision: 05
Nov  9 13:42:39 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov  9 13:42:39 promethium kernel:   Type:   Direct-
Access                      ANSI SCSI revision: 05
Nov  9 13:42:39 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov  9 13:42:39 promethium kernel:   Type:   Direct-
Access                      ANSI SCSI revision: 05
Nov  9 13:42:39 promethium kernel: SCSI device sda: 516087808 512-byte
hdwr sectors (264237 MB)
Nov  9 13:42:39 promethium kernel: sda: Write Protect is off
Nov  9 13:42:39 promethium kernel: SCSI device sdb: 516087808 512-byte
hdwr sectors (264237 MB)
Nov  9 13:42:39 promethium kernel: SCSI device sda: drive cache: write
back
Nov  9 13:42:39 promethium kernel: sdb: Write Protect is off
Nov  9 13:42:39 promethium kernel: SCSI device sdb: drive cache: write
back
Nov  9 13:42:39 promethium kernel: SCSI device sda: 516087808 512-byte
hdwr sectors (264237 MB)
Nov  9 13:42:39 promethium kernel: sda: Write Protect is off
Nov  9 13:42:39 promethium kernel: SCSI device sdb: 516087808 512-byte
hdwr sectors (264237 MB)
Nov  9 13:42:39 promethium kernel: sdb: Write Protect is off
Nov  9 13:42:39 promethium kernel: SCSI device sda: drive cache: write
back
Nov  9 13:42:39 promethium kernel:  sda:<5>SCSI device sdb: drive
cache: write back
Nov  9 13:42:39 promethium kernel:  sdb: sdb1
Nov  9 13:42:39 promethium kernel: sd 3:0:0:5: Attached scsi disk sdb
Nov  9 13:42:39 promethium kernel:  sda1
Nov  9 13:42:39 promethium kernel: sd 2:0:0:5: Attached scsi disk sda
Nov  9 13:42:39 promethium kernel: scsi 0:0:0:0: Attached scsi generic
sg0 type 13
Nov  9 13:42:39 promethium kernel: scsi 1:0:0:0: Attached scsi generic
sg1 type 13
Nov  9 13:42:39 promethium kernel: scsi 2:0:0:0: Attached scsi generic
sg2 type 13
Nov  9 13:42:39 promethium kernel: scsi 3:0:0:0: Attached scsi generic
sg3 type 13
Nov  9 13:42:39 promethium kernel: sd 3:0:0:5: Attached scsi generic
sg4 type 0
Nov  9 13:42:39 promethium kernel: sd 2:0:0:5: Attached scsi generic
sg5 type 0

>>> iscsi starting (current) <<<
Nov 11 15:23:41 promethium kernel: Loading iSCSI transport class
v2.0-871.
Nov 11 15:23:41 promethium kernel: iscsi: registered transport (tcp)
Nov 11 15:23:41 promethium kernel: iscsi: registered transport (iser)
Nov 11 15:23:41 promethium kernel: bnx2: eth0: using MSI
Nov 11 15:23:41 promethium kernel: bnx2: eth0 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov 11 15:23:41 promethium kernel: bnx2: eth1: using MSI
Nov 11 15:23:41 promethium kernel: bnx2: eth1 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov 11 15:23:41 promethium kernel: bnx2: eth2: using MSI
Nov 11 15:23:41 promethium kernel: bnx2: eth2 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov 11 15:23:41 promethium kernel: bnx2: eth3: using MSI
Nov 11 15:23:41 promethium kernel: bnx2: eth3 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov 11 15:23:41 promethium kernel: scsi0 : iSCSI Initiator over TCP/IP
Nov 11 15:23:41 promethium kernel: scsi1 : iSCSI Initiator over TCP/IP
Nov 11 15:23:41 promethium kernel: scsi2 : iSCSI Initiator over TCP/IP
Nov 11 15:23:41 promethium kernel: scsi3 : iSCSI Initiator over TCP/IP
Nov 11 15:23:41 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov 11 15:23:41 promethium kernel:   Type:
Enclosure                          ANSI SCSI revision: 05
Nov 11 15:23:41 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov 11 15:23:41 promethium kernel:   Type:
Enclosure                          ANSI SCSI revision: 05
Nov 11 15:23:41 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov 11 15:23:41 promethium kernel:   Type:
Enclosure                          ANSI SCSI revision: 05
Nov 11 15:23:41 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov 11 15:23:41 promethium kernel:   Type:
Enclosure                          ANSI SCSI revision: 05
Nov 11 15:23:41 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov 11 15:23:41 promethium kernel:   Type:   Direct-
Access                      ANSI SCSI revision: 05
Nov 11 15:23:41 promethium kernel:   Vendor: HP        Model:
MSA2012i          Rev: J210
Nov 11 15:23:41 promethium kernel:   Type:   Direct-
Access                      ANSI SCSI revision: 05
Nov 11 15:23:41 promethium kernel: SCSI device sda: 516087808 512-byte
hdwr sectors (264237 MB)
Nov 11 15:23:41 promethium kernel: sda: Write Protect is off
Nov 11 15:23:41 promethium kernel: SCSI device sdb: 516087808 512-byte
hdwr sectors (264237 MB)
Nov 11 15:23:41 promethium kernel: sdb: Write Protect is off
Nov 11 15:23:41 promethium kernel: SCSI device sda: drive cache: write
back
Nov 11 15:23:41 promethium kernel: SCSI device sda: 516087808 512-byte
hdwr sectors (264237 MB)
Nov 11 15:23:41 promethium kernel: SCSI device sdb: drive cache: write
back
Nov 11 15:23:41 promethium kernel: sda: Write Protect is off
Nov 11 15:23:41 promethium kernel: SCSI device sdb: 516087808 512-byte
hdwr sectors (264237 MB)
Nov 11 15:23:41 promethium kernel: sdb: Write Protect is off
Nov 11 15:23:41 promethium kernel: SCSI device sda: drive cache: write
back
Nov 11 15:23:41 promethium kernel:  sda:<5>SCSI device sdb: drive
cache: write back
Nov 11 15:23:41 promethium kernel:  sdb: sda1
Nov 11 15:23:41 promethium kernel: sd 2:0:0:5: Attached scsi disk sda
Nov 11 15:23:41 promethium kernel:  sdb1
Nov 11 15:23:41 promethium kernel: sd 3:0:0:5: Attached scsi disk sdb
Nov 11 15:23:41 promethium kernel: scsi 0:0:0:0: Attached scsi generic
sg0 type 13
Nov 11 15:23:41 promethium kernel: scsi 1:0:0:0: Attached scsi generic
sg1 type 13
Nov 11 15:23:41 promethium kernel: scsi 2:0:0:0: Attached scsi generic
sg2 type 13
Nov 11 15:23:41 promethium kernel: scsi 3:0:0:0: Attached scsi generic
sg3 type 13
Nov 11 15:23:41 promethium kernel: sd 2:0:0:5: Attached scsi generic
sg4 type 0
Nov 11 15:23:41 promethium kernel: sd 3:0:0:5: Attached scsi generic
sg5 type 0

>>> network outage result <<<
Nov 10 14:42:02 promethium kernel: ping timeout of 5 secs expired,
last rx 4384679620, last ping 4384684620, now 4384689620
Nov 10 14:42:02 promethium kernel:  connection4:0: iscsi: detected
conn error (1011)
Nov 10 14:42:02 promethium kernel: ping timeout of 5 secs expired,
last rx 4384679621, last ping 4384684621, now 4384689621
Nov 10 14:42:02 promethium kernel:  connection1:0: iscsi: detected
conn error (1011)
Nov 10 14:42:03 promethium kernel: ping timeout of 5 secs expired,
last rx 4384680089, last ping 4384685089, now 4384690089
Nov 10 14:42:03 promethium kernel:  connection3:0: iscsi: detected
conn error (1011)
Nov 10 14:42:03 promethium kernel: ping timeout of 5 secs expired,
last rx 4384680173, last ping 4384685173, now 4384690173
Nov 10 14:42:03 promethium kernel:  connection2:0: iscsi: detected
conn error (1011)
Nov 10 14:42:03 promethium iscsid: Kernel reported iSCSI connection
4:0 error (1011) state (3)
Nov 10 14:42:03 promethium iscsid: Kernel reported iSCSI connection
1:0 error (1011) state (3)
Nov 10 14:42:03 promethium iscsid: Kernel reported iSCSI connection
3:0 error (1011) state (3)
Nov 10 14:42:03 promethium iscsid: Kernel reported iSCSI connection
2:0 error (1011) state (3)
Nov 10 14:42:30 promethium iscsid: received iferror -38
Nov 10 14:42:30 promethium iscsid:last message repeated 2 times
Nov 10 14:42:30 promethium iscsid: connection4:0 is operational after
recovery (2 attempts)
Nov 10 14:42:30 promethium iscsid: received iferror -38
Nov 10 14:42:30 promethium iscsid:last message repeated 2 times
Nov 10 14:42:30 promethium iscsid: connection3:0 is operational after
recovery (2 attempts)
Nov 10 14:42:32 promethium iscsid: received iferror -38
Nov 10 14:42:32 promethium iscsid:last message repeated 2 times
Nov 10 14:42:32 promethium iscsid: connection2:0 is operational after
recovery (2 attempts)
Nov 10 14:42:36 promethium iscsid: received iferror -38
Nov 10 14:42:36 promethium iscsid:last message repeated 2 times
Nov 10 14:42:36 promethium iscsid: connection1:0 is operational after
recovery (3 attempts)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to