On Nov 11, 5:14 pm, Mike Christie <micha...@cs.wisc.edu> wrote: > Niels Callesøe wrote: > > Hello group > > > I am running a number of HP blade servers in a C7200 enclosure. > > Several of them have access to individual LUN's on an MSA 2012i using > > open-iscsi. Recently, however, I have experienced unexplained hangs of > > the servers in question and the only appearent thing they have in > > common (beside being blade servers) is that they have access to the IP- > > SAN. > > > When the servers fail, they do so in a fashion where they will still > > respond to, for example, ping requests. But they refuse to respond to > > higher level access, such as spawning a shell for login. This means > > that when the error occurs, I cannot even log into the machines to > > troubleshoot the problem (regardless of remote or local login), even > > though the console greeting is printed readily. > > > My question is primarily whether this sounds like something the iscsi- > > driver could cause and, equally importantly, how one would go about > > troubleshooting the issue. One thing that makes it particularly > > elusive is that I cannot seem to provoke the error state and it does > > not occur very often (at least not while the platform is not yet in > > full production). > > > Possibly relevant information follows: > > > OS: centos-release-5-3.el5.centos.1 > > iscsi version: iscsid version 2.0-868 > > MSA: Current Storage Controller Code Version J210P12 > > > I can, and have started, upgrades to more recent versions of all > > three. However, those were the versions running when the problem was > > caused last -- and since I cannot provoke it, I have no real way of > > knowing if version upgrades will solve the issue (unless someone in > > this group can confirm that it will, of course). > > It could be iscsi. Are you using multipath and do you know if there are > path failures when the system hangs? Is there anything in the log?
I am using multipath, I believe, as I can access either of the MSA controllers via either of two Gbit interfaces on the blades. I'll paste what I believe to be the relevant lines from messages below. Other than the startup messages, as best I can tell there is nothing else relevant in the logs. I do have logs of what happens during a network failure, which I'll paste below also, but this failure at least did not cause the machine to hang. I suspect that whatever causes the hang also prevents writing to the log... I can, of course, induce almost any kind of failure on one or both of the links if you think that will help troubleshooting. > If there is nothing in the log at the time of the hang, could you hook > up a serial line? I am hoping a oops will get spit out at the time of > the hang. I can attach a remote console, if that will do the trick? Usually I only open one to attempt login after something goes wrong and prevents ssh, but I should be able to open one and just keep it there to watch for any console dumpage. Or perhaps I am misunderstanding you? On to the log-dumps: >>> iscsi starting (previous) <<< Nov 9 13:42:39 promethium kernel: Loading iSCSI transport class v2.0-724. Nov 9 13:42:39 promethium kernel: iscsi: registered transport (tcp) Nov 9 13:42:39 promethium kernel: iscsi: registered transport (iser) Nov 9 13:42:39 promethium kernel: bnx2: eth0: using MSI Nov 9 13:42:39 promethium kernel: bnx2: eth0 NIC SerDes Link is Up, 1000 Mbps full duplex Nov 9 13:42:39 promethium kernel: bnx2: eth1: using MSI Nov 9 13:42:39 promethium kernel: bnx2: eth1 NIC SerDes Link is Up, 1000 Mbps full duplex Nov 9 13:42:39 promethium kernel: bnx2: eth2: using MSI Nov 9 13:42:39 promethium kernel: bnx2: eth2 NIC SerDes Link is Up, 1000 Mbps full duplex Nov 9 13:42:39 promethium kernel: bnx2: eth3: using MSI Nov 9 13:42:39 promethium kernel: bnx2: eth3 NIC SerDes Link is Up, 1000 Mbps full duplex Nov 9 13:42:39 promethium kernel: scsi0 : iSCSI Initiator over TCP/IP Nov 9 13:42:39 promethium kernel: scsi1 : iSCSI Initiator over TCP/IP Nov 9 13:42:39 promethium kernel: scsi2 : iSCSI Initiator over TCP/IP Nov 9 13:42:39 promethium kernel: scsi3 : iSCSI Initiator over TCP/IP Nov 9 13:42:39 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 9 13:42:39 promethium kernel: Type: Enclosure ANSI SCSI revision: 05 Nov 9 13:42:39 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 9 13:42:39 promethium kernel: Type: Enclosure ANSI SCSI revision: 05 Nov 9 13:42:39 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 9 13:42:39 promethium kernel: Type: Enclosure ANSI SCSI revision: 05 Nov 9 13:42:39 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 9 13:42:39 promethium kernel: Type: Enclosure ANSI SCSI revision: 05 Nov 9 13:42:39 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 9 13:42:39 promethium kernel: Type: Direct- Access ANSI SCSI revision: 05 Nov 9 13:42:39 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 9 13:42:39 promethium kernel: Type: Direct- Access ANSI SCSI revision: 05 Nov 9 13:42:39 promethium kernel: SCSI device sda: 516087808 512-byte hdwr sectors (264237 MB) Nov 9 13:42:39 promethium kernel: sda: Write Protect is off Nov 9 13:42:39 promethium kernel: SCSI device sdb: 516087808 512-byte hdwr sectors (264237 MB) Nov 9 13:42:39 promethium kernel: SCSI device sda: drive cache: write back Nov 9 13:42:39 promethium kernel: sdb: Write Protect is off Nov 9 13:42:39 promethium kernel: SCSI device sdb: drive cache: write back Nov 9 13:42:39 promethium kernel: SCSI device sda: 516087808 512-byte hdwr sectors (264237 MB) Nov 9 13:42:39 promethium kernel: sda: Write Protect is off Nov 9 13:42:39 promethium kernel: SCSI device sdb: 516087808 512-byte hdwr sectors (264237 MB) Nov 9 13:42:39 promethium kernel: sdb: Write Protect is off Nov 9 13:42:39 promethium kernel: SCSI device sda: drive cache: write back Nov 9 13:42:39 promethium kernel: sda:<5>SCSI device sdb: drive cache: write back Nov 9 13:42:39 promethium kernel: sdb: sdb1 Nov 9 13:42:39 promethium kernel: sd 3:0:0:5: Attached scsi disk sdb Nov 9 13:42:39 promethium kernel: sda1 Nov 9 13:42:39 promethium kernel: sd 2:0:0:5: Attached scsi disk sda Nov 9 13:42:39 promethium kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 13 Nov 9 13:42:39 promethium kernel: scsi 1:0:0:0: Attached scsi generic sg1 type 13 Nov 9 13:42:39 promethium kernel: scsi 2:0:0:0: Attached scsi generic sg2 type 13 Nov 9 13:42:39 promethium kernel: scsi 3:0:0:0: Attached scsi generic sg3 type 13 Nov 9 13:42:39 promethium kernel: sd 3:0:0:5: Attached scsi generic sg4 type 0 Nov 9 13:42:39 promethium kernel: sd 2:0:0:5: Attached scsi generic sg5 type 0 >>> iscsi starting (current) <<< Nov 11 15:23:41 promethium kernel: Loading iSCSI transport class v2.0-871. Nov 11 15:23:41 promethium kernel: iscsi: registered transport (tcp) Nov 11 15:23:41 promethium kernel: iscsi: registered transport (iser) Nov 11 15:23:41 promethium kernel: bnx2: eth0: using MSI Nov 11 15:23:41 promethium kernel: bnx2: eth0 NIC SerDes Link is Up, 1000 Mbps full duplex Nov 11 15:23:41 promethium kernel: bnx2: eth1: using MSI Nov 11 15:23:41 promethium kernel: bnx2: eth1 NIC SerDes Link is Up, 1000 Mbps full duplex Nov 11 15:23:41 promethium kernel: bnx2: eth2: using MSI Nov 11 15:23:41 promethium kernel: bnx2: eth2 NIC SerDes Link is Up, 1000 Mbps full duplex Nov 11 15:23:41 promethium kernel: bnx2: eth3: using MSI Nov 11 15:23:41 promethium kernel: bnx2: eth3 NIC SerDes Link is Up, 1000 Mbps full duplex Nov 11 15:23:41 promethium kernel: scsi0 : iSCSI Initiator over TCP/IP Nov 11 15:23:41 promethium kernel: scsi1 : iSCSI Initiator over TCP/IP Nov 11 15:23:41 promethium kernel: scsi2 : iSCSI Initiator over TCP/IP Nov 11 15:23:41 promethium kernel: scsi3 : iSCSI Initiator over TCP/IP Nov 11 15:23:41 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 11 15:23:41 promethium kernel: Type: Enclosure ANSI SCSI revision: 05 Nov 11 15:23:41 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 11 15:23:41 promethium kernel: Type: Enclosure ANSI SCSI revision: 05 Nov 11 15:23:41 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 11 15:23:41 promethium kernel: Type: Enclosure ANSI SCSI revision: 05 Nov 11 15:23:41 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 11 15:23:41 promethium kernel: Type: Enclosure ANSI SCSI revision: 05 Nov 11 15:23:41 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 11 15:23:41 promethium kernel: Type: Direct- Access ANSI SCSI revision: 05 Nov 11 15:23:41 promethium kernel: Vendor: HP Model: MSA2012i Rev: J210 Nov 11 15:23:41 promethium kernel: Type: Direct- Access ANSI SCSI revision: 05 Nov 11 15:23:41 promethium kernel: SCSI device sda: 516087808 512-byte hdwr sectors (264237 MB) Nov 11 15:23:41 promethium kernel: sda: Write Protect is off Nov 11 15:23:41 promethium kernel: SCSI device sdb: 516087808 512-byte hdwr sectors (264237 MB) Nov 11 15:23:41 promethium kernel: sdb: Write Protect is off Nov 11 15:23:41 promethium kernel: SCSI device sda: drive cache: write back Nov 11 15:23:41 promethium kernel: SCSI device sda: 516087808 512-byte hdwr sectors (264237 MB) Nov 11 15:23:41 promethium kernel: SCSI device sdb: drive cache: write back Nov 11 15:23:41 promethium kernel: sda: Write Protect is off Nov 11 15:23:41 promethium kernel: SCSI device sdb: 516087808 512-byte hdwr sectors (264237 MB) Nov 11 15:23:41 promethium kernel: sdb: Write Protect is off Nov 11 15:23:41 promethium kernel: SCSI device sda: drive cache: write back Nov 11 15:23:41 promethium kernel: sda:<5>SCSI device sdb: drive cache: write back Nov 11 15:23:41 promethium kernel: sdb: sda1 Nov 11 15:23:41 promethium kernel: sd 2:0:0:5: Attached scsi disk sda Nov 11 15:23:41 promethium kernel: sdb1 Nov 11 15:23:41 promethium kernel: sd 3:0:0:5: Attached scsi disk sdb Nov 11 15:23:41 promethium kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 13 Nov 11 15:23:41 promethium kernel: scsi 1:0:0:0: Attached scsi generic sg1 type 13 Nov 11 15:23:41 promethium kernel: scsi 2:0:0:0: Attached scsi generic sg2 type 13 Nov 11 15:23:41 promethium kernel: scsi 3:0:0:0: Attached scsi generic sg3 type 13 Nov 11 15:23:41 promethium kernel: sd 2:0:0:5: Attached scsi generic sg4 type 0 Nov 11 15:23:41 promethium kernel: sd 3:0:0:5: Attached scsi generic sg5 type 0 >>> network outage result <<< Nov 10 14:42:02 promethium kernel: ping timeout of 5 secs expired, last rx 4384679620, last ping 4384684620, now 4384689620 Nov 10 14:42:02 promethium kernel: connection4:0: iscsi: detected conn error (1011) Nov 10 14:42:02 promethium kernel: ping timeout of 5 secs expired, last rx 4384679621, last ping 4384684621, now 4384689621 Nov 10 14:42:02 promethium kernel: connection1:0: iscsi: detected conn error (1011) Nov 10 14:42:03 promethium kernel: ping timeout of 5 secs expired, last rx 4384680089, last ping 4384685089, now 4384690089 Nov 10 14:42:03 promethium kernel: connection3:0: iscsi: detected conn error (1011) Nov 10 14:42:03 promethium kernel: ping timeout of 5 secs expired, last rx 4384680173, last ping 4384685173, now 4384690173 Nov 10 14:42:03 promethium kernel: connection2:0: iscsi: detected conn error (1011) Nov 10 14:42:03 promethium iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3) Nov 10 14:42:03 promethium iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Nov 10 14:42:03 promethium iscsid: Kernel reported iSCSI connection 3:0 error (1011) state (3) Nov 10 14:42:03 promethium iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Nov 10 14:42:30 promethium iscsid: received iferror -38 Nov 10 14:42:30 promethium iscsid:last message repeated 2 times Nov 10 14:42:30 promethium iscsid: connection4:0 is operational after recovery (2 attempts) Nov 10 14:42:30 promethium iscsid: received iferror -38 Nov 10 14:42:30 promethium iscsid:last message repeated 2 times Nov 10 14:42:30 promethium iscsid: connection3:0 is operational after recovery (2 attempts) Nov 10 14:42:32 promethium iscsid: received iferror -38 Nov 10 14:42:32 promethium iscsid:last message repeated 2 times Nov 10 14:42:32 promethium iscsid: connection2:0 is operational after recovery (2 attempts) Nov 10 14:42:36 promethium iscsid: received iferror -38 Nov 10 14:42:36 promethium iscsid:last message repeated 2 times Nov 10 14:42:36 promethium iscsid: connection1:0 is operational after recovery (3 attempts) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---