On Nov 11, 5:14 pm, Mike Christie wrote:
> Niels Callesøe wrote:
> > Hello group
>
> > I am running a number of HP blade servers in a C7200 enclosure.
> > Several of them have access to individual LUN's on an MSA 2012i using
> > open-iscsi. Recently, however, I have experienced unexplained hangs of
> > the servers in question and the only appearent thing they have in
> > common (beside being blade servers) is that they have access to the IP-
> > SAN.
>
> > When the servers fail, they do so in a fashion where they will still
> > respond to, for example, ping requests. But they refuse to respond to
> > higher level access, such as spawning a shell for login. This means
> > that when the error occurs, I cannot even log into the machines to
> > troubleshoot the problem (regardless of remote or local login), even
> > though the console greeting is printed readily.
>
> > My question is primarily whether this sounds like something the iscsi-
> > driver could cause and, equally importantly, how one would go about
> > troubleshooting the issue. One thing that makes it particularly
> > elusive is that I cannot seem to provoke the error state and it does
> > not occur very often (at least not while the platform is not yet in
> > full production).
>
> > Possibly relevant information follows:
>
> > OS: centos-release-5-3.el5.centos.1
> > iscsi version: iscsid version 2.0-868
> > MSA: Current Storage Controller Code Version J210P12
>
> > I can, and have started, upgrades to more recent versions of all
> > three. However, those were the versions running when the problem was
> > caused last -- and since I cannot provoke it, I have no real way of
> > knowing if version upgrades will solve the issue (unless someone in
> > this group can confirm that it will, of course).
>
> It could be iscsi. Are you using multipath and do you know if there are
> path failures when the system hangs? Is there anything in the log?
I am using multipath, I believe, as I can access either of the MSA
controllers via either of two Gbit interfaces on the blades. I'll
paste what I believe to be the relevant lines from messages below.
Other than the startup messages, as best I can tell there is nothing
else relevant in the logs. I do have logs of what happens during a
network failure, which I'll paste below also, but this failure at
least did not cause the machine to hang. I suspect that whatever
causes the hang also prevents writing to the log...
I can, of course, induce almost any kind of failure on one or both of
the links if you think that will help troubleshooting.
> If there is nothing in the log at the time of the hang, could you hook
> up a serial line? I am hoping a oops will get spit out at the time of
> the hang.
I can attach a remote console, if that will do the trick? Usually I
only open one to attempt login after something goes wrong and prevents
ssh, but I should be able to open one and just keep it there to watch
for any console dumpage. Or perhaps I am misunderstanding you?
On to the log-dumps:
>>> iscsi starting (previous) <<<
Nov 9 13:42:39 promethium kernel: Loading iSCSI transport class
v2.0-724.
Nov 9 13:42:39 promethium kernel: iscsi: registered transport (tcp)
Nov 9 13:42:39 promethium kernel: iscsi: registered transport (iser)
Nov 9 13:42:39 promethium kernel: bnx2: eth0: using MSI
Nov 9 13:42:39 promethium kernel: bnx2: eth0 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov 9 13:42:39 promethium kernel: bnx2: eth1: using MSI
Nov 9 13:42:39 promethium kernel: bnx2: eth1 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov 9 13:42:39 promethium kernel: bnx2: eth2: using MSI
Nov 9 13:42:39 promethium kernel: bnx2: eth2 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov 9 13:42:39 promethium kernel: bnx2: eth3: using MSI
Nov 9 13:42:39 promethium kernel: bnx2: eth3 NIC SerDes Link is Up,
1000 Mbps full duplex
Nov 9 13:42:39 promethium kernel: scsi0 : iSCSI Initiator over TCP/IP
Nov 9 13:42:39 promethium kernel: scsi1 : iSCSI Initiator over TCP/IP
Nov 9 13:42:39 promethium kernel: scsi2 : iSCSI Initiator over TCP/IP
Nov 9 13:42:39 promethium kernel: scsi3 : iSCSI Initiator over TCP/IP
Nov 9 13:42:39 promethium kernel: Vendor: HPModel:
MSA2012i Rev: J210
Nov 9 13:42:39 promethium kernel: Type:
Enclosure ANSI SCSI revision: 05
Nov 9 13:42:39 promethium kernel: Vendor: HPModel:
MSA2012i Rev: J210
Nov 9 13:42:39 promethium kernel: Type:
Enclosure ANSI SCSI revision: 05
Nov 9 13:42:39 promethium kernel: Vendor: HPModel:
MSA2012i Rev: J210
Nov 9 13:42:39 promethium kernel: Type:
Enclosure ANSI SCSI revision: 05
Nov 9 13:42:39 promethium kernel: Vendor: HPModel:
MSA2012i Rev: J210
Nov 9 13:42:39 promethium kernel: Type:
Enclosure ANSI SCSI revision: 05
Nov 9 13:42:39 promethium kernel: Vendor: HPModel:
MSA2012i