So are you suggesting the reason was bad hardware?
Or, is it too early to call?

Ulf Zimmermann wrote:
I have serial console setup with logging via conserver but so far no
further crash. We also swapped hardware a bit around (another 4 node
cluster with DL360g5 was working without crash for several weeks, we
swapped those 4 nodes in for the first 4 in the 6 node cluster).

-----Original Message-----
From: Sunil Mushran [mailto:[EMAIL PROTECTED]
Sent: Monday, July 30, 2007 10:21
To: Ulf Zimmermann
Cc: [email protected]
Subject: Re: [Ocfs2-users] 6 node cluster with unexplained reboots

Do you have a netconsole setup? If not, set it up. That will capture
the
real reason for the reset. Well, it typically does.

Ulf Zimmermann wrote:
We just installed a new cluster with 6 HP DL380g5, dual single port
Qlogic 24xx HBAs connected via two HP 4/16 Storageworks switches to a
3Par
S400. We are using the 3Par recommended config for the Qlogic driver
and
device-mapper-multipath giving us 4 paths to the SAN. We do see some
SCSI
errors where DM-MP is failing a path after get a 0x2000 error from the
SAN
controller, but the path gets puts back in service in less then 10
seconds.
This needs to be fixed but I don't think it is what is causing our
reboots. 2 of the nodes rebooted once while being idle (ocfs2 and
clusterware were running, no db) and one node rebooted while idle
(another
node was copying using fscat our 9i db from ocfs1 to the ocfs2 data
volume) and once while some load was put on it via the upgraded 10g
database. In all cases it is as if someone a hardware reset button. No
kernel panic (at least not one leading to a stop with visable
message), we
can get a dirty write cache for the internal cciss controller.
The only messages we get on the nodes are when the crashed node is
already in reset and it missed its ocfs2 heartbeat (set to the default
of
7), followed later by crs moving the vip.
Any hints on trouble shooting this would be appreciated.

Regards, Ulf.


--------------------------
Sent from my BlackBerry Wireless Handheld



------------------------------------------------------------------------
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to