Hello Jon, Thanks very much for your reply. I didn't check iDRAC at the time, but did check at the KVM console on these nodes, nothing showed up, just hang. Did try to check any hardware logs, but not much revealed in the system log.
On the other hand, I was also able to use rpower to reset the nodes to get it back on. So I suspect iDRAC is working fine. One other suspect I have is the Netgear S3300-62X switch, but cannot pin it down as some other Supermicro nodes are working fine, while these two groups of C6400 do not. Does sound like somewhere in the BIOS/firmware. Thanks for this clue, I shall look into the firmware upgrade to see if it helps. Peter From: Jon Diprose <j...@well.ox.ac.uk> Sent: 17 February 2022 09:52 To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> Cc: Chiu, Peter (STFC,RAL,RALSP) <peter.c...@stfc.ac.uk> Subject: RE: [xcat-user] DELL PowerEdge C6400 diskless compute nodes - intermittent system hang Hi Peter, I have a bunch of C6420 nodes that behave pretty well. I have had a few lock-ups in the past but they went away with BIOS and firmware updates. Do the iDRACs also lock up? If not, is there any output to the virtual console? Is there anything in the hardware event logs? Please also check the power utilisation history graph as a symptom on my nodes was that the recorded power utilisation dropped to zero several hours before the lock-up. I do firmware updates using the dell-system-update tools from a local mirror of their Linux repo ( https://linux.dell.com/repo/hardware/dsu/ ) but beware that dsu doesn't do CPLD updates - they have to be downloaded from Dell's website and applied by other means (via manual upload to the iDRAC or via OS-level installer package). Something about CPLD updates having to be applied in isolation and not apparently trusting dsu to get that right. Now that I think about it, I'm also not sure that dsu installs the chassis management firmware updates. I should go look at that. If not, those will be via the iDRAC of one of the nodes in each chassis. Jon -- Dr. Jonathan Diprose <j...@well.ox.ac.uk<mailto:j...@well.ox.ac.uk>> Tel: 01865 287873 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN ________________________________ From: Peter Chiu - STFC UKRI via xCAT-user [xcat-user@lists.sourceforge.net] Sent: 17 February 2022 08:48 To: xCAT Users Mailing list Cc: Peter Chiu - STFC UKRI Subject: [xcat-user] DELL PowerEdge C6400 diskless compute nodes - intermittent system hang Dear all, Not sure if this is the right forum to post this query, please redirect me if not so. We have recently replaced some of our diskless compute nodes with DELL PowerEdge C6400 servers. Strictly speaking, these systems do have internal disks, but they are used as working area, with a common operating system (Centos 7) image served from a master node. While all appear to be working well, but we have found these C6400 nodes getting frozen up intermittently, while at the same time, the other Supermicro based nodes are working fine. When these systems freeze, they do not respond to ip ping, or even to console logins, just hang. The only way out is to power reset the systems, then they return to normal service again. Has anyone encountered similar problems, or any idea what I have missed? Many thanks in advance. Peter Chiu STFC Rutherford Appleton Lab RAL Space UK This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user