Hello Jon,

Thanks very much for your reply.
I didn't check iDRAC at the time, but did check at the KVM console on these 
nodes,
nothing showed up, just hang.  Did try to check any hardware logs, but not much 
revealed in the system log.

On the other hand, I was also able to use rpower to reset the nodes to get it 
back on.
So I suspect iDRAC is working fine.

One other suspect I have is the Netgear S3300-62X switch, but cannot pin it 
down as some other Supermicro nodes are working fine, while these two groups of 
C6400 do not. Does sound like somewhere in the BIOS/firmware.

Thanks for this clue, I shall look into the firmware upgrade to see if it helps.

Peter
From: Jon Diprose <j...@well.ox.ac.uk>
Sent: 17 February 2022 09:52
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Cc: Chiu, Peter (STFC,RAL,RALSP) <peter.c...@stfc.ac.uk>
Subject: RE: [xcat-user] DELL PowerEdge C6400 diskless compute nodes - 
intermittent system hang

Hi Peter,
I have a bunch of C6420 nodes that behave pretty well. I have had a few 
lock-ups in the past but they went away with BIOS and firmware updates.
Do the iDRACs also lock up? If not, is there any output to the virtual console? 
Is there anything in the hardware event logs? Please also check the power 
utilisation history graph as a symptom on my nodes was that the recorded power 
utilisation dropped to zero several hours before the lock-up.
I do firmware updates using the dell-system-update tools from a local mirror of 
their Linux repo ( https://linux.dell.com/repo/hardware/dsu/ ) but beware that 
dsu doesn't do CPLD updates - they have to be downloaded from Dell's website 
and applied by other means (via manual upload to the iDRAC or via OS-level 
installer package). Something about CPLD updates having to be applied in 
isolation and not apparently trusting dsu to get that right.
Now that I think about it, I'm also not sure that dsu installs the chassis 
management firmware updates. I should go look at that. If not, those will be 
via the iDRAC of one of the nodes in each chassis.
Jon

--
Dr. Jonathan Diprose <j...@well.ox.ac.uk<mailto:j...@well.ox.ac.uk>>            
 Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN
________________________________
From: Peter Chiu - STFC UKRI via xCAT-user [xcat-user@lists.sourceforge.net]
Sent: 17 February 2022 08:48
To: xCAT Users Mailing list
Cc: Peter Chiu - STFC UKRI
Subject: [xcat-user] DELL PowerEdge C6400 diskless compute nodes - intermittent 
system hang
Dear all,
Not sure if this is the right forum to post this query, please redirect me if 
not so.
We have recently replaced some of our diskless compute nodes with DELL 
PowerEdge C6400 servers.  Strictly speaking, these systems do have internal 
disks, but they are used as working area, with a common operating system 
(Centos 7) image served from a master node.
While all appear to be working well, but we have found these C6400 nodes 
getting frozen up intermittently, while at the same time, the other Supermicro 
based nodes are working fine.
When these systems freeze, they do not respond to ip ping, or even to console 
logins, just hang.  The only way out is to power reset the systems, then they 
return to normal service again.
Has anyone encountered similar problems, or any idea what I have missed?
Many thanks in advance.
Peter Chiu
STFC Rutherford Appleton Lab
RAL Space
UK

This email and any attachments are intended solely for the use of the named 
recipients. If you are not the intended recipient you must not use, disclose, 
copy or distribute this email or any of its attachments and should notify the 
sender immediately and delete this email from your system. UK Research and 
Innovation (UKRI) has taken every reasonable precaution to minimise risk of 
this email or any attachments containing viruses or malware but the recipient 
should carry out its own virus and malware checks before opening the 
attachments. UKRI does not accept any liability for any losses or damages which 
the recipient may sustain due to presence of any viruses.
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to