Hello,

We have some issues deploying compute nodes on a CentOS 8.1 cluster with xCAT 
2.16.

First, OS installation takes a lot of time, in the xcat.log file there are a 
lot of « updateflag.awk : retrying flag update » messages after the 
confignetwork postscript execution. They last for about 17 minutes. 

Then, there are also issues when attempting to deploy an osimage to the compute 
nodes. Some attempts fail to happen, the node will just boot from hard drive 
without reason. When installing multiple nodes at once, a lot of them will fail 
to boot the installer environment, and will instead boot to emergency mode. 
This issue is resolved by rebooting the faulty nodes until the installation is 
successful.

Does anyone know what could be the cause for these issues ? It looks like there 
is a network problem here but I’m not sure where to look at...

By the way, the compute nodes are configured to have bonded interfaces on the 
main internal network that is also used for OS deployment. Are there any 
considerations or special configuration to be done in that case ? The bonding 
is successfully configured and fonctional at boot.


Best regards,

Antoine Huette
HPC engineer 
Bechtle 
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to