I'm seeing an intermittent problem where xCat randomly fails to boot a sateless 
node, hanging up after the DHCP info has been passed to the node, but it fails 
to download initrd-stateless.gz.  I'm using xCat 2.10, but I've seen this issue 
with previous versions (2.9.x).  Our nodes (HP BL460c Gen9) are using UEFI, so 
I'm set to use xnba for our boot method rather than legacy PXE.  This ends up 
with httpd being used to serve out almost all the initial files beyond the 
first one (xnba.efi).

I've tried increasing the amount of concurrent file access via config options 
in httpd (setting MaxClients 1000 & ServerLimit 1000), but nothing has seemed 
to help so far. It also happens with even small amounts of concurrent boots (as 
low as 12 simultaneous boots).  When I get in this failed state I can simply 
power cycle the node and have it come up fine, so I don't think it's a 
configuration issue with xCat.   This issue occurs randomly and doesn't follow 
any pattern that I can tell based on nodes or switches.

I also don't think it's an issue with throughput since the xCat master is using 
two 10Gb NIC's in a bonded configuration to serve out images.

You can see where it hangs in the attached image.  I'd appreciate any help on 
tracking down what's going on here.

Thanks,

John Lopilato

------------------------------------------------------------------------------
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to