Hello all,
After a node has been reinstalled, it won't boot its OS.
The installation works well and once it has to reboot, I can see on the remote
console (BMC) that it's kind of stuck on this message:
Trying to load: pxelinux.cfg/AC1E0002 OK
Booting from local diskā¦
PXE-M0F: Exiting Intel Boot Agent.
I can see in the nodelist table that the status of this node is "failed" :
tabdump nodelist
#node,groups,status,statustime,appstatus,appstatustime,primarysn,hidden,updatestatus,updatestatustime,comments,disable
"b002","all,worker,ipmi","failed","05-15-2014
09:31:22",,,,,"synced","05-14-2014 16:27:49",,
"b002bmc","bmc",,,,,,,,,,
The /var/log/httpd/error_log on the MN gives me that error:
[Wed May 14 16:33:57 2014] [error] [client 172.30.0.2] File does not exist:
/tftpboot/mypostscripts/mypostscript.b002
[Wed May 14 16:33:57 2014] [error] [client 172.30.0.2] File does not exist:
/install/rhels6.5/x86_64/images/updates.img
The first error has been cleared by adding the following option:
chdef -t site -o clustersite precreatemypostscripts=1
Which gave me a kickstart-like file in /tftpboot/mypostscripts.
(I don't know if that was important but at least I don't have anymore the error
message.)
Regarding the second error, I checked on our other clusters with the same xCAT
version and OS, but this file doesn't exist and this error message neither.
The funny thing is if I force the next reboot on the HD, the node boots the OS
normally (rsetboot b002 hd).
So my only clues are this error message and the state of the node written in
the nodelist table.
I cannot find any log files
To me, the /var/log/messages on the MN seems normal:
May 15 09:34:06 b01 dhcpd: DHCPDISCOVER from 00:23:ae:ee:b7:2d via em1
May 15 09:34:06 b01 dhcpd: DHCPOFFER on 172.30.0.2 to 00:23:ae:ee:b7:2d via em1
May 15 09:34:08 b01 dhcpd: DHCPREQUEST for 172.30.0.2 (172.30.0.1) from
00:23:ae:ee:b7:2d via em1
May 15 09:34:08 b01 dhcpd: DHCPACK on 172.30.0.2 to 00:23:ae:ee:b7:2d via em1
May 15 09:34:08 b01 in.tftpd[113221]: RRQ from 172.30.0.2 filename pxelinux.0
May 15 09:34:08 b01 in.tftpd[113221]: tftp: client does not accept options
May 15 09:34:08 b01 in.tftpd[113222]: RRQ from 172.30.0.2 filename pxelinux.0
May 15 09:34:08 b01 in.tftpd[113223]: RRQ from 172.30.0.2 filename
pxelinux.cfg/44454c4c-3500-104c-805a-c4c04f585831
May 15 09:34:08 b01 in.tftpd[113224]: RRQ from 172.30.0.2 filename
pxelinux.cfg/01-00-23-ae-ee-b7-2d
May 15 09:34:08 b01 in.tftpd[113225]: RRQ from 172.30.0.2 filename
pxelinux.cfg/AC1E0002
cat AC1E0002
#boot
DEFAULT xCAT
LABEL xCAT
LOCALBOOT 0
Here are some details about my configurations:
The node (b002) and the management node are a Dell C6220 .
CPU: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
OS: RHEL 6.5
xCAT: Version 2.8.3 (built Tue Nov 12 23:16:15 EST 2013)
This is a Management Node
dbengine=SQLite
xCAT tables:
Noderes:
tabdump noderes
"worker",,"pxe","172.30.0.1",,"172.30.0.1",,,"em1","em1",,,"172.30.0.1",
nodetype
"b002","rhels6.5","x86_64","cluster","bellatest_node",
Any ideas?
Thanks.
Jean-Claude
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user