Try modifying the pxe file according to
https://www.ibm.com/mysupport/s/article/ka150000000H5ohAAC/Problem-booting-from-local-disk-after-node-provisioning
That is:
in /tftpboot/pxelinux.cfg/adqan001
change:
to:
label xCAT
kernel chain.c32
append hd0
On 09/26/2017 04:24 PM, Saurabh Barve
wrote:
>>>>What’s the updatestatus
of
the compute node after that first successful PXE and Anaconda
run? Check
it with “lsdef adqan001”. Also what is the value of currchain
after Anaconda
runs and the machine reboots?
Here's the update status of the
node
while the OS is installing on it (I've kept only the entries
that I thought
were relevant to the question):
[root@admwqamgr ~]# lsdef
adqan001
Object name: adqan001
addkcmdline=edd=off
ipv6.disable=1
arch=x86_64
currchain=boot
currstate=install
rhels7.3-x86_64-compute
installnic=eth4
interface=eth4
ip=192.168.40.7
mac=90:E2:BA:74:90:A4
netboot=xnba
nfsdir=/install
nfsserver=admwqamgr
os=rhels7.3
postscripts=syslog,remoteshell
primarynic=eth4
profile="">
provmethod=cat-compute-rhels7.3-x86_64
status=installing
statustime=09-26-2017
08:09:13
tftpdir=/tftpboot
tftpserver=admwqamgr
updatestatus=synced
updatestatustime=09-05-2017
11:15:23
Here's the status of the node
after
OS installation completes and the machine reboots (this is where
it gets
stuck):
[root@admwqamgr ~]# lsdef
adqan001
Object name: adqan001
addkcmdline=edd=off
ipv6.disable=1
arch=x86_64
currchain=boot
currstate=boot
installnic=eth4
interface=eth4
ip=192.168.40.7
mac=90:E2:BA:74:90:A4
netboot=xnba
nfsdir=/install
nfsserver=admwqamgr
os=rhels7.3
postscripts=syslog,remoteshell
primarynic=eth4
profile="">
provmethod=cat-compute-rhels7.3-x86_64
status=booting
statustime=09-26-2017
08:17:25
tftpdir=/tftpboot
tftpserver=admwqamgr
updatestatus=synced
updatestatustime=09-05-2017
11:15:23
>>>> Are your DNS
settings
correct? Can the compute node resolve the master node in the
Anaconda shell?
Forward and reverse DNS must work.
Yes. If I manually boot the node
to
disk after the install, it boots into the OS. After logging in
to the compute
node, I can verify that forward and reverse DNS are working
fine. I also
tried specifying the IP address of the xCAT management node
instead of
its name in the 'noderes' table but got the same result.
>>>> What’s in the PXE
file
on the master node after the Anaconda run?
/tftpboot/pxelinux.cfg/adqan001
Here you go:
[root@admwqamgr ~]# cat
/tftpboot/pxelinux.cfg/adqan001
#boot
DEFAULT xCAT
LABEL xCAT
LOCALBOOT 0
>>>> At the end of the
postscripts
run the ‘updatestatus.awk’ script needs to work – that’s what
calls
back to the master node and updates the status of the node
Does this need to be run
manually? My
postscripts table is pretty simple:
[root@admwqamgr ~]# tabdump
postscripts
#node,postscripts,postbootscripts,comments,disable
"compute","syslog,remoteshell",,,
Regards,
Saurabh
From:
<russa...@comcast.net>
To:
"'xCAT Users Mailing
list'" <xcat-user@lists.sourceforge.net>
Date:
26-09-2017 18:11
Subject:
Re: [xcat-user]
RHEL-7.3 provisioning error
What’s the updatestatus of the
compute
node after that first successful PXE and Anaconda run? Check it
with “lsdef
adqan001”. Also what is the value of currchain after Anaconda
runs and
the machine reboots?
Are your DNS settings correct? Can
the
compute node resolve the master node in the Anaconda shell?
Forward and
reverse DNS must work.
What’s in the PXE file on the master
node
after the Anaconda run? /tftpboot/pxelinux.cfg/adqan001
That file is what instructs the
machine
to boot from disk.
At the end of the postscripts run
the ‘updatestatus.awk’
script needs to work – that’s what calls back to the master node
and
updates the status of the node.
From: Saurabh Barve [mailto:barve_saur...@cat.com]
Sent: Tuesday, September 26, 2017 5:15 AM
To: xcat-user@lists.sourceforge.net
Subject: [xcat-user] RHEL-7.3 provisioning error
Hi,
I'm trying to deploy RHEL-7.3 on my cluster compute nodes but
running into
problems with PXE after the node is successfully
installed.
These are the details of my xCAT management node:
[root@admwqamgr ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)
[root@admwqamgr ~]# uname -r
3.10.0-514.26.2.el7.x86_64
[root@admwqamgr ~]# lsxcatd -a
Version 2.13.4 (git commit
6ee3741498768994e4bb10d2a77c9699bcabde90, built
Tue May 16 10:03:13 EDT 2017)
This is a Management Node
cfgloc=Pg:dbname=xcatdb;host=192.168.40.4|xcatadm
dbengine=Pg
dbname=xcatdb
dbhost=192.168.40.4
dbadmin=xcatadm
My compute nodes have six interfaces:
--> 4 1GbE interfaces --- eno1 (eth0), eno2 (eth1),
eno3 (eth2),
eno4 (eth3)
--> 2 10GbE interfaces --- enp130s0f0 (eth4),
enp130s0f0 (eth5)
Two things about the network:
(i) I'm deploying the compute nodes over the "eth4" interface
(ii) There is network connectivity on both eth0 and eth4 - this
is beyond
my control
The boot order for the node as specified in the BIOS is:
enp130s0f0 (eth4)
enp130s0f0 (eth5)
eno1 (eth0)
eno2 (eth1)
eno3 (eth2)
eno4 (eth3)
HDD 1
HDD 2
HDD 3
HDD 4
The xCAT management node provides both the DHCP and DNS services
for the
cluster. I have NetworkManager running on the xCAT management
node. IPV6
is disabled on the xCAT management node.
I also want to use NetworkManager on the compute nodes.
I deploy the node using the commands:
nodeset adqan001 osimage=compute-rhels7.3-x86_64
rsetboot adqan001 net
rpower adqan001 on
The node deploys over eth4 without any problem. However, when it
reboots
after the installation, the node doesn't boot from disk. I see
the following
error messages in the logs on the xCAT management server:
+++++++++++++++++++
Sep 25 13:34:28 admwqamgr dhcpd: DHCPDISCOVER from
90:e2:ba:74:90:a4 via
enp130s0f0
Sep 25 13:34:28 admwqamgr dhcpd: DHCPOFFER on 192.168.40.7
to 90:e2:ba:74:90:a4 via enp130s0f0
Sep 25 13:34:28 admwqamgr dhcpd: DHCPREQUEST for 192.168.40.7
(192.168.40.4)
from 90:e2:ba:74:90:a4 via enp130s0f0
Sep 25 13:34:28 admwqamgr dhcpd: DHCPACK on 192.168.40.7
to 90:e2:ba:74:90:a4 via enp130s0f0
Sep 25 13:34:50 admwqamgr dhcpd: DHCPDISCOVER from
00:1e:67:8e:df:39 via
enp130s0f0
Sep 25 13:34:51 admwqamgr dhcpd: DHCPOFFER on 192.168.40.82
to 00:1e:67:8e:df:39 via enp130s0f0
Sep 25 13:34:54 admwqamgr dhcpd: DHCPREQUEST for 192.168.40.82
(192.168.40.4)
from 00:1e:67:8e:df:39 via enp130s0f0
Sep 25 13:34:54 admwqamgr dhcpd: DHCPACK on 192.168.40.82
to 00:1e:67:8e:df:39 via enp130s0f0
Sep 25 13:34:54 admwqamgr in.tftpd[7342]: RRQ from 192.168.40.82
filename xcat/xnba.kpxe
Sep 25 13:34:54 admwqamgr in.tftpd[7342]: tftp: client does not
accept
options
Sep 25 13:34:54 admwqamgr in.tftpd[7343]: RRQ from 192.168.40.82
filename xcat/xnba.kpxe
Sep 25 13:34:54 admwqamgr in.tftpd[7343]: Client 192.168.40.82
finished xcat/xnba.kpxe
Sep 25 13:34:54 admwqamgr dhcpd: DHCPDISCOVER from
00:1e:67:8e:df:39 via
enp130s0f0
Sep 25 13:34:55 admwqamgr dhcpd: DHCPOFFER on 192.168.40.80
to 00:1e:67:8e:df:39 via enp130s0f0
Sep 25 13:34:55 admwqamgr dhcpd: DHCPREQUEST for 192.168.40.80
(192.168.40.4)
from 00:1e:67:8e:df:39 via enp130s0f0
Sep 25 13:34:55 admwqamgr dhcpd: DHCPACK on 192.168.40.80
to 00:1e:67:8e:df:39 via enp130s0f0
Sep 25 13:35:31 admwqamgr dhcpd: DHCPDISCOVER from
00:1e:67:8e:df:39 via
enp130s0f0
Sep 25 13:35:31 admwqamgr dhcpd: DHCPOFFER on 192.168.40.82
to 00:1e:67:8e:df:39 via enp130s0f0
Sep 25 13:35:31 admwqamgr dhcpd: DHCPREQUEST for 192.168.40.82
(192.168.40.4)
from 00:1e:67:8e:df:39 via enp130s0f0
Sep 25 13:35:31 admwqamgr dhcpd: DHCPACK on 192.168.40.82
to 00:1e:67:8e:df:39 via enp130s0f0
Sep 25 03:35:32 192.168.40.82
(none) dhclient[1054]: XMT: Solicit on eth0, interval 4170ms.
Sep 25 13:35:33 192.168.40.82
(none) ntpd[1633]: 0.0.0.0
c61c 0c clock_step +36000.081070 s
Sep 25 13:35:33 192.168.40.82
(none) ntpd[1633]: 0.0.0.0
c614 04 freq_mode
Sep 25 13:35:34 192.168.40.82
(none) ntpd[1633]: 0.0.0.0
c618 08 no_sys_peer
Sep 25 13:35:36 192.168.40.82
(none) dhclient[1922]: DHCPDISCOVER on eth4 to 255.255.255.255
port 67 interval 7 (xid=0x3fd26f33)
Sep 25 13:35:36 192.168.40.82
(none) dhclient[1054]: XMT: Solicit on eth0, interval 8110ms.
Sep 25 13:35:39 192.168.40.82
(none) dhclient[2017]: Bound to *:546
Sep 25 13:35:39 192.168.40.82
(none) dhclient[2017]: XMT: Solicit on eth4, interval 1010ms.
Sep 25 13:35:40 192.168.40.82
(none) dhclient[2017]: XMT: Solicit on eth4, interval 1970ms.
Sep 25 13:35:41 192.168.40.82
(none) ntpd[1633]: Listen normally on 6 eth4
fe80::92e2:baff:fe74:90a4
UDP 123
+++++++++++++++++++
192.168.40.7
is the address that the DHCP server is supposed to hand out to
eth4 interface
of the node with the MAC address 90:e2:ba:74:90:a4.
But even after it gets the IP, the node doesn't boot to disk.
The eth0
interface with the MAC address 00:1e:67:8e:df:39
then tries and succeeds in getting an IP address from the DHCP
server.
The node then ends up booting in the genesis shell.
- Additional xCAT configuration
information
[root@admwqamgr ~]# tabdump noderes
#node,servicenode,netboot,tftpserver,tftpdir,nfsserver,monserver,nfsdir,installnic,primarynic,discoverynics,cmdinterface,xcatmaster,current_osimage,next_osimage,nimserver,routenames,nameservers,proxydhcp,syslog,comments,disable
"compute",,"xnba","admwqamgr","/tftpboot","admwqamgr",,"/install","eth4","eth4","eth4",,,,,,,,,,,
[root@admwqamgr ~]# tabdump mac
#node,interface,mac,comments,disable
"adqan001","eth4","90:E2:BA:74:90:A4",,
[root@admwqamgr ~]# tabdump bootparams
#node,kernel,initrd,kcmdline,addkcmdline,dhcpstatements,adddhcpstatements,comments,disable
"compute",,,,"edd=off ipv6.disable=1",,,,
I'm enabling and disabling the following services in my compute
node template:
# System services
services
--enabled="chronyd,NetworkManager,postfix,nfs,nfs-server"
--disabled="firewalld"
(1) As you can see in the 'bootparams' table above, I've
disabled IPV6
for the newly deployed node
(2) I've also edited the /opt/xcat/lib/perl/xCAT/Template.pm
and
changed the line number 1066 to add "--noipv6" to the default
Kickstart deployment parameter:
$line .= "dhcp --device=$suffix --noipv6";
(3) I've tried specifying both eth4 and enp130s0f0 in both the
'mac' and
'noderes' tables with the same result
(4) I've used both 'xnba' and 'pxe' as provisioning methods with
the same
result
(5) From the compute node installation template provided by
xCAT, I've
removed the following line as the script called here enables all
network
interfaces and disables NetworkManager
echo "Running Kickstart Post-Installation script..."
#INCLUDE:#ENV:XCATROOT#/share/xcat/install/scripts/post.xcat#
#INCLUDE:#ENV:XCATROOT#/share/xcat/install/scripts/post.rhels7#
<<<<------- Deleted this line
I can get the node to boot into PXE the first time and install
without
problems, but on the subsequent it doesn't boot to disk like it
should.
This configuration had worked very for me on RHEL-6.8, minus the
NetworkManager.
Regards,
Saurabh------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's
most
engaging tech sites, Slashdot.org! https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot&d=DQICAg&c=p0oa49nxxGtbbM2qgM-GB4r4m9OlGg-sEp8sXylY2aQ&r=y6oUgGLI3Va2WgIE-Qq68XsBjmVWtX92km0aTnDk-Go&m=DIoxyd1WtYURSpGirO0EdyK67bK3H_5sX3fX0_f0qeg&s=G5P6Lzta3_uG8XTUkU9jW7qOKfp1zSfKXDCpAEEOM3w&e=
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_xcat-2Duser&d=DQICAg&c=p0oa49nxxGtbbM2qgM-GB4r4m9OlGg-sEp8sXylY2aQ&r=y6oUgGLI3Va2WgIE-Qq68XsBjmVWtX92km0aTnDk-Go&m=DIoxyd1WtYURSpGirO0EdyK67bK3H_5sX3fX0_f0qeg&s=S5TKtQ8OXHoID3ik4KT7Dth4FF3vNNIKWGIeAQUFra4&e=
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
--
![]()
|