>>>>What?s the updatestatus of the compute node after that first 
successful PXE and Anaconda run? Check it with ?lsdef adqan001?. Also what 
is the value of currchain after Anaconda runs and the machine reboots?


Here's the update status of the node while the OS is installing on it 
(I've kept only the entries that I thought were relevant to the question):

[root@admwqamgr ~]# lsdef adqan001
Object name: adqan001
    addkcmdline=edd=off ipv6.disable=1
    arch=x86_64
    currchain=boot
    currstate=install rhels7.3-x86_64-compute
    installnic=eth4
    interface=eth4
    ip=192.168.40.7
    mac=90:E2:BA:74:90:A4
    netboot=xnba
    nfsdir=/install
    nfsserver=admwqamgr
    os=rhels7.3
    postscripts=syslog,remoteshell
    primarynic=eth4
    profile=compute
    provmethod=cat-compute-rhels7.3-x86_64
    status=installing
    statustime=09-26-2017 08:09:13
    tftpdir=/tftpboot
    tftpserver=admwqamgr
    updatestatus=synced
    updatestatustime=09-05-2017 11:15:23


Here's the status of the node after OS installation completes and the 
machine reboots (this is where it gets stuck):

[root@admwqamgr ~]# lsdef adqan001
Object name: adqan001
    addkcmdline=edd=off ipv6.disable=1
    arch=x86_64
    currchain=boot
    currstate=boot
    installnic=eth4
    interface=eth4
    ip=192.168.40.7
    mac=90:E2:BA:74:90:A4
    netboot=xnba
    nfsdir=/install
    nfsserver=admwqamgr
    os=rhels7.3
    postscripts=syslog,remoteshell
    primarynic=eth4
    profile=compute
    provmethod=cat-compute-rhels7.3-x86_64
    status=booting
    statustime=09-26-2017 08:17:25
    tftpdir=/tftpboot
    tftpserver=admwqamgr
    updatestatus=synced
    updatestatustime=09-05-2017 11:15:23


>>>> Are your DNS settings correct? Can the compute node resolve the 
master node in the Anaconda shell? Forward and reverse DNS must work.

Yes. If I manually boot the node to disk after the install, it boots into 
the OS. After logging in to the compute node, I can verify that forward 
and reverse DNS are working fine. I also tried specifying the IP address 
of the xCAT management node instead of its name in the 'noderes' table but 
got the same result.


>>>> What?s in the PXE file on the master node after the Anaconda run? 
/tftpboot/pxelinux.cfg/adqan001

Here you go:

[root@admwqamgr ~]# cat /tftpboot/pxelinux.cfg/adqan001
#boot
DEFAULT xCAT
LABEL xCAT
LOCALBOOT 0

>>>> At the end of the postscripts run the ?updatestatus.awk? script needs 
to work ? that?s what calls back to the master node and updates the status 
of the node

Does this need to be run manually? My postscripts table is pretty simple:

[root@admwqamgr ~]# tabdump postscripts
#node,postscripts,postbootscripts,comments,disable
"compute","syslog,remoteshell",,,


Regards,
Saurabh



From:   <russa...@comcast.net>
To:     "'xCAT Users Mailing list'" <xcat-user@lists.sourceforge.net>
Date:   26-09-2017 18:11
Subject:        Re: [xcat-user] RHEL-7.3 provisioning error



What?s the updatestatus of the compute node after that first successful 
PXE and Anaconda run? Check it with ?lsdef adqan001?. Also what is the 
value of currchain after Anaconda runs and the machine reboots?
 
Are your DNS settings correct? Can the compute node resolve the master 
node in the Anaconda shell? Forward and reverse DNS must work.
 
What?s in the PXE file on the master node after the Anaconda run? 
/tftpboot/pxelinux.cfg/adqan001
That file is what instructs the machine to boot from disk.
 
At the end of the postscripts run the ?updatestatus.awk? script needs to 
work ? that?s what calls back to the master node and updates the status of 
the node.
 
 
From: Saurabh Barve [mailto:barve_saur...@cat.com] 
Sent: Tuesday, September 26, 2017 5:15 AM
To: xcat-user@lists.sourceforge.net
Subject: [xcat-user] RHEL-7.3 provisioning error
 
Hi, 

I'm trying to deploy RHEL-7.3 on my cluster compute nodes but running into 
problems with PXE after  the node is successfully installed. 
Overview

These are the details of my xCAT management node: 

[root@admwqamgr ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.3 (Maipo) 
[root@admwqamgr ~]# uname -r 
3.10.0-514.26.2.el7.x86_64 

[root@admwqamgr ~]# lsxcatd -a 
Version 2.13.4 (git commit 6ee3741498768994e4bb10d2a77c9699bcabde90, built 
Tue May 16 10:03:13 EDT 2017) 
This is a Management Node 
cfgloc=Pg:dbname=xcatdb;host=192.168.40.4|xcatadm 
dbengine=Pg 
dbname=xcatdb 
dbhost=192.168.40.4 
dbadmin=xcatadm 


My compute nodes have six interfaces: 
--> 4 1GbE interfaces --- eno1 (eth0), eno2 (eth1), eno3 (eth2), eno4 
(eth3) 
--> 2 10GbE interfaces --- enp130s0f0 (eth4), enp130s0f0 (eth5) 


Two things about the network: 
(i) I'm deploying the compute nodes over the "eth4" interface 
(ii) There is network connectivity on both eth0 and eth4 - this is beyond 
my control 

The boot order for the node as specified in the BIOS is:
enp130s0f0 (eth4) 
enp130s0f0 (eth5) 
eno1 (eth0) 
eno2 (eth1) 
eno3 (eth2) 
eno4 (eth3) 
HDD 1 
HDD 2 
HDD 3 
HDD 4 

The xCAT management node provides both the DHCP and DNS services for the 
cluster. I have NetworkManager running on the xCAT management node. IPV6 
is disabled on the xCAT management node. 

I also want to use NetworkManager on the compute nodes. 
Problem

I deploy the node using the commands: 

nodeset adqan001 osimage=compute-rhels7.3-x86_64 
rsetboot adqan001 net 
rpower adqan001 on 

The node deploys over eth4 without any problem. However, when it reboots 
after the installation, the node doesn't boot from disk. I see the 
following error messages in the logs on the xCAT management server:


+++++++++++++++++++ 
Sep 25 13:34:28 admwqamgr dhcpd: DHCPDISCOVER from 90:e2:ba:74:90:a4 via 
enp130s0f0 
Sep 25 13:34:28 admwqamgr dhcpd: DHCPOFFER on 192.168.40.7 to 
90:e2:ba:74:90:a4 via enp130s0f0 
Sep 25 13:34:28 admwqamgr dhcpd: DHCPREQUEST for 192.168.40.7 (
192.168.40.4) from 90:e2:ba:74:90:a4 via enp130s0f0 
Sep 25 13:34:28 admwqamgr dhcpd: DHCPACK on 192.168.40.7 to 
90:e2:ba:74:90:a4 via enp130s0f0 
Sep 25 13:34:50 admwqamgr dhcpd: DHCPDISCOVER from 00:1e:67:8e:df:39 via 
enp130s0f0 
Sep 25 13:34:51 admwqamgr dhcpd: DHCPOFFER on 192.168.40.82 to 
00:1e:67:8e:df:39 via enp130s0f0 
Sep 25 13:34:54 admwqamgr dhcpd: DHCPREQUEST for 192.168.40.82 (
192.168.40.4) from 00:1e:67:8e:df:39 via enp130s0f0 
Sep 25 13:34:54 admwqamgr dhcpd: DHCPACK on 192.168.40.82 to 
00:1e:67:8e:df:39 via enp130s0f0 
Sep 25 13:34:54 admwqamgr in.tftpd[7342]: RRQ from 192.168.40.82 filename 
xcat/xnba.kpxe 
Sep 25 13:34:54 admwqamgr in.tftpd[7342]: tftp: client does not accept 
options 
Sep 25 13:34:54 admwqamgr in.tftpd[7343]: RRQ from 192.168.40.82 filename 
xcat/xnba.kpxe 
Sep 25 13:34:54 admwqamgr in.tftpd[7343]: Client 192.168.40.82 finished 
xcat/xnba.kpxe 
Sep 25 13:34:54 admwqamgr dhcpd: DHCPDISCOVER from 00:1e:67:8e:df:39 via 
enp130s0f0 
Sep 25 13:34:55 admwqamgr dhcpd: DHCPOFFER on 192.168.40.80 to 
00:1e:67:8e:df:39 via enp130s0f0 
Sep 25 13:34:55 admwqamgr dhcpd: DHCPREQUEST for 192.168.40.80 (
192.168.40.4) from 00:1e:67:8e:df:39 via enp130s0f0 
Sep 25 13:34:55 admwqamgr dhcpd: DHCPACK on 192.168.40.80 to 
00:1e:67:8e:df:39 via enp130s0f0 
Sep 25 13:35:31 admwqamgr dhcpd: DHCPDISCOVER from 00:1e:67:8e:df:39 via 
enp130s0f0 
Sep 25 13:35:31 admwqamgr dhcpd: DHCPOFFER on 192.168.40.82 to 
00:1e:67:8e:df:39 via enp130s0f0 
Sep 25 13:35:31 admwqamgr dhcpd: DHCPREQUEST for 192.168.40.82 (
192.168.40.4) from 00:1e:67:8e:df:39 via enp130s0f0 
Sep 25 13:35:31 admwqamgr dhcpd: DHCPACK on 192.168.40.82 to 
00:1e:67:8e:df:39 via enp130s0f0 
Sep 25 03:35:32 192.168.40.82 (none) dhclient[1054]: XMT: Solicit on eth0, 
interval 4170ms. 
Sep 25 13:35:33 192.168.40.82 (none) ntpd[1633]: 0.0.0.0 c61c 0c 
clock_step +36000.081070 s 
Sep 25 13:35:33 192.168.40.82 (none) ntpd[1633]: 0.0.0.0 c614 04 freq_mode 

Sep 25 13:35:34 192.168.40.82 (none) ntpd[1633]: 0.0.0.0 c618 08 
no_sys_peer 
Sep 25 13:35:36 192.168.40.82 (none) dhclient[1922]: DHCPDISCOVER on eth4 
to 255.255.255.255 port 67 interval 7 (xid=0x3fd26f33) 
Sep 25 13:35:36 192.168.40.82 (none) dhclient[1054]: XMT: Solicit on eth0, 
interval 8110ms. 
Sep 25 13:35:39 192.168.40.82 (none) dhclient[2017]: Bound to *:546 
Sep 25 13:35:39 192.168.40.82 (none) dhclient[2017]: XMT: Solicit on eth4, 
interval 1010ms. 
Sep 25 13:35:40 192.168.40.82 (none) dhclient[2017]: XMT: Solicit on eth4, 
interval 1970ms. 
Sep 25 13:35:41 192.168.40.82 (none) ntpd[1633]: Listen normally on 6 eth4 
fe80::92e2:baff:fe74:90a4 UDP 123 
+++++++++++++++++++ 

192.168.40.7 is the address that the DHCP server is supposed to hand out 
to eth4 interface of the node with the MAC address 90:e2:ba:74:90:a4. But 
even after it gets the IP, the node doesn't boot to disk. The eth0 
interface with the MAC address 00:1e:67:8e:df:39 then tries and succeeds 
in getting an IP address from the DHCP server. The node then ends up 
booting in the genesis shell. 
Additional xCAT configuration information


[root@admwqamgr ~]# tabdump noderes 
#node,servicenode,netboot,tftpserver,tftpdir,nfsserver,monserver,nfsdir,installnic,primarynic,discoverynics,cmdinterface,xcatmaster,current_osimage,next_osimage,nimserver,routenames,nameservers,proxydhcp,syslog,comments,disable
 

"compute",,"xnba","admwqamgr","/tftpboot","admwqamgr",,"/install","eth4","eth4","eth4",,,,,,,,,,,
 



[root@admwqamgr ~]# tabdump mac 
#node,interface,mac,comments,disable 
"adqan001","eth4","90:E2:BA:74:90:A4",, 


[root@admwqamgr ~]# tabdump bootparams 
#node,kernel,initrd,kcmdline,addkcmdline,dhcpstatements,adddhcpstatements,comments,disable
 

"compute",,,,"edd=off ipv6.disable=1",,,, 

I'm enabling and disabling the following services in my compute node 
template:

# System services 
services --enabled="chronyd,NetworkManager,postfix,nfs,nfs-server" 
--disabled="firewalld" 
What I've tried so far

(1) As you can see in the 'bootparams' table above, I've disabled IPV6 for 
the newly deployed node 
(2) I've also edited the /opt/xcat/lib/perl/xCAT/Template.pm and changed 
the line number 1066 to add "--noipv6" to the default Kickstart deployment 
parameter: 
$line .= "dhcp --device=$suffix --noipv6"; 
(3) I've tried specifying both eth4 and enp130s0f0 in both the 'mac' and 
'noderes' tables with the same result 
(4) I've used both 'xnba' and 'pxe' as provisioning methods with the same 
result 
(5) From the compute node installation template provided by xCAT, I've 
removed the following line as the script called here enables all network 
interfaces and disables NetworkManager 
echo "Running Kickstart Post-Installation script..." 
#INCLUDE:#ENV:XCATROOT#/share/xcat/install/scripts/post.xcat# 
#INCLUDE:#ENV:XCATROOT#/share/xcat/install/scripts/post.rhels7# 
<<<<------- Deleted this line 


I can get the node to boot into PXE the first time and install without 
problems, but on the subsequent it doesn't boot to disk like it should. 
This configuration had worked very for me on RHEL-6.8, minus the 
NetworkManager. 

Regards, 
Saurabh
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot&d=DQICAg&c=p0oa49nxxGtbbM2qgM-GB4r4m9OlGg-sEp8sXylY2aQ&r=y6oUgGLI3Va2WgIE-Qq68XsBjmVWtX92km0aTnDk-Go&m=DIoxyd1WtYURSpGirO0EdyK67bK3H_5sX3fX0_f0qeg&s=G5P6Lzta3_uG8XTUkU9jW7qOKfp1zSfKXDCpAEEOM3w&e=
 
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_xcat-2Duser&d=DQICAg&c=p0oa49nxxGtbbM2qgM-GB4r4m9OlGg-sEp8sXylY2aQ&r=y6oUgGLI3Va2WgIE-Qq68XsBjmVWtX92km0aTnDk-Go&m=DIoxyd1WtYURSpGirO0EdyK67bK3H_5sX3fX0_f0qeg&s=S5TKtQ8OXHoID3ik4KT7Dth4FF3vNNIKWGIeAQUFra4&e=
 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to