Hello,
I'm building a brand new HPC cluster provisionned with
xCAT-server-2.14.6 on CentOS 7.6 x86_64.
A few "infrastructure" nodes are stateful, compute will be stateless.
Stateless nodes will be switch-based discovered physical nodes.
I'm used to do just this on a previous one (older CentOS and xCAT
versions)but on a simpler setup. Here it kinda work but some logs
confuse me :
I only configured one compute node. As I was not in front of the console
and I remotely powered down/up a 4 server chassis, some errors may be
normal because coming from non configured pxe'ing hosts.
My setup
- site :
#key,value,comments,disable
"blademaxp","64",,
"domain","maestro.pasteur.fr",,
"fsptimeout","0",,
"installdir","/install",,
"ipmimaxp","64",,
"ipmiretries","3",,
"ipmitimeout","2",,
"consoleondemand","no",,
"master",",maestro-xcat.maestro.pasteur.fr",,
"nameservers","192.168.149.101,192.168.149.102",,
"maxssh","8",,
"ppcmaxp","64",,
"ppcretry","3",,
"ppctimeout","0",,
"powerinterval","0",,
"syspowerinterval","0",,
"sharedtftp","1",,
"SNsyncfiledir","/var/xcat/syncfiles",,
"nodesyncfiledir","/var/xcat/node/syncfiles",,
"tftpdir","/tftpboot",,
"xcatdport","3001",,
"xcatiport","3002",,
"xcatconfdir","/etc/xcat",,
"timezone","Europe/Paris",,
"useNmapfromMN","no",,
"enableASMI","no",,
"db2installloc","/mntdb2",,
"databaseloc","/var/lib",,
"sshbetweennodes","ALLGROUPS",,
"dnshandler","ddns",,
"vsftp","n",,
"cleanupxcatpost","no",,
"dhcplease","43200",,
"auditnosyslog","0",,
"auditskipcmds","ALL",,
"dnsinterfaces","eth0",,
"dhcpinterfaces","eth0",,
"externaldns","1",,
- no service node
- DNS is on separate hosts (provisionned with stateful images using the
same xCAT)
makedns works for forward and reverse zone
- a node I want to be switched-based discovered :
Object name: maestro-300
addkcmdline=ipv6.disable=1 biosdevname=0 net.ifnames=0
rd.driver.blacklist=nouveau nouveau.modeset=0
bmc=10.7.97.48
bmcport=0
chain=osimage=netboot-cpu-centos7.6
groups=maestro_compute,maestro_ipmi,maestro,standard,a12
ip=192.168.153.48
mgt=ipmi
netboot=xnba
nfsserver=maestro-xcat
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
switch=a12c2.dc1.pasteur.fr
switchport=37
tftpserver=maestro-xcat
I removed bmcsetup from chain to be in a simplier situation
- switches table
"sw","2c",,"<XXXX>",,,,,,,,,
note : an snmpwalk works fine against the switch. although the MIB
returns a12c2.pasteur.fr instead of a12c2.DC1.pasteur.fr (but the same
is true for the older cluster where it works just fine)
- switch is created as a node as seen in switch table
"a12c2.dc1.pasteur.fr","sw,all",,,,,,,,,,,
- noderes looks fine to me
"maestro",,"xnba","maestro-xcat",,"maestro-xcat",,,,,,,,,,,,,,,,
- chain also
"maestro_compute",,,"osimage=netboot-cpu-centos7.6",,,
- networks also
When booting, node does get an IP from the dynamic range
2019-07-12T19:31:29.349206+02:00 maestro-xcat dhcpd: DHCPDISCOVER from
ac:1f:6b:8b:65:87 via eth0
2019-07-12T19:31:30.151476+02:00 maestro-xcat dhcpd: DHCPDISCOVER from
ac:1f:6b:8b:65:8b via eth0
2019-07-12T19:31:30.349611+02:00 maestro-xcat dhcpd: DHCPOFFER on
192.168.144.6 to ac:1f:6b:8b:65:87 via eth0
2019-07-12T19:31:30.610112+02:00 maestro-xcat dhcpd: DHCPDISCOVER from
ac:1f:6b:8b:65:83 via eth0
2019-07-12T19:31:31.152223+02:00 maestro-xcat dhcpd: DHCPOFFER on
192.168.144.5 to ac:1f:6b:8b:65:8b via eth0
2019-07-12T19:31:31.391140+02:00 maestro-xcat dhcpd: DHCPREQUEST for
192.168.144.6 (192.168.148.10) from ac:1f:6b:8b:65:87 via eth0
2019-07-12T19:31:31.391172+02:00 maestro-xcat dhcpd: DHCPACK on
192.168.144.6 to ac:1f:6b:8b:65:87 via eth0
but afterward some things which I didn't manage to interpret seem wrong
in logs :
1) TFP Aborted
2019-07-12T19:31:31.395078+02:00 maestro-xcat in.tftpd[31860]: RRQ from
192.168.144.6 filename xcat/xnba.kpxe
2019-07-12T19:31:31.395188+02:00 maestro-xcat in.tftpd[31860]: Error
code 0: TFTP Aborted
2019-07-12T19:31:31.396765+02:00 maestro-xcat in.tftpd[31861]: RRQ from
192.168.144.6 filename xcat/xnba.kpxe
2019-07-12T19:31:31.400618+02:00 maestro-xcat in.tftpd[31861]: Client
192.168.144.6 finish
2) getcredentials
Jul 12 19:33:07 maestro-xcat xcat[31945]: INFO xCAT: Allowing
getcredentials x509cert
Jul 12 19:33:07 maestro-xcat xcat[31946]: ERR Received getcredentials
from , which couldn't be correlated to a node (domain mismatch?)
3) switch-based discovery seem to work for my configured node :
Jul 12 19:34:34 maestro-xcat xcat[31472]: INFO
xcat.discovery.aaadiscovery: (ac:1f:6b:8b:65:87) Got a discovery
request, attempting to discover the node...
Jul 12 19:34:34 maestro-xcat xcat[31472]: INFO xcat.discovery.blade:
(ac:1f:6b:8b:65:87) Warning: Could not find any nodes using blade-based
discovery
Jul 12 19:34:34 maestro-xcat xcat[31472]: INFO xcat.discovery.switch:
(ac:1f:6b:8b:65:87) Found node: maestro-300
Jul 12 19:34:35 maestro-xcat xcat[31472]: INFO
xcat.discovery.nodediscover: remove gocons session for
Jul 12 19:34:35 maestro-xcat xcat[31472]: INFO
xcat.discovery.nodediscover: maestro-300 has been discovered
Jul 12 19:34:35 maestro-xcat xcat[31472]: INFO
xcat.discovery.zzzdiscovery: (ac:1f:6b:8b:65:87) Successfully discovered
the node using switch discovery method.
4) malformed getpostscript
I see a lot
Jul 12 19:42:12 maestro-xcat xcat[33151]: INFO xCAT: Allowing getpostscript
Jul 12 19:42:12 maestro-xcat xcat[33152]: ERR Received malformed
getpostscript requesting, ignore it.
but I only configured postscripts for stateful nodes (my only one
maestro-300 stateless is not in the postscripts table) :
#node,postscripts,postbootscripts,comments,disable
"xcatdefaults","syslog,remoteshell,syncfiles","otherpkgs",,
"service","servicenode",,,
"maestro-sched","confignetwork -s",,,
"maestro-submit","confignetwork -s",,,
"maestro-bind0","confignetwork -s",,,
"maestro-bind1","confignetwork -s",,,
"maestro-monitor","confignetwork -s",,,
What do you think about those errors ?
For some of them, it's not easy to see it it concern my configured node
or the other server of the chassis which pxe as well
Last thing : MTMS discovery seems to be performed even when switch based
us used : am I right ?
Thanks for your help.
--
Thomas HUMMEL
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user