I was able to at lease top /etc/resolv.conf from being overwritten at every reboot with the following file:
# cat /etc/NetworkManager/conf.d/90-dns-none.conf [main] dns=none I added this to the synclists and we are good about the /etc/resolv.conf isso. The conclusion is that NetworkManager was doing something wrong on /etc/resolv.conf. Although that was fixed with a hack there's consequences of it, the hostname of the machine is set as localhost.localdomain, and I don't know how to fix it. Is there any option in the node table to set the default hostname? So confignetwork can do it's job? # lsdef login Object name: login arch=x86_64 bmc=172.25.255.253 bmcpassword=calvin bmcusername=root cons=ipmi consoleenabled=1 currchain=boot currstate=install ol8.4.0-x86_64-compute groups=login,all ip=172.26.255.253 mac=2c:ea:7f:92:aa:d9 mgt=ipmi netboot=xnba nicdevices.bond0=ens1f0np0|ens1f1np1 nicdevices.bond0.1010=bond0 nichostnamesuffixes.bond0.1010=-ceph nicips.ib0=172.27.255.253 nicips.eno1=XXX.XXX.XXX.XXX nicips.bond0=172.26.255.253 nicips.bond0.1010=10.0.255.253 nicnetworks.ib0=application nicnetworks.eno1=site nicnetworks.bond0=management nicnetworks.bond0.1010=ceph nictypes.ens1f1np1=ethernet nictypes.bond0=bond nictypes.eno1=ethernet nictypes.ib0=Infiniband nictypes.bond0.1010=vlan nictypes.ens1f0np0=ethernet os=ol8.4.0 postbootscripts=otherpkgs,versatushpc/openpbs-login,versatushpc/fix-ohpc-login postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-login profile=compute provmethod=ol8.4.0-x86_64-install-login serialport=0 serialspeed=115200 status=powering-on statustime=06-15-2021 16:29:52 updatestatus=failed updatestatustime=06-15-2021 16:27:27 Thanks, Vinícius. On 14 Jun 2021, at 13:48, Vinícius Ferrão via xCAT-user <xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>> wrote: Hi Thomas, There's a pattern that I've found. When the compute node is simple enough it works, probably da data for resolv.conf is fetched directly from DHPC which should be configured correctly. The issue is around the nodes that have custom network schemes, like bonds and VLANs; it's something wrong during the confignetwork postscript. Probably due to a configuration mistake that I've made but I don't know which one. Regarding your questions: 1) It does not exist [root@ceph01-ib0 ~]# systemctl status systemd-networkd Unit systemd-networkd.service could not be found. 2) It's running [root@ceph01-ib0 ~]# systemctl status NetworkManager ● NetworkManager.service - Network Manager Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2021-06-14 13:37:20 -03; 8min ago Docs: man:NetworkManager(8) Main PID: 2028 (NetworkManager) Tasks: 3 (limit: 2464038) Memory: 11.4M CGroup: /system.slice/NetworkManager.service └─2028 /usr/sbin/NetworkManager --no-daemon 3) It does not exist: [root@ceph01-ib0 ~]# ls -l /etc/resolv.conf -rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf [root@ceph01-ib0 ~]# ls -l /run/systemd/resolv/resolv.conf ls: cannot access '/run/systemd/resolv/resolv.conf': No such file or directory Cannot find anything related to rc-manager, is this a systemd thing? 4) No it's not. [root@ceph01-ib0 ~]# ls -l /etc/resolv.conf -rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf 5) Seems default to me [root@ceph01-ib0 ~]# grep host /etc/nsswitch.conf # Valid databases are: aliases, ethers, group, gshadow, hosts, # myhostname Use systemd host names hosts: files dns myhostname That's it. It's probably something messy with confignetwork script, but not sure what. Thanks, On 14 Jun 2021, at 07:57, Thomas HUMMEL <thomas.hum...@pasteur.fr<mailto:thomas.hum...@pasteur.fr>> wrote: On 14/06/2021 07:41, Vinícius Ferrão via xCAT-user wrote: Hello, For unknown reasons nodes that I've installed with rinstall (using stateful method) didn't get the nameserver section in resolv.conf, basically leaving the node without any name resolution. Hello, assuming it is not an xCAT bug, I would look at 1) if systemd-networkd is enabled 2) if NetworkManager is enabled 3) if b) if it handles /etc/resolv.conf by looking at its conf and a) is dns= stated ? b) is /etc/resolv.conf a symlink to /run/systemd/resolv/resolv.conf ? c) is rc-manager stated ? 4) is /etc/resolv.conf a symlink to ../run/resolvconf/resolv.conf ? 5) the host line of /etc/nsswitch.conf to figure out who manages /etc/resolv.conf Hope it helps. -- Thomas HUMMEL rc-manager= As specified on the documentation https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html<https://urldefense.com/v3/__https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html__;!!JFdNOqOXpB6UZW0!91ZLw8JQX3n5Rscdto49z3zhxcPMupJEn1wtuLVOZFrMI5loio5BEgk3-82bVMwzYliuCA$>; it should be generated it nameservers and domain are provided on the site table: The resolv.conf files for the compute nodes will be created automatically using the domain and nameservers values set in the xCAT network or site definition. Both are defined but it still didn't generate it correctly. [root@headnode ~]# lsdef -t site clustersite | egrep "nameserver|forward|domain" domain=cluster.domain.tld forwarders=1.1.1.1 nameservers=172.26.255.254 I even tried adding the nameservers to the network definition, but it was a no go: [root@headnode ~]# lsdef -t network management Object name: management gateway=<xcatmaster> mask=255.255.0.0 mgtifname=bond0 mtu=1500 nameservers=172.26.255.254 net=172.26.0.0 tftpserver=<xcatmaster> Is there anything that I can do to debug this? Thanks, Vinícius. PS: Here's full data from a given node and the networks. [root@headnode ~]# lsdef ceph01 Object name: ceph01 arch=x86_64 bmc=172.25.254.1 bmcpassword=calvin bmcusername=root cons=ipmi consoleenabled=1 currchain=boot currstate=install ol8.4.0-x86_64-compute groups=ceph,all ip=172.26.254.1 mac=bc:97:e1:ea:08:b0 mgt=ipmi netboot=xnba nicdevices.bond0.123=bond0 nicdevices.bond0.1010=bond0 nicdevices.bond0=ens1f0np0|ens1f1np1 nichostnamesuffixes.bond0.1010=-ceph nichostnamesuffixes.bond0.123=-cephsync nicips.ib0=172.27.254.1 nicips.bond0=172.26.254.1 nicips.bond0.1010=10.0.10.21 nicips.bond0.123=192.168.168.21 nicnetworks.bond0.123=ceph-sync nicnetworks.ib0=application nicnetworks.bond0.1010=ceph nicnetworks.bond0=management nictypes.ib0=Infiniband nictypes.ens1f0np0=ethernet nictypes.bond0.1010=vlan nictypes.bond0=bond nictypes.ens1f1np1=ethernet nictypes.bond0.123=vlan os=ol8.4.0 postbootscripts=otherpkgs,confignics postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-ceph profile=compute provmethod=ol8.4.0-x86_64-install-ceph serialport=0 serialspeed=115200 status=booted statustime=06-14-2021 02:37:04 updatestatus=synced updatestatustime=06-14-2021 02:01:55 [root@headnode ~]# lsdef -t network application (network) ceph (network) ceph-sync (network) libvirt (network) management (network) service (network) site (network) _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net> https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/xcat-user__;!!JFdNOqOXpB6UZW0!91ZLw8JQX3n5Rscdto49z3zhxcPMupJEn1wtuLVOZFrMI5loio5BEgk3-82bVMxD4UfdFg$ _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/xcat-user _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/xcat-user
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user