I was able to at lease top /etc/resolv.conf from being overwritten at every 
reboot with the following file:

# cat /etc/NetworkManager/conf.d/90-dns-none.conf
[main]
dns=none

I added this to the synclists and we are good about the /etc/resolv.conf isso. 
The conclusion is that NetworkManager was doing something wrong on 
/etc/resolv.conf.

Although that was fixed with a hack there's consequences of it, the hostname of 
the machine is set as localhost.localdomain, and I don't know how to fix it. Is 
there any option in the node table to set the default hostname? So 
confignetwork can do it's job?

# lsdef login
Object name: login
    arch=x86_64
    bmc=172.25.255.253
    bmcpassword=calvin
    bmcusername=root
    cons=ipmi
    consoleenabled=1
    currchain=boot
    currstate=install ol8.4.0-x86_64-compute
    groups=login,all
    ip=172.26.255.253
    mac=2c:ea:7f:92:aa:d9
    mgt=ipmi
    netboot=xnba
    nicdevices.bond0=ens1f0np0|ens1f1np1
    nicdevices.bond0.1010=bond0
    nichostnamesuffixes.bond0.1010=-ceph
    nicips.ib0=172.27.255.253
    nicips.eno1=XXX.XXX.XXX.XXX
    nicips.bond0=172.26.255.253
    nicips.bond0.1010=10.0.255.253
    nicnetworks.ib0=application
    nicnetworks.eno1=site
    nicnetworks.bond0=management
    nicnetworks.bond0.1010=ceph
    nictypes.ens1f1np1=ethernet
    nictypes.bond0=bond
    nictypes.eno1=ethernet
    nictypes.ib0=Infiniband
    nictypes.bond0.1010=vlan
    nictypes.ens1f0np0=ethernet
    os=ol8.4.0
    
postbootscripts=otherpkgs,versatushpc/openpbs-login,versatushpc/fix-ohpc-login
    
postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-login
    profile=compute
    provmethod=ol8.4.0-x86_64-install-login
    serialport=0
    serialspeed=115200
    status=powering-on
    statustime=06-15-2021 16:29:52
    updatestatus=failed
    updatestatustime=06-15-2021 16:27:27

Thanks,
Vinícius.

On 14 Jun 2021, at 13:48, Vinícius Ferrão via xCAT-user 
<xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>> wrote:

Hi Thomas,

There's a pattern that I've found. When the compute node is simple enough it 
works, probably da data for resolv.conf is fetched directly from DHPC which 
should be configured correctly.

The issue is around the nodes that have custom network schemes, like bonds and 
VLANs; it's something wrong during the confignetwork postscript. Probably due 
to a configuration mistake that I've made but I don't know which one.

Regarding your questions:

1) It does not exist

[root@ceph01-ib0 ~]# systemctl status systemd-networkd
Unit systemd-networkd.service could not be found.

2) It's running

[root@ceph01-ib0 ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; 
vendor preset: enabled)
   Active: active (running) since Mon 2021-06-14 13:37:20 -03; 8min ago
     Docs: man:NetworkManager(8)
 Main PID: 2028 (NetworkManager)
    Tasks: 3 (limit: 2464038)
   Memory: 11.4M
   CGroup: /system.slice/NetworkManager.service
           └─2028 /usr/sbin/NetworkManager --no-daemon

3) It does not exist:

[root@ceph01-ib0 ~]# ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf
[root@ceph01-ib0 ~]# ls -l /run/systemd/resolv/resolv.conf
ls: cannot access '/run/systemd/resolv/resolv.conf': No such file or directory

Cannot find anything related to rc-manager, is this a systemd thing?

4) No it's not.

[root@ceph01-ib0 ~]# ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf

5) Seems default to me

[root@ceph01-ib0 ~]# grep host /etc/nsswitch.conf
# Valid databases are: aliases, ethers, group, gshadow, hosts,
# myhostname Use systemd host names
hosts:      files dns myhostname

That's it.

It's probably something messy with confignetwork script, but not sure what.

Thanks,

On 14 Jun 2021, at 07:57, Thomas HUMMEL 
<thomas.hum...@pasteur.fr<mailto:thomas.hum...@pasteur.fr>> wrote:



On 14/06/2021 07:41, Vinícius Ferrão via xCAT-user wrote:
Hello,
For unknown reasons nodes that I've installed with rinstall (using stateful 
method) didn't get the nameserver section in resolv.conf, basically leaving the 
node without any name resolution.

Hello,

assuming it is not an xCAT bug, I would look at

1) if systemd-networkd is enabled
2) if NetworkManager is enabled
3) if b) if it handles /etc/resolv.conf by looking at its conf and
   a) is dns= stated ?
   b) is /etc/resolv.conf a symlink to /run/systemd/resolv/resolv.conf ?
   c) is rc-manager stated ?
4) is /etc/resolv.conf a symlink to ../run/resolvconf/resolv.conf ?
5) the host line of /etc/nsswitch.conf

to figure out who manages /etc/resolv.conf

Hope it helps.

--
Thomas HUMMEL


rc-manager=
As specified on the documentation 
https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html<https://urldefense.com/v3/__https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html__;!!JFdNOqOXpB6UZW0!91ZLw8JQX3n5Rscdto49z3zhxcPMupJEn1wtuLVOZFrMI5loio5BEgk3-82bVMwzYliuCA$>;
 it should be generated it nameservers and domain are provided on the site 
table: The resolv.conf files for the compute nodes will be created 
automatically using the domain and nameservers values set in the xCAT network 
or site definition.
Both are defined but it still didn't generate it correctly.
[root@headnode ~]# lsdef -t site clustersite | egrep "nameserver|forward|domain"
    domain=cluster.domain.tld
    forwarders=1.1.1.1
    nameservers=172.26.255.254
I even tried adding the nameservers to the network definition, but it was a no 
go:
[root@headnode ~]# lsdef -t network management
Object name: management
    gateway=<xcatmaster>
    mask=255.255.0.0
    mgtifname=bond0
    mtu=1500
    nameservers=172.26.255.254
    net=172.26.0.0
    tftpserver=<xcatmaster>
Is there anything that I can do to debug this?
Thanks,
Vinícius.
PS: Here's full data from a given node and the networks.
[root@headnode ~]# lsdef ceph01
Object name: ceph01
    arch=x86_64
    bmc=172.25.254.1
    bmcpassword=calvin
    bmcusername=root
    cons=ipmi
    consoleenabled=1
    currchain=boot
    currstate=install ol8.4.0-x86_64-compute
    groups=ceph,all
    ip=172.26.254.1
    mac=bc:97:e1:ea:08:b0
    mgt=ipmi
    netboot=xnba
    nicdevices.bond0.123=bond0
    nicdevices.bond0.1010=bond0
    nicdevices.bond0=ens1f0np0|ens1f1np1
    nichostnamesuffixes.bond0.1010=-ceph
    nichostnamesuffixes.bond0.123=-cephsync
    nicips.ib0=172.27.254.1
    nicips.bond0=172.26.254.1
    nicips.bond0.1010=10.0.10.21
    nicips.bond0.123=192.168.168.21
    nicnetworks.bond0.123=ceph-sync
    nicnetworks.ib0=application
    nicnetworks.bond0.1010=ceph
    nicnetworks.bond0=management
    nictypes.ib0=Infiniband
    nictypes.ens1f0np0=ethernet
    nictypes.bond0.1010=vlan
    nictypes.bond0=bond
    nictypes.ens1f1np1=ethernet
    nictypes.bond0.123=vlan
    os=ol8.4.0
    postbootscripts=otherpkgs,confignics
     
postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-ceph
    profile=compute
    provmethod=ol8.4.0-x86_64-install-ceph
    serialport=0
    serialspeed=115200
    status=booted
    statustime=06-14-2021 02:37:04
    updatestatus=synced
    updatestatustime=06-14-2021 02:01:55
[root@headnode ~]# lsdef -t network
application  (network)
ceph  (network)
ceph-sync  (network)
libvirt  (network)
management  (network)
service  (network)
site  (network)
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net>
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/xcat-user__;!!JFdNOqOXpB6UZW0!91ZLw8JQX3n5Rscdto49z3zhxcPMupJEn1wtuLVOZFrMI5loio5BEgk3-82bVMxD4UfdFg$


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to