Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-18 Thread Vinícius Ferrão via xCAT-user
It wont hurt to sync /etc/resolv.conf. Great idea. Will do it.

On 18 Jun 2021, at 05:26, Jon Diprose 
mailto:j...@well.ox.ac.uk>> wrote:

Sorry, I should have also said that I have /etc/resolv.conf on my syncfiles 
list, so all that setup was to try to get xcat to give me ifcfg files that 
looked as I wanted, and particularly not to be left with an unexpected dhcp-ing 
interface that caused my resolv.conf to be eaten every time the lease renewed. 
I didn't want to meddle with xCAT-provided code more than necessary and I 
wanted to try to keep as much config as possible in the node definition. I 
started off down the track of maybe if I can get anaconda to use the bond as 
the install nic it will all be OK, and that's how I ended up where I did.
Someone else suggested you use the mkresolvconf postscript to provide the 
resolv.conf, which should also work.
Jon

--
Dr. Jonathan Diprose mailto:j...@well.ox.ac.uk>>
 Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN

From: Vinícius Ferrão 
[fer...@versatushpc.com.br<mailto:fer...@versatushpc.com.br>]
Sent: 18 June 2021 06:35
To: Jon Diprose
Cc: xCAT Users Mailing list
Subject: Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

Jon thank you man.

I'm looking at the code right now and trying to understand it.

On 17 Jun 2021, at 13:27, Jon Diprose 
mailto:j...@well.ox.ac.uk>> wrote:

Hi Vinícius,
I've dumped the various parts I think are necessary in the following gist:

https://gist.github.com/JDiprose/f377a97998a971bbe1e591e5afd4e1c6

See particularly the 'installnic' and 'addkcmdline' node settings, and the 
referenced 'confignetwork-jon' postbootscript (small but significant addition 
to the original 'confignetwork') and the locally-written 
rescomp_persistinstallbond postscript. Modifications to the kickstart template 
are also required - see 
compute.centos74.no-ignoredisk.net<http://compute.centos74.no-ignoredisk.net/>-link.tmpl,
 referenced by the osimage.
If I remember correctly, all the important bits for installing on a bond are 
above (execpt the switch config, for which forcing up the pxe link is 
required). The 'addkcmdline' setting configures the bond for anaconda to use, 
the 'network --device=link' in the template tells anaconda to use it, the 
'rescomp_persistinstallbond' replaces the junk anaconda produces with a good 
approximation of what xCAT ought to do and the 'installnic' setting and 
modification to 'confignetwork' stops the 'confignetwork' script from 
destroying the config when run as a postbootscript to configure the secondary 
nics (it otherwise doesn't accept that a nic called bondX can be the install 
nic).
I haven't tested that the modified 'confignetwork' is capable of configuring 
the install nic as a bond in its own right. I don't expect that to work and I 
don't think xCAT has anywhere to hold the relevant config without a significant 
rewrite of 'confignetwork'.
This is obviously for a stateful install. I couldn't tell you anything about 
stateless or statelite.
Finally, apologies to all those whose work I have reused without reference in 
the above - I wasn't expecting to be sharing it. I am happy to answer questions 
on it, and especially to be told better ways to do it!
Jon

--
Dr. Jonathan Diprose mailto:j...@well.ox.ac.uk>>
 Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN

From: Vinícius Ferrão 
[fer...@versatushpc.com.br<mailto:fer...@versatushpc.com.br>]
Sent: 17 June 2021 07:13
To: Jon Diprose
Cc: xCAT Users Mailing list
Subject: Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

Hi Jon, thanks for your time.

You're absolutely correct regarding the network configuration. The node boots 
up with PXE from a single network card and them it creates the LACP bond. On 
the switch side that was achieved with LACP fallback. So yes, ip and 
nicips.bond0 are the same. It's the single interface that is later on bonded.

Regarding NetworkManager, I thought xCAT was using it to configure it's 
network. So there's a way to disable it?

Finally, the networkconfig script works if I add it to the postbootscripts 
section instead of postscripts. In the later I end up with the server without 
any network. Console login is unavaible because root password does not work 
either, so I can't check what happened. Don't know why.

Anyway, here's the files:

Bondies first:
[root@login ~]# cat /etc/sysconfig/network-scripts/ifcfg-xcat-bond-bond0
BONDING_OPTS="mode=802.3ad miimon=100"
TYPE=Bond
BONDING_MASTER=yes
HWADDR=
MTU=1500
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=172.26.255.253
PREFIX=16
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAI

Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-18 Thread Jon Diprose
Sorry, I should have also said that I have /etc/resolv.conf on my syncfiles 
list, so all that setup was to try to get xcat to give me ifcfg files that 
looked as I wanted, and particularly not to be left with an unexpected dhcp-ing 
interface that caused my resolv.conf to be eaten every time the lease renewed. 
I didn't want to meddle with xCAT-provided code more than necessary and I 
wanted to try to keep as much config as possible in the node definition. I 
started off down the track of maybe if I can get anaconda to use the bond as 
the install nic it will all be OK, and that's how I ended up where I did.
Someone else suggested you use the mkresolvconf postscript to provide the 
resolv.conf, which should also work.
Jon

--
Dr. Jonathan Diprose mailto:j...@well.ox.ac.uk>>
 Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN

From: Vinícius Ferrão [fer...@versatushpc.com.br]
Sent: 18 June 2021 06:35
To: Jon Diprose
Cc: xCAT Users Mailing list
Subject: Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

Jon thank you man.

I'm looking at the code right now and trying to understand it.

On 17 Jun 2021, at 13:27, Jon Diprose 
mailto:j...@well.ox.ac.uk>> wrote:

Hi Vinícius,
I've dumped the various parts I think are necessary in the following gist:

https://gist.github.com/JDiprose/f377a97998a971bbe1e591e5afd4e1c6

See particularly the 'installnic' and 'addkcmdline' node settings, and the 
referenced 'confignetwork-jon' postbootscript (small but significant addition 
to the original 'confignetwork') and the locally-written 
rescomp_persistinstallbond postscript. Modifications to the kickstart template 
are also required - see 
compute.centos74.no-ignoredisk.net<http://compute.centos74.no-ignoredisk.net/>-link.tmpl,
 referenced by the osimage.
If I remember correctly, all the important bits for installing on a bond are 
above (execpt the switch config, for which forcing up the pxe link is 
required). The 'addkcmdline' setting configures the bond for anaconda to use, 
the 'network --device=link' in the template tells anaconda to use it, the 
'rescomp_persistinstallbond' replaces the junk anaconda produces with a good 
approximation of what xCAT ought to do and the 'installnic' setting and 
modification to 'confignetwork' stops the 'confignetwork' script from 
destroying the config when run as a postbootscript to configure the secondary 
nics (it otherwise doesn't accept that a nic called bondX can be the install 
nic).
I haven't tested that the modified 'confignetwork' is capable of configuring 
the install nic as a bond in its own right. I don't expect that to work and I 
don't think xCAT has anywhere to hold the relevant config without a significant 
rewrite of 'confignetwork'.
This is obviously for a stateful install. I couldn't tell you anything about 
stateless or statelite.
Finally, apologies to all those whose work I have reused without reference in 
the above - I wasn't expecting to be sharing it. I am happy to answer questions 
on it, and especially to be told better ways to do it!
Jon

--
Dr. Jonathan Diprose mailto:j...@well.ox.ac.uk>>
 Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN

From: Vinícius Ferrão 
[fer...@versatushpc.com.br<mailto:fer...@versatushpc.com.br>]
Sent: 17 June 2021 07:13
To: Jon Diprose
Cc: xCAT Users Mailing list
Subject: Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

Hi Jon, thanks for your time.

You're absolutely correct regarding the network configuration. The node boots 
up with PXE from a single network card and them it creates the LACP bond. On 
the switch side that was achieved with LACP fallback. So yes, ip and 
nicips.bond0 are the same. It's the single interface that is later on bonded.

Regarding NetworkManager, I thought xCAT was using it to configure it's 
network. So there's a way to disable it?

Finally, the networkconfig script works if I add it to the postbootscripts 
section instead of postscripts. In the later I end up with the server without 
any network. Console login is unavaible because root password does not work 
either, so I can't check what happened. Don't know why.

Anyway, here's the files:

Bondies first:
[root@login ~]# cat /etc/sysconfig/network-scripts/ifcfg-xcat-bond-bond0
BONDING_OPTS="mode=802.3ad miimon=100"
TYPE=Bond
BONDING_MASTER=yes
HWADDR=
MTU=1500
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=172.26.255.253
PREFIX=16
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=xcat-bond-bond0
UUID=c7b9cbeb-2966-4e41-88a5-be212ca4b675
DEVICE=bond0
ONBOOT=yes
AUTOCONNECT_PRIORITY=9
AUTOCONNECT_RETRIES=0
AUT

Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-17 Thread Vinícius Ferrão via xCAT-user
Jon thank you man.

I'm looking at the code right now and trying to understand it.

On 17 Jun 2021, at 13:27, Jon Diprose 
mailto:j...@well.ox.ac.uk>> wrote:

Hi Vinícius,
I've dumped the various parts I think are necessary in the following gist:

https://gist.github.com/JDiprose/f377a97998a971bbe1e591e5afd4e1c6

See particularly the 'installnic' and 'addkcmdline' node settings, and the 
referenced 'confignetwork-jon' postbootscript (small but significant addition 
to the original 'confignetwork') and the locally-written 
rescomp_persistinstallbond postscript. Modifications to the kickstart template 
are also required - see 
compute.centos74.no-ignoredisk.net<http://compute.centos74.no-ignoredisk.net/>-link.tmpl,
 referenced by the osimage.
If I remember correctly, all the important bits for installing on a bond are 
above (execpt the switch config, for which forcing up the pxe link is 
required). The 'addkcmdline' setting configures the bond for anaconda to use, 
the 'network --device=link' in the template tells anaconda to use it, the 
'rescomp_persistinstallbond' replaces the junk anaconda produces with a good 
approximation of what xCAT ought to do and the 'installnic' setting and 
modification to 'confignetwork' stops the 'confignetwork' script from 
destroying the config when run as a postbootscript to configure the secondary 
nics (it otherwise doesn't accept that a nic called bondX can be the install 
nic).
I haven't tested that the modified 'confignetwork' is capable of configuring 
the install nic as a bond in its own right. I don't expect that to work and I 
don't think xCAT has anywhere to hold the relevant config without a significant 
rewrite of 'confignetwork'.
This is obviously for a stateful install. I couldn't tell you anything about 
stateless or statelite.
Finally, apologies to all those whose work I have reused without reference in 
the above - I wasn't expecting to be sharing it. I am happy to answer questions 
on it, and especially to be told better ways to do it!
Jon

--
Dr. Jonathan Diprose mailto:j...@well.ox.ac.uk>>
 Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN

From: Vinícius Ferrão 
[fer...@versatushpc.com.br<mailto:fer...@versatushpc.com.br>]
Sent: 17 June 2021 07:13
To: Jon Diprose
Cc: xCAT Users Mailing list
Subject: Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

Hi Jon, thanks for your time.

You're absolutely correct regarding the network configuration. The node boots 
up with PXE from a single network card and them it creates the LACP bond. On 
the switch side that was achieved with LACP fallback. So yes, ip and 
nicips.bond0 are the same. It's the single interface that is later on bonded.

Regarding NetworkManager, I thought xCAT was using it to configure it's 
network. So there's a way to disable it?

Finally, the networkconfig script works if I add it to the postbootscripts 
section instead of postscripts. In the later I end up with the server without 
any network. Console login is unavaible because root password does not work 
either, so I can't check what happened. Don't know why.

Anyway, here's the files:

Bondies first:
[root@login ~]# cat /etc/sysconfig/network-scripts/ifcfg-xcat-bond-bond0
BONDING_OPTS="mode=802.3ad miimon=100"
TYPE=Bond
BONDING_MASTER=yes
HWADDR=
MTU=1500
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=172.26.255.253
PREFIX=16
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=xcat-bond-bond0
UUID=c7b9cbeb-2966-4e41-88a5-be212ca4b675
DEVICE=bond0
ONBOOT=yes
AUTOCONNECT_PRIORITY=9
AUTOCONNECT_RETRIES=0
AUTOCONNECT_SLAVES=yes

[root@login ~]# cat 
/etc/sysconfig/network-scripts/ifcfg-xcat-bond-slave-ens1f0np0
MTU=1500
TYPE=Ethernet
NAME=xcat-bond-slave-ens1f0np0
UUID=b9738cbf-908e-4c6f-8dad-29f422f75d11
DEVICE=ens1f0np0
ONBOOT=yes
AUTOCONNECT_PRIORITY=9
AUTOCONNECT_RETRIES=0
MASTER_UUID=c7b9cbeb-2966-4e41-88a5-be212ca4b675
MASTER=bond0
SLAVE=yes

[root@login ~]# cat 
/etc/sysconfig/network-scripts/ifcfg-xcat-bond-slave-ens1f1np1
MTU=1500
TYPE=Ethernet
NAME=xcat-bond-slave-ens1f1np1
UUID=667dc3c7-f2d9-44ea-9342-5e20fe54d8a8
DEVICE=ens1f1np1
ONBOOT=yes
AUTOCONNECT_PRIORITY=9
AUTOCONNECT_RETRIES=0
MASTER_UUID=c7b9cbeb-2966-4e41-88a5-be212ca4b675
MASTER=bond0
SLAVE=yes

This is not supposed to be here:
[root@login ~]# cat /etc/sysconfig/network-scripts/ifcfg-xcat-ens1f0np0
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=172.26.255.253
PREFIX=16
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=xcat-ens1f0np0
UUID=dd90bcec-1520-403e-a384-7edb2a53756b
DEVICE=ens1f0np0
ONBOOT=no
AUTOCONNECT_PRIORITY=9
MTU=1500
GATEWAY=172.26.255.254

That's the WAN i

Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-17 Thread Thomas HUMMEL

Hello, sorry I was off 2 days.

On 14/06/2021 18:48, Vinícius Ferrão via xCAT-user wrote:

The issue is around the nodes that have custom network schemes, like 
bonds and VLANs; it's something wrong during the confignetwork 
postscript.


Ok. Don't know if it is still the case but I think confignetworks for 
instance was able to adapt to a NetworkManager or not environment.




Regarding your questions:


I was just pointing out how many cases of resolv.conf handling can be 
and what to look for to find out (you have to read the different man for 
the details - it quite simple but the combinatory makes it complex).




Cannot find anything related to rc-manager, is this a systemd thing?


It's a NetworkManager configuration directive, like 'dns=' you could 
look also. For instance on both my stateless and stateful nodes, I had 
to state dns=none (although at the time I hadn't read about rc-manager 
and I may maybe use a more adequate setting)


By the way : this is not related but I had to setup hostname-mode also 
(to none as well)


Best

--
TH


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-17 Thread Jon Diprose
Hi Vinícius,
I've dumped the various parts I think are necessary in the following gist:

https://gist.github.com/JDiprose/f377a97998a971bbe1e591e5afd4e1c6

See particularly the 'installnic' and 'addkcmdline' node settings, and the 
referenced 'confignetwork-jon' postbootscript (small but significant addition 
to the original 'confignetwork') and the locally-written 
rescomp_persistinstallbond postscript. Modifications to the kickstart template 
are also required - see compute.centos74.no-ignoredisk.net-link.tmpl, 
referenced by the osimage.
If I remember correctly, all the important bits for installing on a bond are 
above (execpt the switch config, for which forcing up the pxe link is 
required). The 'addkcmdline' setting configures the bond for anaconda to use, 
the 'network --device=link' in the template tells anaconda to use it, the 
'rescomp_persistinstallbond' replaces the junk anaconda produces with a good 
approximation of what xCAT ought to do and the 'installnic' setting and 
modification to 'confignetwork' stops the 'confignetwork' script from 
destroying the config when run as a postbootscript to configure the secondary 
nics (it otherwise doesn't accept that a nic called bondX can be the install 
nic).
I haven't tested that the modified 'confignetwork' is capable of configuring 
the install nic as a bond in its own right. I don't expect that to work and I 
don't think xCAT has anywhere to hold the relevant config without a significant 
rewrite of 'confignetwork'.
This is obviously for a stateful install. I couldn't tell you anything about 
stateless or statelite.
Finally, apologies to all those whose work I have reused without reference in 
the above - I wasn't expecting to be sharing it. I am happy to answer questions 
on it, and especially to be told better ways to do it!
Jon

--
Dr. Jonathan Diprose mailto:j...@well.ox.ac.uk>>
 Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN

From: Vinícius Ferrão [fer...@versatushpc.com.br]
Sent: 17 June 2021 07:13
To: Jon Diprose
Cc: xCAT Users Mailing list
Subject: Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

Hi Jon, thanks for your time.

You're absolutely correct regarding the network configuration. The node boots 
up with PXE from a single network card and them it creates the LACP bond. On 
the switch side that was achieved with LACP fallback. So yes, ip and 
nicips.bond0 are the same. It's the single interface that is later on bonded.

Regarding NetworkManager, I thought xCAT was using it to configure it's 
network. So there's a way to disable it?

Finally, the networkconfig script works if I add it to the postbootscripts 
section instead of postscripts. In the later I end up with the server without 
any network. Console login is unavaible because root password does not work 
either, so I can't check what happened. Don't know why.

Anyway, here's the files:

Bondies first:
[root@login ~]# cat /etc/sysconfig/network-scripts/ifcfg-xcat-bond-bond0
BONDING_OPTS="mode=802.3ad miimon=100"
TYPE=Bond
BONDING_MASTER=yes
HWADDR=
MTU=1500
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=172.26.255.253
PREFIX=16
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=xcat-bond-bond0
UUID=c7b9cbeb-2966-4e41-88a5-be212ca4b675
DEVICE=bond0
ONBOOT=yes
AUTOCONNECT_PRIORITY=9
AUTOCONNECT_RETRIES=0
AUTOCONNECT_SLAVES=yes

[root@login ~]# cat 
/etc/sysconfig/network-scripts/ifcfg-xcat-bond-slave-ens1f0np0
MTU=1500
TYPE=Ethernet
NAME=xcat-bond-slave-ens1f0np0
UUID=b9738cbf-908e-4c6f-8dad-29f422f75d11
DEVICE=ens1f0np0
ONBOOT=yes
AUTOCONNECT_PRIORITY=9
AUTOCONNECT_RETRIES=0
MASTER_UUID=c7b9cbeb-2966-4e41-88a5-be212ca4b675
MASTER=bond0
SLAVE=yes

[root@login ~]# cat 
/etc/sysconfig/network-scripts/ifcfg-xcat-bond-slave-ens1f1np1
MTU=1500
TYPE=Ethernet
NAME=xcat-bond-slave-ens1f1np1
UUID=667dc3c7-f2d9-44ea-9342-5e20fe54d8a8
DEVICE=ens1f1np1
ONBOOT=yes
AUTOCONNECT_PRIORITY=9
AUTOCONNECT_RETRIES=0
MASTER_UUID=c7b9cbeb-2966-4e41-88a5-be212ca4b675
MASTER=bond0
SLAVE=yes

This is not supposed to be here:
[root@login ~]# cat /etc/sysconfig/network-scripts/ifcfg-xcat-ens1f0np0
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=172.26.255.253
PREFIX=16
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=xcat-ens1f0np0
UUID=dd90bcec-1520-403e-a384-7edb2a53756b
DEVICE=ens1f0np0
ONBOOT=no
AUTOCONNECT_PRIORITY=9
MTU=1500
GATEWAY=172.26.255.254

That's the WAN interface:
[root@login ~]# cat /etc/sysconfig/network-scripts/ifcfg-xcat-eno1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=XXX.XXX.XXX.XXX
PREFIX=26
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
I

Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-17 Thread Vinícius Ferrão via xCAT-user
E fallback interface).

[root@login ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens1f1np1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME=ens1f1np1
UUID=dac4a797-99fd-4d7f-92e4-8c0c257e9128
DEVICE=ens1f1np1
ONBOOT=no

[root@login ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens1f0np0
# Generated by parse-kickstart
TYPE=Ethernet
DEVICE=ens1f0np0
UUID=beba2419-7903-4f09-8e61-16b51b66d835
ONBOOT=yes
BOOTPROTO=dhcp
IPV6INIT=yes
PROXY_METHOD=none
BROWSER_ONLY=no
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME="System ens1f0np0"

So Jon, what's your suggestion here? Do you mind to share how you fixed this?

Thank you so much.
Vinícius

PS: routes are messed up too:

[root@login ~]# ip r
default via 172.26.255.254 dev eno1 proto static metric 103 <=== this should 
not be here, this is the WAN interface.
default via 172.26.255.254 dev bond0 proto static metric 300
default via 172.26.255.254 dev bond0.1010 proto static metric 400 <=== this 
should not be here, this is a VLAN...
10.0.255.0/24 dev bond0.1010 proto kernel scope link src 10.0.255.253 metric 400
XXX.XXX.XXX.192/26 dev eno1 proto kernel scope link src 143.106.42.244 metric 
103 <=== this should be the default route.
172.26.0.0/16 dev bond0 proto kernel scope link src 172.26.255.253 metric 300
172.26.255.254 dev eno1 proto static scope link metric 103
172.26.255.254 dev bond0.1010 proto static scope link metric 400
172.27.0.0/16 dev ib0 proto kernel scope link src 172.27.255.253 metric 150
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown

On 16 Jun 2021, at 05:11, Jon Diprose 
mailto:j...@well.ox.ac.uk>> wrote:

Hi Vinícius,
I don't have masses to contribute on this other than that we generally disable 
NetworkManager (why on earth would I want my server dynamically reconfiguring 
its network?) but I am curious about your bonding set-up. Am I right in 
thinking that you are setting up a bond for the primary nic (the one xCAT talks 
to)? It looks like the values for 'ip' and 'nicips.bond0' are the same. I never 
got that to work with the xCAT-supplied postscripts and had to write my own to 
do it, plus specifying some additional install-time kernel params and the 
relevant switch config. So does your way actually generate the correct ifcfg 
files?
To me, your symptoms are consistent with an interface that is still dhcp-ing 
but receiving an empty dns config, or an ifcfg file with an empty "DNS=" param. 
And even with NetworkManager disabled I routinely add "PEERDNS=no" (and 
"DEFROUTE=no") to the nicextraparams.* setting for all secondary nics, though 
it is the route that usually bites me there.
Could you share `ip -o a` and the ifcfg files?
Jon


--
Dr. Jonathan Diprose mailto:j...@well.ox.ac.uk>>
 Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN

From: Vinícius Ferrão via xCAT-user 
[xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>]
Sent: 16 June 2021 04:15
To: xCAT Users Mailing list
Cc: Vinícius Ferrão
Subject: Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

I was able to at lease top /etc/resolv.conf from being overwritten at every 
reboot with the following file:

# cat /etc/NetworkManager/conf.d/90-dns-none.conf
[main]
dns=none

I added this to the synclists and we are good about the /etc/resolv.conf isso. 
The conclusion is that NetworkManager was doing something wrong on 
/etc/resolv.conf.

Although that was fixed with a hack there's consequences of it, the hostname of 
the machine is set as localhost.localdomain, and I don't know how to fix it. Is 
there any option in the node table to set the default hostname? So 
confignetwork can do it's job?

# lsdef login
Object name: login
arch=x86_64
bmc=172.25.255.253
bmcpassword=calvin
bmcusername=root
cons=ipmi
consoleenabled=1
currchain=boot
currstate=install ol8.4.0-x86_64-compute
groups=login,all
ip=172.26.255.253
mac=2c:ea:7f:92:aa:d9
mgt=ipmi
netboot=xnba
nicdevices.bond0=ens1f0np0|ens1f1np1
nicdevices.bond0.1010=bond0
nichostnamesuffixes.bond0.1010=-ceph
nicips.ib0=172.27.255.253
nicips.eno1=XXX.XXX.XXX.XXX
nicips.bond0=172.26.255.253
nicips.bond0.1010=10.0.255.253
nicnetworks.ib0=application
nicnetworks.eno1=site
nicnetworks.bond0=management
nicnetworks.bond0.1010=ceph
nictypes.ens1f1np1=ethernet
nictypes.bond0=bond
nictypes.eno1=ethernet
nictypes.ib0=Infiniband
nictypes.bond0.1010=vlan
nictypes.ens1f0np0=ethernet
os=ol8.4.0

postbootscripts=otherpkgs,versatushpc/openpbs-login,versatushpc/fix-ohpc-login

postscripts=syslog,remotesh

Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-16 Thread Jon Diprose
Hi Vinícius,
I don't have masses to contribute on this other than that we generally disable 
NetworkManager (why on earth would I want my server dynamically reconfiguring 
its network?) but I am curious about your bonding set-up. Am I right in 
thinking that you are setting up a bond for the primary nic (the one xCAT talks 
to)? It looks like the values for 'ip' and 'nicips.bond0' are the same. I never 
got that to work with the xCAT-supplied postscripts and had to write my own to 
do it, plus specifying some additional install-time kernel params and the 
relevant switch config. So does your way actually generate the correct ifcfg 
files?
To me, your symptoms are consistent with an interface that is still dhcp-ing 
but receiving an empty dns config, or an ifcfg file with an empty "DNS=" param. 
And even with NetworkManager disabled I routinely add "PEERDNS=no" (and 
"DEFROUTE=no") to the nicextraparams.* setting for all secondary nics, though 
it is the route that usually bites me there.
Could you share `ip -o a` and the ifcfg files?
Jon


--
Dr. Jonathan Diprose mailto:j...@well.ox.ac.uk>>
 Tel: 01865 287873
Research Computing Manager
Henry Wellcome Building for Genomic Medicine
Roosevelt Drive, Headington, Oxford OX3 7BN

From: Vinícius Ferrão via xCAT-user [xcat-user@lists.sourceforge.net]
Sent: 16 June 2021 04:15
To: xCAT Users Mailing list
Cc: Vinícius Ferrão
Subject: Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

I was able to at lease top /etc/resolv.conf from being overwritten at every 
reboot with the following file:

# cat /etc/NetworkManager/conf.d/90-dns-none.conf
[main]
dns=none

I added this to the synclists and we are good about the /etc/resolv.conf isso. 
The conclusion is that NetworkManager was doing something wrong on 
/etc/resolv.conf.

Although that was fixed with a hack there's consequences of it, the hostname of 
the machine is set as localhost.localdomain, and I don't know how to fix it. Is 
there any option in the node table to set the default hostname? So 
confignetwork can do it's job?

# lsdef login
Object name: login
arch=x86_64
bmc=172.25.255.253
bmcpassword=calvin
bmcusername=root
cons=ipmi
consoleenabled=1
currchain=boot
currstate=install ol8.4.0-x86_64-compute
groups=login,all
ip=172.26.255.253
mac=2c:ea:7f:92:aa:d9
mgt=ipmi
netboot=xnba
nicdevices.bond0=ens1f0np0|ens1f1np1
nicdevices.bond0.1010=bond0
nichostnamesuffixes.bond0.1010=-ceph
nicips.ib0=172.27.255.253
nicips.eno1=XXX.XXX.XXX.XXX
nicips.bond0=172.26.255.253
nicips.bond0.1010=10.0.255.253
nicnetworks.ib0=application
nicnetworks.eno1=site
nicnetworks.bond0=management
nicnetworks.bond0.1010=ceph
nictypes.ens1f1np1=ethernet
nictypes.bond0=bond
nictypes.eno1=ethernet
nictypes.ib0=Infiniband
nictypes.bond0.1010=vlan
nictypes.ens1f0np0=ethernet
os=ol8.4.0

postbootscripts=otherpkgs,versatushpc/openpbs-login,versatushpc/fix-ohpc-login

postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-login
profile=compute
provmethod=ol8.4.0-x86_64-install-login
serialport=0
serialspeed=115200
status=powering-on
statustime=06-15-2021 16:29:52
updatestatus=failed
updatestatustime=06-15-2021 16:27:27

Thanks,
Vinícius.

On 14 Jun 2021, at 13:48, Vinícius Ferrão via xCAT-user 
mailto:xcat-user@lists.sourceforge.net>> wrote:

Hi Thomas,

There's a pattern that I've found. When the compute node is simple enough it 
works, probably da data for resolv.conf is fetched directly from DHPC which 
should be configured correctly.

The issue is around the nodes that have custom network schemes, like bonds and 
VLANs; it's something wrong during the confignetwork postscript. Probably due 
to a configuration mistake that I've made but I don't know which one.

Regarding your questions:

1) It does not exist

[root@ceph01-ib0 ~]# systemctl status systemd-networkd
Unit systemd-networkd.service could not be found.

2) It's running

[root@ceph01-ib0 ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; 
vendor preset: enabled)
   Active: active (running) since Mon 2021-06-14 13:37:20 -03; 8min ago
 Docs: man:NetworkManager(8)
 Main PID: 2028 (NetworkManager)
Tasks: 3 (limit: 2464038)
   Memory: 11.4M
   CGroup: /system.slice/NetworkManager.service
   └─2028 /usr/sbin/NetworkManager --no-daemon

3) It does not exist:

[root@ceph01-ib0 ~]# ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf
[root@ceph01-ib0 ~]# ls -l /run/systemd/resolv/resolv.conf
ls: cannot access '/run/systemd/resolv/resolv.conf': No such file or directory

Cannot find anything related to rc-manager, is this a systemd thing?

4) No it's not.

[root@c

Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-15 Thread Vinícius Ferrão via xCAT-user
I was able to at lease top /etc/resolv.conf from being overwritten at every 
reboot with the following file:

# cat /etc/NetworkManager/conf.d/90-dns-none.conf
[main]
dns=none

I added this to the synclists and we are good about the /etc/resolv.conf isso. 
The conclusion is that NetworkManager was doing something wrong on 
/etc/resolv.conf.

Although that was fixed with a hack there's consequences of it, the hostname of 
the machine is set as localhost.localdomain, and I don't know how to fix it. Is 
there any option in the node table to set the default hostname? So 
confignetwork can do it's job?

# lsdef login
Object name: login
arch=x86_64
bmc=172.25.255.253
bmcpassword=calvin
bmcusername=root
cons=ipmi
consoleenabled=1
currchain=boot
currstate=install ol8.4.0-x86_64-compute
groups=login,all
ip=172.26.255.253
mac=2c:ea:7f:92:aa:d9
mgt=ipmi
netboot=xnba
nicdevices.bond0=ens1f0np0|ens1f1np1
nicdevices.bond0.1010=bond0
nichostnamesuffixes.bond0.1010=-ceph
nicips.ib0=172.27.255.253
nicips.eno1=XXX.XXX.XXX.XXX
nicips.bond0=172.26.255.253
nicips.bond0.1010=10.0.255.253
nicnetworks.ib0=application
nicnetworks.eno1=site
nicnetworks.bond0=management
nicnetworks.bond0.1010=ceph
nictypes.ens1f1np1=ethernet
nictypes.bond0=bond
nictypes.eno1=ethernet
nictypes.ib0=Infiniband
nictypes.bond0.1010=vlan
nictypes.ens1f0np0=ethernet
os=ol8.4.0

postbootscripts=otherpkgs,versatushpc/openpbs-login,versatushpc/fix-ohpc-login

postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-login
profile=compute
provmethod=ol8.4.0-x86_64-install-login
serialport=0
serialspeed=115200
status=powering-on
statustime=06-15-2021 16:29:52
updatestatus=failed
updatestatustime=06-15-2021 16:27:27

Thanks,
Vinícius.

On 14 Jun 2021, at 13:48, Vinícius Ferrão via xCAT-user 
mailto:xcat-user@lists.sourceforge.net>> wrote:

Hi Thomas,

There's a pattern that I've found. When the compute node is simple enough it 
works, probably da data for resolv.conf is fetched directly from DHPC which 
should be configured correctly.

The issue is around the nodes that have custom network schemes, like bonds and 
VLANs; it's something wrong during the confignetwork postscript. Probably due 
to a configuration mistake that I've made but I don't know which one.

Regarding your questions:

1) It does not exist

[root@ceph01-ib0 ~]# systemctl status systemd-networkd
Unit systemd-networkd.service could not be found.

2) It's running

[root@ceph01-ib0 ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; 
vendor preset: enabled)
   Active: active (running) since Mon 2021-06-14 13:37:20 -03; 8min ago
 Docs: man:NetworkManager(8)
 Main PID: 2028 (NetworkManager)
Tasks: 3 (limit: 2464038)
   Memory: 11.4M
   CGroup: /system.slice/NetworkManager.service
   └─2028 /usr/sbin/NetworkManager --no-daemon

3) It does not exist:

[root@ceph01-ib0 ~]# ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf
[root@ceph01-ib0 ~]# ls -l /run/systemd/resolv/resolv.conf
ls: cannot access '/run/systemd/resolv/resolv.conf': No such file or directory

Cannot find anything related to rc-manager, is this a systemd thing?

4) No it's not.

[root@ceph01-ib0 ~]# ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf

5) Seems default to me

[root@ceph01-ib0 ~]# grep host /etc/nsswitch.conf
# Valid databases are: aliases, ethers, group, gshadow, hosts,
# myhostname Use systemd host names
hosts:  files dns myhostname

That's it.

It's probably something messy with confignetwork script, but not sure what.

Thanks,

On 14 Jun 2021, at 07:57, Thomas HUMMEL 
mailto:thomas.hum...@pasteur.fr>> wrote:



On 14/06/2021 07:41, Vinícius Ferrão via xCAT-user wrote:
Hello,
For unknown reasons nodes that I've installed with rinstall (using stateful 
method) didn't get the nameserver section in resolv.conf, basically leaving the 
node without any name resolution.

Hello,

assuming it is not an xCAT bug, I would look at

1) if systemd-networkd is enabled
2) if NetworkManager is enabled
3) if b) if it handles /etc/resolv.conf by looking at its conf and
   a) is dns= stated ?
   b) is /etc/resolv.conf a symlink to /run/systemd/resolv/resolv.conf ?
   c) is rc-manager stated ?
4) is /etc/resolv.conf a symlink to ../run/resolvconf/resolv.conf ?
5) the host line of /etc/nsswitch.conf

to figure out who manages /etc/resolv.conf

Hope it helps.

--
Thomas HUMMEL


rc-manager=
As specified on the documentation 

Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-14 Thread Vinícius Ferrão via xCAT-user
Hi Thomas,

There's a pattern that I've found. When the compute node is simple enough it 
works, probably da data for resolv.conf is fetched directly from DHPC which 
should be configured correctly.

The issue is around the nodes that have custom network schemes, like bonds and 
VLANs; it's something wrong during the confignetwork postscript. Probably due 
to a configuration mistake that I've made but I don't know which one.

Regarding your questions:

1) It does not exist

[root@ceph01-ib0 ~]# systemctl status systemd-networkd
Unit systemd-networkd.service could not be found.

2) It's running

[root@ceph01-ib0 ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; 
vendor preset: enabled)
   Active: active (running) since Mon 2021-06-14 13:37:20 -03; 8min ago
 Docs: man:NetworkManager(8)
 Main PID: 2028 (NetworkManager)
Tasks: 3 (limit: 2464038)
   Memory: 11.4M
   CGroup: /system.slice/NetworkManager.service
   └─2028 /usr/sbin/NetworkManager --no-daemon

3) It does not exist:

[root@ceph01-ib0 ~]# ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf
[root@ceph01-ib0 ~]# ls -l /run/systemd/resolv/resolv.conf
ls: cannot access '/run/systemd/resolv/resolv.conf': No such file or directory

Cannot find anything related to rc-manager, is this a systemd thing?

4) No it's not.

[root@ceph01-ib0 ~]# ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf

5) Seems default to me

[root@ceph01-ib0 ~]# grep host /etc/nsswitch.conf
# Valid databases are: aliases, ethers, group, gshadow, hosts,
# myhostname Use systemd host names
hosts:  files dns myhostname

That's it.

It's probably something messy with confignetwork script, but not sure what.

Thanks,

On 14 Jun 2021, at 07:57, Thomas HUMMEL 
mailto:thomas.hum...@pasteur.fr>> wrote:



On 14/06/2021 07:41, Vinícius Ferrão via xCAT-user wrote:
Hello,
For unknown reasons nodes that I've installed with rinstall (using stateful 
method) didn't get the nameserver section in resolv.conf, basically leaving the 
node without any name resolution.

Hello,

assuming it is not an xCAT bug, I would look at

1) if systemd-networkd is enabled
2) if NetworkManager is enabled
3) if b) if it handles /etc/resolv.conf by looking at its conf and
   a) is dns= stated ?
   b) is /etc/resolv.conf a symlink to /run/systemd/resolv/resolv.conf ?
   c) is rc-manager stated ?
4) is /etc/resolv.conf a symlink to ../run/resolvconf/resolv.conf ?
5) the host line of /etc/nsswitch.conf

to figure out who manages /etc/resolv.conf

Hope it helps.

--
Thomas HUMMEL


rc-manager=
As specified on the documentation 
https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html;
 it should be generated it nameservers and domain are provided on the site 
table: The resolv.conf files for the compute nodes will be created 
automatically using the domain and nameservers values set in the xCAT network 
or site definition.
Both are defined but it still didn't generate it correctly.
[root@headnode ~]# lsdef -t site clustersite | egrep "nameserver|forward|domain"
domain=cluster.domain.tld
forwarders=1.1.1.1
nameservers=172.26.255.254
I even tried adding the nameservers to the network definition, but it was a no 
go:
[root@headnode ~]# lsdef -t network management
Object name: management
gateway=
mask=255.255.0.0
mgtifname=bond0
mtu=1500
nameservers=172.26.255.254
net=172.26.0.0
tftpserver=
Is there anything that I can do to debug this?
Thanks,
Vinícius.
PS: Here's full data from a given node and the networks.
[root@headnode ~]# lsdef ceph01
Object name: ceph01
arch=x86_64
bmc=172.25.254.1
bmcpassword=calvin
bmcusername=root
cons=ipmi
consoleenabled=1
currchain=boot
currstate=install ol8.4.0-x86_64-compute
groups=ceph,all
ip=172.26.254.1
mac=bc:97:e1:ea:08:b0
mgt=ipmi
netboot=xnba
nicdevices.bond0.123=bond0
nicdevices.bond0.1010=bond0
nicdevices.bond0=ens1f0np0|ens1f1np1
nichostnamesuffixes.bond0.1010=-ceph
nichostnamesuffixes.bond0.123=-cephsync
nicips.ib0=172.27.254.1
nicips.bond0=172.26.254.1
nicips.bond0.1010=10.0.10.21
nicips.bond0.123=192.168.168.21
nicnetworks.bond0.123=ceph-sync
nicnetworks.ib0=application
nicnetworks.bond0.1010=ceph
nicnetworks.bond0=management
nictypes.ib0=Infiniband
nictypes.ens1f0np0=ethernet
nictypes.bond0.1010=vlan
nictypes.bond0=bond
nictypes.ens1f1np1=ethernet
nictypes.bond0.123=vlan
os=ol8.4.0
postbootscripts=otherpkgs,confignics
 

Re: [xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-14 Thread Thomas HUMMEL



On 14/06/2021 07:41, Vinícius Ferrão via xCAT-user wrote:

Hello,

For unknown reasons nodes that I've installed with rinstall (using 
stateful method) didn't get the nameserver section in resolv.conf, 
basically leaving the node without any name resolution.


Hello,

assuming it is not an xCAT bug, I would look at

1) if systemd-networkd is enabled
2) if NetworkManager is enabled
3) if b) if it handles /etc/resolv.conf by looking at its conf and
a) is dns= stated ?
b) is /etc/resolv.conf a symlink to /run/systemd/resolv/resolv.conf ?
c) is rc-manager stated ?
4) is /etc/resolv.conf a symlink to ../run/resolvconf/resolv.conf ?
5) the host line of /etc/nsswitch.conf

to figure out who manages /etc/resolv.conf

Hope it helps.

--
Thomas HUMMEL



rc-manager=
As specified on the documentation 
https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html 
; 
it should be generated it nameservers and domain are provided on the 
site table: The resolv.conf files for the compute nodes will be created 
automatically using the domain and nameservers values set in 
the xCAT network or site definition.


Both are defined but it still didn't generate it correctly.

[root@headnode ~]# lsdef -t site clustersite | egrep 
"nameserver|forward|domain"

     domain=cluster.domain.tld
     forwarders=1.1.1.1
     nameservers=172.26.255.254

I even tried adding the nameservers to the network definition, but it 
was a no go:


[root@headnode ~]# lsdef -t network management
Object name: management
     gateway=
     mask=255.255.0.0
     mgtifname=bond0
     mtu=1500
     nameservers=172.26.255.254
     net=172.26.0.0
     tftpserver=

Is there anything that I can do to debug this?

Thanks,
Vinícius.

PS: Here's full data from a given node and the networks.

[root@headnode ~]# lsdef ceph01
Object name: ceph01
     arch=x86_64
     bmc=172.25.254.1
     bmcpassword=calvin
     bmcusername=root
     cons=ipmi
     consoleenabled=1
     currchain=boot
     currstate=install ol8.4.0-x86_64-compute
     groups=ceph,all
     ip=172.26.254.1
     mac=bc:97:e1:ea:08:b0
     mgt=ipmi
     netboot=xnba
     nicdevices.bond0.123=bond0
     nicdevices.bond0.1010=bond0
     nicdevices.bond0=ens1f0np0|ens1f1np1
     nichostnamesuffixes.bond0.1010=-ceph
     nichostnamesuffixes.bond0.123=-cephsync
     nicips.ib0=172.27.254.1
     nicips.bond0=172.26.254.1
     nicips.bond0.1010=10.0.10.21
     nicips.bond0.123=192.168.168.21
     nicnetworks.bond0.123=ceph-sync
     nicnetworks.ib0=application
     nicnetworks.bond0.1010=ceph
     nicnetworks.bond0=management
     nictypes.ib0=Infiniband
     nictypes.ens1f0np0=ethernet
     nictypes.bond0.1010=vlan
     nictypes.bond0=bond
     nictypes.ens1f1np1=ethernet
     nictypes.bond0.123=vlan
     os=ol8.4.0
     postbootscripts=otherpkgs,confignics
   
   postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-ceph

     profile=compute
     provmethod=ol8.4.0-x86_64-install-ceph
     serialport=0
     serialspeed=115200
     status=booted
     statustime=06-14-2021 02:37:04
     updatestatus=synced
     updatestatustime=06-14-2021 02:01:55

[root@headnode ~]# lsdef -t network
application  (network)
ceph  (network)
ceph-sync  (network)
libvirt  (network)
management  (network)
service  (network)
site  (network)




___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/xcat-user__;!!JFdNOqOXpB6UZW0!91ZLw8JQX3n5Rscdto49z3zhxcPMupJEn1wtuLVOZFrMI5loio5BEgk3-82bVMxD4UfdFg$




___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] /etc/resolv.conf missing nameserver on install nodes

2021-06-13 Thread Vinícius Ferrão via xCAT-user
Hello,

For unknown reasons nodes that I've installed with rinstall (using stateful 
method) didn't get the nameserver section in resolv.conf, basically leaving the 
node without any name resolution.

As specified on the documentation 
https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html;
 it should be generated it nameservers and domain are provided on the site 
table: The resolv.conf files for the compute nodes will be created 
automatically using the domain and nameservers values set in the xCAT network 
or site definition.

Both are defined but it still didn't generate it correctly.

[root@headnode ~]# lsdef -t site clustersite | egrep "nameserver|forward|domain"
domain=cluster.domain.tld
forwarders=1.1.1.1
nameservers=172.26.255.254

I even tried adding the nameservers to the network definition, but it was a no 
go:

[root@headnode ~]# lsdef -t network management
Object name: management
gateway=
mask=255.255.0.0
mgtifname=bond0
mtu=1500
nameservers=172.26.255.254
net=172.26.0.0
tftpserver=

Is there anything that I can do to debug this?

Thanks,
Vinícius.

PS: Here's full data from a given node and the networks.

[root@headnode ~]# lsdef ceph01
Object name: ceph01
arch=x86_64
bmc=172.25.254.1
bmcpassword=calvin
bmcusername=root
cons=ipmi
consoleenabled=1
currchain=boot
currstate=install ol8.4.0-x86_64-compute
groups=ceph,all
ip=172.26.254.1
mac=bc:97:e1:ea:08:b0
mgt=ipmi
netboot=xnba
nicdevices.bond0.123=bond0
nicdevices.bond0.1010=bond0
nicdevices.bond0=ens1f0np0|ens1f1np1
nichostnamesuffixes.bond0.1010=-ceph
nichostnamesuffixes.bond0.123=-cephsync
nicips.ib0=172.27.254.1
nicips.bond0=172.26.254.1
nicips.bond0.1010=10.0.10.21
nicips.bond0.123=192.168.168.21
nicnetworks.bond0.123=ceph-sync
nicnetworks.ib0=application
nicnetworks.bond0.1010=ceph
nicnetworks.bond0=management
nictypes.ib0=Infiniband
nictypes.ens1f0np0=ethernet
nictypes.bond0.1010=vlan
nictypes.bond0=bond
nictypes.ens1f1np1=ethernet
nictypes.bond0.123=vlan
os=ol8.4.0
postbootscripts=otherpkgs,confignics

postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-ceph
profile=compute
provmethod=ol8.4.0-x86_64-install-ceph
serialport=0
serialspeed=115200
status=booted
statustime=06-14-2021 02:37:04
updatestatus=synced
updatestatustime=06-14-2021 02:01:55

[root@headnode ~]# lsdef -t network
application  (network)
ceph  (network)
ceph-sync  (network)
libvirt  (network)
management  (network)
service  (network)
site  (network)


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user