Re: [xcat-user] [ext] /etc/resolv.conf missing nameserver on install nodes

2021-06-14 Thread Vinícius Ferrão via xCAT-user
Here's a followup:

Something is wrong and I don't know what. confignetwork existed with status 
code = 1; so it failed. But xcat.log does says what have failed:

[root@ceph01-ib0 ~]# cat /var/log/xcat/xcat.log
Running Kickstart Post-installation script...
Mon Jun 14 14:39:02 -03 2021 [info]: xcat.deployment: Executing post.xcat to 
prepare for firstbooting ...
Mon Jun 14 14:39:41 -03 2021 [info]: xcat.deployment: trying to download 
postscripts from 172.26.255.254...
Mon Jun 14 14:39:42 -03 2021 [info]: xcat.deployment: postscripts downloaded 
successfully
Mon Jun 14 14:39:42 -03 2021 [info]: xcat.deployment: trying to get 
mypostscript from 172.26.255.254...
Mon Jun 14 14:39:42 -03 2021 [info]: xcat.deployment.postscript: Running 
postscript: syslog
grep: /etc/rsyslog.d/remote.conf: No such file or directory
grep: /etc/rsyslog.d/remote.conf: No such file or directory
Mon Jun 14 14:39:42 -03 2021 [info]: xcat.deployment.postscript: postscript 
syslog return with 0
Mon Jun 14 14:39:42 -03 2021 [info]: xcat.deployment.postscript: Running 
postscript: remoteshell

Unable to load host key: /etc/ssh/ssh_host_ed25519_key
Mon Jun 14 14:39:44 -03 2021 [info]: xcat.deployment.postscript: postscript 
remoteshell return with 0
Mon Jun 14 14:39:44 -03 2021 [info]: xcat.deployment.postscript: Running 
postscript: syncfiles
Mon Jun 14 14:39:45 -03 2021 [info]: xcat.deployment.postscript: postscript 
syncfiles return with 0
Mon Jun 14 14:39:45 -03 2021 [info]: xcat.deployment.postscript: Running 
postscript: confignetwork
[I]: back up /etc/sysconfig/network-scripts to 
/etc/sysconfig/network-scripts.xcatbak
[I]: All valid nics and device list:
[I]: ib0
[I]: bond0 ens1f0np0@ens1f1np1
[I]: bond0.1010 bond0
[I]: bond0.123 bond0

configure nic and its device : ib0
[I]: Call configib for IB nics: ib0, ports:
[I]: NMCLI_USED=2 NIC_IBNICS=ib0 NIC_IBAPORTS= configib
[E]:Error: configib failed.

configure nic and its device : bond0 ens1f0np0@ens1f1np1
[I]: create_bond_interface ifname=bond0 slave_ports=ens1f0np0,ens1f1np1 
slave_type=ethernet
[I]: Pickup xcatnet, "management", from NICNETWORKS for interface "bond0".
[I]: ip link set bond0 down
[I]: [bond.down] >> 7: bond0:  mtu 1500 qdisc noop 
state DOWN mode DEFAULT group default qlen 1000
[I]: [bond.down] >> link/ether 96:c4:61:22:a2:d3 brd ff:ff:ff:ff:ff:ff
[I]: [bond.slavesAft] >>
[I]: ip link set ens1f0np0 down
[I]: [slave]: >> 3: ens1f0np0:  mtu 1500 qdisc mq state 
DOWN mode DEFAULT group default qlen 1000
[I]: [slave]: >> link/ether bc:97:e1:ea:08:b0 brd ff:ff:ff:ff:ff:ff
[I]: create_persistent_ifcfg ifname=ens1f0np0 
inattrs=ONBOOT=yes,USERCTL=no,TYPE=Ethernet,SLAVE=yes,MASTER=bond0,BOOTPROTO=none,MTU=1500
['ifcfg-ens1f0np0']
[I]: >> ONBOOT="yes"
[I]: >> USERCTL="no"
[I]: >> TYPE="Ethernet"
[I]: >> SLAVE="yes"
[I]: >> MASTER="bond0"
[I]: >> BOOTPROTO="static"
[I]: >> MTU="1500"
[I]: >> DEVICE="ens1f0np0"
[I]: >> NAME="ens1f0np0"
[I]: ip link set ens1f1np1 down
[I]: [slave]: >> 5: ens1f1np1:  mtu 1500 qdisc mq state 
DOWN mode DEFAULT group default qlen 1000
[I]: [slave]: >> link/ether bc:97:e1:ea:08:b1 brd ff:ff:ff:ff:ff:ff
[I]: create_persistent_ifcfg ifname=ens1f1np1 
inattrs=ONBOOT=yes,USERCTL=no,TYPE=Ethernet,SLAVE=yes,MASTER=bond0,BOOTPROTO=none,MTU=1500
['ifcfg-ens1f1np1']
[I]: >> ONBOOT="yes"
[I]: >> USERCTL="no"
[I]: >> TYPE="Ethernet"
[I]: >> SLAVE="yes"
[I]: >> MASTER="bond0"
[I]: >> BOOTPROTO="static"
[I]: >> MTU="1500"
[I]: >> DEVICE="ens1f1np1"
[I]: >> NAME="ens1f1np1"
[I]: [bond.slavesNew] >> ens1f0np0 ens1f1np1
[I]: ip link set bond0 up
[I]: [ip.link] >> 7: bond0: 
 mtu 1500 qdisc noqueue state UP mode 
DEFAULT group default qlen 1000
[I]: [ip.link] >> link/ether bc:97:e1:ea:08:b0 brd 
ff:ff:ff:ff:ff:ff
[I]: create_persistent_ifcfg ifname=bond0 xcatnet=management 
inattrs=ONBOOT=yes,USERCTL=no,TYPE=Bond,BONDING_MASTER=yes,BONDING_OPTS='mode=802.3ad
 miimon=100',BOOTPROTO=none,DHCLIENTARGS='-timeout 200',MTU=1500
['ifcfg-bond0']
[I]: >> ONBOOT="yes"
[I]: >> USERCTL="no"
[I]: >> TYPE="Bond"
[I]: >> BONDING_MASTER="yes"
[I]: >> BONDING_OPTS="mode=802.3ad miimon=100"
[I]: >> BOOTPROTO="static"
[I]: >> DHCLIENTARGS="-timeout 200"
[I]: >> MTU="1500"
[I]: >> DEVICE="bond0"
[I]: >> IPADDR="172.26.254.1"
[I]: >> NETMASK="255.255.0.0"
[I]: >> NAME="bond0"

configure nic and its device : bond0.1010 bond0
[I]: create_vlan_interface ifname=bond0 vlanid=1010
[I]: Pickup xcatnet, "ceph", from NICNETWORKS for interface "bond0".
[I]: ip link add link bond0 name bond0.1010 type vlan id 1010
RTNETLINK answers: Numerical result out of range
[I]: ip link set bond0.1010 up
[I]: create_persistent_ifcfg ifname=bond0.1010 xcatnet=ceph 

Re: [xcat-user] [ext] /etc/resolv.conf missing nameserver on install nodes

2021-06-14 Thread Vinícius Ferrão via xCAT-user
Hi Georgios,

Thanks for the tip. I've never heard of mkresolvconf in more than 5 years using 
xCAT. But yes, it does generate the /etc/resolv.conf correctly when issuing the 
updatenode command, but not while the machine is booting.

Something may be overwriting the /etc/resolv.conf file.

But there's stil a tricky behaviour. The compunte node is selecting it's name 
from the Infiniband interface, which is incorrect. It should use it's name from 
the management interface. That was happening before the mkresolvconf trick 
anyway, and again, I don't have any ideia why this is happening.

Thanks,
Vinícius.


On 14 Jun 2021, at 07:10, Nikolis, Georgios 
mailto:georgios.niko...@charite.de>> wrote:

I think you need to add the mkresolvconf postscript into the node's 
postbootscripts.
Does "updatenode  -P mkresolvconf" repair nameserver configuration on an 
affected node?
Cheers,
Georgios


--

Georgios Nikolis
Charité – Universitätsmedizin Berlin
Geschäftsbereich IT | Scientific Computing

Campus Charité Mitte
Rahel-Hirsch-Weg 5 | Ebene 02 | Raum 073
Charitéplatz 1 | 10117 Berlin

georgios.niko...@charite.de
https://www.charite.de


On Mon, 2021-06-14 at 05:41 +, Vinícius Ferrão via xCAT-user wrote:
Hello,

For unknown reasons nodes that I've installed with rinstall (using stateful 
method) didn't get the nameserver section in resolv.conf, basically leaving the 
node without any name resolution.

As specified on the documentation 
https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html;
 it should be generated it nameservers and domain are provided on the site 
table: The resolv.conf files for the compute nodes will be created 
automatically using the domain and nameservers values set in the xCAT network 
or site definition.

Both are defined but it still didn't generate it correctly.

[root@headnode ~]# lsdef -t site clustersite | egrep "nameserver|forward|domain"
domain=cluster.domain.tld
forwarders=1.1.1.1
nameservers=172.26.255.254

I even tried adding the nameservers to the network definition, but it was a no 
go:

[root@headnode ~]# lsdef -t network management
Object name: management
gateway=
mask=255.255.0.0
mgtifname=bond0
mtu=1500
nameservers=172.26.255.254
net=172.26.0.0
tftpserver=

Is there anything that I can do to debug this?

Thanks,
Vinícius.

PS: Here's full data from a given node and the networks.

[root@headnode ~]# lsdef ceph01
Object name: ceph01
arch=x86_64
bmc=172.25.254.1
bmcpassword=calvin
bmcusername=root
cons=ipmi
consoleenabled=1
currchain=boot
currstate=install ol8.4.0-x86_64-compute
groups=ceph,all
ip=172.26.254.1
mac=bc:97:e1:ea:08:b0
mgt=ipmi
netboot=xnba
nicdevices.bond0.123=bond0
nicdevices.bond0.1010=bond0
nicdevices.bond0=ens1f0np0|ens1f1np1
nichostnamesuffixes.bond0.1010=-ceph
nichostnamesuffixes.bond0.123=-cephsync
nicips.ib0=172.27.254.1
nicips.bond0=172.26.254.1
nicips.bond0.1010=10.0.10.21
nicips.bond0.123=192.168.168.21
nicnetworks.bond0.123=ceph-sync
nicnetworks.ib0=application
nicnetworks.bond0.1010=ceph
nicnetworks.bond0=management
nictypes.ib0=Infiniband
nictypes.ens1f0np0=ethernet
nictypes.bond0.1010=vlan
nictypes.bond0=bond
nictypes.ens1f1np1=ethernet
nictypes.bond0.123=vlan
os=ol8.4.0
postbootscripts=otherpkgs,confignics

postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-ceph
profile=compute
provmethod=ol8.4.0-x86_64-install-ceph
serialport=0
serialspeed=115200
status=booted
statustime=06-14-2021 02:37:04
updatestatus=synced
updatestatustime=06-14-2021 02:01:55

[root@headnode ~]# lsdef -t network
application  (network)
ceph  (network)
ceph-sync  (network)
libvirt  (network)
management  (network)
service  (network)
site  (network)



___

xCAT-user mailing list



xCAT-user@lists.sourceforge.net




https://lists.sourceforge.net/lists/listinfo/xcat-user


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] [ext] /etc/resolv.conf missing nameserver on install nodes

2021-06-14 Thread Nikolis, Georgios
I think you need to add the mkresolvconf postscript into the node's
postbootscripts.Does "updatenode  -P mkresolvconf" repair
nameserver configuration on an affected node?Cheers,Georgios 

On Mon, 2021-06-14 at 05:41 +, Vinícius Ferrão via xCAT-user wrote:
> Hello,
> 
> 
> 
> 
> For unknown reasons nodes that I've installed with rinstall (using
> stateful method) didn't get the nameserver section in resolv.conf,
> basically leaving the node without any name resolution.
> 
> 
> 
> 
> 
> As specified on the documentation 
> https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html
> ;
>  it should be generated it nameservers and domain are provided on the
> site table: The resolv.conf files for the compute nodes will be
> created automatically using the domain and nameservers values set in
> the xCAT network or site definition.
> 
> 
> 
> Both are defined but it still didn't generate it correctly.
> 
> 
> 
> 
> [root@headnode ~]# lsdef -t site clustersite | egrep
> "nameserver|forward|domain"
> 
> domain=cluster.domain.tld
> 
> forwarders=1.1.1.1
> 
> nameservers=172.26.255.254
> 
> 
> 
> 
> 
> I even tried adding the nameservers to the network definition, but it
> was a no go:
> 
> 
> 
> 
> 
> [root@headnode ~]# lsdef -t network management
> 
> Object name: management
> 
> gateway=
> 
> mask=255.255.0.0
> 
> mgtifname=bond0
> 
> mtu=1500
> 
> nameservers=172.26.255.254
> 
> net=172.26.0.0
> 
> tftpserver=
> 
> 
> 
> 
> 
> Is there anything that I can do to debug this?
> 
> 
> 
> 
> 
> Thanks,
> 
> Vinícius.
> 
> 
> 
> 
> 
> PS: Here's full data from a given node and the networks.
> 
> 
> 
> 
> 
> [root@headnode ~]# lsdef ceph01
> 
> Object name: ceph01
> 
> arch=x86_64
> 
> bmc=172.25.254.1
> 
> bmcpassword=calvin
> 
> bmcusername=root
> 
> cons=ipmi
> 
> consoleenabled=1
> 
> currchain=boot
> 
> currstate=install ol8.4.0-x86_64-compute
> 
> groups=ceph,all
> 
> ip=172.26.254.1
> 
> mac=bc:97:e1:ea:08:b0
> 
> mgt=ipmi
> 
> netboot=xnba
> 
> nicdevices.bond0.123=bond0
> 
> nicdevices.bond0.1010=bond0
> 
> nicdevices.bond0=ens1f0np0|ens1f1np1
> 
> nichostnamesuffixes.bond0.1010=-ceph
> 
> nichostnamesuffixes.bond0.123=-cephsync
> 
> nicips.ib0=172.27.254.1
> 
> nicips.bond0=172.26.254.1
> 
> nicips.bond0.1010=10.0.10.21
> 
> nicips.bond0.123=192.168.168.21
> 
> nicnetworks.bond0.123=ceph-sync
> 
> nicnetworks.ib0=application
> 
> nicnetworks.bond0.1010=ceph
> 
> nicnetworks.bond0=management
> 
> nictypes.ib0=Infiniband
> 
> nictypes.ens1f0np0=ethernet
> 
> nictypes.bond0.1010=vlan
> 
> nictypes.bond0=bond
> 
> nictypes.ens1f1np1=ethernet
> 
> nictypes.bond0.123=vlan
> 
> os=ol8.4.0
> 
> postbootscripts=otherpkgs,confignics
> 
>  
>   postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/
> postinstall-ceph
> 
> profile=compute
> 
> provmethod=ol8.4.0-x86_64-install-ceph
> 
> serialport=0
> 
> serialspeed=115200
> 
> status=booted
> 
> statustime=06-14-2021 02:37:04
> 
> updatestatus=synced
> 
> updatestatustime=06-14-2021 02:01:55
> 
> 
> 
> 
> 
> [root@headnode ~]# lsdef -t network
> 
> application  (network)
> 
> ceph  (network)
> 
> ceph-sync  (network)
> 
> libvirt  (network)
> 
> management  (network)
> 
> service  (network)
> 
> site  (network)
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___xCAT-user mailing 
> listxcat-u...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
-- 
Georgios Nikolis
Charité – Universitätsmedizin Berlin
Geschäftsbereich IT | Scientific Computing
 
Campus Charité Mitte
Rahel-Hirsch-Weg 5 | Ebene 02 | Raum 073
Charitéplatz 1 | 10117 Berlin
 

georgios.niko...@charite.de

https://www.charite.de





smime.p7s
Description: S/MIME cryptographic signature
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user