Hi, I would like some help from you guys with a problem that I can't figure out what's going on.
I am using the following xcat release: xCAT-2.16-snap202006161607.x86_64 xCAT-genesis-scripts-ppc64-2.16-snap202006161607.noarch xCAT-genesis-base-x86_64-2.14.5-snap201811190037.noarch perl-xCAT-2.16-snap202006161607.noarch ipmitool-xcat-1.8.18-3.x86_64 xCAT-probe-2.16-snap202006161607.noarch xCAT-genesis-scripts-x86_64-2.16-snap202006161607.noarch xCAT-genesis-base-ppc64-2.14.5-snap201811160710.noarch xCAT-server-2.16-snap202006161607.noarch xCAT-client-2.16-snap202006161607.noarch elilo-xcat-3.14-4.noarch syslinux-xcat-3.86-2.noarch grub2-xcat-2.02-0.76.el7.1.snap201905160255.noarch xCAT-buildkit-2.16-snap202006161607.noarch here is my networks table: "compute_net_1","10.240.58.0","255.255.254.0","eno1","10.240.58.1",,"<xcatmaster>","10.240.58.4,8.8.8.8,128.200.192.202",,,"10.240.58.221-10.240.58.240","10.240.58.4-10.240.59.220",,,,,"local","1500",, "mgmt-net1","10.240.62.0","255.255.254.0","eno1.1","10.240.62.1",,,,,,"10.240.62.244-10.240.62.253","10.240.62.4-10.240.63.220",,,,,,,, "ib-net1","10.240.60.0","255.255.254.0","eno1.2","10.240.60.1",,,,,,"10.240.60.221-10.240.60.240","10.240.60.4-10.240.61.220",,,,,,,, "hpc2-net1","10.240.64.0","255.255.254.0","eno1.4","10.240.64.1",,,,,,,"10.240.64.2-10.240.64.254",,,,,,,, "crsp-net1","10.20.20.0","255.255.0.0","eno1.3","10.20.20.1",,,,,,"10.20.20.151-10.20.20.200","10.20.20.7-10.20.20.150",,,,,,,, 'compute_net_1' is the net I am using to build systems. So, I was building a node. Here is the info on the node: [root@hpc3-xcat-1 ~]# lsdef -t node hpc3-21-12 Object name: hpc3-21-12 arch=x86_64 currchain=boot currstate=boot groups=centos78 ip=10.240.59.6 mac=78:ac:44:37:56:20 netboot=xnba nichostnamesuffixes.ib0=-ib0 nichostnamesuffixes.ipmi=-ipmi nicips.ib0=10.240.61.6 nicips.ipmi=10.240.63.6 os=centos7.8 postbootscripts=otherpkgs,hpc3-postscripts/hpc3postbootscript postscripts=syslog,remoteshell,syncfiles,setupntp,hpc3-postscripts/hpc3postscript.1,confignetwork -s profile=compute provmethod=centos7.8-x86_64-install-compute status=booted statustime=07-08-2021 00:00:46 After the initial build was completed, node lost network connectivity. I was scratching my head why that happened. We have built over 200 nodes in this cluster. When I logged on to the node via console ( as root, as I could not SSH, due to loss of network connectivity ), I noticed the OS is showing the NIC's the following way: [root@hpc3-21-12 log]# ip a ... 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 78:ac:44:37:56:20 brd ff:ff:ff:ff:ff:ff inet 10.240.59.6/23 brd 10.240.59.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::7aac:44ff:fe37:5620/64 scope link valid_lft forever preferred_lft forever 3: em3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 78:ac:44:37:56:40 brd ff:ff:ff:ff:ff:ff 4: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 78:ac:44:37:56:22 brd ff:ff:ff:ff:ff:ff 5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 78:ac:44:37:56:41 brd ff:ff:ff:ff:ff:ff ... in /etc/sysconfig/network-scripts/ directory, I see the following: [root@hpc3-21-12 log]# ls -l /etc/sysconfig/network-scripts/ifcfg* -rw-r--r-- 1 root root 102 Jul 7 22:41 /etc/sysconfig/network-scripts/ifcfg-eno1 -rw-r--r--. 1 root root 277 Jul 7 21:47 /etc/sysconfig/network-scripts/ifcfg-eno2 -rw-r--r--. 1 root root 277 Jul 7 21:47 /etc/sysconfig/network-scripts/ifcfg-eno3 -rw-r--r--. 1 root root 277 Jul 7 21:47 /etc/sysconfig/network-scripts/ifcfg-eno4 -rwxr-xr-x. 1 root root 105 Jul 7 21:47 /etc/sysconfig/network-scripts/ifcfg-ib0 -rw-r--r--. 1 root root 254 Aug 19 2019 /etc/sysconfig/network-scripts/ifcfg-lo So, ifcfg-eno1 gets populated, but the OS is looking for ifcfg-em1 in /etc/sysconfig/network-scripts directory. When it does not find it, no network. in /var/log/xcat/xcat.log, I see this: [I]: network service is active ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [I]: configure the install nic eno1. [I]: NMCLI_USED=0 configeth -s eno1 [I]: configeth on hpc3-21-12: os type: redhat ls: cannot access /var/lib/dhclient/*eno1*: No such file or directory ['/etc/sysconfig/network-scripts/ifcfg-eno1'] [I]: >> DEVICE=eno1 [I]: >> IPADDR=10.240.59.6 [I]: >> NETMASK=255.255.254.0 [I]: >> BOOTPROTO=none [I]: >> ONBOOT=yes [I]: >> NAME=xcat-eno1 [I]: >> HWADDR=78:ac:44:37:56:20 [I]: >> MTU=1500 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [I]: back up /etc/sysconfig/network-scripts to /etc/sysconfig/network-scripts.xcatbak [E]:Error: nicips,nictypes and nicnetworks should be configured in nics table for ib0. [E]:Error: nicips,nictypes and nicnetworks should be configured in nics table for ipmi. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ configure nic and its device : [E]:Error: Check the NIC data in the 'nics' table. So, as you might have noticed, NICs are named as 'enoX'(eno0) and the OS(udev) is labeling them as 'emX'(em1). I am not sure why this is happening. My DHCP range come from 10.240.58.X and static IP address is from 10.240.59.X , which is part of 10.240.58.0/23 network. I am not sure if that is cause of this or not. I am at a loss why this is happening though. Could someone please help shed some light on this for me? Thanks a lot!
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user