maybe try some verification command: xcatprobe xcatmn -i <provision interface> xcatprobe detect_dhcpd -i <provision interface> -m <CN's mac address> <<<<------ this is important, make sure no other server service this mac address nslookup <name of CN>
you also can turn on debug mode, you may able to ssh to CN in the anaconda mode to do more debug chdef -t site clustersite xcatdebugmode=2 Thanks, Casandra Qiu ................................................................... Casandra Hong Qiu Phone: (845) 433-9291, t/l 293-9291 Office: Building 8, 3-B-04 cxh...@us.ibm.com From: "Chiu, Peter (STFC,RAL,RALSP)" <peter.c...@stfc.ac.uk> To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net> Date: 08/05/2020 05:12 AM Subject: [EXTERNAL] [xcat-user] xCAT 2.16 Centos 7 clients PXE Boot Aborted after 3.10.0-1127.18.2 upgrade Hello all, Having enjoyed a smooth running of xCAT 2.16 on a small cluster of 4 compute nodes for about 18 months, I have hit a problem on the last Centos 7 system update from 3.10.0-1127.13.1 to 3.10.0-1127.18.2 that resulted in all the clients not booting up with this error: CLIENT MAC ADDR: 00 25 90 5A EB BA GUID: 00000000 0025905AEBBA CLIENT IP: 130.246.32.141 MASK: 255.25.252.0 DNCP IP: 130.246.32.140 GATEWAY IP: 130.246.32.254 PXE Boot aborted. Booting to next device... PXE-M0F: Exiting Intel Boot Agent No such problem with the previous Centos 7 updates. Last successful update to 3.10.0-1127.13.1 on 24 June went through okay. I have attempted a number of checks and recoveries but no joy: a. confirm master can accept dhcp requests with DHCPNAK records in messages. b. confirm master can accept tftp downloads from a different system. c. confirm master can accept http downloads on kernel and ramdisk files. d. power-cycled master e. power-cycled clients f. manually lsdef -t osimage, chdef -t osimage, genimage, packimage, nodeset Unfortunately the above fault persists on all four compute nodes that are still down. I think I have run out of ideas. Before giving up and making a fresh XCAT installation, I wonder if anyone can shed some clues to trouble shoot PXE aborted errors. Many thanks. Peter Chiu STFC RAL Space, UK ============================================================================== Here are some details on the systems: Master node: main.bnsc.rl.ac.uk 130.246.32.140/22 gateway 130.246.32.254 Compute node1: proc01.bnsc.rl.ac.uk 130.246.32.141/22 00:25:90:5a:eb:8a Operating system: CentOS Linux release 7.8.2003 (Core) xCAT: # rpm -qf /opt/xcat/sbin/xcatd xCAT-server-2.16-snap202006161607.noarch Checks: a. DHCP records in master /var/log/messages, no error. The master server has picked up the dhcp requests, and offered the address. But no further communication afterwards. Aug 4 15:00:27 main dhcpd: DHCPDISCOVER from 00:25:90:5a:eb:8a via bond0 Aug 4 15:00:27 main dhcpd: DHCPOFFER on 130.246.32.141 to 00:25:90:5a:eb:8a via bond0 Aug 4 15:00:29 main dhcpd: Dynamic and static leases present for 130.246.32.141. Aug 4 15:00:29 main dhcpd: Remove host declaration proc01 or remove 130.246.32.141 Aug 4 15:00:29 main dhcpd: from the dynamic address pool for bond0 Aug 4 15:00:29 main dhcpd: DHCPREQUEST for 130.246.32.141 (130.246.32.140) from 00:25:90:5a:eb:8a via bond0 Aug 4 15:00:29 main dhcpd: DHCPACK on 130.246.32.141 to 00:25:90:5a:eb:8a via bond0 b. /var/log/xcat/cluster.log No errors, just a record of a new image produced. Aug 4 14:24:54 main xcat[28101]: INFO xCAT: Allowing lsdef -t site -o clustersite -i installdir for root from localhost Aug 4 14:24:54 main xcat[28103]: INFO xCAT: Allowing genimage -i eth0 -n dca,ixgbe,igb,e1000e,e1000,tg3 -o centos7.6 -p compute --tempfile /tmp/xcat_genimage.28086 for root from localhost Aug 4 14:27:29 main xcat[25483]: INFO xCAT: Allowing packimage centos7.6-x86_64-netboot-compute for root from localhost Aug 4 14:27:30 main xcat[25499]: INFO xCAT: Allowing ilitefile centos7.6-x86_64-statelite-compute for root from localhost Aug 4 14:30:07 main xcat[26073]: INFO xCAT: Allowing nodeset to compute osimage=centos7.6-x86_64-netboot-compute for root from localhost Aug 4 14:34:33 main xcat[26958]: INFO xCAT: Allowing rpower to compute reset for root from localhost Aug 4 14:34:33 main xcat[26959]: INFO xcat.updatestatus - proc03: changing status=powering-on Aug 4 14:34:33 main xcat[26959]: INFO xcat.updatestatus - proc04: changing status=powering-on Aug 4 14:34:33 main xcat[26959]: INFO xcat.updatestatus - proc01: changing status=powering-on Aug 4 14:34:33 main xcat[26959]: INFO xcat.updatestatus - proc02: changing status=powering-on c. Check dhcp lease file for the files to be downloaded: less /var/lib/dhcpd/dhcpd.leases host proc01.bnsc.rl.ac.uk { deleted; } host proc04.bnsc.rl.ac.uk { deleted; } host proc01 { dynamic; hardware ethernet 00:25:90:5a:eb:8a; uid 00:25:90:5a:eb:8a; fixed-address 130.246.32.141; supersede server.ddns-hostname = "proc01"; supersede host-name = "proc01"; if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { supersede server.always-broadcast = 01; supersede server.filename = "http://${next-server}:80/tftpboot/xcat/xnba/nodes/proc01"; } elsif option user-class-identifier = "xNBA" and option client-architecture = 00:09 { supersede server.filename = "http://${next-server}:80/tftpboot/xcat/xnba/nodes/proc01.uefi"; } elsif option client-architecture = 00:07 { supersede server.filename = "xcat/xnba.efi"; } elsif option client-architecture = 00:00 { supersede server.filename = "xcat/xnba.kpxe"; } else { supersede server.filename = ""; } } Follow through this list to download the files on a separate Centos server. d. tftp 130.236.32.140 [root@cds1 xcat]# tftp 130.246.32.140 tftp> get xcat/xnba.kpxe tftp> get xcat/xnba.efi tftp> get yaboot tftp> get xcat/xnba/nets/130.246.32.0_22 tftp> get xcat/xnba/nets/130.246.32.0_22.uefi tftp> quit [root@cds1 xcat]# ls 130.246.32.0_22 130.246.32.0_22.uefi elilo.efi xnba.efi xnba.kpxe yaboot [root@cds1 xcat]# ls -ls total 536 4 -rw-r--r-- 1 root root 252 Aug 4 09:46 130.246.32.0_22 4 -rw-r--r-- 1 root root 116 Aug 4 09:46 130.246.32.0_22.uefi 0 -rw-r--r-- 1 root root 0 Aug 4 09:45 elilo.efi 140 -rw-r--r-- 1 root root 139169 Aug 4 09:45 xnba.efi 80 -rw-r--r-- 1 root root 74786 Aug 4 09:45 xnba.kpxe 308 -rw-r--r-- 1 root root 310187 Aug 4 09:46 yaboot e. use wget to download the node start up file wget http://130.246.32.140:80/tftpboot/xcat/xnba/nodes/proc01 root@cds1 xcat]# wget http://130.246.32.140:80/tftpboot/xcat/xnba/nodes/proc01 --2020-08-04 11:57:18-- http://130.246.32.140/tftpboot/xcat/xnba/nodes/proc01 Connecting to 130.246.32.140:80... connected. HTTP request sent, awaiting response... 200 OK Length: 528 Saving to: `proc01' 100%[======================================>] 528 --.-K/s in 0s 2020-08-04 11:57:18 (85.2 MB/s) - `proc01' saved [528/528] f. This file in turn contains the instructions to download the kernel and ramdisk [root@cds1 xcat]# less proc01 #!gpxe #netboot centos7.6-x86_64-compute imgfetch -n kernel http://$ {next-server}:80/tftpboot/xcat/osimage/centos7.6-x86_64-netboot-compute/kernel imgload kernel imgargs kernel imgurl= http://130.246.32.140:80//install/netboot/centos7.6/x86_64/compute/rootimg.cpio.gz XCAT=130.246.32.140:3001 NODE=proc01 FC=yes XCATHTTPPORT=80 netdev=eth0 selinux=0 biosdevname=0 net.ifnames=0 BOOTIF=01-${netX/machyp} imgfetch http://$ {next-server}:80/tftpboot/xcat/osimage/centos7.6-x86_64-netboot-compute/initrd-stateless.gz imgexec kernel Both the kernel and ramdisk can also be downloaded using wget command. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_xcat-2Duser&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=n1LR_Py9TQX0dVqfGTbLHUMGx25-C8VtBDS0nCzyNXY&m=Y7E0PfbtbQvE7NtfODs8nu_NA-1gFekl5t13GDSoAuM&s=c5RvI-u_EyOdIlbd6q0bWcM_RfWY0G5flKFG9-VDhS8&e=
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user