Evidently though something in his xCAT setup it creating the files in /tftpboot/pxelinux.cfg/ with reference to xnba just like my installation. Where does xCAT grab the configuration for that? Maybe it was because I didn't do a completely clean install and did an in-place upgrade, but my cluster actually works perfectly with both xnba & genesis installed because it uses xnba first to bootstrap and then requests the Genesis image. xCAT must support that scenario else I haven't the slightest idea by what miracle my installation is running with such a configuration. :-)
-Josh On Tue, Jan 21, 2014 at 2:58 PM, Russell Jones <russell-l...@jonesmail.me> wrote: > xNBA is a customized gpxe image that xCAT uses. > > NBFS is the older maintenance image that was used for if you set your > node to boot to shell, or booted a runimage script. NBFS is deprecated, > and Genesis replaced NBFS as the maintenance image for these tasks. > > In a standard 2.8 install, there should no longer be any nbk/nbfs RPMs > installed - Genesis replaced them. > > perl-xCAT-2.8.3-snap201311122316.noarch > xCAT-2.8.3-snap201311122318.x86_64 > xCAT-client-2.8.3-snap201311122316.noarch > xCAT-genesis-base-x86_64-2.8-snap201308090229.noarch > elilo-xcat-3.14-4.noarch > xCAT-server-2.8.3-snap201311122316.noarch > xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318.noarch > ipmitool-xcat-1.8.11-3.x86_64 > conserver-xcat-8.1.16-10.x86_64 > xCAT-buildkit-2.8.3-snap201311122318.noarch > syslinux-xcat-3.86-2.noarch > > > > On 1/21/2014 2:38 PM, Josh Nielsen wrote: >> Hi Jonathan, >> >> Yes, I definitely think that would cause a problem. This is jogging my >> memory because I think that when the new Genesis boot loader was >> rolled out in the first version of xCAT that supported it that I faced >> a similar problem. I had assumed that only Genesis was needed but xNBA >> is still used an an intermediate image even if it is no longer the >> final image. I will check my yum repos as soon as I can - but by some >> unfortunate coincidence I just discovered that YUM is not working >> since our RHEL license expired three days ago (unbeknownst to me until >> 10 minutes ago). Do you have xCAT-genesis-x86_64 and elilo-xCAT? You >> may even have to pull xNBA images from an older install(?) and then >> run mknb to build the images. >> >> I remember downloading the tarred files with the RPM manually and >> creating a local repo for xCAT. Whenever I get YUM back I'll give you >> more specifics if I can. >> >> -Josh >> >> On Tue, Jan 21, 2014 at 1:54 PM, Jonathan Mills <jonmi...@renci.org> wrote: >>> Josh, >>> >>> I don't doubt that you're on to something. But if this is the case, it >>> means my systems are missing some files, namely: >>> >>> /tftpboot/xcat/nbk.x86_64 >>> /tftpboot/xcat/nbfs.x86_64.gz >>> >>> Can you tell me what RPM installed those files on your system? They >>> don't exist on mine, and even a 'yum provides' doesn't find them. >>> >>> >>> On 01/21/2014 11:51 AM, Josh Nielsen wrote: >>>> Hi Jonathan, >>>> >>>> It is my understanding, from extensive debugging and notes that I have >>>> taken about the xCAT netbooting process in the past, that xCAT uses a >>>> two-stage image deployment method. It will first come up with a more >>>> "generic" boot image (normally xnba or sometimes yaboot) which - when it >>>> contacts the xCAT headnode (or the node handling DHCP requests) - the >>>> headnode will then recognize the current image on the client that is >>>> sending requests to DHCP for further boot instructions, and will tell >>>> the client to then load another image based on the subnet and image type >>>> it is currently using. For example my headnode's /etc/dhcpd.conf file >>>> has an entry that looks like this: >>>> >>>> hared-network eth0 { >>>> subnet 10.20.0.0 netmask 255.255.0.0 { >>>> max-lease-time 43200; >>>> min-lease-time 43200; >>>> default-lease-time 43200; >>>> next-server 10.20.0.1; >>>> option log-servers 10.20.0.1; >>>> option ntp-servers 10.20.0.1; >>>> option domain-name "xxxxxxxxx"; >>>> option domain-name-servers 10.20.0.1; >>>> if option user-class-identifier = "xNBA" and option >>>> client-architecture = 00:00 { #x86, xCAT Network Boot Agent >>>> always-broadcast on; >>>> filename = >>>> "http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16"; >>>> } else if option user-class-identifier = "xNBA" and option >>>> client-architecture = 00:09 { #x86, xCAT Network Boot Agent >>>> filename = >>>> "http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16.uefi"; >>>> } else if option client-architecture = 00:00 { #x86 >>>> filename "xcat/xnba.kpxe"; >>>> } else if option vendor-class-identifier = "Etherboot-5.4" { #x86 >>>> filename "xcat/xnba.kpxe"; >>>> } else if option client-architecture = 00:07 { #x86_64 uefi >>>> filename "xcat/xnba.efi"; >>>> } else if option client-architecture = 00:09 { #x86_64 uefi >>>> alternative id >>>> filename "xcat/xnba.efi"; >>>> } else if option client-architecture = 00:02 { #ia64 >>>> filename "elilo.efi"; >>>> } else if substring(filename,0,1) = null { #otherwise, provide >>>> yaboot if the client isn't specific >>>> filename "/yaboot"; >>>> } >>>> range dynamic-bootp 10.20.200.254 10.20.254.254; >>>> } # 10.20.0.0/255.255.0.0 <http://10.20.0.0/255.255.0.0> subnet_end >>>> >>>> So if it boots with the xNBA image it then directs it to the >>>> http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16 which has the >>>> genesis boot instructions in it: >>>> >>>> #!gpxe >>>> imgfetch -n kernel >>>> http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64 quiet >>>> xcatd=10.20.0.1:3001 <http://10.20.0.1:3001> BOOTIF=01-${netX/machyp} >>>> imgfetch -n nbfs http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.gz >>>> imgload kernel >>>> imgexec kernel >>>> >>>> So first it boots with xnba (first stage of boot), it contacts the DHCP >>>> server which gives it a "next-server" option of itself (saying to the >>>> client: request the next image from me - the headnode - again), and then >>>> gives it a boot file with instructions for the next image, then it >>>> executes it and finally loads genesis. You will also notice that the >>>> very last options (if it matches nothing else) is yaboot, which is >>>> another generic image, which will in turn probably request the next >>>> image. Try watching your log for the tftp daemon messages to see what is >>>> being sent. >>>> >>>> I wonder if you are having problems at the first stage DHCP redirecting >>>> stage though. Check your options statements in /etc/dhcpd.conf to see >>>> where it is directing xNBA images. >>>> >>>> Regards, >>>> Josh Nielsen >>>> >>>> >>>> On Tue, Jan 21, 2014 at 10:26 AM, Jonathan Mills <jonmi...@renci.org >>>> <mailto:jonmi...@renci.org>> wrote: >>>> >>>> Wang, >>>> >>>> Thank you for your response. I did some digging and here is what I >>>> found. >>>> >>>> cat /tftpboot/xcat/xnba/nets/10.100.0.0_24 >>>> #!gpxe >>>> imgfetch -n kernel >>>> http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64 quiet >>>> xcatd=10.100.0.1:3001 <http://10.100.0.1:3001> >>>> BOOTIF=01-${netX/machyp} >>>> imgfetch -n nbfs >>>> http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.lzma >>>> imgload kernel >>>> imgexec kernel >>>> >>>> >>>> >>>> cat /tftpboot/pxelinux.cfg/0A6400 >>>> DEFAULT xCAT >>>> LABEL xCAT >>>> KERNEL xcat/nbk.x86_64 >>>> APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.100.0.1:3001 >>>> <http://10.100.0.1:3001> >>>> >>>> >>>> >>>> So, clearly, those things don't match up. That strikes me as an xCAT >>>> issue, but nevermind. I manually modified >>>> /tftpboot/pxelinux.cfg/0A6400 >>>> to make it look like: >>>> >>>> DEFAULT xCAT >>>> LABEL xCAT >>>> KERNEL xcat/genesis.kernel.x86_64 >>>> APPEND initrd=xcat/genesis.fs.x86_64.lzma quiet >>>> xcatd=10.100.0.1:3001 <http://10.100.0.1:3001> >>>> BOOTIF=eth0 >>>> >>>> >>>> (It is safe, in this case, to designate BOOTIF as 'eth0' -- with Cisco >>>> UCS hardware, and using vNICs, the first interface will always show up >>>> in Linux as eth0 -- at least, that is my experience). >>>> >>>> After this change, I was indeed able to PXE boot the first node, and I >>>> was hopeful that node discovery would then take place. However, this >>>> still did not occur. On console, I dug into the running genesis image >>>> on the first node, and I found that it had no ethernet interfaces >>>> whatsoever, because the genesis kernel has no driver support for Cisco >>>> UCS hardware. >>>> >>>> For example, this is the ethtool output of a Cisco UCS vNIC: >>>> >>>> [root@ncsu-hn nets]# ethtool -i eth0 >>>> driver: enic >>>> version: 2.1.1.39 >>>> firmware-version: 2.0(4b) >>>> bus-info: 0000:06:00.0 >>>> supports-statistics: yes >>>> supports-test: no >>>> supports-eeprom-access: no >>>> supports-register-dump: no >>>> supports-priv-flags: no >>>> >>>> >>>> You can see it requires the 'enic' kernel module, usually located at: >>>> /lib/modules/`uname -r`/kernel/drivers/net/enic/enic.ko >>>> >>>> This module isn't found within the genesis image, so the node PXE >>>> boots, >>>> and then can do no more. Node discovery fails. >>>> >>>> On 01/20/2014 09:19 PM, Xiao Peng Wang wrote: >>>> > xCAT is using genesis (an xCAT customized pxe tool) to function the >>>> > discovery process. The configuration for genesis is put in >>>> > /tftpboot/xcat/xnba/nets/ for a specific network. Could you check >>>> your >>>> > specific xnba configuration file for your deployment network has >>>> been >>>> > put in /tftpboot/xcat/xnba/nets/? >>>> > >>>> > The prerequisite for booting of genesis is to make the node has a >>>> > dynamic IP address. Did you configure the dynamic IP range for your >>>> > deployment network? Could you take a look of your syslog to see >>>> whether >>>> > the node has sent out dhcp request and what did your dhcp server >>>> replied >>>> > to them? >>>> > >>>> > Thanks >>>> > Best Regards >>>> > >>>> ---------------------------------------------------------------------- >>>> > Wang Xiaopeng (王晓朋) >>>> > IBM China System Technology Laboratory >>>> > Tel: 86-10-82453455 >>>> > Email: w...@cn.ibm.com <mailto:w...@cn.ibm.com> >>>> > Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West >>>> Road, >>>> > Haidian District Beijing P.R.China 100193 >>>> > >>>> > Inactive hide details for Jonathan Mills ---2014/01/19 >>>> 06:24:02---I'm >>>> > running xCAT 2.8.3 and CentOS 6.4 atop of Cisco UCS-C harJonathan >>>> Mills >>>> > ---2014/01/19 06:24:02---I'm running xCAT 2.8.3 and CentOS 6.4 >>>> atop of >>>> > Cisco UCS-C hardware. I'm attempting to do a sequent >>>> > >>>> > From: Jonathan Mills <jonmi...@renci.org >>>> <mailto:jonmi...@renci.org>> >>>> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net >>>> <mailto:xcat-user@lists.sourceforge.net>>, >>>> > Date: 2014/01/19 06:24 >>>> > Subject: [xcat-user] Frustrating time with sequential node >>>> discovery >>>> > >>>> > >>>> >>>> ------------------------------------------------------------------------ >>>> > >>>> > >>>> > >>>> > I'm running xCAT 2.8.3 and CentOS 6.4 atop of Cisco UCS-C >>>> hardware. I'm >>>> > attempting to do a sequential nodediscovery. I've pre-populated >>>> the >>>> > nodelist table with the nodenames, so I shouldn't need to do >>>> anything >>>> > more than >>>> > >>>> > nodediscoverystart noderange=node[1-15] >>>> > >>>> > However, none of the nodes ever gets discovered. >>>> > >>>> > Digging deeper, it seems that none of them ever successfully PXE >>>> boot at >>>> > all. They should be PXE booting off of the genesis netboot image >>>> and >>>> > speaking back to the xcatmaster, correct? >>>> > >>>> > When I run 'mknb x86_64', it populates /tftpboot/pxelinux.cfg with >>>> > entries to non-existent netboot images. Watch: >>>> > >>>> > [root@ncsu-hn ~]# rpm -qf /opt/xcat/sbin/mknb >>>> > xCAT-client-2.8.3-snap201311122316.noarch >>>> > [root@ncsu-hn ~]# mknb x86_64 >>>> > Creating genesis.fs.x86_64.lzma in /tftpboot/xcat >>>> > [root@ncsu-hn ~]# cd /tftpboot/pxelinux.cfg/ >>>> > [root@ncsu-hn pxelinux.cfg]# ls >>>> > 0A6400 0A6500 0A6600 7F 98300D 98300DE6 98300DE7 C0A86B >>>> > [root@ncsu-hn pxelinux.cfg]# cat * >>>> > DEFAULT xCAT >>>> > LABEL xCAT >>>> > KERNEL xcat/nbk.x86_64 >>>> > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.100.0.1:3001 >>>> <http://10.100.0.1:3001> >>>> > DEFAULT xCAT >>>> > LABEL xCAT >>>> > KERNEL xcat/nbk.x86_64 >>>> > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.101.0.1:3001 >>>> <http://10.101.0.1:3001> >>>> > DEFAULT xCAT >>>> > LABEL xCAT >>>> > KERNEL xcat/nbk.x86_64 >>>> > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.102.0.1:3001 >>>> <http://10.102.0.1:3001> >>>> > DEFAULT xCAT >>>> > LABEL xCAT >>>> > KERNEL xcat/nbk.x86_64 >>>> > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=127.0.0.1:3001 >>>> <http://127.0.0.1:3001> >>>> > DEFAULT xCAT >>>> > LABEL xCAT >>>> > KERNEL xcat/nbk.x86_64 >>>> > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=152.48.13.3:3001 >>>> <http://152.48.13.3:3001> >>>> > DEFAULT xCAT >>>> > LABEL xCAT >>>> > KERNEL xcat/nbk.x86_64 >>>> > APPEND initrd=xcat/nbfs.x86_64.gz quiet >>>> xcatd=152.48.13.230:3001 <http://152.48.13.230:3001> >>>> > DEFAULT xCAT >>>> > LABEL xCAT >>>> > KERNEL xcat/nbk.x86_64 >>>> > APPEND initrd=xcat/nbfs.x86_64.gz quiet >>>> xcatd=152.48.13.231:3001 <http://152.48.13.231:3001> >>>> > DEFAULT xCAT >>>> > LABEL xCAT >>>> > KERNEL xcat/nbk.x86_64 >>>> > APPEND initrd=xcat/nbfs.x86_64.gz quiet >>>> xcatd=192.168.107.10:3001 <http://192.168.107.10:3001> >>>> > [root@ncsu-hn pxelinux.cfg]# cd ../xcat/ >>>> > [root@ncsu-hn xcat]# ls -la >>>> > total 21528 >>>> > drwxr-xr-x 4 root root 4096 Jan 17 13:06 . >>>> > drwxr-xr-x. 7 root root 4096 Jan 18 22:02 .. >>>> > -rwxr-xr-x 1 root root 242929 Jan 15 2012 elilo-x64.efi >>>> > -rw-r--r-- 1 root root 17573621 Jan 18 22:03 >>>> genesis.fs.x86_64.lzma >>>> > -rwxr-xr-x 1 root root 3986608 Aug 9 06:29 genesis.kernel.x86_64 >>>> > drwxr-xr-x 3 root root 4096 Jan 17 13:06 osimage >>>> > drwxr-xr-x 3 root root 4096 Dec 23 07:42 xnba >>>> > -rw-r--r-- 1 root root 139200 Oct 28 16:16 xnba.efi >>>> > -rw-r--r-- 1 root root 74792 Oct 28 16:16 xnba.kpxe >>>> > >>>> > >>>> > >>>> > As you can see....it ought to be netbooting the genesis kernel, but >>>> > instead all my pxelinux.cfg/* files are instructing clients to >>>> boot the >>>> > non-existent "nbk.x86_64" image. >>>> > >>>> > Your advice is appreciated. >>>> > >>>> > -- >>>> > Jonathan Mills >>>> > Systems Administrator >>>> > Renaissance Computing Institute >>>> > UNC-Chapel Hill >>>> > >>>> > >>>> >>>> ------------------------------------------------------------------------------ >>>> > CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>>> > Learn Why More Businesses Are Choosing CenturyLink Cloud For >>>> > Critical Workloads, Development Environments & Everything In >>>> Between. >>>> > Get a Quote or Start a Free Trial Today. >>>> > >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>>> > _______________________________________________ >>>> > xCAT-user mailing list >>>> > xCAT-user@lists.sourceforge.net >>>> <mailto:xCAT-user@lists.sourceforge.net> >>>> > https://lists.sourceforge.net/lists/listinfo/xcat-user >>>> > >>>> > >>>> >>>> -- >>>> Jonathan Mills >>>> Systems Administrator >>>> Renaissance Computing Institute >>>> UNC-Chapel Hill >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>>> Critical Workloads, Development Environments & Everything In Between. >>>> Get a Quote or Start a Free Trial Today. >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> xCAT-user mailing list >>>> xCAT-user@lists.sourceforge.net >>>> <mailto:xCAT-user@lists.sourceforge.net> >>>> https://lists.sourceforge.net/lists/listinfo/xcat-user >>>> >>>> >>> -- >>> Jonathan Mills >>> Systems Administrator >>> Renaissance Computing Institute >>> UNC-Chapel Hill >>> >>> ------------------------------------------------------------------------------ >>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>> Critical Workloads, Development Environments & Everything In Between. >>> Get a Quote or Start a Free Trial Today. >>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> xCAT-user mailing list >>> xCAT-user@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/xcat-user >> ------------------------------------------------------------------------------ >> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> Learn Why More Businesses Are Choosing CenturyLink Cloud For >> Critical Workloads, Development Environments & Everything In Between. >> Get a Quote or Start a Free Trial Today. >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> _______________________________________________ >> xCAT-user mailing list >> xCAT-user@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/xcat-user > > > ------------------------------------------------------------------------------ > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user