Re: [xcat-user] Frustrating time with sequential node discovery
xCAT-nbroot-core* was replaced by xCAT-genesis-* in xCAT 2.8. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 From: Xiao Peng Wang w...@cn.ibm.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Cc: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 01/22/2014 02:58 AM Subject:Re: [xcat-user] Frustrating time with sequential node discovery Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? With the latest xCAT build, it needs /tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.* Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Inactive hide details for Jonathan Mills ---2014/01/22 14:10:32---It would seem to me that what I am missing is the whole of thJonathan Mills ---2014/01/22 14:10:32---It would seem to me that what I am missing is the whole of the xCAT-nbroot infrastructure...because From: Jonathan Mills jonmi...@renci.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/22 14:10 Subject: Re: [xcat-user] Frustrating time with sequential node discovery It would seem to me that what I am missing is the whole of the xCAT-nbroot infrastructure...because it isn't part of xcat-core, nor xcat-dep. So I didn't grab it. But it just so happens...you need it. The file /tftpboot/xcat/nbk.x86_64 is provided by the RPM xCAT-nbkernel-x86_64. Which is missing from my yum repo mirrior, and from my hosts. Anything else I'm missing? Hopefully if I grab correct copies of xCAT-nbkernel and xCAT-nbroot (or xCAT-nbroot2?) then node discovery will actually work. On 1/22/14, 12:08 AM, Xiao Peng Wang wrote: Both Josh and Russell are correct. xNBA is a customized pxe and genesis is a xCAT customized diskless linux system to run discovery and other tasks like 'bmcsetup'. It does not need the /tftpboot/pxelinux.cfg/.* to load the genesis. For discovery, if a node is not defined in xCAT, the dhcp configuration in the /etc/dhcp/dhcpd.conf or /etc/dhcpd.conf is used to reply the dhcp request from not-discovered node. In your dhcpd.conf, it should have the following part for your deployment network. If not, run 'makedhcp -n' to recreate your dhcpd.conf. if option user-class-identifier = xNBA and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16 ; } else if option user-class-identifier = xNBA and option client-architecture = 00:09 { #x86, xCAT Network Boot Agent filename = http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16.uefi;; } else if option client-architecture = 00:00 { #x86 filename xcat/xnba.kpxe; } else if option vendor-class-identifier = Etherboot-5.4 { #x86 filename xcat/xnba.kpxe; } else if option client-architecture = 00:07 { #x86_64 uefi filename xcat/xnba.efi; } else if option client-architecture = 00:09 { #x86_64 uefi alternative id filename xcat/xnba.efi; } During the boot process of a not-discovered node, dhcpd will tell the node to load xcat/xnba.kpxe first and then the configuration file http://xcat mn/tftpboot/xcat/xnba/nets/10.1.0.0_16. Then the xnba will load the genesis. Take a look of the syslog to see whether the xnba was downloaded successfully from tftp server. And look into the httpd log to see whether the genesis has been downloaded successfully. Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Inactive hide details for Josh Nielsen ---2014/01/22 05:56:00---Ah, I see what you are saying now. Well, I hope the thread I stJosh Nielsen ---2014/01/22 05:56:00---Ah, I see what you are saying now. Well, I hope the thread I stumbled on that Jarrod replied to help From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/22 05:56 Subject: Re: [xcat-user] Frustrating time with sequential node discovery Ah, I see what you are saying now. Well, I hope the thread I stumbled on that Jarrod replied to helps figure out why his configuration is looking to the outdated (according to what Jarrod said) configuration files in /tftpboot/pxelinux.cfg/. Looks like it is either /etc/dhcpd.conf or /var/lib/dhcpd/dhcpd.leases related in that case. On Tue, Jan 21, 2014 at 3
Re: [xcat-user] Frustrating time with sequential node discovery
I've been lurking on this discussion, and just checked to see what we've got -- nbroot or genesis -- and we have both of them. I had given up on node discovery years ago, we originally used the switch port numbers and forwarding tables to assign node names. Now I use ASU to collect the macs, and populate the mac table with a bit of grep and awk. But last week we powered on a rack of 20 non-ibm nodes, and I was wishing we had something easier since ASU didn't work for them. So I have two questions -- 1) Can I safely delete the xCAT-nbroot-core* RPMS ? 2) What is the current best practice method? What about for non-ibm hardware? No, three questions 3) How do you get IBM manufacturing to use a specific different 172.29.X for each rack they build for you? We've had three racks arrive in different months all with 172.29.101 addresses for the IMM, and I have to spend 5-10 minutes reprogramming each one. I can't put them on the same network until the conflicts are gone. What's hard for me is that we get new nodes only a couple times a year, and I forget everything in between. Thanks, -- ddj On Jan 22, 2014, at 7:12 AM, Lissa Valletta lis...@us.ibm.com wrote: xCAT-nbroot-core* was replaced by xCAT-genesis-* in xCAT 2.8. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 graycol.gifXiao Peng Wang ---01/22/2014 02:58:13 AM---Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? From: Xiao Peng Wang w...@cn.ibm.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Cc: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 01/22/2014 02:58 AM Subject: Re: [xcat-user] Frustrating time with sequential node discovery Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? With the latest xCAT build, it needs /tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.* Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 graycol.gifJonathan Mills ---2014/01/22 14:10:32---It would seem to me that what I am missing is the whole of the xCAT-nbroot infrastructure...because From: Jonathan Mills jonmi...@renci.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/22 14:10 Subject: Re: [xcat-user] Frustrating time with sequential node discovery It would seem to me that what I am missing is the whole of the xCAT-nbroot infrastructure...because it isn't part of xcat-core, nor xcat-dep. So I didn't grab it. But it just so happens...you need it. The file /tftpboot/xcat/nbk.x86_64 is provided by the RPM xCAT-nbkernel-x86_64. Which is missing from my yum repo mirrior, and from my hosts. Anything else I'm missing? Hopefully if I grab correct copies of xCAT-nbkernel and xCAT-nbroot (or xCAT-nbroot2?) then node discovery will actually work. On 1/22/14, 12:08 AM, Xiao Peng Wang wrote: Both Josh and Russell are correct. xNBA is a customized pxe and genesis is a xCAT customized diskless linux system to run discovery and other tasks like 'bmcsetup'. It does not need the /tftpboot/pxelinux.cfg/.* to load the genesis. For discovery, if a node is not defined in xCAT, the dhcp configuration in the /etc/dhcp/dhcpd.conf or /etc/dhcpd.conf is used to reply the dhcp request from not-discovered node. In your dhcpd.conf, it should have the following part for your deployment network. If not, run 'makedhcp -n' to recreate your dhcpd.conf. if option user-class-identifier = xNBA and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16;; } else if option user-class-identifier = xNBA and option client-architecture = 00:09 { #x86, xCAT Network Boot Agent filename = http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16.uefi;; } else if option client-architecture = 00:00 { #x86 filename xcat/xnba.kpxe; } else if option vendor-class-identifier = Etherboot-5.4 { #x86 filename xcat/xnba.kpxe; } else if option client-architecture = 00:07 { #x86_64 uefi filename xcat/xnba.efi; } else if option client-architecture = 00:09 { #x86_64 uefi alternative id filename xcat/xnba.efi; } During the boot process of a not-discovered node, dhcpd will tell the node to load xcat/xnba.kpxe first and then the configuration file http://xcat mn/tftpboot/xcat/xnba/nets/10.1.0.0_16. Then the xnba will load the genesis. Take a look of the syslog to see whether the xnba was downloaded successfully from tftp server
Re: [xcat-user] Frustrating time with sequential node discovery
On Jan 22, 2014, at 8:30 AM, Jonathan Mills jonmi...@renci.org wrote: Comments inline... On 1/22/14, 8:08 AM, David D Johnson wrote: I've been lurking on this discussion, and just checked to see what we've got -- nbroot or genesis -- and we have both of them. I had given up on node discovery years ago, we originally used the switch port numbers and forwarding tables to assign node names. Now I use ASU to collect the macs, and populate the mac table with a bit of grep and awk. But last week we powered on a rack of 20 non-ibm nodes, and I was wishing we had something easier since ASU didn't work for them. Precisely! That's what I've been doing for IBM and Dell gear for a long time. In fact, here's what I do: for i in `seq 1 100` do MAC=$(rinv node${i} mac | grep 'MAC Address 1' | cut -d -f 5) chtab node=node${i} mac.mac=$MAC mac.interface=eth0 done; or similar. Nice, rinv mac seems to take much less time that asu show --group PXE,and it can run in parallel on a node range. Unfortunately neither works for these (SupermicrO) nodes. So I have two questions -- 1) Can I safely delete the xCAT-nbroot-core* RPMS ? I still don't know! Because if using the chain-loading, I don't see how the first stage is installed by the xCAT-genesis-* RPMS. 2) What is the current best practice method? What about for non-ibm hardware? This is also what I'm trying to establish. I had been attempting to use sequential node discovery for Cisco UCS-B series equipment, since the switch method would be hard to use (since UCS's Fabric Interconnect is kinda like a switch but not totally). I am also familiar with ROCKS Clusters method of sequential node discovery (insert-ethers), which literally scrapes MAC addresses out of the dhcpd lines which appear in /var/log/messages -- IMHO that would have worked far better for xCAT than by the method of PXE booting (or chain-loading) the Genesis kernel -- which may or may not have kernel modules for your NIC hardware -- to send messages back to xcatmaster:3001. Our clusters from 2006-2009 were all based on Rocks, and that was the one feature I miss the most. This way would work for any hardware type. The Rocks web-GUI database was OK for the small clusters, but they moved to the rocks cli for making changes just about the time we went to xcat-2. We in fact used xcat-1 for all the hardware management, rcons rpower, etc. I miss the ability to hack on the python scripts. No, three questions 3) How do you get IBM manufacturing to use a specific different 172.29.X for each rack they build for you? We've had three racks arrive in different months all with 172.29.101 addresses for the IMM, and I have to spend 5-10 minutes reprogramming each one. I can't put them on the same network until the conflicts are gone. For a price, IBM has an integration center with technicians can make such things happen. For another price, they even offer a kind of DMZ they call the yellowzone where you can SSH into their lab and pre-configure your gear before it ships. But it isn't worth the effort unless you're going to be buying a lot of things with some frequency. We get the racks prebuilt and shipped from Hong Kong, and they do program the IMM addresses, but they never ask us which rack number to use, it's always A1. If they're going to do it at all, they should do it right. What's hard for me is that we get new nodes only a couple times a year, and I forget everything in between. Thanks, -- ddj On Jan 22, 2014, at 7:12 AM, Lissa Valletta lis...@us.ibm.com mailto:lis...@us.ibm.com wrote: xCAT-nbroot-core* was replaced by xCAT-genesis-* in xCAT 2.8. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 graycol.gifXiao Peng Wang ---01/22/2014 02:58:13 AM---Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? From: Xiao Peng Wang w...@cn.ibm.com mailto:w...@cn.ibm.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net mailto:xcat-user@lists.sourceforge.net, Cc: xCAT Users Mailing list xcat-user@lists.sourceforge.net mailto:xcat-user@lists.sourceforge.net Date: 01/22/2014 02:58 AM Subject: Re: [xcat-user] Frustrating time with sequential node discovery Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? With the latest xCAT build, it needs /tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.* Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com mailto:w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 graycol.gifJonathan Mills ---2014/01/22
Re: [xcat-user] Frustrating time with sequential node discovery
Answering your Item 3: For IBM nodes we use the following to determine the address for the IMM 172.29.1xx.y xx = rack number (rack A1, node 1 = 172.29.101.1 which follows the node ip we would assign 172.20.101.1 y=node number in the rack, with the first node being in the lower U location, if rack is two wide then it would be lower left side going up then lower right side going up Since your racks came at different timesthey were seen as different orders and we will always start at the first rack and go contiguous from there. We assign the bmc to a group, for example 84bmcperrack (see hosts tab) #node,ip,hostnames,otherinterfaces,comments,disable 84bmcperrack,|\D+(\d+).*$|172.29.(101+(($1-1)/84)).(($1-1)%84+1)| idataplex-bmc,|\D+(\d+).*$|172.29.(101+(($1-1)/84)).(($1-1)%84+1)| 40bmcperrack,|\D+(\d+).*$|172.29.(101+(($1-1)/40)).(($1-1)%40+1)| 41bmcperrack,|\D+(\d+).*$|172.29.(101+(($1-1)/41)).(($1-1)%41+1)| 42bmcperrack,|\D+(\d+).*$|172.29.(101+(($1-1)/42)).(($1-1)%42+1)| 20bmcperrack,|\D+(\d+).*$|172.29.(101+(($1-1)/20)).(($1-1)%20+1)| 21bmcperrack,|\D+(\d+).*$|172.29.(101+(($1-1)/21)).(($1-1)%21+1)| You can set up your scheme when you bring up the cluster and this will all get done during the discovery/bmcsetup. Regards, Tom Thomas Alandt WW Test Engineer Complex Solutions IBM-ISC Phone:919-543-7581 (t/l 441-7581 From: David D Johnson david_john...@brown.edu To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/22/2014 08:11 AM Subject:Re: [xcat-user] Frustrating time with sequential node discovery I've been lurking on this discussion, and just checked to see what we've got -- nbroot or genesis -- and we have both of them. I had given up on node discovery years ago, we originally used the switch port numbers and forwarding tables to assign node names. Now I use ASU to collect the macs, and populate the mac table with a bit of grep and awk. But last week we powered on a rack of 20 non-ibm nodes, and I was wishing we had something easier since ASU didn't work for them. So I have two questions -- 1) Can I safely delete the xCAT-nbroot-core* RPMS ? 2) What is the current best practice method? What about for non-ibm hardware? No, three questions 3) How do you get IBM manufacturing to use a specific different 172.29.X for each rack they build for you? We've had three racks arrive in different months all with 172.29.101 addresses for the IMM, and I have to spend 5-10 minutes reprogramming each one. I can't put them on the same network until the conflicts are gone. What's hard for me is that we get new nodes only a couple times a year, and I forget everything in between. Thanks, -- ddj On Jan 22, 2014, at 7:12 AM, Lissa Valletta lis...@us.ibm.com wrote: xCAT-nbroot-core* was replaced by xCAT-genesis-* in xCAT 2.8. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 graycol.gifXiao Peng Wang ---01/22/2014 02:58:13 AM---Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? From: Xiao Peng Wang w...@cn.ibm.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Cc: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 01/22/2014 02:58 AM Subject: Re: [xcat-user] Frustrating time with sequential node discovery Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? With the latest xCAT build, it needs /tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.* Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 graycol.gifJonathan Mills ---2014/01/22 14:10:32---It would seem to me that what I am missing is the whole of the xCAT-nbroot infrastructure...because From: Jonathan Mills jonmi...@renci.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/22 14:10 Subject: Re: [xcat-user] Frustrating time with sequential node discovery It would seem to me that what I am missing is the whole of the xCAT-nbroot infrastructure...because it isn't part of xcat-core, nor xcat-dep. So I didn't grab it. But it just so happens...you need it. The file /tftpboot/xcat/nbk.x86_64 is provided by the RPM xCAT-nbkernel-x86_64. Which is missing from my yum repo mirrior, and from my hosts. Anything else I'm missing? Hopefully if I grab correct copies of xCAT-nbkernel and xCAT-nbroot (or xCAT-nbroot2?) then node discovery will actually work. On 1/22/14, 12:08 AM, Xiao
Re: [xcat-user] Frustrating time with sequential node discovery
I can answer that point from a personal viewpoint - it's just a pain. A real bad pain, especially when you do not have homogenous switch models/vendors in the environment. By the time you've finally gotten it to work you could have just went node to node and hand-written down the MAC's and populated it yourself :-) Typically when doing deployments now where xCAT is utilized I require the vendor to provide me a list or spreadsheet of node-to-MAC mappings, and just manually populate the tables myself with a for loop. On 1/22/2014 10:03 AM, Jarrod B Johnson wrote: I am interested in issues with switch based discovery that would cause it to be given up on. Sequential or semi-automatic discovery is ok for smallish setups, but scaling it up causes a lot of ambiguity to trudge through. -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Frustrating time with sequential node discovery
xCAT-cisco works to an extent. It is fabulous for fetching MAC addresses via UCS Manager. However, for me at least, the rsetboot command fails flat out. But worst of all is that rpower commands do not shoot the node in the head like an IPMI command. Instead, UCS Manager tries to be cute and gracefully shut down the OS. To get the expected result of an rpower command via xCAT-cisco often means waiting 60 seconds, or it may never work at all occasionally. That's why I found it attractive, the idea of using a traditional xCAT setup using IPMI to control UCS nodes, and node discovery to pull in their MAC addresses. Using IPMI with UCS hardware means that commands like 'rinv' don't work, like with SuperMicro gear. Some aspects of the hardware aren't exposed through IPMI registers. On 01/22/2014 11:03 AM, Jarrod B Johnson wrote: Sorry I haven't been following the thread and will hit a few points to the list in general. For rinv macs, sadly that's not part of standards, so we can only pull it off one vendor at a time, hence why rinv mac works for some, but not others. For the questions about UCS, I assume _https://github.com/vallard/xCAT-cisco_was looked at. I'm not personally familiar with their scheme, but for other blade-oriented solutions, we have used the chassis managers as a topology cue alternative to switch. For scraping dhcpd.leases, that should be a doable script to include. There are cases that require a more thorough investigation than can be acheived in that manner is warranted, but it's better than a non-starter for cases where it doesn't work. We strive to include modern network drivers and perhaps we should be more aggressive about that. One thing I've been hoping to do is implement a proxydhcp server. That could glean much of the pertinent details for common configuration cases and provide a nonambiguous set of candidates for automatic (sequential, switch, chassis based) or semi-automatic (scriptable set of candidates to do whatever with) discovery (one challenge we've had with dhcp lease scraping is ambiguity of whether something is a node or piece of other equipment). I need to see about extending lsslp --flexdiscover to cover rackmount case for service processor based reconfiguration. The good thing about that scheme is that duplicate IPs are fine and get fixed automatically so long as the IMMs are on the same subnet as a management node. I am interested in issues with switch based discovery that would cause it to be given up on. Sequential or semi-automatic discovery is ok for smallish setups, but scaling it up causes a lot of ambiguity to trudge through. Inactive hide details for David D Johnson ---01/22/2014 08:53:19 AM---On Jan 22, 2014, at 8:30 AM, Jonathan Mills jonmills@renDavid D Johnson ---01/22/2014 08:53:19 AM---On Jan 22, 2014, at 8:30 AM, Jonathan Mills jonmi...@renci.org wrote: Comments inline... From: David D Johnson david_john...@brown.edu To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 01/22/2014 08:53 AM Subject: Re: [xcat-user] Frustrating time with sequential node discovery On Jan 22, 2014, at 8:30 AM, Jonathan Mills jonmi...@renci.org wrote: Comments inline... On 1/22/14, 8:08 AM, David D Johnson wrote: I've been lurking on this discussion, and just checked to see what we've got -- nbroot or genesis -- and we have both of them. I had given up on node discovery years ago, we originally used the switch port numbers and forwarding tables to assign node names. Now I use ASU to collect the macs, and populate the mac table with a bit of grep and awk. But last week we powered on a rack of 20 non-ibm nodes, and I was wishing we had something easier since ASU didn't work for them. Precisely! That's what I've been doing for IBM and Dell gear for a long time. In fact, here's what I do: for i in `seq 1 100` do MAC=$(rinv node${i} mac | grep 'MAC Address 1' | cut -d -f 5) chtab node=node${i} mac.mac=$MAC mac.interface=eth0 done; or similar. Nice, rinv mac seems to take much less time that asu show --group PXE,and it can run in parallel on a node range. Unfortunately neither works for these (SupermicrO) nodes. So I have two questions -- 1) Can I safely delete the xCAT-nbroot-core* RPMS ? I still don't know! Because if using the chain-loading, I don't see how the first stage is installed by the xCAT-genesis-* RPMS. 2) What is the current best practice method? What about for non-ibm hardware? This is also what I'm trying to establish. I had been attempting to use sequential node discovery for Cisco UCS-B series equipment, since the switch method would be hard to use (since UCS's Fabric Interconnect is kinda like a switch but not totally). I am also familiar with ROCKS Clusters
Re: [xcat-user] Frustrating time with sequential node discovery
I'm not clear anymore remembering what gave is the most grief back 2-3 years ago when the first add-on racks were added to the cluster. There were some xcat update growing pains, some bugs that weren't fixed soon enough for us to move on, and the original design/rollout was done in 2009 by an IBM contractor, so we needed to learn what he had done. Some of this was moving target problem -- I learned it one way, and by the time I need to do it again the ground rules have changed, and I've forgotten a lot of stuff in the mean time. We did not originally buy xcat support, but we have it now. We also did a centos5-centos6 upgrade by moving all the xcat stuff to a new mgt server, turned off the old dhcp server and had the diskless nodes reboot on the new one. I never did get around to copying the tabdump switch information. Also, because of partial rack orders, the actual node arrangements got screwy. Like node201-240 on the bottom left and right of one idataplex rack, and 241-274 in the upper half. Finally it's a matter of how long the learning curve to do it the right way vs. knowing exactly how long it takes to do it with asu/rinv and tabedit mac. On Jan 22, 2014, at 12:10 PM, Jonathan Mills jonmi...@renci.org wrote: xCAT-cisco works to an extent. It is fabulous for fetching MAC addresses via UCS Manager. However, for me at least, the rsetboot command fails flat out. But worst of all is that rpower commands do not shoot the node in the head like an IPMI command. Instead, UCS Manager tries to be cute and gracefully shut down the OS. To get the expected result of an rpower command via xCAT-cisco often means waiting 60 seconds, or it may never work at all occasionally. That's why I found it attractive, the idea of using a traditional xCAT setup using IPMI to control UCS nodes, and node discovery to pull in their MAC addresses. Using IPMI with UCS hardware means that commands like 'rinv' don't work, like with SuperMicro gear. Some aspects of the hardware aren't exposed through IPMI registers. On 01/22/2014 11:03 AM, Jarrod B Johnson wrote: Sorry I haven't been following the thread and will hit a few points to the list in general. For rinv macs, sadly that's not part of standards, so we can only pull it off one vendor at a time, hence why rinv mac works for some, but not others. For the questions about UCS, I assume _https://github.com/vallard/xCAT-cisco_was looked at. I'm not personally familiar with their scheme, but for other blade-oriented solutions, we have used the chassis managers as a topology cue alternative to switch. For scraping dhcpd.leases, that should be a doable script to include. There are cases that require a more thorough investigation than can be acheived in that manner is warranted, but it's better than a non-starter for cases where it doesn't work. We strive to include modern network drivers and perhaps we should be more aggressive about that. One thing I've been hoping to do is implement a proxydhcp server. That could glean much of the pertinent details for common configuration cases and provide a nonambiguous set of candidates for automatic (sequential, switch, chassis based) or semi-automatic (scriptable set of candidates to do whatever with) discovery (one challenge we've had with dhcp lease scraping is ambiguity of whether something is a node or piece of other equipment). I need to see about extending lsslp --flexdiscover to cover rackmount case for service processor based reconfiguration. The good thing about that scheme is that duplicate IPs are fine and get fixed automatically so long as the IMMs are on the same subnet as a management node. I am interested in issues with switch based discovery that would cause it to be given up on. Sequential or semi-automatic discovery is ok for smallish setups, but scaling it up causes a lot of ambiguity to trudge through. Inactive hide details for David D Johnson ---01/22/2014 08:53:19 AM---On Jan 22, 2014, at 8:30 AM, Jonathan Mills jonmills@renDavid D Johnson ---01/22/2014 08:53:19 AM---On Jan 22, 2014, at 8:30 AM, Jonathan Mills jonmi...@renci.org wrote: Comments inline... From: David D Johnson david_john...@brown.edu To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 01/22/2014 08:53 AM Subject: Re: [xcat-user] Frustrating time with sequential node discovery On Jan 22, 2014, at 8:30 AM, Jonathan Mills jonmi...@renci.org wrote: Comments inline... On 1/22/14, 8:08 AM, David D Johnson wrote: I've been lurking on this discussion, and just checked to see what we've got -- nbroot or genesis -- and we have both of them. I had given up on node discovery years ago, we originally used the switch port numbers and forwarding tables to assign node names. Now I use ASU to collect the macs, and populate the mac table with a bit
Re: [xcat-user] Frustrating time with sequential node discovery
Whoops, I meant to write Jarrod not Jarros. I went a little Koine Greek on your name there. Sorry about that. :-) On Wed, Jan 22, 2014 at 10:22 AM, Josh Nielsen jniel...@hudsonalpha.orgwrote: Jarros, I know you haven't been following the whole thread but Jonathan's problem (which this discussion originated from) is that somehow his installation is still using the /tftpboot/pxelinux.cfg/ files despite being up to date with genesis. I saw a response on the list from you to someone else about a similar problem in November 2013: http://sourceforge.net/mailarchive/message.php?msg_id=31683484 http://sourceforge.net/mailarchive/message.php?msg_id=31686689 Although I am not experiencing the same problem I too would be interested in the solution. It has always been challenging for me to understand the xCAT PXE deployment process, which is why I have 5-10 pages of self-written documentation and debugging info in a Google Doc about it. He posted his dhcpd.conf in his latest email. Any thoughts? -Josh On Wed, Jan 22, 2014 at 10:11 AM, Jarrod B Johnson jbjoh...@us.ibm.comwrote: 1) If running 2.8, go ahead and delete nbroot-core. genesis is far more maintainable and easier to muck with on the end point (e.g. having full fledged glibc) 2) The greatest common denominator remains switch based. It's the only frequently instrumented topology cue that is pretty universal. For bladed solutions there is potential for the chassis manager to be a good topology cue. I am curious what goes awry with switch based discovery. Without a topology cue, then the choices are sequential discovery (which I frankly haven't used myself) or using/scripting nodediscoverls/nodediscoverdef. I personally think the latter is actually better and can be trivially made into a 'sequential' discovery using straightforward scripting.. 3) Tom Alandt is the best person to discuss what can/can't be done by IBM mfg. On the other hand, we *could* take some measures to make you impervious to the conflict. The 'lsslp --flexdiscover' for its respective bits is impervious to IP conflict issues and will automatically fix it. It's not a huge stretch to make that pan out for rackmount systems (though currently it's hard to pull off without *some* topology cue). [image: Inactive hide details for David D Johnson ---01/22/2014 08:11:53 AM---I've been lurking on this discussion, and just checked to]David D Johnson ---01/22/2014 08:11:53 AM---I've been lurking on this discussion, and just checked to see what we've got -- nbroot or genesis -- From: David D Johnson david_john...@brown.edu To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 01/22/2014 08:11 AM Subject: Re: [xcat-user] Frustrating time with sequential node discovery -- I've been lurking on this discussion, and just checked to see what we've got -- nbroot or genesis -- and we have both of them. I had given up on node discovery years ago, we originally used the switch port numbers and forwarding tables to assign node names. Now I use ASU to collect the macs, and populate the mac table with a bit of grep and awk. But last week we powered on a rack of 20 non-ibm nodes, and I was wishing we had something easier since ASU didn't work for them. So I have two questions -- 1) Can I safely delete the xCAT-nbroot-core* RPMS ? 2) What is the current best practice method? What about for non-ibm hardware? No, three questions 3) How do you get IBM manufacturing to use a specific different 172.29.X for each rack they build for you? We've had three racks arrive in different months all with 172.29.101 addresses for the IMM, and I have to spend 5-10 minutes reprogramming each one. I can't put them on the same network until the conflicts are gone. What's hard for me is that we get new nodes only a couple times a year, and I forget everything in between. Thanks, -- ddj On Jan 22, 2014, at 7:12 AM, Lissa Valletta *lis...@us.ibm.com*lis...@us.ibm.com wrote: xCAT-nbroot-core* was replaced by xCAT-genesis-* in xCAT 2.8. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 graycol.gifXiao Peng Wang ---01/22/2014 02:58:13 AM---Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? From: Xiao Peng Wang *w...@cn.ibm.com* w...@cn.ibm.com To: xCAT Users Mailing list *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net, Cc: xCAT Users Mailing list *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net Date: 01/22/2014 02:58 AM Subject: Re: [xcat-user] Frustrating time with sequential node discovery -- Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? With the latest xCAT build, it needs /tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.* Thanks
Re: [xcat-user] Frustrating time with sequential node discovery
xNBA is a customized gpxe image that xCAT uses. NBFS is the older maintenance image that was used for if you set your node to boot to shell, or booted a runimage script. NBFS is deprecated, and Genesis replaced NBFS as the maintenance image for these tasks. In a standard 2.8 install, there should no longer be any nbk/nbfs RPMs installed - Genesis replaced them. perl-xCAT-2.8.3-snap201311122316.noarch xCAT-2.8.3-snap201311122318.x86_64 xCAT-client-2.8.3-snap201311122316.noarch xCAT-genesis-base-x86_64-2.8-snap201308090229.noarch elilo-xcat-3.14-4.noarch xCAT-server-2.8.3-snap201311122316.noarch xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318.noarch ipmitool-xcat-1.8.11-3.x86_64 conserver-xcat-8.1.16-10.x86_64 xCAT-buildkit-2.8.3-snap201311122318.noarch syslinux-xcat-3.86-2.noarch On 1/21/2014 2:38 PM, Josh Nielsen wrote: Hi Jonathan, Yes, I definitely think that would cause a problem. This is jogging my memory because I think that when the new Genesis boot loader was rolled out in the first version of xCAT that supported it that I faced a similar problem. I had assumed that only Genesis was needed but xNBA is still used an an intermediate image even if it is no longer the final image. I will check my yum repos as soon as I can - but by some unfortunate coincidence I just discovered that YUM is not working since our RHEL license expired three days ago (unbeknownst to me until 10 minutes ago). Do you have xCAT-genesis-x86_64 and elilo-xCAT? You may even have to pull xNBA images from an older install(?) and then run mknb to build the images. I remember downloading the tarred files with the RPM manually and creating a local repo for xCAT. Whenever I get YUM back I'll give you more specifics if I can. -Josh On Tue, Jan 21, 2014 at 1:54 PM, Jonathan Mills jonmi...@renci.org wrote: Josh, I don't doubt that you're on to something. But if this is the case, it means my systems are missing some files, namely: /tftpboot/xcat/nbk.x86_64 /tftpboot/xcat/nbfs.x86_64.gz Can you tell me what RPM installed those files on your system? They don't exist on mine, and even a 'yum provides' doesn't find them. On 01/21/2014 11:51 AM, Josh Nielsen wrote: Hi Jonathan, It is my understanding, from extensive debugging and notes that I have taken about the xCAT netbooting process in the past, that xCAT uses a two-stage image deployment method. It will first come up with a more generic boot image (normally xnba or sometimes yaboot) which - when it contacts the xCAT headnode (or the node handling DHCP requests) - the headnode will then recognize the current image on the client that is sending requests to DHCP for further boot instructions, and will tell the client to then load another image based on the subnet and image type it is currently using. For example my headnode's /etc/dhcpd.conf file has an entry that looks like this: hared-network eth0 { subnet 10.20.0.0 netmask 255.255.0.0 { max-lease-time 43200; min-lease-time 43200; default-lease-time 43200; next-server 10.20.0.1; option log-servers 10.20.0.1; option ntp-servers 10.20.0.1; option domain-name x; option domain-name-servers 10.20.0.1; if option user-class-identifier = xNBA and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16;; } else if option user-class-identifier = xNBA and option client-architecture = 00:09 { #x86, xCAT Network Boot Agent filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16.uefi;; } else if option client-architecture = 00:00 { #x86 filename xcat/xnba.kpxe; } else if option vendor-class-identifier = Etherboot-5.4 { #x86 filename xcat/xnba.kpxe; } else if option client-architecture = 00:07 { #x86_64 uefi filename xcat/xnba.efi; } else if option client-architecture = 00:09 { #x86_64 uefi alternative id filename xcat/xnba.efi; } else if option client-architecture = 00:02 { #ia64 filename elilo.efi; } else if substring(filename,0,1) = null { #otherwise, provide yaboot if the client isn't specific filename /yaboot; } range dynamic-bootp 10.20.200.254 10.20.254.254; } # 10.20.0.0/255.255.0.0 http://10.20.0.0/255.255.0.0 subnet_end So if it boots with the xNBA image it then directs it to the http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16 which has the genesis boot instructions in it: #!gpxe imgfetch -n kernel http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64 quiet xcatd=10.20.0.1:3001 http://10.20.0.1:3001 BOOTIF=01-${netX/machyp} imgfetch -n nbfs http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.gz imgload kernel imgexec kernel So first it boots with xnba (first stage of boot), it contacts the DHCP server which gives it
Re: [xcat-user] Frustrating time with sequential node discovery
Russell, That's what I had been thinking. # rpm -qa | grep -i xcat | sort conserver-xcat-8.1.16-10.x86_64 elilo-xcat-3.14-4.noarch ipmitool-xcat-1.8.11-3.x86_64 perl-xCAT-2.8.3-snap201311122316.noarch syslinux-xcat-3.86-2.noarch xCAT-2.8.3-snap201311122318.x86_64 xCAT-buildkit-2.8.3-snap201311122318.noarch xCAT-client-2.8.3-snap201311122316.noarch xCAT-genesis-base-x86_64-2.8-snap201308090229.noarch xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318.noarch xCAT-server-2.8.3-snap201311122316.noarch If that is the case, I am troubled by the incorrect pxelinux.cfg configuration generated by 'mknb x86_64'. And this is to say nothing of having successful node discover, which still eludes me. On 01/21/2014 03:58 PM, Russell Jones wrote: xNBA is a customized gpxe image that xCAT uses. NBFS is the older maintenance image that was used for if you set your node to boot to shell, or booted a runimage script. NBFS is deprecated, and Genesis replaced NBFS as the maintenance image for these tasks. In a standard 2.8 install, there should no longer be any nbk/nbfs RPMs installed - Genesis replaced them. perl-xCAT-2.8.3-snap201311122316.noarch xCAT-2.8.3-snap201311122318.x86_64 xCAT-client-2.8.3-snap201311122316.noarch xCAT-genesis-base-x86_64-2.8-snap201308090229.noarch elilo-xcat-3.14-4.noarch xCAT-server-2.8.3-snap201311122316.noarch xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318.noarch ipmitool-xcat-1.8.11-3.x86_64 conserver-xcat-8.1.16-10.x86_64 xCAT-buildkit-2.8.3-snap201311122318.noarch syslinux-xcat-3.86-2.noarch On 1/21/2014 2:38 PM, Josh Nielsen wrote: Hi Jonathan, Yes, I definitely think that would cause a problem. This is jogging my memory because I think that when the new Genesis boot loader was rolled out in the first version of xCAT that supported it that I faced a similar problem. I had assumed that only Genesis was needed but xNBA is still used an an intermediate image even if it is no longer the final image. I will check my yum repos as soon as I can - but by some unfortunate coincidence I just discovered that YUM is not working since our RHEL license expired three days ago (unbeknownst to me until 10 minutes ago). Do you have xCAT-genesis-x86_64 and elilo-xCAT? You may even have to pull xNBA images from an older install(?) and then run mknb to build the images. I remember downloading the tarred files with the RPM manually and creating a local repo for xCAT. Whenever I get YUM back I'll give you more specifics if I can. -Josh On Tue, Jan 21, 2014 at 1:54 PM, Jonathan Mills jonmi...@renci.org wrote: Josh, I don't doubt that you're on to something. But if this is the case, it means my systems are missing some files, namely: /tftpboot/xcat/nbk.x86_64 /tftpboot/xcat/nbfs.x86_64.gz Can you tell me what RPM installed those files on your system? They don't exist on mine, and even a 'yum provides' doesn't find them. On 01/21/2014 11:51 AM, Josh Nielsen wrote: Hi Jonathan, It is my understanding, from extensive debugging and notes that I have taken about the xCAT netbooting process in the past, that xCAT uses a two-stage image deployment method. It will first come up with a more generic boot image (normally xnba or sometimes yaboot) which - when it contacts the xCAT headnode (or the node handling DHCP requests) - the headnode will then recognize the current image on the client that is sending requests to DHCP for further boot instructions, and will tell the client to then load another image based on the subnet and image type it is currently using. For example my headnode's /etc/dhcpd.conf file has an entry that looks like this: hared-network eth0 { subnet 10.20.0.0 netmask 255.255.0.0 { max-lease-time 43200; min-lease-time 43200; default-lease-time 43200; next-server 10.20.0.1; option log-servers 10.20.0.1; option ntp-servers 10.20.0.1; option domain-name x; option domain-name-servers 10.20.0.1; if option user-class-identifier = xNBA and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16;; } else if option user-class-identifier = xNBA and option client-architecture = 00:09 { #x86, xCAT Network Boot Agent filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16.uefi;; } else if option client-architecture = 00:00 { #x86 filename xcat/xnba.kpxe; } else if option vendor-class-identifier = Etherboot-5.4 { #x86 filename xcat/xnba.kpxe; } else if option client-architecture = 00:07 { #x86_64 uefi filename xcat/xnba.efi; } else if option client-architecture = 00:09 { #x86_64 uefi alternative id filename xcat/xnba.efi; } else if option client-architecture = 00:02 { #ia64 filename elilo.efi; } else
Re: [xcat-user] Frustrating time with sequential node discovery
Evidently though something in his xCAT setup it creating the files in /tftpboot/pxelinux.cfg/ with reference to xnba just like my installation. Where does xCAT grab the configuration for that? Maybe it was because I didn't do a completely clean install and did an in-place upgrade, but my cluster actually works perfectly with both xnba genesis installed because it uses xnba first to bootstrap and then requests the Genesis image. xCAT must support that scenario else I haven't the slightest idea by what miracle my installation is running with such a configuration. :-) -Josh On Tue, Jan 21, 2014 at 2:58 PM, Russell Jones russell-l...@jonesmail.me wrote: xNBA is a customized gpxe image that xCAT uses. NBFS is the older maintenance image that was used for if you set your node to boot to shell, or booted a runimage script. NBFS is deprecated, and Genesis replaced NBFS as the maintenance image for these tasks. In a standard 2.8 install, there should no longer be any nbk/nbfs RPMs installed - Genesis replaced them. perl-xCAT-2.8.3-snap201311122316.noarch xCAT-2.8.3-snap201311122318.x86_64 xCAT-client-2.8.3-snap201311122316.noarch xCAT-genesis-base-x86_64-2.8-snap201308090229.noarch elilo-xcat-3.14-4.noarch xCAT-server-2.8.3-snap201311122316.noarch xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318.noarch ipmitool-xcat-1.8.11-3.x86_64 conserver-xcat-8.1.16-10.x86_64 xCAT-buildkit-2.8.3-snap201311122318.noarch syslinux-xcat-3.86-2.noarch On 1/21/2014 2:38 PM, Josh Nielsen wrote: Hi Jonathan, Yes, I definitely think that would cause a problem. This is jogging my memory because I think that when the new Genesis boot loader was rolled out in the first version of xCAT that supported it that I faced a similar problem. I had assumed that only Genesis was needed but xNBA is still used an an intermediate image even if it is no longer the final image. I will check my yum repos as soon as I can - but by some unfortunate coincidence I just discovered that YUM is not working since our RHEL license expired three days ago (unbeknownst to me until 10 minutes ago). Do you have xCAT-genesis-x86_64 and elilo-xCAT? You may even have to pull xNBA images from an older install(?) and then run mknb to build the images. I remember downloading the tarred files with the RPM manually and creating a local repo for xCAT. Whenever I get YUM back I'll give you more specifics if I can. -Josh On Tue, Jan 21, 2014 at 1:54 PM, Jonathan Mills jonmi...@renci.org wrote: Josh, I don't doubt that you're on to something. But if this is the case, it means my systems are missing some files, namely: /tftpboot/xcat/nbk.x86_64 /tftpboot/xcat/nbfs.x86_64.gz Can you tell me what RPM installed those files on your system? They don't exist on mine, and even a 'yum provides' doesn't find them. On 01/21/2014 11:51 AM, Josh Nielsen wrote: Hi Jonathan, It is my understanding, from extensive debugging and notes that I have taken about the xCAT netbooting process in the past, that xCAT uses a two-stage image deployment method. It will first come up with a more generic boot image (normally xnba or sometimes yaboot) which - when it contacts the xCAT headnode (or the node handling DHCP requests) - the headnode will then recognize the current image on the client that is sending requests to DHCP for further boot instructions, and will tell the client to then load another image based on the subnet and image type it is currently using. For example my headnode's /etc/dhcpd.conf file has an entry that looks like this: hared-network eth0 { subnet 10.20.0.0 netmask 255.255.0.0 { max-lease-time 43200; min-lease-time 43200; default-lease-time 43200; next-server 10.20.0.1; option log-servers 10.20.0.1; option ntp-servers 10.20.0.1; option domain-name x; option domain-name-servers 10.20.0.1; if option user-class-identifier = xNBA and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16;; } else if option user-class-identifier = xNBA and option client-architecture = 00:09 { #x86, xCAT Network Boot Agent filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16.uefi;; } else if option client-architecture = 00:00 { #x86 filename xcat/xnba.kpxe; } else if option vendor-class-identifier = Etherboot-5.4 { #x86 filename xcat/xnba.kpxe; } else if option client-architecture = 00:07 { #x86_64 uefi filename xcat/xnba.efi; } else if option client-architecture = 00:09 { #x86_64 uefi alternative id filename xcat/xnba.efi; } else if option client-architecture = 00:02 { #ia64 filename elilo.efi; } else if substring(filename,0,1) = null { #otherwise, provide yaboot if the client isn't specific filename
Re: [xcat-user] Frustrating time with sequential node discovery
It *should* work with xNBA and Genesis - xNBA is the PXE image that loads Genesis. :-) Genesis is the utility image that handles shell commands, runimages, etc. Don't confuse NBFS with xNBA - NBFS is deprecated via Genesis. xNBA is the gpxe image that loads Genesis or your normal OS image depending on what you sent via nodeset. Genesis would not be able to load without xNBA (or standard PXE), and neither would any netboot images. On 1/21/2014 3:33 PM, Josh Nielsen wrote: my case it still works with both xnba and genesis because of the nature of PXE chainloading. It probably adds deployment time, but it actually works in such a mixed configuration. -Josh -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Frustrating time with sequential node discovery
It would seem to me that what I am missing is the whole of the xCAT-nbroot infrastructure...because it isn't part of xcat-core, nor xcat-dep. So I didn't grab it. But it just so happens...you need it. The file /tftpboot/xcat/nbk.x86_64 is provided by the RPM xCAT-nbkernel-x86_64. Which is missing from my yum repo mirrior, and from my hosts. Anything else I'm missing? Hopefully if I grab correct copies of xCAT-nbkernel and xCAT-nbroot (or xCAT-nbroot2?) then node discovery will actually work. On 1/22/14, 12:08 AM, Xiao Peng Wang wrote: Both Josh and Russell are correct. xNBA is a customized pxe and genesis is a xCAT customized diskless linux system to run discovery and other tasks like 'bmcsetup'. It does not need the /tftpboot/pxelinux.cfg/.* to load the genesis. For discovery, if a node is not defined in xCAT, the dhcp configuration in the /etc/dhcp/dhcpd.conf or /etc/dhcpd.conf is used to reply the dhcp request from not-discovered node. In your dhcpd.conf, it should have the following part for your deployment network. If not, run 'makedhcp -n' to recreate your dhcpd.conf. if option user-class-identifier = xNBA and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16;; } else if option user-class-identifier = xNBA and option client-architecture = 00:09 { #x86, xCAT Network Boot Agent filename = http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16.uefi;; } else if option client-architecture = 00:00 { #x86 filename xcat/xnba.kpxe; } else if option vendor-class-identifier = Etherboot-5.4 { #x86 filename xcat/xnba.kpxe; } else if option client-architecture = 00:07 { #x86_64 uefi filename xcat/xnba.efi; } else if option client-architecture = 00:09 { #x86_64 uefi alternative id filename xcat/xnba.efi; } During the boot process of a not-discovered node, dhcpd will tell the node to load xcat/xnba.kpxe first and then the configuration file http://xcat mn/tftpboot/xcat/xnba/nets/10.1.0.0_16. Then the xnba will load the genesis. Take a look of the syslog to see whether the xnba was downloaded successfully from tftp server. And look into the httpd log to see whether the genesis has been downloaded successfully. Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Inactive hide details for Josh Nielsen ---2014/01/22 05:56:00---Ah, I see what you are saying now. Well, I hope the thread I stJosh Nielsen ---2014/01/22 05:56:00---Ah, I see what you are saying now. Well, I hope the thread I stumbled on that Jarrod replied to help From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/22 05:56 Subject: Re: [xcat-user] Frustrating time with sequential node discovery Ah, I see what you are saying now. Well, I hope the thread I stumbled on that Jarrod replied to helps figure out why his configuration is looking to the outdated (according to what Jarrod said) configuration files in /tftpboot/pxelinux.cfg/. Looks like it is either /etc/dhcpd.conf or /var/lib/dhcpd/dhcpd.leases related in that case. On Tue, Jan 21, 2014 at 3:51 PM, Russell Jones russell-l...@jonesmail.me wrote: It *should* work with xNBA and Genesis - xNBA is the PXE image that loads Genesis. :-) Genesis is the utility image that handles shell commands, runimages, etc. Don't confuse NBFS with xNBA - NBFS is deprecated via Genesis. xNBA is the gpxe image that loads Genesis or your normal OS image depending on what you sent via nodeset. Genesis would not be able to load without xNBA (or standard PXE), and neither would any netboot images. On 1/21/2014 3:33 PM, Josh Nielsen wrote: my case it still works with both xnba and genesis because of the nature of PXE chainloading. It probably adds deployment time, but it actually works in such a mixed configuration. -Josh -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Frustrating time with sequential node discovery
Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? With the latest xCAT build, it needs /tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.* Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 From: Jonathan Mills jonmi...@renci.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/22 14:10 Subject:Re: [xcat-user] Frustrating time with sequential node discovery It would seem to me that what I am missing is the whole of the xCAT-nbroot infrastructure...because it isn't part of xcat-core, nor xcat-dep. So I didn't grab it. But it just so happens...you need it. The file /tftpboot/xcat/nbk.x86_64 is provided by the RPM xCAT-nbkernel-x86_64. Which is missing from my yum repo mirrior, and from my hosts. Anything else I'm missing? Hopefully if I grab correct copies of xCAT-nbkernel and xCAT-nbroot (or xCAT-nbroot2?) then node discovery will actually work. On 1/22/14, 12:08 AM, Xiao Peng Wang wrote: Both Josh and Russell are correct. xNBA is a customized pxe and genesis is a xCAT customized diskless linux system to run discovery and other tasks like 'bmcsetup'. It does not need the /tftpboot/pxelinux.cfg/.* to load the genesis. For discovery, if a node is not defined in xCAT, the dhcp configuration in the /etc/dhcp/dhcpd.conf or /etc/dhcpd.conf is used to reply the dhcp request from not-discovered node. In your dhcpd.conf, it should have the following part for your deployment network. If not, run 'makedhcp -n' to recreate your dhcpd.conf. if option user-class-identifier = xNBA and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16 ; } else if option user-class-identifier = xNBA and option client-architecture = 00:09 { #x86, xCAT Network Boot Agent filename = http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16.uefi;; } else if option client-architecture = 00:00 { #x86 filename xcat/xnba.kpxe; } else if option vendor-class-identifier = Etherboot-5.4 { #x86 filename xcat/xnba.kpxe; } else if option client-architecture = 00:07 { #x86_64 uefi filename xcat/xnba.efi; } else if option client-architecture = 00:09 { #x86_64 uefi alternative id filename xcat/xnba.efi; } During the boot process of a not-discovered node, dhcpd will tell the node to load xcat/xnba.kpxe first and then the configuration file http://xcat mn/tftpboot/xcat/xnba/nets/10.1.0.0_16. Then the xnba will load the genesis. Take a look of the syslog to see whether the xnba was downloaded successfully from tftp server. And look into the httpd log to see whether the genesis has been downloaded successfully. Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Inactive hide details for Josh Nielsen ---2014/01/22 05:56:00---Ah, I see what you are saying now. Well, I hope the thread I stJosh Nielsen ---2014/01/22 05:56:00---Ah, I see what you are saying now. Well, I hope the thread I stumbled on that Jarrod replied to help From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/22 05:56 Subject: Re: [xcat-user] Frustrating time with sequential node discovery Ah, I see what you are saying now. Well, I hope the thread I stumbled on that Jarrod replied to helps figure out why his configuration is looking to the outdated (according to what Jarrod said) configuration files in /tftpboot/pxelinux.cfg/. Looks like it is either /etc/dhcpd.conf or /var/lib/dhcpd/dhcpd.leases related in that case. On Tue, Jan 21, 2014 at 3:51 PM, Russell Jones russell-l...@jonesmail.me wrote: It *should* work with xNBA and Genesis - xNBA is the PXE image that loads Genesis. :-) Genesis is the utility image that handles shell commands, runimages, etc. Don't confuse NBFS with xNBA - NBFS is deprecated via Genesis. xNBA is the gpxe image that loads Genesis or your normal OS image depending on what you sent via nodeset. Genesis would not be able to load without xNBA (or standard PXE), and neither would any netboot images. On 1/21/2014 3:33 PM, Josh Nielsen wrote: my case it still works with both xnba and genesis because of the nature of PXE chainloading
Re: [xcat-user] Frustrating time with sequential node discovery
xCAT is using genesis (an xCAT customized pxe tool) to function the discovery process. The configuration for genesis is put in /tftpboot/xcat/xnba/nets/ for a specific network. Could you check your specific xnba configuration file for your deployment network has been put in /tftpboot/xcat/xnba/nets/? The prerequisite for booting of genesis is to make the node has a dynamic IP address. Did you configure the dynamic IP range for your deployment network? Could you take a look of your syslog to see whether the node has sent out dhcp request and what did your dhcp server replied to them? Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 From: Jonathan Mills jonmi...@renci.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/19 06:24 Subject:[xcat-user] Frustrating time with sequential node discovery I'm running xCAT 2.8.3 and CentOS 6.4 atop of Cisco UCS-C hardware. I'm attempting to do a sequential nodediscovery. I've pre-populated the nodelist table with the nodenames, so I shouldn't need to do anything more than nodediscoverystart noderange=node[1-15] However, none of the nodes ever gets discovered. Digging deeper, it seems that none of them ever successfully PXE boot at all. They should be PXE booting off of the genesis netboot image and speaking back to the xcatmaster, correct? When I run 'mknb x86_64', it populates /tftpboot/pxelinux.cfg with entries to non-existent netboot images. Watch: [root@ncsu-hn ~]# rpm -qf /opt/xcat/sbin/mknb xCAT-client-2.8.3-snap201311122316.noarch [root@ncsu-hn ~]# mknb x86_64 Creating genesis.fs.x86_64.lzma in /tftpboot/xcat [root@ncsu-hn ~]# cd /tftpboot/pxelinux.cfg/ [root@ncsu-hn pxelinux.cfg]# ls 0A6400 0A6500 0A6600 7F 98300D 98300DE6 98300DE7 C0A86B [root@ncsu-hn pxelinux.cfg]# cat * DEFAULT xCAT LABEL xCAT KERNEL xcat/nbk.x86_64 APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.100.0.1:3001 DEFAULT xCAT LABEL xCAT KERNEL xcat/nbk.x86_64 APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.101.0.1:3001 DEFAULT xCAT LABEL xCAT KERNEL xcat/nbk.x86_64 APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.102.0.1:3001 DEFAULT xCAT LABEL xCAT KERNEL xcat/nbk.x86_64 APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=127.0.0.1:3001 DEFAULT xCAT LABEL xCAT KERNEL xcat/nbk.x86_64 APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=152.48.13.3:3001 DEFAULT xCAT LABEL xCAT KERNEL xcat/nbk.x86_64 APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=152.48.13.230:3001 DEFAULT xCAT LABEL xCAT KERNEL xcat/nbk.x86_64 APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=152.48.13.231:3001 DEFAULT xCAT LABEL xCAT KERNEL xcat/nbk.x86_64 APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=192.168.107.10:3001 [root@ncsu-hn pxelinux.cfg]# cd ../xcat/ [root@ncsu-hn xcat]# ls -la total 21528 drwxr-xr-x 4 root root 4096 Jan 17 13:06 . drwxr-xr-x. 7 root root 4096 Jan 18 22:02 .. -rwxr-xr-x 1 root root 242929 Jan 15 2012 elilo-x64.efi -rw-r--r-- 1 root root 17573621 Jan 18 22:03 genesis.fs.x86_64.lzma -rwxr-xr-x 1 root root 3986608 Aug 9 06:29 genesis.kernel.x86_64 drwxr-xr-x 3 root root 4096 Jan 17 13:06 osimage drwxr-xr-x 3 root root 4096 Dec 23 07:42 xnba -rw-r--r-- 1 root root 139200 Oct 28 16:16 xnba.efi -rw-r--r-- 1 root root74792 Oct 28 16:16 xnba.kpxe As you can seeit ought to be netbooting the genesis kernel, but instead all my pxelinux.cfg/* files are instructing clients to boot the non-existent nbk.x86_64 image. Your advice is appreciated. -- Jonathan Mills Systems Administrator Renaissance Computing Institute UNC-Chapel Hill -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user inline: graycol.gif-- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net