I'm not clear anymore remembering what gave is the most grief back 2-3 years ago when the first add-on racks were added to the cluster. There were some xcat update growing pains, some bugs that weren't fixed soon enough for us to move on, and the original design/rollout was done in 2009 by an IBM contractor, so we needed to learn what he had done. Some of this was moving target problem -- I learned it one way, and by the time I need to do it again the ground rules have changed, and I've forgotten a lot of stuff in the mean time. We did not originally buy xcat support, but we have it now. We also did a centos5->centos6 upgrade by moving all the xcat stuff to a new mgt server, turned off the old dhcp server and had the diskless nodes reboot on the new one. I never did get around to copying the tabdump switch information. Also, because of partial rack orders, the actual node arrangements got screwy. Like node201-240 on the bottom left and right of one idataplex rack, and 241-274 in the upper half. Finally it's a matter of how long the learning curve to do it the "right" way vs. knowing exactly how long it takes to do it with asu/rinv and tabedit mac.
On Jan 22, 2014, at 12:10 PM, Jonathan Mills <jonmi...@renci.org> wrote: > xCAT-cisco works to an extent. It is fabulous for fetching MAC > addresses via UCS Manager. However, for me at least, the rsetboot > command fails flat out. But worst of all is that rpower commands do not > "shoot the node in the head" like an IPMI command. Instead, UCS Manager > tries to be cute and gracefully shut down the OS. To get the expected > result of an rpower command via xCAT-cisco often means waiting 60 > seconds, or it may never work at all occasionally. > > That's why I found it attractive, the idea of using a traditional xCAT > setup using IPMI to control UCS nodes, and node discovery to pull in > their MAC addresses. > > Using IPMI with UCS hardware means that commands like 'rinv' don't work, > like with SuperMicro gear. Some aspects of the hardware aren't exposed > through IPMI registers. > > On 01/22/2014 11:03 AM, Jarrod B Johnson wrote: >> Sorry I haven't been following the thread and will hit a few points to >> the list in general. >> >> For rinv macs, sadly that's not part of standards, so we can only pull >> it off one vendor at a time, hence why rinv mac works for some, but not >> others. >> >> For the questions about UCS, I assume >> _https://github.com/vallard/xCAT-cisco_was looked at. I'm not >> personally familiar with their scheme, but for other blade-oriented >> solutions, we have used the chassis managers as a topology cue >> alternative to switch. >> >> For scraping dhcpd.leases, that should be a doable script to include. >> There are cases that require a more thorough investigation than can be >> acheived in that manner is warranted, but it's better than a non-starter >> for cases where it doesn't work. We strive to include modern network >> drivers and perhaps we should be more aggressive about that. >> >> One thing I've been hoping to do is implement a proxydhcp server. That >> could glean much of the pertinent details for common configuration cases >> and provide a nonambiguous set of candidates for automatic (sequential, >> switch, chassis based) or semi-automatic (scriptable set of candidates >> to do whatever with) discovery (one challenge we've had with dhcp lease >> scraping is ambiguity of whether something is a node or piece of other >> equipment). >> >> I need to see about extending lsslp --flexdiscover to cover rackmount >> case for service processor based reconfiguration. The good thing about >> that scheme is that duplicate IPs are fine and get fixed automatically >> so long as the IMMs are on the same subnet as a management node. >> >> I am interested in issues with switch based discovery that would cause >> it to be given up on. Sequential or semi-automatic discovery is ok for >> smallish setups, but scaling it up causes a lot of ambiguity to trudge >> through. >> >> Inactive hide details for David D Johnson ---01/22/2014 08:53:19 AM---On >> Jan 22, 2014, at 8:30 AM, Jonathan Mills <jonmills@renDavid D Johnson >> ---01/22/2014 08:53:19 AM---On Jan 22, 2014, at 8:30 AM, Jonathan Mills >> <jonmi...@renci.org> wrote: > Comments inline... >> >> From: David D Johnson <david_john...@brown.edu> >> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >> Date: 01/22/2014 08:53 AM >> Subject: Re: [xcat-user] Frustrating time with sequential node discovery >> >> ------------------------------------------------------------------------ >> >> >> >> >> On Jan 22, 2014, at 8:30 AM, Jonathan Mills <jonmi...@renci.org> wrote: >> >>> Comments inline... >>> >>> On 1/22/14, 8:08 AM, David D Johnson wrote: >>>> I've been lurking on this discussion, and just checked to see what we've >>>> got -- nbroot or genesis -- and we have both of them. >>>> >>>> I had given up on node discovery years ago, we originally used the >>>> switch port numbers and forwarding tables to assign node names. Now I >>>> use ASU to collect the macs, and populate the mac table with a bit of >>>> grep and awk. But last week we powered on a rack of 20 non-ibm nodes, >>>> and I was wishing we had something easier since ASU didn't work for >> them. >>> >>> Precisely! That's what I've been doing for IBM and Dell gear for a long >>> time. In fact, here's what I do: >>> >>> for i in `seq 1 100` >>> do >>> MAC=$(rinv node${i} mac | grep 'MAC Address 1' | cut -d " " -f 5) >>> chtab node="node${i}" mac.mac=$MAC mac.interface="eth0" >>> done; >>> >>> ....or similar. >>> >> >> Nice, "rinv mac" seems to take much less time that "asu show --group >> PXE",and it can run in parallel on a node range. Unfortunately neither >> works for these (SupermicrO) nodes. >> >>>> >>>> So I have two questions -- >>>> 1) Can I safely delete the xCAT-nbroot-core* RPMS ? >>> >>> I still don't know! Because if using the chain-loading, I don't see how >>> the first stage is installed by the xCAT-genesis-* RPMS. >>> >>>> 2) What is the current best practice method? What about for non-ibm >>>> hardware? >>> >>> This is also what I'm trying to establish. I had been attempting to use >>> sequential node discovery for Cisco UCS-B series equipment, since the >>> switch method would be hard to use (since UCS's Fabric Interconnect is >>> kinda like a switch but not totally). >>> >>> I am also familiar with ROCKS Clusters method of sequential node >>> discovery (insert-ethers), which literally scrapes MAC addresses out of >>> the dhcpd lines which appear in /var/log/messages -- IMHO that would >>> have worked far better for xCAT than by the method of PXE booting (or >>> chain-loading) the Genesis kernel -- which may or may not have kernel >>> modules for your NIC hardware -- to send messages back to >> xcatmaster:3001. >>> >> >> Our clusters from 2006-2009 were all based on Rocks, and that was the >> one feature I miss the most. >> This way would work for any hardware type. >> The Rocks web-GUI database was OK for the small clusters, but they moved >> to the "rocks" cli for making changes just about the time we went to >> xcat-2. We in fact used xcat-1 for all the hardware management, rcons >> rpower, etc. >> I miss the ability to hack on the python scripts. >> >>>> >>>> No, three questions >>>> 3) How do you get IBM manufacturing to use a specific different 172.29.X >>>> for each rack they build for you? >>>> We've had three racks arrive in different months all with 172.29.101 >>>> addresses for the IMM, and I have to spend 5-10 minutes reprogramming >>>> each one. I can't put them on the same network until the conflicts are >>>> gone. >>> >>> For a price, IBM has an integration center with technicians can make >>> such things happen. For another price, they even offer a kind of DMZ >>> they call "the yellowzone" where you can SSH into their lab and >>> pre-configure your gear before it ships. But it isn't worth the effort >>> unless you're going to be buying a lot of things with some frequency. >>> >> >> We get the racks prebuilt and shipped from Hong Kong, and they do >> program the IMM addresses, but they never ask us which rack number to >> use, it's always A1. If they're going to do it at all, they should do >> it right. >> >>>> What's hard for me is that we get new nodes only a couple times a year, >>>> and I forget everything in between. >>>> >>>> Thanks, >>>> -- ddj >>>> >>>> On Jan 22, 2014, at 7:12 AM, Lissa Valletta <lis...@us.ibm.com >>>> <mailto:lis...@us.ibm.com>> wrote: >>>> >>>>> xCAT-nbroot-core* was replaced by xCAT-genesis-* in xCAT 2.8. >>>>> >>>>> Lissa K. Valletta >>>>> 8-3/B10 >>>>> Poughkeepsie, NY 12601 >>>>> (tie 293) 433-3102 >>>>> >>>>> >>>>> >>>>> <graycol.gif>Xiao Peng Wang ---01/22/2014 02:58:13 AM---Why do you say >>>>> that you need nbk.x86_64? Is this file listed in the >>>>> </tftpboot/xcat/xnba/nets/>? >>>>> >>>>> From: Xiao Peng Wang <w...@cn.ibm.com <mailto:w...@cn.ibm.com>> >>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net >>>>> <mailto:xcat-user@lists.sourceforge.net>>, >>>>> Cc: xCAT Users Mailing list <xcat-user@lists.sourceforge.net >>>>> <mailto:xcat-user@lists.sourceforge.net>> >>>>> Date: 01/22/2014 02:58 AM >>>>> Subject: Re: [xcat-user] Frustrating time with sequential node >> discovery >>>>> >>>>> >> ------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> Why do you say that you need nbk.x86_64? Is this file listed in the >>>>> </tftpboot/xcat/xnba/nets/>? >>>>> >>>>> With the latest xCAT build, it needs >>>>> /tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.* >>>>> >>>>> Thanks >>>>> Best Regards >>>>> ---------------------------------------------------------------------- >>>>> Wang Xiaopeng (王晓朋) >>>>> IBM China System Technology Laboratory >>>>> Tel: 86-10-82453455 >>>>> Email: w...@cn.ibm.com <mailto:w...@cn.ibm.com> >>>>> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, >>>>> Haidian District Beijing P.R.China 100193 >>>>> >>>>> <graycol.gif>Jonathan Mills ---2014/01/22 14:10:32---It would seem to >>>>> me that what I am missing is the whole of the xCAT-nbroot >>>>> infrastructure...because >>>>> >>>>> From: Jonathan Mills <jonmi...@renci.org <mailto:jonmi...@renci.org>> >>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net >>>>> <mailto:xcat-user@lists.sourceforge.net>>, >>>>> Date: 2014/01/22 14:10 >>>>> Subject: Re: [xcat-user] Frustrating time with sequential node >> discovery >>>>> >> ------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> It would seem to me that what I am missing is the whole of the >>>>> xCAT-nbroot infrastructure...because it isn't part of xcat-core, nor >>>>> xcat-dep. So I didn't grab it. But it just so happens...you need it. >>>>> >>>>> The file >>>>> >>>>> /tftpboot/xcat/nbk.x86_64 >>>>> >>>>> is provided by the RPM xCAT-nbkernel-x86_64. Which is missing from my >>>>> yum repo mirrior, and from my hosts. >>>>> >>>>> >>>>> Anything else I'm missing? Hopefully if I grab correct copies of >>>>> xCAT-nbkernel and xCAT-nbroot (or xCAT-nbroot2?) then node discovery >>>>> will actually work. >>>>> >>>>> On 1/22/14, 12:08 AM, Xiao Peng Wang wrote: >>>>>> Both Josh and Russell are correct. >>>>>> >>>>>> xNBA is a customized pxe and genesis is a xCAT customized diskless >> linux >>>>>> system to run discovery and other tasks like 'bmcsetup'. It does not >>>>>> need the /tftpboot/pxelinux.cfg/.* to load the genesis. >>>>>> >>>>>> For discovery, if a node is not defined in xCAT, the dhcp >> configuration >>>>>> in the /etc/dhcp/dhcpd.conf or /etc/dhcpd.conf is used to reply >> the dhcp >>>>>> request from not-discovered node. >>>>>> >>>>>> In your dhcpd.conf, it should have the following part for your >>>>>> deployment network. If not, run 'makedhcp -n' to recreate your >>>>> dhcpd.conf. >>>>>> if option user-class-identifier = "xNBA" and option >>>>>> client-architecture = 00:00 { #x86, xCAT Network Boot Agent >>>>>> always-broadcast on; >>>>>> filename = >>>>> "_http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16_"; >>>>>> } else if option user-class-identifier = "xNBA" and option >>>>>> client-architecture = 00:09 { #x86, xCAT Network Boot Agent >>>>>> filename = >>>>>> "_http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16.uefi_"; >>>>>> } else if option client-architecture = 00:00 { #x86 >>>>>> filename "xcat/xnba.kpxe"; >>>>>> } else if option vendor-class-identifier = "Etherboot-5.4" { #x86 >>>>>> filename "xcat/xnba.kpxe"; >>>>>> } else if option client-architecture = 00:07 { #x86_64 uefi >>>>>> filename "xcat/xnba.efi"; >>>>>> } else if option client-architecture = 00:09 { #x86_64 uefi >>>>>> alternative id >>>>>> filename "xcat/xnba.efi"; >>>>>> } >>>>>> >>>>>> During the boot process of a not-discovered node, dhcpd will tell the >>>>>> node to load xcat/xnba.kpxe first and then the configuration file >>>>>> http://<xcat mn>/tftpboot/xcat/xnba/nets/10.1.0.0_16. Then the >> xnba will >>>>>> load the genesis. >>>>>> >>>>>> Take a look of the syslog to see whether the xnba was downloaded >>>>>> successfully from tftp server. And look into the httpd log to see >>>>>> whether the genesis has been downloaded successfully. >>>>>> >>>>>> >>>>>> Thanks >>>>>> Best Regards >>>>>> ---------------------------------------------------------------------- >>>>>> Wang Xiaopeng (王晓朋) >>>>>> IBM China System Technology Laboratory >>>>>> Tel: 86-10-82453455 >>>>>> Email: w...@cn.ibm.com <mailto:w...@cn.ibm.com> >>>>>> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, >>>>>> Haidian District Beijing P.R.China 100193 >>>>>> >>>>>> Inactive hide details for Josh Nielsen ---2014/01/22 05:56:00---Ah, I >>>>>> see what you are saying now. Well, I hope the thread I stJosh Nielsen >>>>>> ---2014/01/22 05:56:00---Ah, I see what you are saying now. Well, >> I hope >>>>>> the thread I stumbled on that Jarrod replied to help >>>>>> >>>>>> From: Josh Nielsen <jniel...@hudsonalpha.org >>>>> <mailto:jniel...@hudsonalpha.org>> >>>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net >>>>> <mailto:xcat-user@lists.sourceforge.net>>, >>>>>> Date: 2014/01/22 05:56 >>>>>> Subject: Re: [xcat-user] Frustrating time with sequential node >> discovery >>>>>> >>>>>> >> ------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> Ah, I see what you are saying now. Well, I hope the thread I stumbled >>>>>> on that Jarrod replied to helps figure out why his configuration is >>>>>> looking to the outdated (according to what Jarrod said) configuration >>>>>> files in /tftpboot/pxelinux.cfg/. Looks like it is either >>>>>> /etc/dhcpd.conf or /var/lib/dhcpd/dhcpd.leases related in that case. >>>>>> >>>>>> On Tue, Jan 21, 2014 at 3:51 PM, Russell Jones >>>>>> <russell-l...@jonesmail.me <mailto:russell-l...@jonesmail.me>> wrote: >>>>>>> It *should* work with xNBA and Genesis - xNBA is the PXE image that >>>>>>> loads Genesis. :-) >>>>>>> >>>>>>> Genesis is the utility image that handles shell commands, >>>>> runimages, etc. >>>>>>> >>>>>>> Don't confuse NBFS with xNBA - NBFS is deprecated via Genesis. >>>>> xNBA is >>>>>>> the gpxe image that loads Genesis or your normal OS image >>>>> depending on >>>>>>> what you sent via nodeset. Genesis would not be able to load without >>>>>>> xNBA (or standard PXE), and neither would any netboot images. >>>>>>> >>>>>>> On 1/21/2014 3:33 PM, Josh Nielsen wrote: >>>>>>>> my case it still works with >>>>>>>> both xnba and genesis because of the nature of PXE chainloading. It >>>>>>>> probably adds deployment time, but it actually works in such a mixed >>>>>>>> configuration. >>>>>>>> >>>>>>>> -Josh >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>>>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>>>>>> Critical Workloads, Development Environments & Everything In Between. >>>>>>> Get a Quote or Start a Free Trial Today. >>>>>>> >>>>>> >>>>> >> _http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_ >>>>>>> _______________________________________________ >>>>>>> xCAT-user mailing list >>>>>>> xCAT-user@lists.sourceforge.net >>>>> <mailto:xCAT-user@lists.sourceforge.net> >>>>>>> _https://lists.sourceforge.net/lists/listinfo/xcat-user_ >>>>>> >>>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>>>>> Critical Workloads, Development Environments & Everything In Between. >>>>>> Get a Quote or Start a Free Trial Today. >>>>>> >>>>> >> _http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_ >>>>>> _______________________________________________ >>>>>> xCAT-user mailing list >>>>>> xCAT-user@lists.sourceforge.net >> <mailto:xCAT-user@lists.sourceforge.net> >>>>>> _https://lists.sourceforge.net/lists/listinfo/xcat-user_ >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Jonathan Mills >>>>> Systems Administrator >>>>> Renaissance Computing Institute >>>>> UNC-Chapel Hill >>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>>>> Critical Workloads, Development Environments & Everything In Between. >>>>> Get a Quote or Start a Free Trial Today. _ >>>>> >> __http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_ >>>>> _______________________________________________ >>>>> xCAT-user mailing list >>>>> xCAT-user@lists.sourceforge.net >> <mailto:xCAT-user@lists.sourceforge.net>_ >>>>> __https://lists.sourceforge.net/lists/listinfo/xcat-user_ >>>>> >> ------------------------------------------------------------------------------ >>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>>>> Critical Workloads, Development Environments & Everything In Between. >>>>> Get a Quote or Start a Free Trial Today. >>>>> >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_______________________________________________ >>>>> xCAT-user mailing list >>>>> xCAT-user@lists.sourceforge.net >> <mailto:xCAT-user@lists.sourceforge.net> >>>>> https://lists.sourceforge.net/lists/listinfo/xcat-user >>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>>>> Critical Workloads, Development Environments & Everything In Between. >>>>> Get a Quote or Start a Free Trial Today. >>>>> >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_______________________________________________ >>>>> xCAT-user mailing list >>>>> xCAT-user@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/xcat-user >>>> >>> >>> -- >>> Jonathan Mills >>> Systems Administrator >>> Renaissance Computing Institute >>> UNC-Chapel Hill >>> >>> >> ------------------------------------------------------------------------------ >>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>> Critical Workloads, Development Environments & Everything In Between. >>> Get a Quote or Start a Free Trial Today. >>> >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> xCAT-user mailing list >>> xCAT-user@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/xcat-user >> >> >> ------------------------------------------------------------------------------ >> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> Learn Why More Businesses Are Choosing CenturyLink Cloud For >> Critical Workloads, Development Environments & Everything In Between. >> Get a Quote or Start a Free Trial Today. >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> _______________________________________________ >> xCAT-user mailing list >> xCAT-user@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/xcat-user >> > > -- > Jonathan Mills > Systems Administrator > Renaissance Computing Institute > UNC-Chapel Hill > > ------------------------------------------------------------------------------ > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user