I'm not clear anymore remembering what gave is the most grief back 2-3 years 
ago when the first add-on racks were added to the cluster.  There were some 
xcat update growing pains, some bugs that weren't fixed soon enough for us to 
move on, and the original design/rollout was done in 2009 by an IBM contractor, 
so we needed to learn what he had done.  Some of this was moving target problem 
-- I learned it one way, and by the time I need to do it again the ground rules 
have changed, and I've forgotten a lot of stuff in the mean time.  We did not 
originally buy xcat support, but we have it now.  We also did a 
centos5->centos6 upgrade by moving all the xcat stuff to a new mgt server, 
turned off the old dhcp server and had the diskless nodes reboot on the new 
one.  I never did get around to copying the tabdump switch information.  Also, 
because of partial rack orders, the actual node arrangements got screwy. Like 
node201-240 on the bottom left and right of one idataplex rack, and 241-274 in 
the upper half.
Finally it's a matter of how long the learning curve to do it the "right" way 
vs. knowing exactly how long it takes to do it with asu/rinv and tabedit mac.

On Jan 22, 2014, at 12:10 PM, Jonathan Mills <jonmi...@renci.org> wrote:

> xCAT-cisco works to an extent.  It is fabulous for fetching MAC 
> addresses via UCS Manager.  However, for me at least, the rsetboot 
> command fails flat out.  But worst of all is that rpower commands do not 
> "shoot the node in the head" like an IPMI command.  Instead, UCS Manager 
> tries to be cute and gracefully shut down the OS.  To get the expected 
> result of an rpower command via xCAT-cisco often means waiting 60 
> seconds, or it may never work at all occasionally.
> 
> That's why I found it attractive, the idea of using a traditional xCAT 
> setup using IPMI to control UCS nodes, and node discovery to pull in 
> their MAC addresses.
> 
> Using IPMI with UCS hardware means that commands like 'rinv' don't work, 
> like with SuperMicro gear.  Some aspects of the hardware aren't exposed 
> through IPMI registers.
> 
> On 01/22/2014 11:03 AM, Jarrod B Johnson wrote:
>> Sorry I haven't been following the thread and will hit a few points to
>> the list in general.
>> 
>> For rinv macs, sadly that's not part of standards, so we can only pull
>> it off one vendor at a time, hence why rinv mac works for some, but not
>> others.
>> 
>> For the questions about UCS, I assume
>> _https://github.com/vallard/xCAT-cisco_was looked at.  I'm not
>> personally familiar with their scheme, but for other blade-oriented
>> solutions, we have used the chassis managers as a topology cue
>> alternative to switch.
>> 
>> For scraping dhcpd.leases, that should be a doable script to include.
>>  There are cases that require a more thorough investigation than can be
>> acheived in that manner is warranted, but it's better than a non-starter
>> for cases where it doesn't work.  We strive to include modern network
>> drivers and perhaps we should be more aggressive about that.
>> 
>> One thing I've been hoping to do is implement a proxydhcp server.  That
>> could glean much of the pertinent details for common configuration cases
>> and provide a nonambiguous set of candidates for automatic (sequential,
>> switch, chassis based) or semi-automatic (scriptable set of candidates
>> to do whatever with) discovery (one challenge we've had with dhcp lease
>> scraping is ambiguity of whether something is a node or piece of other
>> equipment).
>> 
>> I need to see about extending lsslp --flexdiscover to cover rackmount
>> case for service processor based reconfiguration.  The good thing about
>> that scheme is that duplicate IPs are fine and get fixed automatically
>> so long as the IMMs are on the same subnet as a management node.
>> 
>> I am interested in issues with switch based discovery that would cause
>> it to be given up on.  Sequential or semi-automatic discovery is ok for
>> smallish setups, but scaling it up causes a lot of ambiguity to trudge
>> through.
>> 
>> Inactive hide details for David D Johnson ---01/22/2014 08:53:19 AM---On
>> Jan 22, 2014, at 8:30 AM, Jonathan Mills <jonmills@renDavid D Johnson
>> ---01/22/2014 08:53:19 AM---On Jan 22, 2014, at 8:30 AM, Jonathan Mills
>> <jonmi...@renci.org> wrote: > Comments inline...
>> 
>> From: David D Johnson <david_john...@brown.edu>
>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>> Date: 01/22/2014 08:53 AM
>> Subject: Re: [xcat-user] Frustrating time with sequential node discovery
>> 
>> ------------------------------------------------------------------------
>> 
>> 
>> 
>> 
>> On Jan 22, 2014, at 8:30 AM, Jonathan Mills <jonmi...@renci.org> wrote:
>> 
>>> Comments inline...
>>> 
>>> On 1/22/14, 8:08 AM, David D Johnson wrote:
>>>> I've been lurking on this discussion, and just checked to see what we've
>>>> got -- nbroot or genesis -- and we have both of them.
>>>> 
>>>> I had given up on node discovery years ago, we originally used the
>>>> switch port numbers and forwarding tables to assign node names.  Now I
>>>> use ASU to collect the macs, and populate the mac table with a bit of
>>>> grep and awk.  But last week we powered on a rack of 20 non-ibm nodes,
>>>> and I was wishing we had something easier since ASU didn't work for
>> them.
>>> 
>>> Precisely!  That's what I've been doing for IBM and Dell gear for a long
>>> time.  In fact, here's what I do:
>>> 
>>> for i in `seq 1 100`
>>> do
>>>  MAC=$(rinv node${i} mac | grep 'MAC Address 1' | cut -d " " -f 5)
>>>  chtab node="node${i}" mac.mac=$MAC mac.interface="eth0"
>>> done;
>>> 
>>> ....or similar.
>>> 
>> 
>> Nice, "rinv mac" seems to take much less time that "asu show --group
>> PXE",and it can run in parallel on a node range. Unfortunately neither
>> works for these (SupermicrO) nodes.
>> 
>>>> 
>>>> So I have two questions --
>>>> 1) Can I safely delete the xCAT-nbroot-core* RPMS ?
>>> 
>>> I still don't know!  Because if using the chain-loading, I don't see how
>>> the first stage is installed by the xCAT-genesis-* RPMS.
>>> 
>>>> 2) What is the current best practice method?  What about for non-ibm
>>>> hardware?
>>> 
>>> This is also what I'm trying to establish.  I had been attempting to use
>>> sequential node discovery for Cisco UCS-B series equipment, since the
>>> switch method would be hard to use (since UCS's Fabric Interconnect is
>>> kinda like a switch but not totally).
>>> 
>>> I am also familiar with ROCKS Clusters method of sequential node
>>> discovery (insert-ethers), which literally scrapes MAC addresses out of
>>> the dhcpd lines which appear in /var/log/messages -- IMHO that would
>>> have worked far better for xCAT than by the method of PXE booting (or
>>> chain-loading) the Genesis kernel -- which may or may not have kernel
>>> modules for your NIC hardware -- to send messages back to
>> xcatmaster:3001.
>>> 
>> 
>> Our clusters from 2006-2009 were all based on Rocks, and that was the
>> one feature I miss the most.
>> This way would work for any hardware type.
>> The Rocks web-GUI database was OK for the small clusters, but they moved
>> to the "rocks" cli for making changes just about the time we went to
>> xcat-2.  We in fact used xcat-1 for all the hardware management, rcons
>> rpower, etc.
>> I miss the ability to hack on the python scripts.
>> 
>>>> 
>>>> No, three questions
>>>> 3) How do you get IBM manufacturing to use a specific different 172.29.X
>>>> for each rack they build for you?
>>>> We've had three racks arrive in different months all with 172.29.101
>>>> addresses for the IMM, and I have to spend 5-10 minutes reprogramming
>>>> each one.  I can't put them on the same network until the conflicts are
>>>> gone.
>>> 
>>> For a price, IBM has an integration center with technicians can make
>>> such things happen.  For another price, they even offer a kind of DMZ
>>> they call "the yellowzone" where you can SSH into their lab and
>>> pre-configure your gear before it ships.  But it isn't worth the effort
>>> unless you're going to be buying a lot of things with some frequency.
>>> 
>> 
>> We get the racks prebuilt and shipped from Hong Kong, and they do
>> program the IMM addresses, but they never ask us which rack number to
>> use, it's always A1.  If they're going to do it at all, they should do
>> it right.
>> 
>>>> What's hard for me is that we get new nodes only a couple times a year,
>>>> and I forget everything in between.
>>>> 
>>>> Thanks,
>>>> -- ddj
>>>> 
>>>> On Jan 22, 2014, at 7:12 AM, Lissa Valletta <lis...@us.ibm.com
>>>> <mailto:lis...@us.ibm.com>> wrote:
>>>> 
>>>>> xCAT-nbroot-core*   was replaced by  xCAT-genesis-* in xCAT 2.8.
>>>>> 
>>>>> Lissa K. Valletta
>>>>> 8-3/B10
>>>>> Poughkeepsie, NY 12601
>>>>> (tie 293) 433-3102
>>>>> 
>>>>> 
>>>>> 
>>>>> <graycol.gif>Xiao Peng Wang ---01/22/2014 02:58:13 AM---Why do you say
>>>>> that you need nbk.x86_64? Is this file listed in the
>>>>> </tftpboot/xcat/xnba/nets/>?
>>>>> 
>>>>> From: Xiao Peng Wang <w...@cn.ibm.com <mailto:w...@cn.ibm.com>>
>>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net
>>>>> <mailto:xcat-user@lists.sourceforge.net>>,
>>>>> Cc: xCAT Users Mailing list <xcat-user@lists.sourceforge.net
>>>>> <mailto:xcat-user@lists.sourceforge.net>>
>>>>> Date: 01/22/2014 02:58 AM
>>>>> Subject: Re: [xcat-user] Frustrating time with sequential node
>> discovery
>>>>> 
>>>>> 
>> ------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> 
>>>>> Why do you say that you need nbk.x86_64? Is this file listed in the
>>>>> </tftpboot/xcat/xnba/nets/>?
>>>>> 
>>>>> With the latest xCAT build, it needs
>>>>> /tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.*
>>>>> 
>>>>> Thanks
>>>>> Best Regards
>>>>> ----------------------------------------------------------------------
>>>>> Wang Xiaopeng (王晓朋)
>>>>> IBM China System Technology Laboratory
>>>>> Tel: 86-10-82453455
>>>>> Email: w...@cn.ibm.com <mailto:w...@cn.ibm.com>
>>>>> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
>>>>> Haidian District Beijing P.R.China 100193
>>>>> 
>>>>> <graycol.gif>Jonathan Mills ---2014/01/22 14:10:32---It would seem to
>>>>> me that what I am missing is the whole of the  xCAT-nbroot
>>>>> infrastructure...because
>>>>> 
>>>>> From: Jonathan Mills <jonmi...@renci.org <mailto:jonmi...@renci.org>>
>>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net
>>>>> <mailto:xcat-user@lists.sourceforge.net>>,
>>>>> Date: 2014/01/22 14:10
>>>>> Subject: Re: [xcat-user] Frustrating time with sequential node
>> discovery
>>>>> 
>> ------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> 
>>>>> It would seem to me that what I am missing is the whole of the
>>>>> xCAT-nbroot infrastructure...because it isn't part of xcat-core, nor
>>>>> xcat-dep.  So I didn't grab it.  But it just so happens...you need it.
>>>>> 
>>>>> The file
>>>>> 
>>>>> /tftpboot/xcat/nbk.x86_64
>>>>> 
>>>>> is provided by the RPM xCAT-nbkernel-x86_64.  Which is missing from my
>>>>> yum repo mirrior, and from my hosts.
>>>>> 
>>>>> 
>>>>> Anything else I'm missing?  Hopefully if I grab correct copies of
>>>>> xCAT-nbkernel and xCAT-nbroot (or xCAT-nbroot2?) then node discovery
>>>>> will actually work.
>>>>> 
>>>>> On 1/22/14, 12:08 AM, Xiao Peng Wang wrote:
>>>>>> Both Josh and Russell are correct.
>>>>>> 
>>>>>> xNBA is a customized pxe and genesis is a xCAT customized diskless
>> linux
>>>>>> system to run discovery and other tasks like 'bmcsetup'. It does not
>>>>>> need the /tftpboot/pxelinux.cfg/.* to load the genesis.
>>>>>> 
>>>>>> For discovery, if a node is not defined in xCAT, the dhcp
>> configuration
>>>>>> in the /etc/dhcp/dhcpd.conf or /etc/dhcpd.conf is used to reply
>> the dhcp
>>>>>> request from not-discovered node.
>>>>>> 
>>>>>> In your dhcpd.conf, it should have the following part for your
>>>>>> deployment network. If not, run 'makedhcp -n' to recreate your
>>>>> dhcpd.conf.
>>>>>>    if option user-class-identifier = "xNBA" and option
>>>>>> client-architecture = 00:00 { #x86, xCAT Network Boot Agent
>>>>>>       always-broadcast on;
>>>>>>       filename =
>>>>> "_http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16_";;
>>>>>>    } else if option user-class-identifier = "xNBA" and option
>>>>>> client-architecture = 00:09 { #x86, xCAT Network Boot Agent
>>>>>>       filename =
>>>>>> "_http://10.1.0.207/tftpboot/xcat/xnba/nets/10.1.0.0_16.uefi_";;
>>>>>>    } else if option client-architecture = 00:00  { #x86
>>>>>>      filename "xcat/xnba.kpxe";
>>>>>>    } else if option vendor-class-identifier = "Etherboot-5.4"  { #x86
>>>>>>      filename "xcat/xnba.kpxe";
>>>>>>    } else if option client-architecture = 00:07 { #x86_64 uefi
>>>>>>       filename "xcat/xnba.efi";
>>>>>>    } else if option client-architecture = 00:09 { #x86_64 uefi
>>>>>> alternative id
>>>>>>       filename "xcat/xnba.efi";
>>>>>>    }
>>>>>> 
>>>>>> During the boot process of a not-discovered node, dhcpd will tell the
>>>>>> node to load xcat/xnba.kpxe first and then the configuration file
>>>>>> http://<xcat mn>/tftpboot/xcat/xnba/nets/10.1.0.0_16. Then the
>> xnba will
>>>>>> load the genesis.
>>>>>> 
>>>>>> Take a look of the syslog to see whether the xnba was downloaded
>>>>>> successfully from tftp server. And look into the httpd log to see
>>>>>> whether the genesis has been downloaded successfully.
>>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> Best Regards
>>>>>> ----------------------------------------------------------------------
>>>>>> Wang Xiaopeng (王晓朋)
>>>>>> IBM China System Technology Laboratory
>>>>>> Tel: 86-10-82453455
>>>>>> Email: w...@cn.ibm.com <mailto:w...@cn.ibm.com>
>>>>>> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
>>>>>> Haidian District Beijing P.R.China 100193
>>>>>> 
>>>>>> Inactive hide details for Josh Nielsen ---2014/01/22 05:56:00---Ah, I
>>>>>> see what you are saying now. Well, I hope the thread I stJosh Nielsen
>>>>>> ---2014/01/22 05:56:00---Ah, I see what you are saying now. Well,
>> I hope
>>>>>> the thread I stumbled on that Jarrod replied to help
>>>>>> 
>>>>>> From: Josh Nielsen <jniel...@hudsonalpha.org
>>>>> <mailto:jniel...@hudsonalpha.org>>
>>>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net
>>>>> <mailto:xcat-user@lists.sourceforge.net>>,
>>>>>> Date: 2014/01/22 05:56
>>>>>> Subject: Re: [xcat-user] Frustrating time with sequential node
>> discovery
>>>>>> 
>>>>>> 
>> ------------------------------------------------------------------------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Ah, I see what you are saying now. Well, I hope the thread I stumbled
>>>>>> on that Jarrod replied to helps figure out why his configuration is
>>>>>> looking to the outdated (according to what Jarrod said) configuration
>>>>>> files in /tftpboot/pxelinux.cfg/. Looks like it is either
>>>>>> /etc/dhcpd.conf or /var/lib/dhcpd/dhcpd.leases related in that case.
>>>>>> 
>>>>>> On Tue, Jan 21, 2014 at 3:51 PM, Russell Jones
>>>>>> <russell-l...@jonesmail.me <mailto:russell-l...@jonesmail.me>> wrote:
>>>>>>> It *should* work with xNBA and Genesis - xNBA is the PXE image that
>>>>>>> loads Genesis. :-)
>>>>>>> 
>>>>>>> Genesis is the utility image that handles shell commands,
>>>>> runimages, etc.
>>>>>>> 
>>>>>>> Don't confuse NBFS with xNBA - NBFS is deprecated via Genesis.
>>>>> xNBA is
>>>>>>> the gpxe image that loads Genesis or your normal OS image
>>>>> depending on
>>>>>>> what you sent via nodeset. Genesis would not be able to load without
>>>>>>> xNBA (or standard PXE), and neither would any netboot images.
>>>>>>> 
>>>>>>> On 1/21/2014 3:33 PM, Josh Nielsen wrote:
>>>>>>>> my case it still works with
>>>>>>>> both xnba and genesis because of the nature of PXE chainloading. It
>>>>>>>> probably adds deployment time, but it actually works in such a mixed
>>>>>>>> configuration.
>>>>>>>> 
>>>>>>>> -Josh
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> ------------------------------------------------------------------------------
>>>>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>>>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>>>>>> Critical Workloads, Development Environments & Everything In Between.
>>>>>>> Get a Quote or Start a Free Trial Today.
>>>>>>> 
>>>>>> 
>>>>> 
>> _http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_
>>>>>>> _______________________________________________
>>>>>>> xCAT-user mailing list
>>>>>>> xCAT-user@lists.sourceforge.net
>>>>> <mailto:xCAT-user@lists.sourceforge.net>
>>>>>>> _https://lists.sourceforge.net/lists/listinfo/xcat-user_
>>>>>> 
>>>>>> 
>>>>> 
>> ------------------------------------------------------------------------------
>>>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>>>>> Critical Workloads, Development Environments & Everything In Between.
>>>>>> Get a Quote or Start a Free Trial Today.
>>>>>> 
>>>>> 
>> _http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_
>>>>>> _______________________________________________
>>>>>> xCAT-user mailing list
>>>>>> xCAT-user@lists.sourceforge.net
>> <mailto:xCAT-user@lists.sourceforge.net>
>>>>>> _https://lists.sourceforge.net/lists/listinfo/xcat-user_
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Jonathan Mills
>>>>> Systems Administrator
>>>>> Renaissance Computing Institute
>>>>> UNC-Chapel Hill
>>>>> 
>>>>> 
>> ------------------------------------------------------------------------------
>>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>>>> Critical Workloads, Development Environments & Everything In Between.
>>>>> Get a Quote or Start a Free Trial Today. _
>>>>> 
>> __http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_
>>>>> _______________________________________________
>>>>> xCAT-user mailing list
>>>>> xCAT-user@lists.sourceforge.net
>> <mailto:xCAT-user@lists.sourceforge.net>_
>>>>> __https://lists.sourceforge.net/lists/listinfo/xcat-user_
>>>>> 
>> ------------------------------------------------------------------------------
>>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>>>> Critical Workloads, Development Environments & Everything In Between.
>>>>> Get a Quote or Start a Free Trial Today.
>>>>> 
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_______________________________________________
>>>>> xCAT-user mailing list
>>>>> xCAT-user@lists.sourceforge.net
>> <mailto:xCAT-user@lists.sourceforge.net>
>>>>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>>>> 
>>>>> 
>> ------------------------------------------------------------------------------
>>>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>>>> Critical Workloads, Development Environments & Everything In Between.
>>>>> Get a Quote or Start a Free Trial Today.
>>>>> 
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_______________________________________________
>>>>> xCAT-user mailing list
>>>>> xCAT-user@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>>> 
>>> 
>>> --
>>> Jonathan Mills
>>> Systems Administrator
>>> Renaissance Computing Institute
>>> UNC-Chapel Hill
>>> 
>>> 
>> ------------------------------------------------------------------------------
>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>> Critical Workloads, Development Environments & Everything In Between.
>>> Get a Quote or Start a Free Trial Today.
>>> 
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> xCAT-user mailing list
>>> xCAT-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>> 
>> 
>> ------------------------------------------------------------------------------
>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> Critical Workloads, Development Environments & Everything In Between.
>> Get a Quote or Start a Free Trial Today.
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>> _______________________________________________
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>> 
> 
> -- 
> Jonathan Mills
> Systems Administrator
> Renaissance Computing Institute
> UNC-Chapel Hill
> 
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today. 
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to