Hi Kevin,
 
We have a script xcatha.py can setup a standby xcat management node.
It is used in one of xCAT HA solution. We can use it setup 2 inactive xcat management node, then,  use it active one of the xcat management node. You scenario is similar with our solution.
You can find the related code and doc here: https://github.com/xcat2/xcat-extensions/tree/master/HA
I hope it can help you.
 
 
Best Regards
--------------------------------------------------
Yuan Bai (白媛)

CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193

IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193
 
 
----- Original message -----
From: Rich Sudlow <r...@nd.edu>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>, David Johnson <david_john...@brown.edu>
Cc:
Subject: Re: [xcat-user] Staging a new management node but keeping it inactive?
Date: Sat, Dec 8, 2018 5:21 AM
 
We do something slightly different by not using a DHCP pool but only allow
DHCP to answer with MACs that are know.


On 12/7/18 1:30 PM, David Johnson wrote:
> Yes, only one can have a dynamic range.  In my case neither of them do,
> since I manually paste the MAC addresses into the mac table.
>
> My issue with deleting first is when I deleted 16 mac addresses and then got
> sidetracked and went home, those nodes later lost their lease and ended up getting
> evicted from GPFS.  Not an issue if the nodes were otherwise idle, but they had
> been marked to drain, jobs were still running on them.
>
>   — ddj
>
>> On Dec 7, 2018, at 1:25 PM, Kevin Keane <kke...@sandiego.edu
>> <mailto:kke...@sandiego.edu>> wrote:
>>
>> Thank you, Dave. That is an interesting alternative approach; I might actually
>> consider that.
>>
>> So you are saying that the old and new DHCP servers can run in parallel? I
>> assume that they just can't both have dynamic ranges?
>>
>> I'm not sure I understand what the problem is with deleting a node first, and
>> then adding it on the new system. Even if the lease expires, wouldn't it just
>> reacquire the new one once you create it?
>>
>> _______________________________________________________________________
>> Kevin Keane | Systems Architect | University of San Diego ITS |
>> kke...@sandiego.edu <mailto:kke...@sandiego.edu>
>> Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 | 619.260.6859
>>
>> *REMEMBER! **_No one from IT at USD will ever ask to confirm or supply your
>> password_*.
>> These messages are an attempt to steal your username and password. Please do
>> not reply to, click the links within, or open the attachments of these
>> messages. Delete them!
>>
>>
>>
>>
>> On Thu, Dec 6, 2018 at 4:43 PM <david_john...@brown.edu
>> <mailto:david_john...@brown.edu>> wrote:
>>
>>     We’ve kept parallel clusters on the same network for nearly a year now
>>     while transitioning to RH7 from CentOS 6.
>>     Initially copied the hosts and nodelist and MAC tables into the new xcat
>>     database.  Carefully controlled use of makedhcp so that nodes moving to
>>     the new cluster were first added to new dhcp server and then deleted from
>>     the old. (  Didn’t want a repeat of what happened when I left some deleted
>>     from the old but not added to the new cluster and they lost their lease.
>>     The postscript hardeths also helped.  ). Make new images and use nodeset
>>     to point to them. Reboot and test.
>>
>>     Drawback is having to make parallel changes on both management servers all
>>     the time, but we needed both clusters to access gpfs so it was a necessary
>>     evil.
>>
>>       -- ddj
>>     Dave Johnson
>>
>>     On Dec 6, 2018, at 4:23 PM, Kevin Keane <kke...@sandiego.edu
>>     <mailto:kke...@sandiego.edu>> wrote:
>>
>>>     I'm in the middle of upgrading our existing HPC (from RHEL 6 to RHEL 7).
>>>     I'm doing most of my testing on a separate "sandbox" test bed, but now
>>>     I'm close to going live. I'm trying to figure out how to do this with
>>>     minimal disruption.
>>>
>>>     My question: how can I install the new management node and keep it
>>>     *almost* completely operational, without interfering with the existing
>>>     cluster? Is it enough to disable DHCP, or do I need to do anything else?
>>>
>>>     How do I prevent DHCP from accidentally getting enabled before I'm ready?
>>>     Is makedhcp responsible for that?
>>>
>>>     Step-by-step, here is what I plan to do:
>>>
>>>     - Set up the new management node, but keep it inactive.
>>>     - Test
>>>     - Bring down all compute nodes.
>>>     - Via IPMI, reset all the compute nodes' BMC controllers to DHCP
>>>     - Other migration steps (home directories, modifications on the storage
>>>     node, etc.)
>>>     - De-activate the old management node (but keep it running)
>>>     - Activate the new management node.
>>>     - Discover and boot compute nodes
>>>
>>>     Is there anything glaringly obvious that I overlooked?
>>>
>>>     Thanks!
>>>
>>>     _______________________________________________________________________
>>>     Kevin Keane | Systems Architect | University of San Diego ITS |
>>>     kke...@sandiego.edu <mailto:kke...@sandiego.edu>
>>>     Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 | 619.260.6859
>>>
>>>     *REMEMBER! **_No one from IT at USD will ever ask to confirm or supply
>>>     your password_*.
>>>     These messages are an attempt to steal your username and password. Please
>>>     do not reply to, click the links within, or open the attachments of these
>>>     messages. Delete them!
>>>
>>>
>>>     _______________________________________________
>>>     xCAT-user mailing list
>>>     xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net>
>>>     https://lists.sourceforge.net/lists/listinfo/xcat-user
>>     _______________________________________________
>>     xCAT-user mailing list
>>     xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net>
>>     https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>> _______________________________________________
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>


--
Rich Sudlow
University of Notre Dame
Center for Research Computing - Union Station
506 W. South St
South Bend, In 46601

(574) 631-7258 (office)
(574) 807-1046 (cell)


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

 
 

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to