Hi Kevin,
We have a script xcatha.py can setup a standby xcat management node.
It is used in one of xCAT HA solution. We can use it setup 2 inactive xcat management node, then, use it active one of the xcat management node. You scenario is similar with our solution.
You can find the related code and doc here: https://github.com/xcat2/xcat-extensions/tree/master/HA
I hope it can help you.
Best Regards
--------------------------------------------------
Yuan Bai (白媛)
CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193
IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193
--------------------------------------------------
Yuan Bai (白媛)
CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193
IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193
----- Original message -----
From: Rich Sudlow <r...@nd.edu>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>, David Johnson <david_john...@brown.edu>
Cc:
Subject: Re: [xcat-user] Staging a new management node but keeping it inactive?
Date: Sat, Dec 8, 2018 5:21 AM
We do something slightly different by not using a DHCP pool but only allow
DHCP to answer with MACs that are know.
On 12/7/18 1:30 PM, David Johnson wrote:
> Yes, only one can have a dynamic range. In my case neither of them do,
> since I manually paste the MAC addresses into the mac table.
>
> My issue with deleting first is when I deleted 16 mac addresses and then got
> sidetracked and went home, those nodes later lost their lease and ended up getting
> evicted from GPFS. Not an issue if the nodes were otherwise idle, but they had
> been marked to drain, jobs were still running on them.
>
> — ddj
>
>> On Dec 7, 2018, at 1:25 PM, Kevin Keane <kke...@sandiego.edu
>> <mailto:kke...@sandiego.edu>> wrote:
>>
>> Thank you, Dave. That is an interesting alternative approach; I might actually
>> consider that.
>>
>> So you are saying that the old and new DHCP servers can run in parallel? I
>> assume that they just can't both have dynamic ranges?
>>
>> I'm not sure I understand what the problem is with deleting a node first, and
>> then adding it on the new system. Even if the lease expires, wouldn't it just
>> reacquire the new one once you create it?
>>
>> _______________________________________________________________________
>> Kevin Keane | Systems Architect | University of San Diego ITS |
>> kke...@sandiego.edu <mailto:kke...@sandiego.edu>
>> Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 | 619.260.6859
>>
>> *REMEMBER! **_No one from IT at USD will ever ask to confirm or supply your
>> password_*.
>> These messages are an attempt to steal your username and password. Please do
>> not reply to, click the links within, or open the attachments of these
>> messages. Delete them!
>>
>>
>>
>>
>> On Thu, Dec 6, 2018 at 4:43 PM <david_john...@brown.edu
>> <mailto:david_john...@brown.edu>> wrote:
>>
>> We’ve kept parallel clusters on the same network for nearly a year now
>> while transitioning to RH7 from CentOS 6.
>> Initially copied the hosts and nodelist and MAC tables into the new xcat
>> database. Carefully controlled use of makedhcp so that nodes moving to
>> the new cluster were first added to new dhcp server and then deleted from
>> the old. ( Didn’t want a repeat of what happened when I left some deleted
>> from the old but not added to the new cluster and they lost their lease.
>> The postscript hardeths also helped. ). Make new images and use nodeset
>> to point to them. Reboot and test.
>>
>> Drawback is having to make parallel changes on both management servers all
>> the time, but we needed both clusters to access gpfs so it was a necessary
>> evil.
>>
>> -- ddj
>> Dave Johnson
>>
>> On Dec 6, 2018, at 4:23 PM, Kevin Keane <kke...@sandiego.edu
>> <mailto:kke...@sandiego.edu>> wrote:
>>
>>> I'm in the middle of upgrading our existing HPC (from RHEL 6 to RHEL 7).
>>> I'm doing most of my testing on a separate "sandbox" test bed, but now
>>> I'm close to going live. I'm trying to figure out how to do this with
>>> minimal disruption.
>>>
>>> My question: how can I install the new management node and keep it
>>> *almost* completely operational, without interfering with the existing
>>> cluster? Is it enough to disable DHCP, or do I need to do anything else?
>>>
>>> How do I prevent DHCP from accidentally getting enabled before I'm ready?
>>> Is makedhcp responsible for that?
>>>
>>> Step-by-step, here is what I plan to do:
>>>
>>> - Set up the new management node, but keep it inactive.
>>> - Test
>>> - Bring down all compute nodes.
>>> - Via IPMI, reset all the compute nodes' BMC controllers to DHCP
>>> - Other migration steps (home directories, modifications on the storage
>>> node, etc.)
>>> - De-activate the old management node (but keep it running)
>>> - Activate the new management node.
>>> - Discover and boot compute nodes
>>>
>>> Is there anything glaringly obvious that I overlooked?
>>>
>>> Thanks!
>>>
>>> _______________________________________________________________________
>>> Kevin Keane | Systems Architect | University of San Diego ITS |
>>> kke...@sandiego.edu <mailto:kke...@sandiego.edu>
>>> Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 | 619.260.6859
>>>
>>> *REMEMBER! **_No one from IT at USD will ever ask to confirm or supply
>>> your password_*.
>>> These messages are an attempt to steal your username and password. Please
>>> do not reply to, click the links within, or open the attachments of these
>>> messages. Delete them!
>>>
>>>
>>> _______________________________________________
>>> xCAT-user mailing list
>>> xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>> _______________________________________________
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>> _______________________________________________
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Rich Sudlow
University of Notre Dame
Center for Research Computing - Union Station
506 W. South St
South Bend, In 46601
(574) 631-7258 (office)
(574) 807-1046 (cell)
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user