Re: [xcat-user] Staging a new management node but keeping it inactive?

Kevin Keane Tue, 11 Dec 2018 10:05:47 -0800

Interesting idea, Nick. I'm not sure how that would work because when the
nodes are booting up, initially they wouldn't know which VLAN to connect
to. I don't have enough control over the switch to set up port-based VLANs.


_______________________________________________________________________
Kevin Keane | Systems Architect | University of San Diego ITS |
kke...@sandiego.edu
Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 | 619.260.6859

*REMEMBER! **No one from IT at USD will ever ask to confirm or supply your
password*.
These messages are an attempt to steal your username and password. Please
do not reply to, click the links within, or open the attachments of these
messages. Delete them!




On Fri, Dec 7, 2018 at 7:37 PM Nick Evans <nick.c.ev...@gmail.com> wrote:

> Why not run the mgr and data networks for the new cluster in different
> vlans.
>
> That way you can leave the dhcp on both systems running and they will only
> respond to the requests for their respective clusters.
>
> Nick
>
> On Sat., 8 Dec. 2018, 8:22 am Rich Sudlow <r...@nd.edu wrote:
>
>> We do something slightly different by not using a DHCP pool but only allow
>> DHCP to answer with MACs that are know.
>>
>>
>> On 12/7/18 1:30 PM, David Johnson wrote:
>> > Yes, only one can have a dynamic range.  In my case neither of them do,
>> > since I manually paste the MAC addresses into the mac table.
>> >
>> > My issue with deleting first is when I deleted 16 mac addresses and
>> then got
>> > sidetracked and went home, those nodes later lost their lease and ended
>> up getting
>> > evicted from GPFS.  Not an issue if the nodes were otherwise idle, but
>> they had
>> > been marked to drain, jobs were still running on them.
>> >
>> >   — ddj
>> >
>> >> On Dec 7, 2018, at 1:25 PM, Kevin Keane <kke...@sandiego.edu
>> >> <mailto:kke...@sandiego.edu>> wrote:
>> >>
>> >> Thank you, Dave. That is an interesting alternative approach; I might
>> actually
>> >> consider that.
>> >>
>> >> So you are saying that the old and new DHCP servers can run in
>> parallel? I
>> >> assume that they just can't both have dynamic ranges?
>> >>
>> >> I'm not sure I understand what the problem is with deleting a node
>> first, and
>> >> then adding it on the new system. Even if the lease expires, wouldn't
>> it just
>> >> reacquire the new one once you create it?
>> >>
>> >> _______________________________________________________________________
>> >> Kevin Keane | Systems Architect | University of San Diego ITS |
>> >> kke...@sandiego.edu <mailto:kke...@sandiego.edu>
>> >> Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 |
>> 619.260.6859
>> >>
>> >> *REMEMBER! **_No one from IT at USD will ever ask to confirm or supply
>> your
>> >> password_*.
>> >> These messages are an attempt to steal your username and password.
>> Please do
>> >> not reply to, click the links within, or open the attachments of these
>> >> messages. Delete them!
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Dec 6, 2018 at 4:43 PM <david_john...@brown.edu
>> >> <mailto:david_john...@brown.edu>> wrote:
>> >>
>> >>     We’ve kept parallel clusters on the same network for nearly a year
>> now
>> >>     while transitioning to RH7 from CentOS 6.
>> >>     Initially copied the hosts and nodelist and MAC tables into the
>> new xcat
>> >>     database.  Carefully controlled use of makedhcp so that nodes
>> moving to
>> >>     the new cluster were first added to new dhcp server and then
>> deleted from
>> >>     the old. (  Didn’t want a repeat of what happened when I left some
>> deleted
>> >>     from the old but not added to the new cluster and they lost their
>> lease.
>> >>     The postscript hardeths also helped.  ). Make new images and use
>> nodeset
>> >>     to point to them. Reboot and test.
>> >>
>> >>     Drawback is having to make parallel changes on both management
>> servers all
>> >>     the time, but we needed both clusters to access gpfs so it was a
>> necessary
>> >>     evil.
>> >>
>> >>       -- ddj
>> >>     Dave Johnson
>> >>
>> >>     On Dec 6, 2018, at 4:23 PM, Kevin Keane <kke...@sandiego.edu
>> >>     <mailto:kke...@sandiego.edu>> wrote:
>> >>
>> >>>     I'm in the middle of upgrading our existing HPC (from RHEL 6 to
>> RHEL 7).
>> >>>     I'm doing most of my testing on a separate "sandbox" test bed,
>> but now
>> >>>     I'm close to going live. I'm trying to figure out how to do this
>> with
>> >>>     minimal disruption.
>> >>>
>> >>>     My question: how can I install the new management node and keep it
>> >>>     *almost* completely operational, without interfering with the
>> existing
>> >>>     cluster? Is it enough to disable DHCP, or do I need to do
>> anything else?
>> >>>
>> >>>     How do I prevent DHCP from accidentally getting enabled before
>> I'm ready?
>> >>>     Is makedhcp responsible for that?
>> >>>
>> >>>     Step-by-step, here is what I plan to do:
>> >>>
>> >>>     - Set up the new management node, but keep it inactive.
>> >>>     - Test
>> >>>     - Bring down all compute nodes.
>> >>>     - Via IPMI, reset all the compute nodes' BMC controllers to DHCP
>> >>>     - Other migration steps (home directories, modifications on the
>> storage
>> >>>     node, etc.)
>> >>>     - De-activate the old management node (but keep it running)
>> >>>     - Activate the new management node.
>> >>>     - Discover and boot compute nodes
>> >>>
>> >>>     Is there anything glaringly obvious that I overlooked?
>> >>>
>> >>>     Thanks!
>> >>>
>> >>>
>>  _______________________________________________________________________
>> >>>     Kevin Keane | Systems Architect | University of San Diego ITS |
>> >>>     kke...@sandiego.edu <mailto:kke...@sandiego.edu>
>> >>>     Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 |
>> 619.260.6859
>> >>>
>> >>>     *REMEMBER! **_No one from IT at USD will ever ask to confirm or
>> supply
>> >>>     your password_*.
>> >>>     These messages are an attempt to steal your username and
>> password. Please
>> >>>     do not reply to, click the links within, or open the attachments
>> of these
>> >>>     messages. Delete them!
>> >>>
>> >>>
>> >>>     _______________________________________________
>> >>>     xCAT-user mailing list
>> >>>     xCAT-user@lists.sourceforge.net <mailto:
>> xCAT-user@lists.sourceforge.net>
>> >>>     https://lists.sourceforge.net/lists/listinfo/xcat-user
>> >>     _______________________________________________
>> >>     xCAT-user mailing list
>> >>     xCAT-user@lists.sourceforge.net <mailto:
>> xCAT-user@lists.sourceforge.net>
>> >>     https://lists.sourceforge.net/lists/listinfo/xcat-user
>> >>
>> >> _______________________________________________
>> >> xCAT-user mailing list
>> >> xCAT-user@lists.sourceforge.net <mailto:
>> xCAT-user@lists.sourceforge.net>
>> >> https://lists.sourceforge.net/lists/listinfo/xcat-user
>> >
>> >
>> >
>> > _______________________________________________
>> > xCAT-user mailing list
>> > xCAT-user@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/xcat-user
>> >
>>
>>
>> --
>> Rich Sudlow
>> University of Notre Dame
>> Center for Research Computing - Union Station
>> 506 W. South St
>> South Bend, In 46601
>>
>> (574) 631-7258 (office)
>> (574) 807-1046 (cell)
>>
>>
>> _______________________________________________
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Staging a new management node but keeping it inactive?

Reply via email to