Re: [ClusterLabs] Rebuild of failed node

Fabrizio Ermini Tue, 13 May 2025 00:19:58 -0700

Thank you very much Alexey, I will certainly try that and update you on the
result.


Best regards!


Il giorno lun 12 mag 2025 alle ore 22:36 <[email protected]> ha scritto:

> Hi,
>
>
>
> Occasionally, I have pacemaker as a base layer of custom clustering
> solution and I have a script to rebuild the second node from the first one.
> I can’t share the script itself as is has a lot of solution-dependent
> references, but I can share the sequence to rebuild the failed node:
>
>    1. Setup the new node with the same IP and hostname
>    2. (optional) setup passwordless mutual key-based SSH access. It is
>    not necessary, but make a lot of things easy.
>    3. Copy files from survived host to the new one:
>       1. /etc/corosync/authkey
>       2. /etc/corosync/corosync.conf
>       3. /etc/drbd.d/*.res
>       4. /etc/pacemaker/authkey
>    4. Set *hacluster* user pass to the same as it was on the survived
>    node.
>    5. Re-auth pcs nodes with command
>    pcs host auth <host1_name>  <host2_name> -u hacluster -p
>    <ha_cluster_pass>
>    6. Reboot the restored server
>    7. PROFIT!!!
>
>
>
> If you use no arbiter (corosync-qnetd) this should be enough for your new
> cluster node up and running. If you use corosync-qnetd, you need also
> restore corosync-qdevice nssdb keys for the second host connect the arbiter
> node:
>
>    1. On old host, extract your arbiter certificate from nssdb on the
>    survived host:
>    certutil -L -d /etc/corosync/qdevice/net/nssdb -n 'QNet CA' -r >
>    /root/qnetd-cert.crt
>    2. Copy certificate to the new host, assume the path on the new host
>    is the same
>    3. On the new host, Init new nssdb with certificate:
>    corosync-qdevice-net-certutil -i -c /root/qnetd-cert.crt
>    4. Copy certificate and key at location
>    /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12 from old node to
>    new one
>    5. On the new node: Import certificate and key:
>    corosync-qdevice-net-certutil -m -c
>    /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12
>    6. Enable or restart corosync-qdevice:
>    systemctl enable –now corosync-qdevice.service
>    or
>    systemctl restart corosync-qdevice.service
>    7. Enjoy!
>
>
>
> That’s what practically work for me and included in service scripts of our
> product, based on Pacemaker.
>
>
>
> Hope this could help!
>
>
>
> Sincerely,
>
>
>
> Alex
>
>
>
>
>
> *From:* Users <[email protected]> *On Behalf Of *Fabrizio
> Ermini
> *Sent:* Friday, May 9, 2025 5:26 PM
> *To:* [email protected]
> *Subject:* [ClusterLabs] Rebuild of failed node
>
>
>
> Hi all! Freshmen here, just joined.
>
>
>
> I'm currently in the need to rebuild a failed node on a
> pacemaker2.1/corosync3.1 2-node cluster with drbd storage.
>
> I've searched in Pacemaker docs and in the list archives, but I haven't
> found a clear guide on how to proceed in this task. So far, I've
> reinstalled a new server, configured the same IP and hostname of the failed
> one, and installed all the software. I've also fixed DRBD layer and started
> the resync of the volumes. But it's not clear to me how to proceed - I've
> found some hints online pointing to the need of manually copying corosync
> config, but they were quite old and probably obsolete. I'm using pcs as a
> shell and I haven't found a command designed to replace a node, only to add
> or remove them.
>
> It seems really strange to me that there isn't a guide, since this should
> be a very basic operation and it's quite important to know how to do it -
> HW breaks, as a matter of fact :D
>
> So I'll be very grateful if anyone can point me in the right direction.
>
> Thanks in advance, and best regards
>
>
>
> Fabrizio
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Rebuild of failed node

Reply via email to