Re: [ClusterLabs] Preventing multiple resources from moving at the same time.

Klaus Wenninger Fri, 30 Apr 2021 07:51:31 -0700

On 4/30/21 4:04 PM, Matthew Schumacher wrote:

On 4/21/21 11:04 AM, Matthew Schumacher wrote:
On 4/21/21 10:21 AM, Andrei Borzenkov wrote:
If I set the stickiness to 100 then it's a race condition, manytimes we
get the storage layer migrated without VirtualDomain noticing, but if
the stickiness is not set, then moving a resource causes thecluster to
re-balance and will cause the VM to fail every time because validation
is one of the first things we do when we migrate the VM, and it'sat the
same time as a IP-ZFS-iSCSI move so the config file goes away for 5
seconds.
I'm not sure how to fix this.  The nodes don't have local storage that
Your nodes must have operating system and pacemaker stack loaded from
somewhere before they can import zfs pool.
Yup, and they do. There are plenty of ways to do this: internal SDcard, usb boot, pxe boot, etc.... I prefer this because I don't needto maintain a boot drive, the nodes boot from the exact same image,and I have gobs of memory so the running system can run in aramdisk. This also makes it possible to boot my nodes with faileddisks/controllers which makes troubleshooting easier. I basicallymade a live CD distro that has everything I need.
I suppose the next step is to see if NFS has some sort of retrymode so
That is what "hard" mount option is for.
Thanks, I'll take a look.
For others searching the list, I did figure this out. The problem wasthe order I was loading the resources in.
This doesn't work because we start the failover IP before ZFS whichstarts the NFS share. This causes there to be a split second wherethe IP is listening for NFS requests, but the NFS server isn't runningyet, so the IP stack sends a RST which causes the NFS client to reportto the OS a hard failure which causes the VirtualDomain resource tosee an invalid config, and thus breaks things.
  * Resource Group: IP-ZFS-iSCSI:
    * fence-datastore    (stonith:fence_scsi):     Started node1
    * failover-ip    (ocf::heartbeat:IPaddr):     Started node1
    * zfs-datastore    (ocf::heartbeat:ZFS):     Started node1
    * ZFSiSCSI    (ocf::heartbeat:ZFSiSCSI):     Started node1
If I change it to this, then NFS requests simply go unanswered and theclient retries until it can make a connection, which is responded to.
  * Resource Group: IP-ZFS-iSCSI:
    * fence-datastore    (stonith:fence_scsi):     Started node1
    * zfs-datastore    (ocf::heartbeat:ZFS):     Started node1
    * ZFSiSCSI    (ocf::heartbeat:ZFSiSCSI):     Started node1
    * failover-ip    (ocf::heartbeat:IPaddr):     Started node1
Originally I didn't do it this way because my iscsi and nfs stack bindto the failover IP and I was worried stuff wouldn't start until the IPwas configured, but that doesn't seam to be a problem.

Therefore the idea of using a firewall-rule to suppress the negativeresponse

while the IP can be up already.

And thanks for coming back once you got it to work ;-)

Klaus


Matt
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Preventing multiple resources from moving at the same time.

Reply via email to