>>> Matthew Schumacher <mat...@aptalaska.net> schrieb am 21.04.2021 um 18:27 in Nachricht <3dab41c0-d56e-72fa-8a61-70e268a0f...@aptalaska.net>: > On 4/21/21 12:48 AM, Klaus Wenninger wrote: >> Just to better understand the issue ... >> Does the first resource implement storage that is being used >> by the resource that is being migrated/moved? >> Or is it just the combination of 2 parallel moves that is >> overcommitting storage or network? >> Is it assured that there are no load-scenarios inside these >> resources that create the same issues as if you migrate/move >> them? >> >> Klaus > > Thanks for the help Klaus, I'll spell it out more clearly. > > I'm using a group resource sets up a failover-ip address, then mounts a > ZFS dataset (which exports a configuration directory as NFS), then a > custom resource called ZFSiSCSI that exports all virtual machine disks > as iSCSI. > > Like this: > > * Resource Group: IP-ZFS-iSCSI: > * fence-datastore (stonith:fence_scsi): Started node1 > * failover-ip (ocf::heartbeat:IPaddr): Started node1 > * zfs-datastore (ocf::heartbeat:ZFS): Started node1 > * ZFSiSCSI (ocf::heartbeat:ZFSiSCSI): Started node1 > > Then I create a virtual machine with > > primitive vm-testvm VirtualDomain params > config="/nfs/vm/testvm/testvm.xml" meta allow-migrate=true op monitor > timeout=30 interval=10 > > This works fine because the ZFS storage can be mounted/exported on node1 > or node2 which will have an iSCSI target for each VM bound to the shared > IP address. I can move the storage to either node and while there is a > pause in the storage it works fine as things move around faster than the > iscsi timeout. I can also migrate the VM to either node because when > it's started on the target node, it can immediately access it's iscsi > storage regardless if the storage is local or not. > > The problem is monitoring with VirtualDomain. The > /usr/lib/ocf/resource.d/heartbeat/VirtualDomain script goes to check to > see if /nfs/vm/testvm/testvm.xml is available with this line: > > if [ ! -r $OCF_RESKEY_config ]; then > if ocf_is_probe; then > ocf_log info "Configuration file > $OCF_RESKEY_config not readable during probe." > > That causes bash to stat the config file which if we are in the middle > of a IP-ZFS-iSCSI move, will return -1 which then causes VirtualDomain > to view the VM as dead and hard resets it.
I think it's unsafe to move an iSCSI target between nodes assuming the initiator won't notice, specifically as iSCSI uses TCP. Don't the initiators see a "connection reset" when the target moved? (When if your target ran in a VM that is life-migrated, it might succeed if migration is fast enough) > > If I set the stickiness to 100 then it's a race condition, many times we > get the storage layer migrated without VirtualDomain noticing, but if > the stickiness is not set, then moving a resource causes the cluster to > re-balance and will cause the VM to fail every time because validation > is one of the first things we do when we migrate the VM, and it's at the > same time as a IP-ZFS-iSCSI move so the config file goes away for 5 seconds. > > I'm not sure how to fix this. The nodes don't have local storage that > isn't the ZFS pool, otherwise I'd just create a local config directory > and glusterfs them together. > > I suppose the next step is to see if NFS has some sort of retry mode so > that bash stating the config file is blocked until a timeout. That would NFS (at least before version 4) always had a mode to wait for the server; see "bg" (background) option. > certainly fix my issue as that's how the iscsi stuff works, retry until > timeout. Another option is to rework VirtualDomain as stating a config > file isn't really a good test to see if the domain is working. It makes > more sense to have it make a virsh call to see if it's working and only > care about the config file if it's starting the domain. > > Ideas welcome!!!! > > Matt > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/