>>> lejeczek via Users <[email protected]> schrieb am 14.12.2021 um 15:48 in Nachricht <[email protected]>:
>> Hi! >> >> My guess is that you checked the corresponding logs already; why not show > them here? >> I can imagine that the VMs die rather early after start. >> >> Regards, >> Ulrich >> >>>>> lejeczek via Users <[email protected]> schrieb am 10.12.2021 um 17:33 in >> Nachricht <df8eac8f‑a58e‑28e5‑53b5‑[email protected]>: >>> Hi guys. >>> >>> I quite often.. well, to frequently in my mind, see a VM >>> which cluster says: >>> ‑> $ pcs resource status | grep ‑v disabled >>> ... >>> * c8kubermaster2 (ocf::heartbeat:VirtualDomain): >>> Started dzien >>> .. >>> >>> but that is false, also cluster itself confirms it: >>> ‑> $ pcs resource debug‑monitor c8kubermaster2 >>> crm_resource: Error performing operation: Not running >>> Operation force‑check for c8kubermaster2 >>> (ocf:heartbeat:VirtualDomain) returned: 'not running' (7) >>> >>> What is the issue here, might be & how best to troubleshoot it? >>> >>> ‑> $ pcs resource config c8kubermaster2 >>> Resource: c8kubermaster2 (class=ocf provider=heartbeat >>> type=VirtualDomain) >>> Attributes: >>> config=/var/lib/pacemaker/conf.d/c8kubermaster2.xml >>> hypervisor=qemu:///system migration_transport=ssh >>> Meta Attrs: allow‑migrate=true failure‑timeout=30s >>> Operations: migrate_from interval=0s timeout=180s >>> (c8kubermaster2‑migrate_from‑interval‑0s) >>> migrate_to interval=0s timeout=180s >>> (c8kubermaster2‑migrate_to‑interval‑0s) >>> monitor interval=30s >>> (c8kubermaster2‑monitor‑interval‑30s) >>> start interval=0s timeout=90s >>> (c8kubermaster2‑start‑interval‑0s) >>> stop interval=0s timeout=90s >>> (c8kubermaster2‑stop‑interval‑0s) >>> >>> many thanks, L. >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ > Not much there in the logs I could see (which is probably > why cluster decides the resource is okey) > What is the resource's monitor for it not for that exactly ‑ > to check the state of resource ‑ whether it dies early or > late should not matter. > What suffices in order to "fix" such resource > false‑positive, I do quick dis/enable the resource or as in > this very instance rpm updates which restarted node. > Again, how cluster might think resource is okey while > debug‑monitor shows it's not. > I only do not know how to reproduce this in a controlled, > orderly manner. Maybe try "xentop" or "watch virsh list" on all nodes and watch what is happening. > > thanks, L > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
