Happy new year! Can you please re-post this to the clusterlabs - users list?
http://clusterlabs.org/mailman/listinfo/users This list is being phased out. digimer On 05/01/16 04:34 AM, InterNetworX | Michael Rößler wrote: > Happy new year list, > > I have here a test environment for checking pacemaker. Sometimes our > kvm-hosts with libvirt have trouble with responding the stonith/libvirt > resource. Pacemaker should work like zabbix, for example, that after 3 > failed monitoring attemps a service should regarded as failed. That's > why I was searching for a configuration here: > > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html > > > But I failed after hours. > > That's the configuration line for stonith/libvirt: > > crm configure primitive p_fence_ha3 stonith:external/libvirt params > hostlist="ha3" hypervisor_uri="qemu+tls://debian1/system" op monitor > interval="60" > > Every 60 seconds pacemaker makes something like this: > > stonith -t external/libvirt hostlist="ha3" > hypervisor_uri="qemu+tls://debian1/system" -S > ok > > To simulate the unavailability of the kvm host I remove the certificate > in /etc/libvirt/libvirtd.conf and restart libvirtd. After 60 seconds or > less I can see the error with "crm status". On the kvm host I add > certificate again to /etc/libvirt/libvirtd.conf and restart libvirt > again. Although libvirt is again available the stonith-resource did not > start again. > > I altered the configuration line for stonith with following parts: > > op monitor interval="60" pcmk_status_retries="3" > op monitor interval="60" pcmk_monitor_retries="3" > op monitor interval="60" start-delay=180 > meta migration-threshold="200" failure-timeout="120" > > But always with first failed monitor check after 60 or less seconds > pacemakers stops resuming after libvirt is again available. > > It follows the "crm status" on debian 8 (Jessie): > > root@ha4:~# crm status > Last updated: Tue Jan 5 10:04:18 2016 > Last change: Mon Jan 4 18:18:12 2016 > Stack: corosync > Current DC: ha3 (167772400) - partition with quorum > Version: 1.1.12-561c4cf > 2 Nodes configured > 2 Resources configured > Online: [ ha3 ha4 ] > Service-IP (ocf::heartbeat:IPaddr2): Started ha3 > haproxy (lsb:haproxy): Started ha3 > > Kind regards > > Michael R. > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linuxfoundation.org/mailman/listinfo/openais -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Openais mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/openais
