Re: [ClusterLabs] Problem with stonith and starting services

Klaus Wenninger Thu, 06 Jul 2017 07:29:44 -0700

On 07/06/2017 04:20 PM, Cesar Hernandez wrote:
>> If node2 is getting the notification of its own fencing, it wasn't
>> successfully fenced. Successful fencing would render it incapacitated
>> (powered down, or at least cut off from the network and any shared
>> resources).
>
> Maybe I don't understand you, or maybe you don't understand me... ;)
> This is the syslog of the machine, where you can see that the machine has 
> rebooted successfully, and as I said, it has been rebooted successfully all 
> the times:


It is not just a question if it was rebooted at all.
Your fence-agent mustn't return positively until this definitely
has happened and the node is down.
Otherwise you will see that message and the node will try to
somehow cope with the fact that obviously the rest of the
cluster thinks that it is down already.

>
> Jul  5 10:41:54 node2 kernel: [    0.000000] Initializing cgroup subsys cpuset
> Jul  5 10:41:54 node2 kernel: [    0.000000] Initializing cgroup subsys cpu
> Jul  5 10:41:54 node2 kernel: [    0.000000] Initializing cgroup subsys 
> cpuacct
> Jul  5 10:41:54 node2 kernel: [    0.000000] Linux version 3.16.0-4-amd64 
> (debian-ker...@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP 
> Debian 3.16.39-1 (2016-12-30)
> Jul  5 10:41:54 node2 kernel: [    0.000000] Command line: 
> BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 
> root=UUID=711e1ec2-2a36-4405-bf46-44b43cfee42e ro init=/bin/systemd 
> console=ttyS0 console=hvc0
> Jul  5 10:41:54 node2 kernel: [    0.000000] e820: BIOS-provided physical RAM 
> map:
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x0000000000000000-0x000000000009dfff] usable
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x000000000009e000-0x000000000009ffff] reserved
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x00000000000e0000-0x00000000000fffff] reserved
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x0000000000100000-0x000000003fffffff] usable
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x00000000fc000000-0x00000000ffffffff] reserved
> Jul  5 10:41:54 node2 kernel: [    0.000000] NX (Execute Disable) protection: 
> active
> Jul  5 10:41:54 node2 kernel: [    0.000000] SMBIOS 2.4 present.
>
> ...
>
> Jul  5 10:41:54 node2 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
>
> ...
>
> Jul  5 10:41:54 node2 corosync[585]:   [MAIN  ] Corosync Cluster Engine 
> ('UNKNOWN'): started and ready to provide service.
> Jul  5 10:41:54 node2 corosync[585]:   [MAIN  ] Corosync built-in features: 
> nss
> Jul  5 10:41:54 node2 corosync[585]:   [MAIN  ] Successfully read main 
> configuration file '/etc/corosync/corosync.conf'.
>
> ...
>
> Jul  5 10:41:57 node2 crmd[608]:   notice: Defaulting to uname -n for the 
> local classic openais (with plugin) node name
> Jul  5 10:41:57 node2 crmd[608]:   notice: Membership 4308: quorum acquired
> Jul  5 10:41:57 node2 crmd[608]:   notice: plugin_handle_membership: Node 
> node2[1108352940] - state is now member (was (null))
> Jul  5 10:41:57 node2 crmd[608]:   notice: plugin_handle_membership: Node 
> node11[794540] - state is now member (was (null))
> Jul  5 10:41:57 node2 crmd[608]:   notice: The local CRM is operational
> Jul  5 10:41:57 node2 crmd[608]:   notice: State transition S_STARTING -> 
> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: Watching for stonith 
> topology changes
> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: Membership 4308: quorum 
> acquired
> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: plugin_handle_membership: 
> Node node11[794540] - state is now member (was (null))
> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: On loss of CCM Quorum: Ignore
> Jul  5 10:41:58 node2 stonith-ng[604]:   notice: Added 'st-fence_propio:0' to 
> the device list (1 active devices)
> Jul  5 10:41:59 node2 stonith-ng[604]:   notice: Operation reboot of node2 by 
> node11 for crmd.2141@node11.61c3e613: OK
> Jul  5 10:41:59 node2 crmd[608]:     crit: We were allegedly just fenced by 
> node11 for node11!
> Jul  5 10:41:59 node2 corosync[585]:   [pcmk  ] info: pcmk_ipc_exit: Client 
> crmd (conn=0x228d970, async-conn=0x228d970) left
> Jul  5 10:41:59 node2 pacemakerd[597]:  warning: The crmd process (608) can 
> no longer be respawned, shutting the cluster down.
> Jul  5 10:41:59 node2 pacemakerd[597]:   notice: Shutting down Pacemaker
> Jul  5 10:41:59 node2 pacemakerd[597]:   notice: Stopping pengine: Sent -15 
> to process 607
> Jul  5 10:41:59 node2 pengine[607]:   notice: Invoking handler for signal 15: 
> Terminated
> Jul  5 10:41:59 node2 pacemakerd[597]:   notice: Stopping attrd: Sent -15 to 
> process 606
> Jul  5 10:41:59 node2 attrd[606]:   notice: Invoking handler for signal 15: 
> Terminated
> Jul  5 10:41:59 node2 attrd[606]:   notice: Exiting...
> Jul  5 10:41:59 node2 corosync[585]:   [pcmk  ] info: pcmk_ipc_exit: Client 
> attrd (conn=0x2280ef0, async-conn=0x2280ef0) left
>
>
> _______________________________________________
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 
Klaus Wenninger

Senior Software Engineer, EMEA ENG Openstack Infrastructure

Red Hat

kwenn...@redhat.com   


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Problem with stonith and starting services

Reply via email to