Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins
active devices) Jan 13 19:33:00 [4292] orana stonith-ng: info: stonith_device_remove: Device 'E-3' not found (2 active devices) Jan 13 19:33:00 [4292] orana stonith-ng: info: stonith_device_remove: Device 'MGMT-FLT' not found (2 active devices) Jan 13 19:33:00 [4292] orana stonith-ng: info: stonith_device_remove: Device 'M-FLT' not found (2 active devices) Jan 13 19:33:00 [4292] orana stonith-ng: info: stonith_device_remove: Device 'M-FLT2' not found (2 active devices) Jan 13 19:33:00 [4292] orana stonith-ng: info: stonith_device_remove: Device 'S-FLT' not found (2 active devices) Jan 13 19:33:00 [4292] orana stonith-ng: info: stonith_device_remove: Device 'S-FLT2' not found (2 active devices) Jan 13 19:33:00 [4292] orana stonith-ng: info: update_cib_stonith_devices_v2: Updating device list from the cib: modify nvpair[@id='fence-uc-orana-instance_attributes-delay'] Jan 13 19:33:00 [4292] orana stonith-ng: info: cib_devices_update: Updating devices to version 0.75.0 Jan 13 19:33:00 [4292] orana stonith-ng: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 13 19:33:00 [4292] orana stonith-ng: info: unpack_nodes: Creating a fake local node Jan 13 19:33:00 [4292] orana stonith-ng: info: stonith_device_register: Overwriting an existing entry for fence-uc-orana from the cib Jan 13 19:33:00 [4292] orana stonith-ng: notice: stonith_device_register: Added 'fence-uc-orana' to the device list (2 active devices) Jan 13 19:33:00 [4291] oranacib: info: write_cib_contents: Archived previous version as /var/lib/pacemaker/cib/cib-85.raw Jan 13 19:33:00 [4291] oranacib: info: write_cib_contents: Wrote version 0.75.0 of the CIB to disk (digest: 4fb4a8ef2f8cde3a07fb30eb706e7e9c) Jan 13 19:33:00 [4291] oranacib: info: retrieveCib: Reading cluster configuration from: /var/lib/pacemaker/cib/cib.HyXAc3 (digest: /var/lib/pacemaker/cib/cib.lOa3UL) Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op: Diff: --- 0.75.0 2 Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op: Diff: +++ 0.76.0 (null) Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op: + /cib: @epoch=76 Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op: ++ /cib/configuration/constraints: Jan 13 19:33:01 [4291] oranacib: info: cib_process_request: Completed cib_replace operation for section configuration: OK (rc=0, origin=kamet/cibadmin/2, version=0.76.0) Jan 13 19:33:01 [4291] oranacib: info: write_cib_contents: Archived previous version as /var/lib/pacemaker/cib/cib-86.raw Jan 13 19:33:01 [4291] oranacib: info: write_cib_contents: Wrote version 0.76.0 of the CIB to disk (digest: df07ff6cbef5a35891d43b89b9ba4371) Jan 13 19:33:01 [4291] oranacib: info: retrieveCib: Reading cluster configuration from: /var/lib/pacemaker/cib/cib.Pi8ov3 (digest: /var/lib/pacemaker/cib/cib.kYeIwM) Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op: Diff: --- 0.76.0 2 Any pointers would be helpful. Thanks Arjun On Thu, Jan 14, 2016 at 12:48 PM, Arjun Pandey <apandepub...@gmail.com> wrote: > Hi > > I am running a 2 node cluster with this config on centos 6.6 > > Master/Slave Set: foo-master [foo] >Masters: [ kamet ] >Slaves: [ orana ] > fence-uc-orana (stonith:fence_ilo4): Started kamet > fence-uc-kamet (stonith:fence_ilo4): Started orana > C-3 (ocf::pw:IPaddr): Started kamet > C-FLT (ocf::pw:IPaddr): Started kamet > C-FLT2 (ocf::pw:IPaddr): Started kamet > E-3 (ocf::pw:IPaddr): Started kamet > MGMT-FLT (ocf::pw:IPaddr): Started kamet > M-FLT (ocf::pw:IPaddr): Started kamet > M-FLT2 (ocf::pw:IPaddr): Started kamet > S-FLT (ocf::pw:IPaddr): Started kamet > S-FLT2 (ocf::pw:IPaddr): Started kamet > > > where i have a multi-state resource foo being run in master/slave mode > and IPaddr RA is just modified IPAddr2 RA. Additionally i have a > collocation constraint for the IP addr to be collocated with the master. > I have additionally configured fencing and when i plug out the > redundancy interface fencing gets triggered correctly. However once > the fenced node(kamet) is rejoining i see all my floating IP resources > are deleted > and system looks to be in this state. Also if i log into kamet i see > that the floating ip addresses are actually available. > > Based on the logs the IP resources are marked unrunnable and later > marked as orphaned. > > > Master/Slave Set: foo-master [foo] >Masters: [ orana ] >Slaves: [ kamet ] > fence-uc-orana (stonith:fence_ilo4): Started orana > fence-uc-kamet (stonith:fence_ilo4): Started orana > > CIB state post fencing of kamet. > crm_feature_set="3.0.9" epoch="72" num_updates="7" > validate-with="pacemaker-2.0" have-quorum="1" dc-uuid="orana"> >
[ClusterLabs] Fwd: Parallel adding of resources
Hi I am running a 2 node cluster with this config on centos 6.6 Master/Slave Set: foo-master [foo] Masters: [ kamet ] Slaves: [ orana ] fence-uc-orana (stonith:fence_ilo4): Started kamet fence-uc-kamet (stonith:fence_ilo4): Started orana C-3 (ocf::pw:IPaddr): Started kamet C-FLT (ocf::pw:IPaddr): Started kamet C-FLT2 (ocf::pw:IPaddr): Started kamet E-3 (ocf::pw:IPaddr): Started kamet MGMT-FLT (ocf::pw:IPaddr): Started kamet M-FLT (ocf::pw:IPaddr): Started kamet M-FLT2 (ocf::pw:IPaddr): Started kamet S-FLT (ocf::pw:IPaddr): Started kamet S-FLT2 (ocf::pw:IPaddr): Started kamet where i have a multi-state resource foo being run in master/slave mode and IPaddr RA is just modified IPAddr2 RA. Additionally i have a collocation constraint for the IP addr to be collocated with the master. I have additionally configured fencing and when i plug out the redundancy interface fencing gets triggered correctly. However once the fenced node(kamet) is rejoining i see all my floating IP resources are deleted and system looks to be in this state. Also if i log into kamet i see that the floating ip addresses are actually available. Master/Slave Set: foo-master [foo] Masters: [ orana ] Slaves: [ kamet ] fence-uc-orana (stonith:fence_ilo4): Started orana fence-uc-kamet (stonith:fence_ilo4): Started orana CIB state post fencing of kamet. Attaching full corosync.log from orana. Mentioning the interesting parts in the log here. Jan 13 19:32:44 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 13 19:32:44 corosync [QUORUM] Members[2]: 1 2 Jan 13 19:32:44 corosync [QUORUM] Members[2]: 1 2 Jan 13 19:32:44 [4296] orana crmd: info: cman_event_callback: Membership 7044: quorum retained Jan 13 19:32:44 [4296] orana crmd: notice: crm_update_peer_state: cman_event_callback: Node kamet[2] - state is now member (was lost) Jan 13 19:32:44 [4296]
[ClusterLabs] Parallel adding of resources
Hi I am running a 2 node cluster with this config on centos 6.6 Master/Slave Set: foo-master [foo] Masters: [ messi ] Stopped: [ronaldo ] eth1-CP(ocf::pw:IPaddr): Started messi eth2-UP(ocf::pw:IPaddr): Started messi eth3-UPCP (ocf::pw:IPaddr): Started messi where i have a multi-state resource foo being run in master/slave mode and IPaddr RA is just modified IPAddr2 RA. Additionally i have a collocation constraint for the IP addr to be collocated with the master. Now there are cases where i have multiple virtual IP's ( around 20 ) and for failover time gets substantially increased in these cases. Based on the logs what i have observed is the IPaddr resources are moved sequentially. Is this really the case ? Also is it possible to specify that they can be added simultaneously, since none of them have any sort of corelation with the other ? If it's sequential what is the reason behind it ? Thanks in advance. Regards Arjun ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Cluster monitoring
Will have a look. Thanks Arjun On Wed, Oct 21, 2015 at 8:26 PM, Ken Gaillot <kgail...@redhat.com> wrote: > On 10/21/2015 08:24 AM, Michael Schwartzkopff wrote: > > Am Mittwoch, 21. Oktober 2015, 18:50:15 schrieb Arjun Pandey: > >> Hi folks > >> > >> I had a question on monitoring of cluster events. Based on the > >> documentation it seems that cluster monitor is the only method > >> of monitoring the cluster events. Also since it seems to poll > >> based on the interval configured it might miss some events. Is > >> that the case ? > > > > No. the cluser is event-based. So it won't miss any event. If you > > use the cluster's tools, they see hte events. If you monitor the > > events you won't miss any either. > > FYI, Pacemaker 1.1.14 will have built-in handling of notification > scripts, without needing a ClusterMon resource. These will be > event-driven. Andrew Beekhof did a recent blog post about it: > http://blog.clusterlabs.org/blog/2015/reliable-notifications/ > > Pacemaker's monitors are polling, at the interval specified when > configuring the monitor operation. Pacemaker relies on the resource > agent to return status for monitors, so technically it's up to the > resource agent whether it can "miss" brief outages that occur between > polls. All the ones I've looked at would miss them, but generally > that's considered acceptable if the service is once again fully > working when the monitor runs (because it implies it recovered itself). > > Some people use an external monitoring system (nagios, icinga, zabbix, > etc.) in addition to Pacemaker's monitors. They can complement each > other, as the external system can check system parameters outside > Pacemaker's view and can alert administrators for some early warning > signs before a resource gets to the point of needing recovery. Of > course such monitoring systems are also polling at configured intervals. > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Cluster monitoring
Hi folks I had a question on monitoring of cluster events. Based on the documentation it seems that cluster monitor is the only method of monitoring the cluster events. Also since it seems to poll based on the interval configured it might miss some events. Is that the case ? Is their any other alternative available ? As of now i'm only looking at Cluster Monitor which will be configured with an external program and the interval as a part of resource configuration. Regards Arjun ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Cluster node getting stopped from other node(resending mail)
Hi I am running a 2 node cluster with this config on centos 6.5/6.6 Master/Slave Set: foo-master [foo] Masters: [ messi ] Stopped: [ronaldo ] eth1-CP(ocf::pw:IPaddr): Started messi eth2-UP(ocf::pw:IPaddr): Started messi eth3-UPCP (ocf::pw:IPaddr): Started messi where i have a multi-state resource foo being run in master/slave mode and IPaddr RA is just modified IPAddr2 RA. Additionally i have a collocation constraint for the IP addr to be collocated with the master. Sometimes when i setup the cluster , i find that one of the nodes (the second node that joins ) gets stopped and i find this log. 2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker Cluster Manager 2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]: notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (0) 2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]: notice: do_state_transition: State transition S_PENDING - S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAG E origin=do_cl_join_finalize_respond ] 2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]: notice: attrd_local_callback: Sending full refresh (origin=crmd) 2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]: notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (0) This looks to be the likely reason*** 2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]:error: handle_request: We didn't ask to be shut down, yet our DC is telling us too . * 2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]: notice: do_state_transition: State transition S_NOT_DC - S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ] 2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]: notice: lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown... waiting (2 ops remaining) Based on the logs , pacemaker on active was stopping the secondary cloud everytime it joins cluster. This issue seems similar to http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error Packages used :- pacemaker-1.1.12-4.el6.x86_64 pacemaker-libs-1.1.12-4.el6.x86_64 pacemaker-cli-1.1.12-4.el6.x86_64 pacemaker-cluster-libs-1.1.12-4.el6.x86_64 pacemaker-debuginfo-1.1.10-14.el6.x86_64 pcsc-lite-libs-1.5.2-13.el6_4.x86_64 pcs-0.9.90-2.el6.centos.2.noarch pcsc-lite-1.5.2-13.el6_4.x86_64 pcsc-lite-openct-0.6.19-4.el6.x86_64 corosync-1.4.1-17.el6.x86_64 corosynclib-1.4.1-17.el6.x86_64 Thanks in advance for your help Regards Arjun ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org