Re: [ClusterLabs] All IP resources deleted once a fenced node rejoins

2016-01-15 Thread Arjun Pandey
 active devices)
Jan 13 19:33:00 [4292] orana stonith-ng: info:
stonith_device_remove: Device 'E-3' not found (2 active devices)
Jan 13 19:33:00 [4292] orana stonith-ng: info:
stonith_device_remove: Device 'MGMT-FLT' not found (2 active devices)
Jan 13 19:33:00 [4292] orana stonith-ng: info:
stonith_device_remove: Device 'M-FLT' not found (2 active devices)
Jan 13 19:33:00 [4292] orana stonith-ng: info:
stonith_device_remove: Device 'M-FLT2' not found (2 active devices)
Jan 13 19:33:00 [4292] orana stonith-ng: info:
stonith_device_remove: Device 'S-FLT' not found (2 active devices)
Jan 13 19:33:00 [4292] orana stonith-ng: info:
stonith_device_remove: Device 'S-FLT2' not found (2 active devices)
Jan 13 19:33:00 [4292] orana stonith-ng: info:
update_cib_stonith_devices_v2: Updating device list from the cib:
modify nvpair[@id='fence-uc-orana-instance_attributes-delay']
Jan 13 19:33:00 [4292] orana stonith-ng: info: cib_devices_update:
Updating devices to version 0.75.0
Jan 13 19:33:00 [4292] orana stonith-ng:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Jan 13 19:33:00 [4292] orana stonith-ng: info: unpack_nodes:
Creating a fake local node
Jan 13 19:33:00 [4292] orana stonith-ng: info:
stonith_device_register: Overwriting an existing entry for
fence-uc-orana from the cib
Jan 13 19:33:00 [4292] orana stonith-ng:   notice:
stonith_device_register: Added 'fence-uc-orana' to the device list (2
active devices)
Jan 13 19:33:00 [4291] oranacib: info: write_cib_contents:
Archived previous version as /var/lib/pacemaker/cib/cib-85.raw
Jan 13 19:33:00 [4291] oranacib: info: write_cib_contents:
Wrote version 0.75.0 of the CIB to disk (digest:
4fb4a8ef2f8cde3a07fb30eb706e7e9c)
Jan 13 19:33:00 [4291] oranacib: info: retrieveCib:
Reading cluster configuration from: /var/lib/pacemaker/cib/cib.HyXAc3
(digest: /var/lib/pacemaker/cib/cib.lOa3UL)
Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op:
Diff: --- 0.75.0 2
Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op:
Diff: +++ 0.76.0 (null)
Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op: +
/cib:  @epoch=76
Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op: ++
/cib/configuration/constraints:  
Jan 13 19:33:01 [4291] oranacib: info:
cib_process_request: Completed cib_replace operation for section
configuration: OK (rc=0, origin=kamet/cibadmin/2, version=0.76.0)
Jan 13 19:33:01 [4291] oranacib: info: write_cib_contents:
Archived previous version as /var/lib/pacemaker/cib/cib-86.raw
Jan 13 19:33:01 [4291] oranacib: info: write_cib_contents:
Wrote version 0.76.0 of the CIB to disk (digest:
df07ff6cbef5a35891d43b89b9ba4371)
Jan 13 19:33:01 [4291] oranacib: info: retrieveCib:
Reading cluster configuration from: /var/lib/pacemaker/cib/cib.Pi8ov3
(digest: /var/lib/pacemaker/cib/cib.kYeIwM)
Jan 13 19:33:01 [4291] oranacib: info: cib_perform_op:
Diff: --- 0.76.0 2


Any pointers would be helpful.

Thanks
Arjun
On Thu, Jan 14, 2016 at 12:48 PM, Arjun Pandey <apandepub...@gmail.com> wrote:
> Hi
>
> I am running a 2 node cluster with this config on centos 6.6
>
> Master/Slave Set: foo-master [foo]
>Masters: [ kamet ]
>Slaves: [ orana ]
> fence-uc-orana (stonith:fence_ilo4): Started kamet
> fence-uc-kamet (stonith:fence_ilo4): Started orana
> C-3 (ocf::pw:IPaddr): Started kamet
> C-FLT (ocf::pw:IPaddr): Started kamet
> C-FLT2 (ocf::pw:IPaddr): Started kamet
> E-3 (ocf::pw:IPaddr): Started kamet
> MGMT-FLT (ocf::pw:IPaddr): Started kamet
> M-FLT (ocf::pw:IPaddr): Started kamet
> M-FLT2 (ocf::pw:IPaddr): Started kamet
> S-FLT (ocf::pw:IPaddr): Started kamet
> S-FLT2 (ocf::pw:IPaddr): Started kamet
>
>
> where i have a multi-state resource foo being run in master/slave mode
> and  IPaddr RA is just modified IPAddr2 RA. Additionally i have a
> collocation constraint for the IP addr to be collocated with the master.
> I have additionally configured fencing and when i plug out the
> redundancy interface fencing gets triggered correctly. However once
> the fenced node(kamet) is rejoining i see all my floating IP resources
> are deleted
> and system looks to be in this state. Also if i log into kamet i see
> that the floating ip addresses are actually available.
>
> Based on the logs the IP resources are marked unrunnable and later
> marked as orphaned.
>
>
> Master/Slave Set: foo-master [foo]
>Masters: [ orana ]
>Slaves: [ kamet ]
> fence-uc-orana (stonith:fence_ilo4): Started orana
> fence-uc-kamet (stonith:fence_ilo4): Started orana
>
> CIB state post fencing of kamet.
>  crm_feature_set="3.0.9" epoch="72" num_updates="7"
> validate-with="pacemaker-2.0" have-quorum="1" dc-uuid="orana">
>   

[ClusterLabs] Fwd: Parallel adding of resources

2016-01-13 Thread Arjun Pandey
Hi

I am running a 2 node cluster with this config on centos 6.6

Master/Slave Set: foo-master [foo]
   Masters: [ kamet ]
   Slaves: [ orana ]
fence-uc-orana (stonith:fence_ilo4): Started kamet
fence-uc-kamet (stonith:fence_ilo4): Started orana
C-3 (ocf::pw:IPaddr): Started kamet
C-FLT (ocf::pw:IPaddr): Started kamet
C-FLT2 (ocf::pw:IPaddr): Started kamet
E-3 (ocf::pw:IPaddr): Started kamet
MGMT-FLT (ocf::pw:IPaddr): Started kamet
M-FLT (ocf::pw:IPaddr): Started kamet
M-FLT2 (ocf::pw:IPaddr): Started kamet
S-FLT (ocf::pw:IPaddr): Started kamet
S-FLT2 (ocf::pw:IPaddr): Started kamet


where i have a multi-state resource foo being run in master/slave mode
and  IPaddr RA is just modified IPAddr2 RA. Additionally i have a
collocation constraint for the IP addr to be collocated with the master.
I have additionally configured fencing and when i plug out the
redundancy interface fencing gets triggered correctly. However once
the fenced node(kamet) is rejoining i see all my floating IP resources
are deleted
and system looks to be in this state. Also if i log into kamet i see
that the floating ip addresses are actually available.

Master/Slave Set: foo-master [foo]
   Masters: [ orana ]
   Slaves: [ kamet ]
fence-uc-orana (stonith:fence_ilo4): Started orana
fence-uc-kamet (stonith:fence_ilo4): Started orana

CIB state post fencing of kamet.

  

  






  


  
  


  

  

  
  






  


  
  
  
  
  
  

  
  

  
  
  
  
  
  
  


  
  


  

  
  

  
  
  
  
  
  
  


  
  


  
  

  
  
  
  


  
  
  


  
  

  
  

  
  
  
  


  
  
  


  
  

  
  

  
  
  
  


  
  
  


  
  

  
  

  
  
  
  


  
  
  


  
  

  
  

  
  
  
  


  
  
  


  
  

  
  

  
  
  
  


  
  
  


  
  

  
  

  
  
  
  


  
  
  


  
  

  
  

  
  
  
  


  
  
  


  
  

  
  

  
  
  
  


  
  
  


  
  

  


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  

  
  


  

  


  
  


  
  


  
  


  
  


  
  


  
  


  
  


  
  


  
  


  
  


  
  


  

  
  

  
  
  

  

  



Attaching full corosync.log from orana.

Mentioning the interesting parts in the log here.

Jan 13 19:32:44 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jan 13 19:32:44 corosync [QUORUM] Members[2]: 1 2
Jan 13 19:32:44 corosync [QUORUM] Members[2]: 1 2
Jan 13 19:32:44 [4296] orana   crmd: info:
cman_event_callback: Membership 7044: quorum retained
Jan 13 19:32:44 [4296] orana   crmd:   notice:
crm_update_peer_state: cman_event_callback: Node kamet[2] - state is
now member (was lost)
Jan 13 19:32:44 [4296] 

[ClusterLabs] Parallel adding of resources

2016-01-07 Thread Arjun Pandey
Hi

I am running a 2 node cluster with this config on centos 6.6

Master/Slave Set: foo-master [foo]
Masters: [ messi ]
Stopped: [ronaldo ]
 eth1-CP(ocf::pw:IPaddr):   Started messi
 eth2-UP(ocf::pw:IPaddr):   Started messi
 eth3-UPCP  (ocf::pw:IPaddr):   Started messi

where i have a multi-state resource foo being run in master/slave mode
and  IPaddr RA is just modified IPAddr2 RA. Additionally i have a
collocation constraint for the IP addr to be collocated with the master.

Now there are cases where i have multiple virtual IP's ( around 20 )
and for failover time gets substantially increased in these cases.
Based on the logs what i have observed is the IPaddr resources are
moved sequentially. Is this really the case ? Also is it possible to
specify that they can be added simultaneously, since none of them have
any sort of corelation with the other ?

If it's sequential what is the reason behind it ?


Thanks in advance.

Regards
Arjun

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Cluster monitoring

2015-10-23 Thread Arjun Pandey
Will have a look.

Thanks
Arjun

On Wed, Oct 21, 2015 at 8:26 PM, Ken Gaillot <kgail...@redhat.com> wrote:

> On 10/21/2015 08:24 AM, Michael Schwartzkopff wrote:
> > Am Mittwoch, 21. Oktober 2015, 18:50:15 schrieb Arjun Pandey:
> >> Hi folks
> >>
> >> I had a question on monitoring of cluster events. Based on the
> >> documentation it seems that cluster monitor is the only method
> >> of monitoring the cluster events. Also since it seems to poll
> >> based on the interval configured it might miss some events. Is
> >> that the case ?
> >
> > No. the cluser is event-based. So it won't miss any event. If you
> > use the cluster's tools, they see hte events. If you monitor the
> > events you won't miss any either.
>
> FYI, Pacemaker 1.1.14 will have built-in handling of notification
> scripts, without needing a ClusterMon resource. These will be
> event-driven. Andrew Beekhof did a recent blog post about it:
> http://blog.clusterlabs.org/blog/2015/reliable-notifications/
>
> Pacemaker's monitors are polling, at the interval specified when
> configuring the monitor operation. Pacemaker relies on the resource
> agent to return status for monitors, so technically it's up to the
> resource agent whether it can "miss" brief outages that occur between
> polls. All the ones I've looked at would miss them, but generally
> that's considered acceptable if the service is once again fully
> working when the monitor runs (because it implies it recovered itself).
>
> Some people use an external monitoring system (nagios, icinga, zabbix,
> etc.) in addition to Pacemaker's monitors. They can complement each
> other, as the external system can check system parameters outside
> Pacemaker's view and can alert administrators for some early warning
> signs before a resource gets to the point of needing recovery. Of
> course such monitoring systems are also polling at configured intervals.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Cluster monitoring

2015-10-21 Thread Arjun Pandey
Hi folks

I had a question on monitoring of cluster events. Based on the
documentation it seems that cluster monitor is the only method of
monitoring the cluster events. Also since it seems to poll based on the
interval configured it might miss some events. Is that the case ?

Is their any other alternative available ?
As of now i'm only looking at Cluster Monitor which will be configured with
an external program and the interval as a part of resource configuration.


Regards
Arjun
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Cluster node getting stopped from other node(resending mail)

2015-06-30 Thread Arjun Pandey
Hi

I am running a 2 node cluster with this config on centos 6.5/6.6

Master/Slave Set: foo-master [foo]
Masters: [ messi ]
Stopped: [ronaldo ]
 eth1-CP(ocf::pw:IPaddr):   Started messi
 eth2-UP(ocf::pw:IPaddr):   Started messi
 eth3-UPCP  (ocf::pw:IPaddr):   Started messi

where i have a multi-state resource foo being run in master/slave mode and
 IPaddr RA is just modified IPAddr2 RA. Additionally i have a
collocation constraint for the IP addr to be collocated with the master.

Sometimes when i setup the cluster , i find that one of the nodes (the
second node that joins ) gets stopped and i find this log.

2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker
Cluster Manager
2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]:   notice:
attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]:   notice:
do_state_transition: State transition S_PENDING - S_NOT_DC [
input=I_NOT_DC cause=C_HA_MESSAG
E origin=do_cl_join_finalize_respond ]
2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]:   notice:
attrd_local_callback: Sending full refresh (origin=crmd)
2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]:   notice:
attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
 This looks to be the likely
reason***
2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]:error:
handle_request: We didn't ask to be shut down, yet our DC is telling us too
.
*

2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]:   notice:
do_state_transition: State transition S_NOT_DC - S_STOPPING [ input=I_STOP
cause=C_HA_MESSAGE
 origin=route_message ]
2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]:   notice:
lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown...
waiting (2 ops remaining)

Based on the logs , pacemaker on active was stopping the secondary cloud
everytime it joins cluster. This issue seems similar to
http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error

Packages used :-
pacemaker-1.1.12-4.el6.x86_64
pacemaker-libs-1.1.12-4.el6.x86_64
pacemaker-cli-1.1.12-4.el6.x86_64
pacemaker-cluster-libs-1.1.12-4.el6.x86_64
pacemaker-debuginfo-1.1.10-14.el6.x86_64
pcsc-lite-libs-1.5.2-13.el6_4.x86_64
pcs-0.9.90-2.el6.centos.2.noarch
pcsc-lite-1.5.2-13.el6_4.x86_64
pcsc-lite-openct-0.6.19-4.el6.x86_64
corosync-1.4.1-17.el6.x86_64
corosynclib-1.4.1-17.el6.x86_64



Thanks in advance for your help

Regards
Arjun
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org