Re: [ClusterLabs] Master-Slaver resource Restarted after configuration change

2016-06-29 Thread Ken Gaillot
On 06/29/2016 01:35 PM, Ilia Sokolinski wrote:
> 
>>
>> I'm not sure there's a way to do this.
>>
>> If a (non-reloadable) parameter changes, the entire clone does need a
>> restart, so the cluster will want all instances to be stopped, before
>> proceeding to start them all again.
>>
>> Your desired behavior couldn't be the default, because not all services
>> would be able to function correctly with a running master using
>> different configuration options than running slaves. In fact, I think it
>> would be rare; consider a typical option for a TCP port -- changing the
>> port in only the slaves would break communication with the master and
>> potentially lead to data inconsistency.
>>
>> Can you give an example of an option that could be handled this way
>> without causing problems?
>>
>> Reload could be a way around this, but not in the way you suggest. If
>> your service really does need to restart after the option change, then
>> reload is not appropriate. However, if you can approach the problem on
>> the application side, and make it able to accept the change without
>> restarting, then you could implement it as a reload in the agent.
>>
> 
> Ken,
> 
> I see what you are saying.
> The parameter we are changing is the docker image version, so it is not 
> possible to Reload it without a restart.
> 
> Couple of questions:
> What is reloadable vs non-reloadable parameter? Is it the same as unique=“0” 
> vs unique=“1”?
> We currently set  unique=“0”.

Yes, the cluster considers any parameter with unique=0 as reloadable, if
the resource agent supports the reload action.

> When doing repeated experiments, I see that sometimes both Master and Slave 
> are Reload-ed, but sometimes one of them is Restart-ed.
> 
> Why is that?

Good question. I would expect all or no instances of the same clone to
be reloaded.

An otherwise reloadable change may get a restart if there is also a
nonreloadable parameter changing at the same time. Also, if the
reloadable resource is ordered after another resource that is being
restarted, it will get a restart.

As an aside, I'm not happy with the current implementation of reload.
Using "unique" to determine reloadability was not a good choice; it
should be a separate attribute. More importantly, there's a fundamental
misunderstanding between pacemaker's use of reload and how most resource
agent writers interpret it -- pacemaker calls it when a resource
parameter in the pacemaker configuration changes, but most RAs use it
for a service's native reload of its own configuration file. Those two
use cases need to be separated.

> I looked at the source code allocate.c:check_action_definition(), and it 
> seems that there is a meta parameter
> called “isolation” which affects on Reload vs Restart decision.
> 
> I can’t find any documentation about this “isolation” meta parameter.
> Do you know what is is intended for?

That is a great feature that, unfortunately, completely lacks
documentation and testing. It's a way to run cluster-managed services
inside a Docker container. Documentation/testing are on the to-do list,
but it's a long list ...

> Thanks a lot
> 
> Ilia

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Master-Slaver resource Restarted after configuration change

2016-06-10 Thread Ferenc Wágner
Ilia Sokolinski  writes:

> We have a custom Master-Slave resource running on a 3-node pcs cluster on 
> CentOS 7.1
>
> As part of what is supposed to be an NDU we do update some properties of the 
> resource.
> For some reason this causes both Master and Slave instances of the  resource 
> to be restarted.
>
> Since restart takes a fairly long time for us, the update becomes very much 
> disruptive.
>
> Is this expected? 

Yes, if you changed a parameter declared with unique="1" in your resource
agent metadata.

> We have not seen this behavior with the previous release of pacemaker.

I'm surprised...
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Master-Slaver resource Restarted after configuration change

2016-06-09 Thread Ilia Sokolinski
Hi,

We have a custom Master-Slave resource running on a 3-node pcs cluster on 
CentOS 7.1

As part of what is supposed to be an NDU we do update some properties of the 
resource.
For some reason this causes both Master and Slave instances of the  resource to 
be restarted.

Since restart takes a fairly long time for us, the update becomes very much 
disruptive.

Is this expected? 
We have not seen this behavior with the previous release of pacemaker.


Jun 10 02:06:11 dev-ceph02 crmd[30570]: notice: State transition S_IDLE -> 
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Jun 10 02:06:11 dev-ceph02 attrd[30568]: notice: Updating all attributes after 
cib_refresh_notify event
Jun 10 02:06:11 dev-ceph02 crmd[30570]: notice: State transition S_ELECTION -> 
S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED 
origin=election_timeout_popped ]
Jun 10 02:06:11 dev-ceph02 crmd[30570]: warning: FSA: Input I_ELECTION_DC from 
do_election_check() received in state S_INTEGRATION

Jun 10 02:06:12 dev-ceph02 pengine[30569]: notice: Restart L3:0 (Master 
d-l303-a.dev-bos.csdops.net)
Jun 10 02:06:12 dev-ceph02 pengine[30569]: notice: Restart L3:1 (Slave 
d-l303-b.dev-bos.csdops.net)
Jun 10 02:06:12 dev-ceph02 pengine[30569]: notice: Calculated Transition 4845: 
/var/lib/pacemaker/pengine/pe-input-2934.bz2
Jun 10 02:06:12 dev-ceph02 crmd[30570]: notice: Initiating action 63: demote 
L3_demote_0 on d-l303-a.dev-bos.csdops.net
Jun 10 02:06:14 dev-ceph02 crmd[30570]: notice: Initiating action 64: stop 
L3_stop_0 on d-l303-a.dev-bos.csdops.net
Jun 10 02:06:14 dev-ceph02 crmd[30570]: notice: Initiating action 66: stop 
L3_stop_0 on d-l303-b.dev-bos.csdops.net
Jun 10 02:06:15 dev-ceph02 crmd[30570]: notice: Initiating action 17: start 
L3_start_0 on d-l303-a.dev-bos.csdops.net
Jun 10 02:06:15 dev-ceph02 crmd[30570]: notice: Initiating action 18: start 
L3_start_0 on d-l303-b.dev-bos.csdops.net 


Here is the cluster configuration:

pcs status
Cluster name: L3_cluster
Last updated: Fri Jun 10 03:17:31 2016  Last change: Fri Jun 10 
02:06:11 2016 by root via cibadmin on d-l303-a.dev-bos.csdops.net
Stack: corosync
Current DC: dev-ceph02.dev-bos.csdops.net (version 1.1.13-a14efad) - partition 
with quorum
3 nodes and 12 resources configured

Online: [ d-l303-a.dev-bos.csdops.net d-l303-b.dev-bos.csdops.net 
dev-ceph02.dev-bos.csdops.net ]

Full list of resources:

 idrac-d-l303-b.dev-bos.csdops.net  (stonith:fence_idrac):  Started 
dev-ceph02.dev-bos.csdops.net
 idrac-d-l303-a.dev-bos.csdops.net  (stonith:fence_idrac):  Started 
d-l303-b.dev-bos.csdops.net
 noop-dev-ceph02.dev-bos.csdops.net (stonith:fence_noop):   Started 
d-l303-a.dev-bos.csdops.net
 L3-5bb92-0-ip  (ocf::heartbeat:IPaddr2):   Started 
d-l303-a.dev-bos.csdops.net
 Master/Slave Set: L3-5bb92-0-master [L3-5bb92-0]
 Masters: [ d-l303-a.dev-bos.csdops.net ]
 Slaves: [ d-l303-b.dev-bos.csdops.net ]
 L3-86a2c-1-ip  (ocf::heartbeat:IPaddr2):   Started 
d-l303-b.dev-bos.csdops.net
 Master/Slave Set: L3-86a2c-1-master [L3-86a2c-1]
 Masters: [ d-l303-b.dev-bos.csdops.net ]
 Slaves: [ d-l303-a.dev-bos.csdops.net ]
 L3-ip  (ocf::heartbeat:IPaddr2):   Started d-l303-a.dev-bos.csdops.net
 Master/Slave Set: L3-master [L3]
 Masters: [ d-l303-a.dev-bos.csdops.net ]
 Slaves: [ d-l303-b.dev-bos.csdops.net ]

PCSD Status:
  d-l303-b.dev-bos.csdops.net: Online
  d-l303-a.dev-bos.csdops.net: Online
  dev-ceph02.dev-bos.csdops.net: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Rpms:

pcs-0.9.137-13.el7_1.4.x86_64
pacemaker-cluster-libs-1.1.12-22.el7_1.4.x86_64
pacemaker-cli-1.1.12-22.el7_1.4.x86_64
pacemaker-libs-1.1.12-22.el7_1.4.x86_64
pacemaker-1.1.12-22.el7_1.4.x86_64


Thanks a lot

Ilia Sokolinski___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org