Re: [openstack-dev] [TripleO][Heat][Kolla][Magnum] The zen of Heat, containers, and the future of TripleO

2016-04-03 Thread Andrew Beekhof
On Tue, Mar 29, 2016 at 6:02 AM, Dan Prince  wrote:

[...]

> That said regardless of what we eventually do with Pacemaker or Puppet
> it should be feasible for them both to co-exist.

The key thing to keep in mind if you're using Puppet to build a
cluster is that if you're doing something to a service that is or will
be managed by the cluster, that service should either:

- not be part of the cluster at that time, or
- the cluster needs to be told to ignore the service temporarily, or
- the act of taking the service down or bringing it up needs to be
done via cluster tools

NOT doing one of those, telling the cluster "here's a service, make
sure its available" and then screwing around with it, puts the cluster
and Puppet into conflict (essentially an internal split-brain) that
rarely ends well.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2016-03-18 Thread Andrew Beekhof
On Tue, Feb 16, 2016 at 2:58 AM, Bogdan Dobrelya 
wrote:

> Hello!
> A quick status update inline:
>

[snip]


> So, what's next?
>
> - I'm open for merging both [5], [6] of the existing OCF RA solutions,
> as it was proposed by Andrew Beekhof. Let's make it happen.
>

Great :-)
Oyvind (CC'd) is the relevant contact from our side, should he talk to you
or someone else?


>
> - Would be nice to make Travis CI based gate to the upstream
> rabbitmq-server's HA OCF RA. As for now, it relies on Fuel CI gates and
> manual testing with atlas boxes.
>
> - Please also consider Travis or a suchlike CI for the resource-agents'
> rabbit-cluster OCF RA as well.
>
> [1] https://github.com/bogdando/rabbitmq-cluster-ocf-vagrant
> [2] https://github.com/bogdando/packer-atlas-example
> [3] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf/
> [4] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf-wily/
> [5]
>
> https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
> [6]
>
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
>
> >
> > I'm also planning to refer this official RabbitMQ cluster setup guide in
> > the OpenStack HA guide as well [2].
>
> Done, see [7]
>
> [7] http://docs.openstack.org/ha-guide/controller-ha-rabbitmq.html
>
> >
> > PS. Original rabbitmq-users mail thread is here [3].
> > [openstack-operators] cross posted as well.
> >
> > [0] http://www.rabbitmq.com/pacemaker.html
> > [1] https://atlas.hashicorp.com/bogdando/boxes/rabbitmq-cluster-ocf
> > [2] https://bugs.launchpad.net/openstack-manuals/+bug/1497528
> > [3] https://groups.google.com/forum/#!topic/rabbitmq-users/BnoIQJb34Ao
> >
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel] HA cluster disk monitoring, failover and recovery

2015-11-17 Thread Andrew Beekhof

> On 18 Nov 2015, at 4:52 AM, Alex Schultz  wrote:
> 
> On Tue, Nov 17, 2015 at 11:12 AM, Vladimir Kuklin  
> wrote:
>> Bogdan
>> 
>> I think we should firstly check whether attribute deletion leads to node
>> starting its services or not. From what I read in the official Pacemaker
>> documentation, it should work out of the box without the need to restart the
>> node.
> 
> It does start up the services when the attribute is cleared. QA has a
> test to validate this as part of this change.
> 
>> And by the way the quote above mentions 'use ONE of the following methods'
>> meaning that we could actually use attribute deletion. The 2nd and the 3rd
>> options do the same - they clear short-living node attribute. So we need to
>> figure out why OCF script does not update the corresponding attribute by
>> itself.
>> 
> 
> https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/SysInfo#L215-L227
> 
> It doesn't have something that updates it to green because essentially
> when this condition hits, the sysinfo service is also stopped. It has
> no way of knowing when it is cleared because all the resources are
> stopped and there is no longer a service running to reset the
> attribute.

There needs to be a way to mark cluster resources (specifically the sysinfo 
one) as being immune to the “red” condition.
Alas it hasn’t bubbled up the priority list yet.  Complaining definitely helps 
make it more visible :)

In the short-term, a cron job that called the agent would probably do the trick.

>  We would need something outside of pacemaker to mark it OK
> or perhaps write a custom health strategy[0][1] that would not stop
> the sysinfo task and update the ocf script to update the status to
> green if all disks are OK.
> 
> -Alex
> 
> [0] 
> https://github.com/openstack/fuel-library/blob/master/deployment/puppet/cluster/manifests/sysinfo.pp#L50-L55
> [1] http://clusterlabs.org/wiki/SystemHealth
> 
>> 
>> 
>> On Tue, Nov 17, 2015 at 7:03 PM, Bogdan Dobrelya 
>> wrote:
>>> 
>>> On 17.11.2015 15:28, Kyrylo Galanov wrote:
 Hi Team,
>>> 
>>> Hello
>>> 
 
 I have been testing fail-over after free disk space is less than 512 mb.
 (https://review.openstack.org/#/c/240951/)
 Affected node is stopped correctly and services migrate to a healthy
 node.
 
 However, after free disk space is more than 512 mb again the node does
 not recover it's state to operating. Moreover, starting the resources
 manually would rather fail. In a nutshell, the pacemaker service / node
 should be restarted. Detailed information is available
 here:
 https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_configuration_basics_monitor_health.html
 
 How do we address this issue?
>>> 
>>> According to the docs you provided,
>>> " After a node's health status has turned to red, solve the issue that
>>> led to the problem. Then clear the red status to make the node eligible
>>> again for running resources. Log in to the cluster node and use one of
>>> the following methods:
>>> 
>>>Execute the following command:
>>> 
>>>crm node status-attr NODE delete #health_disk
>>> 
>>>Restart OpenAIS on that node.
>>> 
>>>Reboot the node.
>>> 
>>> The node will be returned to service and can run resources again. "
>>> 
>>> So this looks like an expected behaviour!
>>> 
>>> What else could be done:
>>> - We should check if we have this nuance documented, and submit a bug to
>>> fuel-docs team, if not yet there.
>>> - Submitting a bug and inspecting logs would be nice to do as well.
>>> I believe some optimizations may be done, bearing in mind this pacemaker
>>> cluster-recheck-interval and failure-timeout story [0].
>>> 
>>> [0]
>>> http://blog.kennyrasschaert.be/blog/2013/12/18/pacemaker-high-failability/
>>> 
 
 
 Best regards,
 Kyrylo
 
 
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Bogdan Dobrelya,
>>> Irc #bogdando
>>> 
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>> 
>> 
>> 
>> --
>> Yours Faithfully,
>> Vladimir Kuklin,
>> Fuel Library Tech Lead,
>> Mirantis, Inc.
>> +7 (495) 640-49-04
>> +7 (926) 702-39-68
>> Skype kuklinvv
>> 35bk3, Vorontsovskaya Str.
>> Moscow, Russia,
>> www.mirantis.com
>> www.mirantis.ru
>> vkuk...@mirantis.com
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscrib

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-15 Thread Andrew Beekhof

> On 13 Nov 2015, at 7:31 AM, Vladimir Kuklin  wrote:
> 
> Hi, Andrew
> 
> Thanks for a quick turnaround.
> 
> > The one I linked to in my original reply does:
> >   
> > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> 
> I do not have logs of testing of this script. Maybe, Bogdan has something to 
> tell about results of testing this script. From the first glance it does not 
> contain gigantic amount of workarounds we injected into our script to handle 
> various situations when a node fails to join or tries  to join a cluster that 
> does not want to accept it (in this case you need to kick it from the cluster 
> with forget_cluster_node and it starts an RPC multicall in rabbitmq internals 
> to all cluster nodes, including the dead one, hanging forever). Actually, we 
> started a long time ago with an approach similar to the one in the script 
> above, but we faced a lot of issues in the case when a node tries to join a 
> cluster after a dirty failover or a long time of being out of the cluster.

Thats really good info, much appreciated.
Peter, Oyvind: Sounds like it would be worth validating the agent we’re using 
in these types of situations.

Based on that we can plot a path forward.

> I do not have all the logs of which particular cases we were handling while 
> introducing that additional logic (it was an agile process, if you know what 
> I mean :-) ), but we finally came up with this almost 2K lines code script. 
> We are actively communicating with Pivotal folks on improving methods of 
> monitoring RabbitMQ cluster nodes or even switching to RabbitMQ 
> clusterer+autocluster plugins and writing new smaleer and fancier OCF script, 
> but this is only in plans for further Fuel releases, I guess.

:-)

> 
> >
> > > Changing the state isn’t ideal but there is precedent, the part that has 
> > > me concerned is the error codes coming out of notify.
> > > Apart from producing some log messages, I can’t think how it would 
> > > produce any recovery.
> >
> > > Unless you’re relying on the subsequent monitor operation to notice the 
> > > error state.
> > > I guess that would work but you might be waiting a while for it to notice.
> >
> > Yes, we are relying on subsequent monitor operations. We also have several 
> > OCF check levels to catch a case when one node does not have rabbitmq 
> > application started properly (btw, there was a strange bug that we had to 
> > wait for several non-zero checks to fail to get the resource to restart 
> > http://bugs.clusterlabs.org/show_bug.cgi?id=5243) .
> 
> Regarding this bug - it was very easy to reproduce - just add additional 
> check to 'Dummy' resource with non-intersecting interval returning 
> ERR_GENERIC code and the default check returning SUCCESS code. You will find 
> that it is restarting only after 2 consequent failures of non-zero level 
> check.

Ack. I’ve asked some people to look into it.

> 
> On Thu, Nov 12, 2015 at 10:58 PM, Andrew Beekhof  wrote:
> 
> > On 12 Nov 2015, at 10:44 PM, Vladimir Kuklin  wrote:
> >
> > Hi, Andrew
> >
> > >Ah good, I understood it correctly then :)
> > > I would be interested in your opinion of how the other agent does the 
> > > bootstrapping (ie. without notifications or master/slave).
> > >That makes sense, the part I’m struggling with is that it sounds like the 
> > >other agent shouldn’t work at all.
> > > Yet we’ve used it extensively and not experienced these kinds of hangs.
> > Regarding other scripts - I am not aware of any other scripts that actually 
> > handle cloned rabbitmq server. I may be mistaking, of course. So if you are 
> > aware if these scripts succeed in creating rabbitmq cluster which actually 
> > survives 1-node or all-node failure scenarios and reassembles the cluster 
> > automatically - please, let us know.
> 
> The one I linked to in my original reply does:
> 
>
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> 
> >
> > > Changing the state isn’t ideal but there is precedent, the part that has 
> > > me concerned is the error codes coming out of notify.
> > > Apart from producing some log messages, I can’t think how it would 
> > > produce any recovery.
> >
> > > Unless you’re relying on the subsequent monitor operation to notice the 
> > > error state.
> > > I guess that would work but you might be waiting a while for it to notice.
> >
> > Yes, we are relying on subsequent monitor operations. We also have several 
> > OCF check levels to catch

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-12 Thread Andrew Beekhof

> On 12 Nov 2015, at 10:44 PM, Vladimir Kuklin  wrote:
> 
> Hi, Andrew
> 
> >Ah good, I understood it correctly then :)
> > I would be interested in your opinion of how the other agent does the 
> > bootstrapping (ie. without notifications or master/slave).
> >That makes sense, the part I’m struggling with is that it sounds like the 
> >other agent shouldn’t work at all.
> > Yet we’ve used it extensively and not experienced these kinds of hangs.
> Regarding other scripts - I am not aware of any other scripts that actually 
> handle cloned rabbitmq server. I may be mistaking, of course. So if you are 
> aware if these scripts succeed in creating rabbitmq cluster which actually 
> survives 1-node or all-node failure scenarios and reassembles the cluster 
> automatically - please, let us know.

The one I linked to in my original reply does:

   
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster

> 
> > Changing the state isn’t ideal but there is precedent, the part that has me 
> > concerned is the error codes coming out of notify.
> > Apart from producing some log messages, I can’t think how it would produce 
> > any recovery.
> 
> > Unless you’re relying on the subsequent monitor operation to notice the 
> > error state.
> > I guess that would work but you might be waiting a while for it to notice.
> 
> Yes, we are relying on subsequent monitor operations. We also have several 
> OCF check levels to catch a case when one node does not have rabbitmq 
> application started properly (btw, there was a strange bug that we had to 
> wait for several non-zero checks to fail to get the resource to restart 
> http://bugs.clusterlabs.org/show_bug.cgi?id=5243) .

It appears I misunderstood your bug the first time around :-(
Do you still have logs of this occuring?

> I now remember, why we did notify errors - for error logging, I guess.
>  
> 
> On Thu, Nov 12, 2015 at 1:30 AM, Andrew Beekhof  wrote:
> 
> > On 11 Nov 2015, at 11:35 PM, Vladimir Kuklin  wrote:
> >
> > Hi, Andrew
> >
> > Let me answer your questions.
> >
> > This agent is active/active which actually marks one of the node as 
> > 'pseudo'-master which is used as a target for other nodes to join to. We 
> > also check which node is a master and use it in monitor action to check 
> > whether this node is clustered with this 'master' node. When we do cluster 
> > bootstrap, we need to decide which node to mark as a master node. Then, 
> > when it starts (actually, promotes), we can finally pick its name through 
> > notification mechanism and ask other nodes to join this cluster.
> 
> Ah good, I understood it correctly then :)
> I would be interested in your opinion of how the other agent does the 
> bootstrapping (ie. without notifications or master/slave).
> 
> >
> > Regarding disconnect_node+forget_cluster_node this is quite simple - we 
> > need to eject node from the cluster. Otherwise it is mentioned in the list 
> > of cluster nodes and a lot of cluster actions, e.g. list_queues, will hang 
> > forever as well as forget_cluster_node action.
> 
> That makes sense, the part I’m struggling with is that it sounds like the 
> other agent shouldn’t work at all.
> Yet we’ve used it extensively and not experienced these kinds of hangs.
> 
> >
> > We also handle this case whenever a node leaves the cluster. If you 
> > remember, I wrote an email to Pacemaker ML regarding getting notifications 
> > on node unjoin event '[openstack-dev] [Fuel][Pacemaker][HA] Notifying 
> > clones of offline nodes’.
> 
> Oh, I recall that now.
> 
> > So we went another way and added a dbus daemon listener that does the same 
> > when node lefts corosync cluster (we know that this is a little bit racy, 
> > but disconnect+forget actions pair is idempotent).
> >
> > Regarding notification commands - we changed behaviour to the one that 
> > fitter our use cases better and passed our destructive tests. It could be 
> > Pacemaker-version dependent, so I agree we should consider changing this 
> > behaviour. But so far it worked for us.
> 
> Changing the state isn’t ideal but there is precedent, the part that has me 
> concerned is the error codes coming out of notify.
> Apart from producing some log messages, I can’t think how it would produce 
> any recovery.
> 
> Unless you’re relying on the subsequent monitor operation to notice the error 
> state.
> I guess that would work but you might be waiting a while for it to notice.
> 
> >
> > On Wed, Nov 11, 2015 at 2:12 PM, Andrew Beekhof  wrote:
> >
> &

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-11 Thread Andrew Beekhof

> On 11 Nov 2015, at 11:35 PM, Vladimir Kuklin  wrote:
> 
> Hi, Andrew
> 
> Let me answer your questions.
> 
> This agent is active/active which actually marks one of the node as 
> 'pseudo'-master which is used as a target for other nodes to join to. We also 
> check which node is a master and use it in monitor action to check whether 
> this node is clustered with this 'master' node. When we do cluster bootstrap, 
> we need to decide which node to mark as a master node. Then, when it starts 
> (actually, promotes), we can finally pick its name through notification 
> mechanism and ask other nodes to join this cluster. 

Ah good, I understood it correctly then :)
I would be interested in your opinion of how the other agent does the 
bootstrapping (ie. without notifications or master/slave).

> 
> Regarding disconnect_node+forget_cluster_node this is quite simple - we need 
> to eject node from the cluster. Otherwise it is mentioned in the list of 
> cluster nodes and a lot of cluster actions, e.g. list_queues, will hang 
> forever as well as forget_cluster_node action. 

That makes sense, the part I’m struggling with is that it sounds like the other 
agent shouldn’t work at all.
Yet we’ve used it extensively and not experienced these kinds of hangs.

> 
> We also handle this case whenever a node leaves the cluster. If you remember, 
> I wrote an email to Pacemaker ML regarding getting notifications on node 
> unjoin event '[openstack-dev] [Fuel][Pacemaker][HA] Notifying clones of 
> offline nodes’.

Oh, I recall that now.

> So we went another way and added a dbus daemon listener that does the same 
> when node lefts corosync cluster (we know that this is a little bit racy, but 
> disconnect+forget actions pair is idempotent).
> 
> Regarding notification commands - we changed behaviour to the one that fitter 
> our use cases better and passed our destructive tests. It could be 
> Pacemaker-version dependent, so I agree we should consider changing this 
> behaviour. But so far it worked for us.

Changing the state isn’t ideal but there is precedent, the part that has me 
concerned is the error codes coming out of notify.
Apart from producing some log messages, I can’t think how it would produce any 
recovery.

Unless you’re relying on the subsequent monitor operation to notice the error 
state.
I guess that would work but you might be waiting a while for it to notice.

> 
> On Wed, Nov 11, 2015 at 2:12 PM, Andrew Beekhof  wrote:
> 
> > On 11 Nov 2015, at 6:26 PM, bdobre...@mirantis.com wrote:
> >
> > Thank you Andrew.
> > Answers below.
> > >>>
> > Sounds interesting, can you give any comment about how it differs to the 
> > other[i] upstream agent?
> > Am I right that this one is effectively A/P and wont function without some 
> > kind of shared storage?
> > Any particular reason you went down this path instead of full A/A?
> >
> > [i]
> > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> > <<<
> > It is based on multistate clone notifications. It requries nothing shared 
> > but Corosync info base CIB where all Pacemaker resources stored anyway.
> > And it is fully A/A.
> 
> Oh!  So I should skip the A/P parts before "Auto-configuration of a cluster 
> with a Pacemaker”?
> Is the idea that the master mode is for picking a node to bootstrap the 
> cluster?
> 
> If so I don’t believe that should be necessary provided you specify 
> ordered=true for the clone.
> This allows you to assume in the agent that your instance is the only one 
> currently changing state (by starting or stopping).
> I notice that rabbitmq.com explicitly sets this to false… any particular 
> reason?
> 
> 
> Regarding the pcs command to create the resource, you can simplify it to:
> 
> pcs resource create --force --master p_rabbitmq-server 
> ocf:rabbitmq:rabbitmq-server-ha \
>   erlang_cookie=DPMDALGUKEOMPTHWPYKC node_port=5672 \
>   op monitor interval=30 timeout=60 \
>   op monitor interval=27 role=Master timeout=60 \
>   op monitor interval=103 role=Slave timeout=60 OCF_CHECK_LEVEL=30 \
>   meta notify=true ordered=false interleave=true master-max=1 
> master-node-max=1
> 
> If you update the stop/start/notify/promote/demote timeouts in the agent’s 
> metadata.
> 
> 
> Lines 1602,1565,1621,1632,1657, and 1678 have the notify command returning an 
> error.
> Was this logic tested? Because pacemaker does not currently support/allow 
> notify actions to fail.
> IIRC pacemaker simply ignores them.
> 
> Modifying the resource state in notifications is also highly unusual.
> What was the reason for that?
> 
> I notice that on node

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-11 Thread Andrew Beekhof

> On 11 Nov 2015, at 6:26 PM, bdobre...@mirantis.com wrote:
> 
> Thank you Andrew.
> Answers below.
> >>>
> Sounds interesting, can you give any comment about how it differs to the 
> other[i] upstream agent?
> Am I right that this one is effectively A/P and wont function without some 
> kind of shared storage?
> Any particular reason you went down this path instead of full A/A?
> 
> [i] 
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> <<<
> It is based on multistate clone notifications. It requries nothing shared but 
> Corosync info base CIB where all Pacemaker resources stored anyway.
> And it is fully A/A.

Oh!  So I should skip the A/P parts before "Auto-configuration of a cluster 
with a Pacemaker”? 
Is the idea that the master mode is for picking a node to bootstrap the cluster?

If so I don’t believe that should be necessary provided you specify 
ordered=true for the clone.
This allows you to assume in the agent that your instance is the only one 
currently changing state (by starting or stopping).
I notice that rabbitmq.com explicitly sets this to false… any particular reason?


Regarding the pcs command to create the resource, you can simplify it to:

pcs resource create --force --master p_rabbitmq-server 
ocf:rabbitmq:rabbitmq-server-ha \
  erlang_cookie=DPMDALGUKEOMPTHWPYKC node_port=5672 \
  op monitor interval=30 timeout=60 \
  op monitor interval=27 role=Master timeout=60 \
  op monitor interval=103 role=Slave timeout=60 OCF_CHECK_LEVEL=30 \
  meta notify=true ordered=false interleave=true master-max=1 master-node-max=1

If you update the stop/start/notify/promote/demote timeouts in the agent’s 
metadata.


Lines 1602,1565,1621,1632,1657, and 1678 have the notify command returning an 
error.
Was this logic tested? Because pacemaker does not currently support/allow 
notify actions to fail.
IIRC pacemaker simply ignores them.

Modifying the resource state in notifications is also highly unusual.
What was the reason for that?

I notice that on node down, this agent makes disconnect_node and 
forget_cluster_node calls.
The other upstream agent does not, do you have any information about the bad 
things that might happen as a result?

Basically I’m looking for what each option does differently/better with a view 
to converging on a single implementation. 
I don’t much care in which location it lives.

I’m CC’ing the other upstream maintainer, it would be good if you guys could 
have a chat :-)

> All running rabbit nodes may process AMQP connections. Master state is only 
> for a cluster initial point at wich other slaves may join to it.
> Note, here you can find events flow charts as well [0]
> [0] https://www.rabbitmq.com/pacemaker.html
> Regards,
> Bogdan
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-09 Thread Andrew Beekhof

> On 23 Oct 2015, at 7:01 PM, Bogdan Dobrelya  wrote:
> 
> Hello.
> I'm glad to announce that the pacemaker OCF resource agent for the
> rabbitmq clustering, which was born in the Fuel project initially, now
> available and maintained upstream! It will be shipped with the
> rabbitmq-server 3.5.7 package (release by November, 2015)
> 
> You can read about this OCF agent in the official guide [0] (flow charts
> for promote/demote/start/stop actions in progress).

Sounds interesting, can you give any comment about how it differs to the 
other[i] upstream agent?
Am I right that this one is effectively A/P and wont function without some kind 
of shared storage?
Any particular reason you went down this path instead of full A/A?

[i] 
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster

> 
> And you can try it as a tiny cluster example with a Vagrant box for
> Atlas [1]. Note, this only installs an Ubuntu box with a
> Corosync/Pacemaker & RabbitMQ clusters running, no Fuel or OpenStack
> required :-)
> 
> I'm also planning to refer this official RabbitMQ cluster setup guide in
> the OpenStack HA guide as well [2].
> 
> PS. Original rabbitmq-users mail thread is here [3].
> [openstack-operators] cross posted as well.
> 
> [0] http://www.rabbitmq.com/pacemaker.html
> [1] https://atlas.hashicorp.com/bogdando/boxes/rabbitmq-cluster-ocf
> [2] https://bugs.launchpad.net/openstack-manuals/+bug/1497528
> [3] https://groups.google.com/forum/#!topic/rabbitmq-users/BnoIQJb34Ao
> 
> -- 
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active

2015-08-06 Thread Andrew Beekhof

> On 5 Aug 2015, at 1:34 am, Joshua Harlow  wrote:
> 
> Philipp Marek wrote:
>>> If we end up using a DLM then we have to detect when the connection to
>>> the DLM is lost on a node and stop all ongoing operations to prevent
>>> data corruption.
>>> 
>>> It may not be trivial to do, but we will have to do it in any solution
>>> we use, even on my last proposal that only uses the DB in Volume Manager
>>> we would still need to stop all operations if we lose connection to the
>>> DB.
>> 
>> Well, is it already decided that Pacemaker would be chosen to provide HA in
>> Openstack? There's been a talk "Pacemaker: the PID 1 of Openstack" IIRC.
>> 
>> I know that Pacemaker's been pushed aside in an earlier ML post, but IMO
>> there's already *so much* been done for HA in Pacemaker that Openstack
>> should just use it.
>> 
>> All HA nodes needs to participate in a Pacemaker cluster - and if one node
>> looses connection, all services will get stopped automatically (by
>> Pacemaker) - or the node gets fenced.
>> 
>> 
>> No need to invent some sloppy scripts to do exactly the tasks (badly!) that
>> the Linux HA Stack has been providing for quite a few years.
>> 
>> 
>> Yes, Pacemaker needs learning - but not more than any other involved
>> project, and there are already quite a few here, which have to be known to
>> any operator or developer already.
>> 
>> 
>> (BTW, LINBIT sells training for the Linux HA Cluster Stack - and yes,
>>  I work for them ;)
> 
> So just a piece of information, but yahoo (the company I work for, with vms 
> in the tens of thousands, baremetal in the much more than that...) hasn't 
> used pacemaker, and in all honesty this is the first project (openstack) that 
> I have heard that needs such a solution. I feel that we really should be 
> building our services better so that they can be A-A vs having to depend on 
> another piece of software to get around our 'sloppiness' (for lack of a 
> better word).

HA is a deceptively hard problem.
There is really no need for every project to attempt to solve it on their own.
Having everyone consuming/calculating a different membership list is a very 
good way to go insane.

Aside from the usual bugs, the HA space lends itself to making simplifying 
assumptions early on, only to trap you with them down the road.
Its even worse if you’re trying to bolt it on after-the-fact...

Perhaps try to think of pacemaker as a distribute finite state machine instead 
of a cluster manager.
That is part of the value we bring to projects like galera and rabbitmq.

Sure they are A-A, and once they’re up they can survive many failures, but 
bringing them up can be non-trivial.
We also provide the additional context (eg. quorum and fencing) that allow more 
kinds of failures to be safely recovered from.

Something to think about perhaps.

— Andrew

> 
> Nothing against pacemaker personally... IMHO it just doesn't feel like we are 
> doing this right if we need such a product in the first place.
> 
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Keystone][Fernet] HA SQL backend for Fernet keys

2015-08-03 Thread Andrew Beekhof

> On 3 Aug 2015, at 8:02 pm, Sergii Golovatiuk  wrote:
> 
> Hi,
> 
> I agree with Bogdan that key rotation procedure should be part of HA solution.

These things don’t usually have to be an either/or situation.
Why not create one script that does the work and can be called manually if 
desired but also create an agent that pacemaker can use to call it at the 
appropriate time.

> If you make a simple script then this script will be a single point of 
> failure. It requires operator's attention so it may lead to human errors 
> also. Adding monitoring around it or expiration time is not a solution either.
> 
> There are couple of approaches how to make 'key rotation' HA ready.
> 
> 1. Make it as part of pacemaker OCF script. In this case pacemaker will 
> select the node which will be Master. It will be responsible for key 
> generations. In this case OCF script should have logic how to distribute 
> keys. It may be puppet or some rsync wrappers like lsyncd or special function 
> in OCF script itself. In this case when master is dead, pacemaker will elect 
> a new master while old one is down.
> 
> 2. Make keystone HA ready by itself. In this case, all logic of distributed 
> system should be covered in keystone. keystone should be able to detect 
> peers, it should have some consensus algorithms (PAXOS, RAFT, ZAB). Using 
> this algorithm master should be elected. Master should generate keys and 
> distribute them somehow to all other peers. Key distribution may be done via 
> rsync or using memcache/db as centralized storage for keys. Master may send a 
> event to all peers or peers may check memcache/db periodically.
> 
> 
> 
> 
> 
> --
> Best regards,
> Sergii Golovatiuk,
> Skype #golserge
> IRC #holser
> 
> On Mon, Aug 3, 2015 at 2:37 AM, David Medberry  wrote:
> Glad to see you weighed in on this. -d
> 
> On Sat, Aug 1, 2015 at 3:50 PM, Matt Fischer  wrote:
> Agree that you guys are way over thinking this. You don't need to rotate keys 
> at exactly the same time, we do it in within a one or two hours typically 
> based on how our regions are setup. We do it with puppet, puppet runs on one 
> keystone node at a time and drops the keys into place. The actual rotation 
> and generation we handle with a script that then proposes the new key 
> structure as a review which is then approved and deployed via the normal 
> process. For this process I always drop keys 0, 1, 2 into place, I'm not 
> bumping the numbers like the normal rotations do.
> 
> We had also considered ansible which would be perfect for this, but that 
> makes our ability to setup throw away environments with a single click a bit 
> more complicated. If you don't have that requirement, a simple ansible script 
> is what you should do. 
> 
> 
> On Sat, Aug 1, 2015 at 3:41 PM, Clint Byrum  wrote:
> Excerpts from Boris Bobrov's message of 2015-08-01 14:18:21 -0700:
> > On Saturday 01 August 2015 16:27:17 bdobre...@mirantis.com wrote:
> > > I suggest to use pacemaker multistate clone resource to rotate and
> > rsync
> > > fernet tokens from local directories across cluster nodes. The resource
> > > prototype is described here
> > > https://etherpad.openstack.org/p/fernet_tokens_pacemaker> Pros:
> > Pacemaker
> > > will care about CAP/split-brain stuff for us, we just design rotate and
> > > rsync logic. Also no shared FS/DB involved but only Corosync CIB - to
> > store
> > > few internal resource state related params, not tokens. Cons: Keystone
> > > nodes hosting fernet tokens directories must be members of pacemaker
> > > cluster. Also custom OCF script should be created to implement this. __
> > > Regards,
> > > Bogdan Dobrelya.
> > > IRC: bogdando
> >
> > Looks complex.
> >
> > I suggest this kind of bash or python script, running on Fuel master node:
> >
> > 0. Check that all controllers are online;
> > 1. Go to one of the controllers, rotate keys there;
> > 2. Fetch key 0 from there;
> > 3. For each other controller rotate keys there and put the 0-key instead of
> > their new 0-key.
> > 4. If any of the nodes fail to get new keys (because they went offline or 
> > for
> > some other reason) revert the rotate (move the key with the biggest index
> > back to 0).
> >
> > The script can be launched by cron or by button in Fuel.
> >
> > I don't see anything critically bad if one rotation/sync event fails.
> >
> 
> This too is overly complex and will cause failures. If you replace key 0,
> you will stop validating tokens that were encrypted with the old key 0.
> 
> You simply need to run rotate on one, and then rsync that key repository
> to all of the others. You _must not_ run rotate again until you rsync to
> all of the others, since the key 0 from one rotation becomes the primary
> token encrypting key going forward, so you need it to get pushed out to
> all nodes as 0 first.
> 
> Don't over think it. Just read http://lbragstad.com/?p=133 and it will
> remain simple.
> 
> _

Re: [openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

2015-05-19 Thread Andrew Beekhof

> On 20 May 2015, at 6:05 am, Andrew Woodward  wrote:
> 
> 
> 
> On Thu, May 7, 2015 at 5:01 PM Andrew Beekhof  wrote:
> 
> > On 5 May 2015, at 1:19 pm, Zhou Zheng Sheng / 周征晟  
> > wrote:
> >
> > Thank you Andrew.
> >
> > on 2015/05/05 08:03, Andrew Beekhof wrote:
> >>> On 28 Apr 2015, at 11:15 pm, Bogdan Dobrelya  
> >>> wrote:
> >>>
> >>>> Hello,
> >>> Hello, Zhou
> >>>
> >>>> I using Fuel 6.0.1 and find that RabbitMQ recover time is long after
> >>>> power failure. I have a running HA environment, then I reset power of
> >>>> all the machines at the same time. I observe that after reboot it
> >>>> usually takes 10 minutes for RabittMQ cluster to appear running
> >>>> master-slave mode in pacemaker. If I power off all the 3 controllers and
> >>>> only start 2 of them, the downtime sometimes can be as long as 20 
> >>>> minutes.
> >>> Yes, this is a known issue [0]. Note, there were many bugfixes, like
> >>> [1],[2],[3], merged for MQ OCF script, so you may want to try to
> >>> backport them as well by the following guide [4]
> >>>
> >>> [0] https://bugs.launchpad.net/fuel/+bug/1432603
> >>> [1] https://review.openstack.org/#/c/175460/
> >>> [2] https://review.openstack.org/#/c/175457/
> >>> [3] https://review.openstack.org/#/c/175371/
> >>> [4] https://review.openstack.org/#/c/170476/
> >> Is there a reason you’re using a custom OCF script instead of the 
> >> upstream[a] one?
> >> Please have a chat with David (the maintainer, in CC) if there is 
> >> something you believe is wrong with it.
> >>
> >> [a] 
> >> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> >
> > I'm using the OCF script from the Fuel project, specifically from the
> > "6.0" stable branch [alpha].
> 
> Ah, I’m still learning who is who... i thought you were part of that project 
> :-)
> 
> >
> > Comparing with upstream OCF code, the main difference is that Fuel
> > RabbitMQ OCF is a master-slave resource. Fuel RabbitMQ OCF does more
> > bookkeeping, for example, blocking client access when RabbitMQ cluster
> > is not ready. I beleive the upstream OCF should be OK to use as well
> > after I read the code, but it might not fit into the Fuel project. As
> > far as I test, the Fuel OCF script is good except sometimes the full
> > reassemble time is long, and as I find out, it is mostly because the
> > Fuel MySQL Galera OCF script keeps pacemaker from promoting RabbitMQ
> > resource, as I mentioned in the previous emails.
> >
> > Maybe Vladimir and Sergey can give us more insight on why Fuel needs a
> > master-slave RabbitMQ.
> 
> That would be good to know.
> Browsing the agent, promote seems to be a no-op if rabbit is already running.
> 
> 
> To the master / slave reason due to how the ocf script is structured to deal 
> with rabbit's poor ability to handle its self in some scenarios. Hopefully 
> the state transition diagram [5] is enough to clarify what's going on.
> 
> [5] http://goo.gl/PPNrw7

Not really.
It seems to be under the impression you can skip started and go directly from 
stopped to master.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

2015-05-07 Thread Andrew Beekhof

> On 5 May 2015, at 7:52 pm, Bogdan Dobrelya  wrote:
> 
> On 05.05.2015 04:32, Andrew Beekhof wrote:
>> 
>> 
>> [snip]
>> 
>> 
>> Technically it calculates an ordered graph of actions that need to be 
>> performed for a set of related resources.
>> You can see an example of the kinds of graphs it produces at:
>> 
>>   
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-config-testing-changes.html
>> 
>> There is a more complex one which includes promotion and demotion on the 
>> next page.
>> 
>> The number of actions that can run at any one time is therefor limited by
>> - the value of batch-limit (the total number of in-flight actions)
>> - the number of resources that do not have ordering constraints between them 
>> (eg. rsc{1,2,3} in the above example)  
>> 
>> So in the above example, if batch-limit >= 3, the monitor_0 actions will 
>> still all execute in parallel.
>> If batch-limit == 2, one of them will be deferred until the others complete.
>> 
>> Processing of the graph stops the moment any action returns a value that was 
>> not expected.
>> If that happens, we wait for currently in-flight actions to complete, 
>> re-calculate a new graph based on the new information and start again.
>> 
>> 
>> First we do a non-recurring monitor (*_monitor_0) to check what state the 
>> resource is in.
>> We can’t assume its off because a) we might have crashed, b) the admin might 
>> have accidentally configured it to start at boot or c) the admin may have 
>> asked us to re-check everything.
>> 
>> 
>> Also important to know, the order of actions is:

I should clarify something here:

   s/actions is/actions for each resource is/

>> 
>> 1. any necessary demotions
>> 2. any necessary stops
>> 3. any necessary starts
>> 4. any necessary promotions
>> 
>> 
> 
> Thank you for explaining this, Andrew!
> 
> So, in the context of the given two example DB(MySQL) and
> messaging(RabbitMQ) resources:
> 
> "The problem is that pacemaker can only promote a resource after it
> detects the resource is started. During a full reassemble, in the first
> transition batch, pacemaker starts all the resources including MySQL and
> RabbitMQ. Pacemaker issues resource agent "start" invocation in parallel
> and reaps the results.
> For a multi-state resource agent like RabbitMQ, pacemaker needs the
> start result reported in the first batch, then transition engine and
> policy engine decide if it has to retry starting or promote, and put
> this new transition job into a new batch."
> 
> So, for given example, it looks like we currently have:
> _batch start_
> ...
> 3. DB, messaging resources start in a one batch

Since there is no dependancy between them, yes.

> 4. messaging resource promote blocked by the step 3 completion
> _batch end_

Not quite, I wasn’t as clear as I could have been in my previous email.

We wont promote Rabbit instances until all they have all been started.
However we don’t need to wait for all the DBs to finish starting (again, 
because there is no dependancy between them) before we begin promoting Rabbit.

So a single transition that did this is totally possible:

t0.  Begin transition
t1.  Rabbit start node1(begin)
t2.  DB start node 3   (begin)
t3.  DB start node 2   (begin)
t4.  Rabbit start node2(begin)
t5.  Rabbit start node3(begin)
t6.  DB start node 1   (begin)
t7.  Rabbit start node2(complete)
t8.  Rabbit start node1(complete)
t9.  DB start node 3   (complete)
t10. Rabbit start node3(complete)
t11. Rabbit promote node 1 (begin)
t12. Rabbit promote node 3 (begin)
t13. Rabbit promote node 2 (begin)
... etc etc ...

For something like cinder however, these are some of the dependancies we define:

pcs constraint order start keystone-clone then cinder-api-clone
pcs constraint order start cinder-api-clone then cinder-scheduler-clone
pcs constraint order start galera-master then keystone-clone

So first all the galera instances must be started. Then we can begin to promote 
some.
Once all the promotions complete, then we can start the keystone instances.
Once all the keystone instances are up, then we can bring up the cinder API 
instances, which allows us to start the scheduler, etc etc.

And assuming nothing fails, this can all happen in one transition.

Bottom line: Pacemaker will do as much as it can as soon as it can.  
The only restrictions are ordering constraints you specify, the batch-limit, 
and each master/slave (or clone) resource’s _internal_ 
demote->stop->start->promote ordering.

Am I making it better or worse?

> 
> Does this

Re: [openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

2015-05-07 Thread Andrew Beekhof

> On 5 May 2015, at 9:30 pm, Zhou Zheng Sheng / 周征晟  
> wrote:
> 
> Thank you Andrew. Sorry for misspell your name in the previous email.
> 
> on 2015/05/05 14:25, Andrew Beekhof wrote:
>>> On 5 May 2015, at 2:31 pm, Zhou Zheng Sheng / 周征晟  
>>> wrote:
>>> 
>>> Thank you Bogdan for clearing the pacemaker promotion process for me.
>>> 
>>> on 2015/05/05 10:32, Andrew Beekhof wrote:
>>>>> On 29 Apr 2015, at 5:38 pm, Zhou Zheng Sheng / 周征晟 
>>>>>  wrote:
>>>> [snip]
>>>> 
>>>>> Batch is a pacemaker concept I found when I was reading its
>>>>> documentation and code. There is a "batch-limit: 30" in the output of
>>>>> "pcs property list --all". The pacemaker official documentation
>>>>> explanation is that it's "The number of jobs that the TE is allowed to
>>>>> execute in parallel." From my understanding, pacemaker maintains cluster
>>>>> states, and when we start/stop/promote/demote a resource, it triggers a
>>>>> state transition. Pacemaker puts as many as possible transition jobs
>>>>> into a batch, and process them in parallel.
>>>> Technically it calculates an ordered graph of actions that need to be 
>>>> performed for a set of related resources.
>>>> You can see an example of the kinds of graphs it produces at:
>>>> 
>>>>  
>>>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-config-testing-changes.html
>>>> 
>>>> There is a more complex one which includes promotion and demotion on the 
>>>> next page.
>>>> 
>>>> The number of actions that can run at any one time is therefor limited by
>>>> - the value of batch-limit (the total number of in-flight actions)
>>>> - the number of resources that do not have ordering constraints between 
>>>> them (eg. rsc{1,2,3} in the above example)  
>>>> 
>>>> So in the above example, if batch-limit >= 3, the monitor_0 actions will 
>>>> still all execute in parallel.
>>>> If batch-limit == 2, one of them will be deferred until the others 
>>>> complete.
>>>> 
>>>> Processing of the graph stops the moment any action returns a value that 
>>>> was not expected.
>>>> If that happens, we wait for currently in-flight actions to complete, 
>>>> re-calculate a new graph based on the new information and start again.
>>> So can I infer the following statement? In a big cluster with many
>>> resources, chances are some resource agent actions return unexpected
>>> values,
>> The size of the cluster shouldn’t increase the chance of this happening 
>> unless you’ve set the timeouts too aggressively.
> 
> If there are many types of resource agents, and anyone of them is not
> well written, it might cause trouble, right?

Yes, but really only for the things that depend on it.

For example if resources B, C, D, E all depend (in some way) on A, then their 
startup is going to be delayed.
But F, G, H and J will be able to start while we wait around for B to time out.

> 
>>> and if any of the in-flight action timeout is long, it would
>>> block pacemaker from re-calculating a new transition graph?
>> Yes, but its actually an argument for making the timeouts longer, not 
>> shorter.
>> Setting the timeouts too aggressively actually increases downtime because of 
>> all the extra delays and recovery it induces.
>> So set them to be long enough that there is unquestionably a problem if you 
>> hit them.
>> 
>> But we absolutely recognise that starting/stopping a database can take a 
>> very long time comparatively and that it shouldn’t block recovery of other 
>> unrelated services.
>> I would expect to see this land in Pacemaker 1.1.14
> 
> It will be great to see this in Pacemaker 1.1.14. From my experience
> using Pacemaker, I think customized resource agents are possibly the
> weakest part.

This is why we encourage people wanting new agents to get involved with the 
upstream resource-agents project :-)

> This feature should improve the handling for resource
> action timeouts.
> 
>>> I see the
>>> current batch-limit is 30 and I tried to increase it to 100, but did not
>>> help.
>> Correct.  It only puts an upper limit on the number of in-flight actions, 
>> actions still need to wait for all their dependants to complete before 
>> executing.
>> 
>>> I'm sure that the c

Re: [openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

2015-05-07 Thread Andrew Beekhof

> On 5 May 2015, at 1:19 pm, Zhou Zheng Sheng / 周征晟  
> wrote:
> 
> Thank you Andrew.
> 
> on 2015/05/05 08:03, Andrew Beekhof wrote:
>>> On 28 Apr 2015, at 11:15 pm, Bogdan Dobrelya  wrote:
>>> 
>>>> Hello,
>>> Hello, Zhou
>>> 
>>>> I using Fuel 6.0.1 and find that RabbitMQ recover time is long after
>>>> power failure. I have a running HA environment, then I reset power of
>>>> all the machines at the same time. I observe that after reboot it
>>>> usually takes 10 minutes for RabittMQ cluster to appear running
>>>> master-slave mode in pacemaker. If I power off all the 3 controllers and
>>>> only start 2 of them, the downtime sometimes can be as long as 20 minutes.
>>> Yes, this is a known issue [0]. Note, there were many bugfixes, like
>>> [1],[2],[3], merged for MQ OCF script, so you may want to try to
>>> backport them as well by the following guide [4]
>>> 
>>> [0] https://bugs.launchpad.net/fuel/+bug/1432603
>>> [1] https://review.openstack.org/#/c/175460/
>>> [2] https://review.openstack.org/#/c/175457/
>>> [3] https://review.openstack.org/#/c/175371/
>>> [4] https://review.openstack.org/#/c/170476/
>> Is there a reason you’re using a custom OCF script instead of the 
>> upstream[a] one?
>> Please have a chat with David (the maintainer, in CC) if there is something 
>> you believe is wrong with it.
>> 
>> [a] 
>> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> 
> I'm using the OCF script from the Fuel project, specifically from the
> "6.0" stable branch [alpha].

Ah, I’m still learning who is who... i thought you were part of that project 
:-) 

> 
> Comparing with upstream OCF code, the main difference is that Fuel
> RabbitMQ OCF is a master-slave resource. Fuel RabbitMQ OCF does more
> bookkeeping, for example, blocking client access when RabbitMQ cluster
> is not ready. I beleive the upstream OCF should be OK to use as well
> after I read the code, but it might not fit into the Fuel project. As
> far as I test, the Fuel OCF script is good except sometimes the full
> reassemble time is long, and as I find out, it is mostly because the
> Fuel MySQL Galera OCF script keeps pacemaker from promoting RabbitMQ
> resource, as I mentioned in the previous emails.
> 
> Maybe Vladimir and Sergey can give us more insight on why Fuel needs a
> master-slave RabbitMQ.

That would be good to know.
Browsing the agent, promote seems to be a no-op if rabbit is already running.

> I see Vladimir and Sergey works on the original
> Fuel blueprint "RabbitMQ cluster" [beta].
> 
> [alpha]
> https://github.com/stackforge/fuel-library/blob/stable/6.0/deployment/puppet/nova/files/ocf/rabbitmq
> [beta]
> https://blueprints.launchpad.net/fuel/+spec/rabbitmq-cluster-controlled-by-pacemaker
> 
>>>> I have a little investigation and find out there are some possible causes.
>>>> 
>>>> 1. MySQL Recovery Takes Too Long [1] and Blocking RabbitMQ Clustering in
>>>> Pacemaker
>>>> 
>>>> The pacemaker resource p_mysql start timeout is set to 475s. Sometimes
>>>> MySQL-wss fails to start after power failure, and pacemaker would wait
>>>> 475s before retry starting it. The problem is that pacemaker divides
>>>> resource state transitions into batches. Since RabbitMQ is master-slave
>>>> resource, I assume that starting all the slaves and promoting master are
>>>> put into two different batches. If unfortunately starting all RabbitMQ
>>>> slaves are put in the same batch as MySQL starting, even if RabbitMQ
>>>> slaves and all other resources are ready, pacemaker will not continue
>>>> but just wait for MySQL timeout.
>>> Could you please elaborate the what is the same/different batches for MQ
>>> and DB? Note, there is a MQ clustering logic flow charts available here
>>> [5] and we're planning to release a dedicated technical bulletin for this.
>>> 
>>> [5] http://goo.gl/PPNrw7
>>> 
>>>> I can re-produce this by hard powering off all the controllers and start
>>>> them again. It's more likely to trigger MySQL failure in this way. Then
>>>> I observe that if there is one cloned mysql instance not starting, the
>>>> whole pacemaker cluster gets stuck and does not emit any log. On the
>>>> host of the failed instance, I can see a mysql resource agent process
>>>> calling the sleep command. If I kill that process, the pacemaker comes
>>>>

Re: [openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

2015-05-04 Thread Andrew Beekhof

> On 5 May 2015, at 2:31 pm, Zhou Zheng Sheng / 周征晟  
> wrote:
> 
> Thank you Bogdan for clearing the pacemaker promotion process for me.
> 
> on 2015/05/05 10:32, Andrew Beekhof wrote:
>>> On 29 Apr 2015, at 5:38 pm, Zhou Zheng Sheng / 周征晟  
>>> wrote:
>> [snip]
>> 
>>> Batch is a pacemaker concept I found when I was reading its
>>> documentation and code. There is a "batch-limit: 30" in the output of
>>> "pcs property list --all". The pacemaker official documentation
>>> explanation is that it's "The number of jobs that the TE is allowed to
>>> execute in parallel." From my understanding, pacemaker maintains cluster
>>> states, and when we start/stop/promote/demote a resource, it triggers a
>>> state transition. Pacemaker puts as many as possible transition jobs
>>> into a batch, and process them in parallel.
>> Technically it calculates an ordered graph of actions that need to be 
>> performed for a set of related resources.
>> You can see an example of the kinds of graphs it produces at:
>> 
>>   
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-config-testing-changes.html
>> 
>> There is a more complex one which includes promotion and demotion on the 
>> next page.
>> 
>> The number of actions that can run at any one time is therefor limited by
>> - the value of batch-limit (the total number of in-flight actions)
>> - the number of resources that do not have ordering constraints between them 
>> (eg. rsc{1,2,3} in the above example)  
>> 
>> So in the above example, if batch-limit >= 3, the monitor_0 actions will 
>> still all execute in parallel.
>> If batch-limit == 2, one of them will be deferred until the others complete.
>> 
>> Processing of the graph stops the moment any action returns a value that was 
>> not expected.
>> If that happens, we wait for currently in-flight actions to complete, 
>> re-calculate a new graph based on the new information and start again.
> So can I infer the following statement? In a big cluster with many
> resources, chances are some resource agent actions return unexpected
> values,

The size of the cluster shouldn’t increase the chance of this happening unless 
you’ve set the timeouts too aggressively.

> and if any of the in-flight action timeout is long, it would
> block pacemaker from re-calculating a new transition graph?

Yes, but its actually an argument for making the timeouts longer, not shorter.
Setting the timeouts too aggressively actually increases downtime because of 
all the extra delays and recovery it induces.
So set them to be long enough that there is unquestionably a problem if you hit 
them.

But we absolutely recognise that starting/stopping a database can take a very 
long time comparatively and that it shouldn’t block recovery of other unrelated 
services.
I would expect to see this land in Pacemaker 1.1.14


> I see the
> current batch-limit is 30 and I tried to increase it to 100, but did not
> help.

Correct.  It only puts an upper limit on the number of in-flight actions, 
actions still need to wait for all their dependants to complete before 
executing.

> I'm sure that the cloned MySQL Galera resource is not related to
> master-slave RabbitMQ resource. I don't find any dependency, order or
> rule connecting them in the cluster deployed by Fuel [1].

In general it should not have needed to wait, but if you send me a crm_report 
covering the period you’re talking about I’ll be able to comment specifically 
about the behaviour you saw.

> 
> Is there anything I can do to make sure all the resource actions return
> expected values in a full reassembling?

In general, if we say ‘start’, do your best to start or return ‘0’ if you 
already were started.
Likewise for stop.

Otherwise its really specific to your agent.
For example an IP resource just needs to add itself to an interface - it cant 
do much differently, if it times out then the system much be very very busy.

The only other thing I would say is:
- avoid blocking calls where possible
- have empathy for the machine (do as little as is needed)

> Is it because node-1 and node-2
> happen to boot faster than node-3 and form a cluster, when node-3 joins,
> it triggers new state transition? Or may because some resources are
> already started, so pacemaker needs to stop them firstly?

We only stop them if they shouldn’t yet be running (ie. a colocation or 
ordering dependancy is not yet started also).


> Does setting
> default-resource-stickiness to 1 help?

From 0 or INFINITY?

> 
> I also tried "crm history XXX" commands in a live and correct cluster,

I’m not familiar with t

Re: [openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

2015-05-04 Thread Andrew Beekhof

> On 29 Apr 2015, at 5:38 pm, Zhou Zheng Sheng / 周征晟  
> wrote:

[snip]

> Batch is a pacemaker concept I found when I was reading its
> documentation and code. There is a "batch-limit: 30" in the output of
> "pcs property list --all". The pacemaker official documentation
> explanation is that it's "The number of jobs that the TE is allowed to
> execute in parallel." From my understanding, pacemaker maintains cluster
> states, and when we start/stop/promote/demote a resource, it triggers a
> state transition. Pacemaker puts as many as possible transition jobs
> into a batch, and process them in parallel.

Technically it calculates an ordered graph of actions that need to be performed 
for a set of related resources.
You can see an example of the kinds of graphs it produces at:

   
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/s-config-testing-changes.html

There is a more complex one which includes promotion and demotion on the next 
page.

The number of actions that can run at any one time is therefor limited by
- the value of batch-limit (the total number of in-flight actions)
- the number of resources that do not have ordering constraints between them 
(eg. rsc{1,2,3} in the above example)  

So in the above example, if batch-limit >= 3, the monitor_0 actions will still 
all execute in parallel.
If batch-limit == 2, one of them will be deferred until the others complete.

Processing of the graph stops the moment any action returns a value that was 
not expected.
If that happens, we wait for currently in-flight actions to complete, 
re-calculate a new graph based on the new information and start again.

> 
> The problem is that pacemaker can only promote a resource after it
> detects the resource is started.

First we do a non-recurring monitor (*_monitor_0) to check what state the 
resource is in.
We can’t assume its off because a) we might have crashed, b) the admin might 
have accidentally configured it to start at boot or c) the admin may have asked 
us to re-check everything.

> During a full reassemble, in the first
> transition batch, pacemaker starts all the resources including MySQL and
> RabbitMQ. Pacemaker issues resource agent "start" invocation in parallel
> and reaps the results.
> 
> For a multi-state resource agent like RabbitMQ, pacemaker needs the
> start result reported in the first batch, then transition engine and
> policy engine decide if it has to retry starting or promote, and put
> this new transition job into a new batch.

Also important to know, the order of actions is:

1. any necessary demotions
2. any necessary stops
3. any necessary starts
4. any necessary promotions



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel] Speed Up RabbitMQ Recovering

2015-05-04 Thread Andrew Beekhof

> On 28 Apr 2015, at 11:15 pm, Bogdan Dobrelya  wrote:
> 
>> Hello,
> 
> Hello, Zhou
> 
>> 
>> I using Fuel 6.0.1 and find that RabbitMQ recover time is long after
>> power failure. I have a running HA environment, then I reset power of
>> all the machines at the same time. I observe that after reboot it
>> usually takes 10 minutes for RabittMQ cluster to appear running
>> master-slave mode in pacemaker. If I power off all the 3 controllers and
>> only start 2 of them, the downtime sometimes can be as long as 20 minutes.
> 
> Yes, this is a known issue [0]. Note, there were many bugfixes, like
> [1],[2],[3], merged for MQ OCF script, so you may want to try to
> backport them as well by the following guide [4]
> 
> [0] https://bugs.launchpad.net/fuel/+bug/1432603
> [1] https://review.openstack.org/#/c/175460/
> [2] https://review.openstack.org/#/c/175457/
> [3] https://review.openstack.org/#/c/175371/
> [4] https://review.openstack.org/#/c/170476/

Is there a reason you’re using a custom OCF script instead of the upstream[a] 
one?
Please have a chat with David (the maintainer, in CC) if there is something you 
believe is wrong with it.

[a] 
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster

> 
>> 
>> I have a little investigation and find out there are some possible causes.
>> 
>> 1. MySQL Recovery Takes Too Long [1] and Blocking RabbitMQ Clustering in
>> Pacemaker
>> 
>> The pacemaker resource p_mysql start timeout is set to 475s. Sometimes
>> MySQL-wss fails to start after power failure, and pacemaker would wait
>> 475s before retry starting it. The problem is that pacemaker divides
>> resource state transitions into batches. Since RabbitMQ is master-slave
>> resource, I assume that starting all the slaves and promoting master are
>> put into two different batches. If unfortunately starting all RabbitMQ
>> slaves are put in the same batch as MySQL starting, even if RabbitMQ
>> slaves and all other resources are ready, pacemaker will not continue
>> but just wait for MySQL timeout.
> 
> Could you please elaborate the what is the same/different batches for MQ
> and DB? Note, there is a MQ clustering logic flow charts available here
> [5] and we're planning to release a dedicated technical bulletin for this.
> 
> [5] http://goo.gl/PPNrw7
> 
>> 
>> I can re-produce this by hard powering off all the controllers and start
>> them again. It's more likely to trigger MySQL failure in this way. Then
>> I observe that if there is one cloned mysql instance not starting, the
>> whole pacemaker cluster gets stuck and does not emit any log. On the
>> host of the failed instance, I can see a mysql resource agent process
>> calling the sleep command. If I kill that process, the pacemaker comes
>> back alive and RabbitMQ master gets promoted. In fact this long timeout
>> is blocking every resource from state transition in pacemaker.
>> 
>> This maybe a known problem of pacemaker and there are some discussions
>> in Linux-HA mailing list [2]. It might not be fixed in the near future.
>> It seems in generally it's bad to have long timeout in state transition
>> actions (start/stop/promote/demote). There maybe another way to
>> implement MySQL-wss resource agent to use a short start timeout and
>> monitor the wss cluster state using monitor action.
> 
> This is very interesting, thank you! I believe all commands for MySQL RA
> OCF script should be as well wrapped with timeout -SIGTERM or -SIGKILL
> as we did for MQ RA OCF. And there should no be any sleep calls. I
> created a bug for this [6].
> 
> [6] https://bugs.launchpad.net/fuel/+bug/1449542
> 
>> 
>> I also find a fix to improve MySQL start timeout [3]. It shortens the
>> timeout to 300s. At the time I sending this email, I can not find it in
>> stable/6.0 branch. Maybe the maintainer needs to cherry-pick it to
>> stable/6.0 ?
>> 
>> [1] https://bugs.launchpad.net/fuel/+bug/1441885
>> [2] http://lists.linux-ha.org/pipermail/linux-ha/2014-March/047989.html
>> [3] https://review.openstack.org/#/c/171333/
>> 
>> 
>> 2. RabbitMQ Resource Agent Breaks Existing Cluster
>> 
>> Read the code of the RabbitMQ resource agent, I find it does the
>> following to start RabbitMQ master-slave cluster.
>> On all the controllers:
>> (1) Start Erlang beam process
>> (2) Start RabbitMQ App (If failed, reset mnesia DB and cluster state)
>> (3) Stop RabbitMQ App but do not stop the beam process
>> 
>> Then in pacemaker, all the RabbitMQ instances are in slave state. After
>> pacemaker determines the master, it does the following.
>> On the to-be-master host:
>> (4) Start RabbitMQ App (If failed, reset mnesia DB and cluster state)
>> On the slaves hosts:
>> (5) Start RabbitMQ App (If failed, reset mnesia DB and cluster state)
>> (6) Join RabbitMQ cluster of the master host
>> 
> 
> Yes, something like that. As I mentioned, there were several bug fixes
> in the 6.1 dev, and you can also check the MQ clustering flow charts.
> 
>> As far as I can understand,

Re: [openstack-dev] [nova] Host health monitoring

2015-01-11 Thread Andrew Beekhof

> On 9 Jan 2015, at 5:37 am, Joe Gordon  wrote:
> 
> 
> 
> On Sun, Jan 4, 2015 at 7:08 PM, Andrew Beekhof  wrote:
> 
> > On 9 Dec 2014, at 1:20 am, Roman Dobosz  wrote:
> >
> > On Wed, 3 Dec 2014 08:44:57 +0100
> > Roman Dobosz  wrote:
> >
> >> I've just started to work on the topic of detection if host is alive or
> >> not: https://blueprints.launchpad.net/nova/+spec/host-health-monitoring
> >>
> >> I'll appreciate any comments :)
> >
> > I've submitted another blueprint, which is closely bounded with previous 
> > one:
> > https://blueprints.launchpad.net/nova/+spec/pacemaker-servicegroup-driver
> >
> > The idea behind those two blueprints is to enable Nova to be aware of host
> > status, not only services that run on such. Bringing Pacemaker as a driver 
> > for
> > servicegroup will provide us with two things: fencing and reliable 
> > information
> > about host state, therefore we can avoid situations, where some actions will
> > misinterpret information like service state with host state.
> >
> > Comments?
> 
> I would rather move the servicegroup concept to use tooz and put things like 
> Pacemaker in there (https://review.openstack.org/138607)

Perhaps I'm missing something obvious, but from looking at git, its unclear to 
me how tooz addresses the fencing requirement.

There also seems to be an assumption that all services are completely 
independent - is that correct?
In practice this is rarely the case (or worse, pushes the 'wait and retry' 
logic onto the individual services.

Finally, could you clarify what you mean by "put things like Pacemaker in 
there".  
Presumably "there" is a tooz driver of some kind?  What would that achieve?



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Host health monitoring

2015-01-04 Thread Andrew Beekhof

> On 9 Dec 2014, at 1:20 am, Roman Dobosz  wrote:
> 
> On Wed, 3 Dec 2014 08:44:57 +0100
> Roman Dobosz  wrote:
> 
>> I've just started to work on the topic of detection if host is alive or
>> not: https://blueprints.launchpad.net/nova/+spec/host-health-monitoring
>> 
>> I'll appreciate any comments :)
> 
> I've submitted another blueprint, which is closely bounded with previous one: 
> https://blueprints.launchpad.net/nova/+spec/pacemaker-servicegroup-driver
> 
> The idea behind those two blueprints is to enable Nova to be aware of host 
> status, not only services that run on such. Bringing Pacemaker as a driver 
> for 
> servicegroup will provide us with two things: fencing and reliable 
> information 
> about host state, therefore we can avoid situations, where some actions will 
> misinterpret information like service state with host state.
> 
> Comments?

Sounds like an excellent idea, is there code for these blueprints? If so, how 
do I get to see it?
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel] Waiting for Haproxy backends

2014-11-19 Thread Andrew Beekhof

> On 20 Nov 2014, at 6:55 am, Sergii Golovatiuk  
> wrote:
> 
> Hi crew,
> 
> Please see my inline comments.
> 
> Hi Everyone,
> 
> I was reading the blueprints mentioned here and thought I'd take the 
> opportunity to introduce myself and ask a few questions.
> For those that don't recognise my name, Pacemaker is my baby - so I take a 
> keen interest helping people have a good experience with it :)
> 
> A couple of items stood out to me (apologies if I repeat anything that is 
> already well understood):
> 
> * Operations with CIB utilizes almost 100% of CPU on the Controller
> 
>  We introduced a new CIB algorithm in 1.1.12 which is O(2) faster/less 
> resource hungry than prior versions.
>  I would be interested to hear your experiences with it if you are able to 
> upgrade to that version.
>  
> Our team is aware of that. That's really nice improvement. Thank you very 
> much for that. We've prepared all packages, though we have feature freeze. 
> Pacemaker 1.1.12 will be added to next release.
>  
> * Corosync shutdown process takes a lot of time
> 
>  Corosync (and Pacemaker) can shut down incredibly quickly.
>  If corosync is taking a long time, it will be because it is waiting for 
> pacemaker, and pacemaker is almost always waiting for for one of the 
> clustered services to shut down.
> 
> As part of improvement we have idea to split signalling layer (corosync) and 
> resource management (pacemaker) layers by specifying
> service { 
>name: pacemaker
>ver:  1
> }
> 
> and create upstart script to set start ordering. That will allow us
> 
> 1. Create some notifications in puppet for pacemaker
> 2. Restart and manage corosync and pacemaker independently
> 3. Use respawn in upstart to restart corosync or pacemaker
> 
> 
> * Current Fuel Architecture is limited to Corosync 1.x and Pacemaker 1.x
> 
>  Corosync 2 is really the way to go.
>  Is there something in particular that is holding you back?
>  Also, out of interest, are you using cman or the pacemaker plugin?
> 
> We use almost standard corosync 1.x and pacemaker from CentOS 6.5

Please be aware that the plugin is not long for this world on CentOS.
It was already removed once (in 6.4 betas) and is not even slightly tested at 
RH and about the only ones using it upstream are SUSE.

http://blog.clusterlabs.org/blog/2013/pacemaker-on-rhel6-dot-4/ has some 
relevant details.
The short version is that I would really encourage a transition to CMAN (which 
is really just corosync 1.x plus a more mature and better tested plugin from 
the corosync people).
See http://clusterlabs.org/quickstart-redhat.html , its really quite painless.

> and Ubuntu 12.04. However, we've prepared corosync 2.x and pacemaker 1.1.12 
> packages. Also we have update puppet manifests on review. As was said above, 
> we can't just add at the end of development cycle.

Yep, makes sense.

>  
> 
> *  Diff operations against Corosync CIB require to save data to file rather
>   than keep all data in memory
> 
>  Can someone clarify this one for me?
>  
> That's our implementation for puppet. We can't just use shadow on distributed 
> environment, so we run 
> 
>  Also, I notice that the corosync init script has been modified to set/unset 
> maintenance-mode with cibadmin.
>  Any reason not to use crm_attribute instead?  You might find its a less 
> fragile solution than a hard-coded diff.
>  
> Can you give a particular line where you see that?  

I saw it in one of the bugs:
   https://bugs.launchpad.net/fuel/+bug/1340172

Maybe it is no longer accurate

> 
> * Debug process of OCF scripts is not unified requires a lot of actions from
>  Cloud Operator
> 
>  Two things to mention here... the first is crm_resource 
> --force-(start|stop|check) which queries the cluster for the resource's 
> definition but runs the command directly. 
>  Combined with -V, this means that you get to see everything the agent is 
> doing.
> 
> We write many own OCF scripts. We just need to see how OCF script behaves. 
> ocf_tester is not enough for our cases.

Agreed. ocf_tester is more for out-of-cluster regression testing, not really 
good for debugging a running cluster.

> I'll try if crm_resource -V --force-start is better.
>  
> 
>  Also, pacemaker now supports the ability for agents to emit specially 
> formatted error messages that are stored in the cib and can be shown back to 
> users.
>  This can make things much less painful for admins. Look for 
> PCMK_OCF_REASON_PREFIX in the upstream resource-agents project.
> 
> Thank you for tip. 
> 
> 
> * Openstack services are not managed by Pacemaker
> 
> The general idea to have all openstack services under pacemaker control 
> rather than having upstart and pacemaker. It will be very handy for operators 
> to see the status of all services from one console. Also it will give us 
> flexibility to have more complex service verification checks in monitor 
> function.
>  
> 
>  Oh?
> 
> * Compute nodes aren't in Pacemaker cluster, hen

Re: [openstack-dev] [Fuel] Waiting for Haproxy backends

2014-11-18 Thread Andrew Beekhof
Hi Everyone,

I was reading the blueprints mentioned here and thought I'd take the 
opportunity to introduce myself and ask a few questions.
For those that don't recognise my name, Pacemaker is my baby - so I take a keen 
interest helping people have a good experience with it :)

A couple of items stood out to me (apologies if I repeat anything that is 
already well understood):

* Operations with CIB utilizes almost 100% of CPU on the Controller

 We introduced a new CIB algorithm in 1.1.12 which is O(2) faster/less resource 
hungry than prior versions.
 I would be interested to hear your experiences with it if you are able to 
upgrade to that version.

* Corosync shutdown process takes a lot of time

 Corosync (and Pacemaker) can shut down incredibly quickly. 
 If corosync is taking a long time, it will be because it is waiting for 
pacemaker, and pacemaker is almost always waiting for for one of the clustered 
services to shut down.

* Current Fuel Architecture is limited to Corosync 1.x and Pacemaker 1.x

 Corosync 2 is really the way to go.
 Is there something in particular that is holding you back?
 Also, out of interest, are you using cman or the pacemaker plugin?

*  Diff operations against Corosync CIB require to save data to file rather
  than keep all data in memory

 Can someone clarify this one for me?

 Also, I notice that the corosync init script has been modified to set/unset 
maintenance-mode with cibadmin.
 Any reason not to use crm_attribute instead?  You might find its a less 
fragile solution than a hard-coded diff.

* Debug process of OCF scripts is not unified requires a lot of actions from
 Cloud Operator

 Two things to mention here... the first is crm_resource 
--force-(start|stop|check) which queries the cluster for the resource's 
definition but runs the command directly.
 Combined with -V, this means that you get to see everything the agent is doing.

 Also, pacemaker now supports the ability for agents to emit specially 
formatted error messages that are stored in the cib and can be shown back to 
users.
 This can make things much less painful for admins. Look for 
PCMK_OCF_REASON_PREFIX in the upstream resource-agents project.


* Openstack services are not managed by Pacemaker

 Oh?

* Compute nodes aren't in Pacemaker cluster, hence, are lacking a viable
 control plane for their's compute/nova services.

 pacemaker-remoted might be of some interest here.  
 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Remote/index.html


* Creating and committing shadows not only adds constant pain with dependencies 
and unneeded complexity but also rewrites cluster attributes and even other 
changes if you mess up with ordering and it’s really hard to debug it.

 Is this still an issue?  I'm reasonably sure this is specific to the way crmsh 
uses shadows.  
 Using the native tools it should be possible to commit only the delta, so any 
other changes that occur while you're updating the shadow would not be an 
issue, and existing attributes wouldn't be rewritten.

* Restarting resources by Puppet’s pacemaker service provider restarts them 
even if they are running on other nodes and it sometimes impacts the cluster.

 Not available yet, but upstream there is now a smart --restart option for 
crm_resource which can optionally take a --host parameter.
 Sounds like it would be useful here.  
 
http://blog.clusterlabs.org/blog/2014/feature-spotlight-smart-resource-restart-from-the-command-line/

* An attempt to stop or restart corosync service brings down a lot of resources 
and probably will fail and bring down the entire deployment.

 That sounds deeply worrying.  Details?

* Controllers other the the first download configured cib an immediate start 
all cloned resources before they are configured so they have to be cleaned up 
later.

 By this you mean clones are being started on nodes which do not have the 
software? Or before the ordering/colocation constraints have been configured?


> On 15 Nov 2014, at 10:31 am, Sergii Golovatiuk  
> wrote:
> 
> +1 for ha-pacemaker-improvements
> 
> --
> Best regards,
> Sergii Golovatiuk,
> Skype #golserge
> IRC #holser
> 
> On Fri, Nov 14, 2014 at 11:51 PM, Dmitry Borodaenko 
>  wrote:
> Good plan, but I really hate the name of this blueprint. I think we
> should stop lumping different unrelated HA improvements into a single
> blueprint with a generic name like that, especially when we already
> had a blueprint with essentially the same name
> (ha-pacemaker-improvements). There's nothing wrong with having 4
> trivial but specific blueprints instead of one catch-all.
> 
> On Wed, Nov 12, 2014 at 4:10 AM, Aleksandr Didenko
>  wrote:
> > HI,
> >
> > in order to make sure some critical Haproxy backends are running (like mysql
> > or keystone) before proceeding with deployment, we use execs like [1] or
> > [2].
> >
> > We're currently working on a minor improvements of those execs, but there is
> > another approach - we can replace tho