Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2016-04-08 Thread Bogdan Dobrelya
On 03/17/2016 10:01 AM, Bogdan Dobrelya wrote:
> On 03/17/2016 12:17 AM, Andrew Beekhof wrote:
>>
>>
>> On Tue, Feb 16, 2016 at 2:58 AM, Bogdan Dobrelya > > wrote:
>>
>> Hello!
>> A quick status update inline:
>>
>>
>> [snip] 
>>
>>
>> So, what's next?
>>
>> - I'm open for merging both [5], [6] of the existing OCF RA solutions,
>> as it was proposed by Andrew Beekhof. Let's make it happen.
>>
>>
>> Great :-)
>> Oyvind (CC'd) is the relevant contact from our side, should he talk to
>> you or someone else?
> 
> Yes, we should make a follow-up perhaps and define the plan for merging
> agents

An update!
The followup was very productive, the upstream OCF RAs merge plan is

1. The rabbitmq-server repo [0] will be the destination for the merged
OCF RA.
2. Fork it [1] for the merge process and enable a travis CI check for
incoming pull requests. Note that the check exists upstream as well but
has yet to be enabled properly. Also, we decided to run tests for *only*
a cluster assembled with the merged OCF RA, w/o other components like
OpenStack or DB cluster.
3. Update the travis CI check (see example job script [2]) to meet use
cases of the 2nd OCF RA, from the resource-agents repo, we want to merge
with. So the test case shall benefit all sides of the merge process.
4. Start submitting patches to the merged RA.
5. Once finished, submit it to the upstream source at step #1. Or do it
by every (stable) patch based as well.

Action items:
* Peter Lemenkov (petro) to try a dev env via vagrant scripts [3] as
well as check it against the (RHOS/resource-agents?) packages containing
the 2nd upstream OCF RA.
* Oyvind Albrigtsen (e-ddie) to investigate ASL/GPL licensing questions.
Hopefully no issues expected and the merged OCF RA should be licensed
with ASL v2.0.
* Bogdan Dobrelya (bogdando) to investigate destructive test cases for
the travis CI check as well. The plan is to use custom jepsen tests [4]
for that.

An update item #2:
I created the aforementioned jepsen test [4], to test a rabbit cluster
recovery after network partitions. The travis CI check may run it now as
well [5], although faily a little - It may time out, but works well for
my dev box :) How-to use can be found in the README [3].

[0]
https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
[1]
https://github.com/bogdando/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
[2]
https://github.com/bogdando/rabbitmq-server/blob/rabbit_ocf_ra_travis/scripts/travis_test_ocf_ra.sh
[3] https://github.com/bogdando/rabbitmq-cluster-ocf-vagrant
[4] https://github.com/bogdando/jepsen/tree/rabbit_pcmk/rabbitmq_ocf_pcmk
[5] https://travis-ci.org/bogdando/rabbitmq-server/jobs/121704254

> 
>>  
>>
>>
>> - Would be nice to make Travis CI based gate to the upstream
>> rabbitmq-server's HA OCF RA. As for now, it relies on Fuel CI gates and
>> manual testing with atlas boxes.
>>
>> - Please also consider Travis or a suchlike CI for the resource-agents'
>> rabbit-cluster OCF RA as well.
>>
>> [1] https://github.com/bogdando/rabbitmq-cluster-ocf-vagrant
>> [2] https://github.com/bogdando/packer-atlas-example
>> [3] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf/
>> [4] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf-wily/
>> [5]
>> 
>> https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
>> [6]
>> 
>> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
>>
>> >
>> > I'm also planning to refer this official RabbitMQ cluster setup guide 
>> in
>> > the OpenStack HA guide as well [2].
>>
>> Done, see [7]
>>
>> [7] http://docs.openstack.org/ha-guide/controller-ha-rabbitmq.html
>>
>> >
>> > PS. Original rabbitmq-users mail thread is here [3].
>> > [openstack-operators] cross posted as well.
>> >
>> > [0] http://www.rabbitmq.com/pacemaker.html
>> > [1] https://atlas.hashicorp.com/bogdando/boxes/rabbitmq-cluster-ocf
>> > [2] https://bugs.launchpad.net/openstack-manuals/+bug/1497528
>> > [3] https://groups.google.com/forum/#!topic/rabbitmq-users/BnoIQJb34Ao
>> >
>>
>>
>> --
>> Best regards,
>> Bogdan Dobrelya,
>> Irc #bogdando
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> 
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/li

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2016-03-18 Thread Andrew Beekhof
On Tue, Feb 16, 2016 at 2:58 AM, Bogdan Dobrelya 
wrote:

> Hello!
> A quick status update inline:
>

[snip]


> So, what's next?
>
> - I'm open for merging both [5], [6] of the existing OCF RA solutions,
> as it was proposed by Andrew Beekhof. Let's make it happen.
>

Great :-)
Oyvind (CC'd) is the relevant contact from our side, should he talk to you
or someone else?


>
> - Would be nice to make Travis CI based gate to the upstream
> rabbitmq-server's HA OCF RA. As for now, it relies on Fuel CI gates and
> manual testing with atlas boxes.
>
> - Please also consider Travis or a suchlike CI for the resource-agents'
> rabbit-cluster OCF RA as well.
>
> [1] https://github.com/bogdando/rabbitmq-cluster-ocf-vagrant
> [2] https://github.com/bogdando/packer-atlas-example
> [3] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf/
> [4] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf-wily/
> [5]
>
> https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
> [6]
>
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
>
> >
> > I'm also planning to refer this official RabbitMQ cluster setup guide in
> > the OpenStack HA guide as well [2].
>
> Done, see [7]
>
> [7] http://docs.openstack.org/ha-guide/controller-ha-rabbitmq.html
>
> >
> > PS. Original rabbitmq-users mail thread is here [3].
> > [openstack-operators] cross posted as well.
> >
> > [0] http://www.rabbitmq.com/pacemaker.html
> > [1] https://atlas.hashicorp.com/bogdando/boxes/rabbitmq-cluster-ocf
> > [2] https://bugs.launchpad.net/openstack-manuals/+bug/1497528
> > [3] https://groups.google.com/forum/#!topic/rabbitmq-users/BnoIQJb34Ao
> >
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2016-03-18 Thread Bogdan Dobrelya
On 03/17/2016 12:17 AM, Andrew Beekhof wrote:
> 
> 
> On Tue, Feb 16, 2016 at 2:58 AM, Bogdan Dobrelya  > wrote:
> 
> Hello!
> A quick status update inline:
> 
> 
> [snip] 
> 
> 
> So, what's next?
> 
> - I'm open for merging both [5], [6] of the existing OCF RA solutions,
> as it was proposed by Andrew Beekhof. Let's make it happen.
> 
> 
> Great :-)
> Oyvind (CC'd) is the relevant contact from our side, should he talk to
> you or someone else?

Yes, we should make a follow-up perhaps and define the plan for merging
agents

>  
> 
> 
> - Would be nice to make Travis CI based gate to the upstream
> rabbitmq-server's HA OCF RA. As for now, it relies on Fuel CI gates and
> manual testing with atlas boxes.
> 
> - Please also consider Travis or a suchlike CI for the resource-agents'
> rabbit-cluster OCF RA as well.
> 
> [1] https://github.com/bogdando/rabbitmq-cluster-ocf-vagrant
> [2] https://github.com/bogdando/packer-atlas-example
> [3] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf/
> [4] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf-wily/
> [5]
> 
> https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
> [6]
> 
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> 
> >
> > I'm also planning to refer this official RabbitMQ cluster setup guide in
> > the OpenStack HA guide as well [2].
> 
> Done, see [7]
> 
> [7] http://docs.openstack.org/ha-guide/controller-ha-rabbitmq.html
> 
> >
> > PS. Original rabbitmq-users mail thread is here [3].
> > [openstack-operators] cross posted as well.
> >
> > [0] http://www.rabbitmq.com/pacemaker.html
> > [1] https://atlas.hashicorp.com/bogdando/boxes/rabbitmq-cluster-ocf
> > [2] https://bugs.launchpad.net/openstack-manuals/+bug/1497528
> > [3] https://groups.google.com/forum/#!topic/rabbitmq-users/BnoIQJb34Ao
> >
> 
> 
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2016-02-15 Thread Bogdan Dobrelya
Hello!
A quick status update inline:

On 23.10.2015 10:01, Bogdan Dobrelya wrote:
> Hello.
> I'm glad to announce that the pacemaker OCF resource agent for the
> rabbitmq clustering, which was born in the Fuel project initially, now
> available and maintained upstream! It will be shipped with the
> rabbitmq-server 3.5.7 package (release by November, 2015)
> 
> You can read about this OCF agent in the official guide [0] (flow charts
> for promote/demote/start/stop actions in progress).
> 
> And you can try it as a tiny cluster example with a Vagrant box for
> Atlas [1]. Note, this only installs an Ubuntu box with a
> Corosync/Pacemaker & RabbitMQ clusters running, no Fuel or OpenStack
> required :-)

- Extracted Vagrantfile and cluster provision scripts to a separate repo
[1]. The packer example repo [2] now manages only atlas and docker
(new!) builds.

- Added docker images for Ubuntu Trusty [3] and Wily [4]. Only the
latter works stable, though. For Ubuntu Trusty, there are vagrant boxes
working w/o issues, so perhaps that is only docker related issue.
Perhaps I can build new ones for Xenial or other distros as well.

- Vagrantfile can now also deploy with the docker provider. Although,
there are few hacks to w/a unimplemented in vagrant docker things...

So, what's next?

- I'm open for merging both [5], [6] of the existing OCF RA solutions,
as it was proposed by Andrew Beekhoff. Let's make it happen.

- Would be nice to make Travis CI based gate to the upstream
rabbitmq-server's HA OCF RA. As for now, it relies on Fuel CI gates and
manual testing with atlas boxes.

- Please also consider Travis or a suchlike CI for the resource-agents'
rabbit-cluster OCF RA as well.

[1] https://github.com/bogdando/rabbitmq-cluster-ocf-vagrant
[2] https://github.com/bogdando/packer-atlas-example
[3] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf/
[4] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf-wily/
[5]
https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
[6]
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster

> 
> I'm also planning to refer this official RabbitMQ cluster setup guide in
> the OpenStack HA guide as well [2].

Done, see [7]

[7] http://docs.openstack.org/ha-guide/controller-ha-rabbitmq.html

> 
> PS. Original rabbitmq-users mail thread is here [3].
> [openstack-operators] cross posted as well.
> 
> [0] http://www.rabbitmq.com/pacemaker.html
> [1] https://atlas.hashicorp.com/bogdando/boxes/rabbitmq-cluster-ocf
> [2] https://bugs.launchpad.net/openstack-manuals/+bug/1497528
> [3] https://groups.google.com/forum/#!topic/rabbitmq-users/BnoIQJb34Ao
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-15 Thread Andrew Beekhof

> On 13 Nov 2015, at 7:31 AM, Vladimir Kuklin  wrote:
> 
> Hi, Andrew
> 
> Thanks for a quick turnaround.
> 
> > The one I linked to in my original reply does:
> >   
> > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> 
> I do not have logs of testing of this script. Maybe, Bogdan has something to 
> tell about results of testing this script. From the first glance it does not 
> contain gigantic amount of workarounds we injected into our script to handle 
> various situations when a node fails to join or tries  to join a cluster that 
> does not want to accept it (in this case you need to kick it from the cluster 
> with forget_cluster_node and it starts an RPC multicall in rabbitmq internals 
> to all cluster nodes, including the dead one, hanging forever). Actually, we 
> started a long time ago with an approach similar to the one in the script 
> above, but we faced a lot of issues in the case when a node tries to join a 
> cluster after a dirty failover or a long time of being out of the cluster.

Thats really good info, much appreciated.
Peter, Oyvind: Sounds like it would be worth validating the agent we’re using 
in these types of situations.

Based on that we can plot a path forward.

> I do not have all the logs of which particular cases we were handling while 
> introducing that additional logic (it was an agile process, if you know what 
> I mean :-) ), but we finally came up with this almost 2K lines code script. 
> We are actively communicating with Pivotal folks on improving methods of 
> monitoring RabbitMQ cluster nodes or even switching to RabbitMQ 
> clusterer+autocluster plugins and writing new smaleer and fancier OCF script, 
> but this is only in plans for further Fuel releases, I guess.

:-)

> 
> >
> > > Changing the state isn’t ideal but there is precedent, the part that has 
> > > me concerned is the error codes coming out of notify.
> > > Apart from producing some log messages, I can’t think how it would 
> > > produce any recovery.
> >
> > > Unless you’re relying on the subsequent monitor operation to notice the 
> > > error state.
> > > I guess that would work but you might be waiting a while for it to notice.
> >
> > Yes, we are relying on subsequent monitor operations. We also have several 
> > OCF check levels to catch a case when one node does not have rabbitmq 
> > application started properly (btw, there was a strange bug that we had to 
> > wait for several non-zero checks to fail to get the resource to restart 
> > http://bugs.clusterlabs.org/show_bug.cgi?id=5243) .
> 
> Regarding this bug - it was very easy to reproduce - just add additional 
> check to 'Dummy' resource with non-intersecting interval returning 
> ERR_GENERIC code and the default check returning SUCCESS code. You will find 
> that it is restarting only after 2 consequent failures of non-zero level 
> check.

Ack. I’ve asked some people to look into it.

> 
> On Thu, Nov 12, 2015 at 10:58 PM, Andrew Beekhof  wrote:
> 
> > On 12 Nov 2015, at 10:44 PM, Vladimir Kuklin  wrote:
> >
> > Hi, Andrew
> >
> > >Ah good, I understood it correctly then :)
> > > I would be interested in your opinion of how the other agent does the 
> > > bootstrapping (ie. without notifications or master/slave).
> > >That makes sense, the part I’m struggling with is that it sounds like the 
> > >other agent shouldn’t work at all.
> > > Yet we’ve used it extensively and not experienced these kinds of hangs.
> > Regarding other scripts - I am not aware of any other scripts that actually 
> > handle cloned rabbitmq server. I may be mistaking, of course. So if you are 
> > aware if these scripts succeed in creating rabbitmq cluster which actually 
> > survives 1-node or all-node failure scenarios and reassembles the cluster 
> > automatically - please, let us know.
> 
> The one I linked to in my original reply does:
> 
>
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> 
> >
> > > Changing the state isn’t ideal but there is precedent, the part that has 
> > > me concerned is the error codes coming out of notify.
> > > Apart from producing some log messages, I can’t think how it would 
> > > produce any recovery.
> >
> > > Unless you’re relying on the subsequent monitor operation to notice the 
> > > error state.
> > > I guess that would work but you might be waiting a while for it to notice.
> >
> > Yes, we are relying on subsequent monitor operations. We also have several 
> > OCF check levels to catch a case when one node does not have rabbitmq 
> > application started properly (btw, there was a strange bug that we had to 
> > wait for several non-zero checks to fail to get the resource to restart 
> > http://bugs.clusterlabs.org/show_bug.cgi?id=5243) .
> 
> It appears I misunderstood your bug the first time around :-(
> Do you still have logs of this occuring?
> 
> > I now remember, why we did notify errors - for error logging, I guess.
> >
> >
> > On 

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-12 Thread Vladimir Kuklin
Hi, Andrew

Thanks for a quick turnaround.

> The one I linked to in my original reply does:
>
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster

I do not have logs of testing of this script. Maybe, Bogdan has something
to tell about results of testing this script. From the first glance it does
not contain gigantic amount of workarounds we injected into our script to
handle various situations when a node fails to join or tries  to join a
cluster that does not want to accept it (in this case you need to kick it
from the cluster with forget_cluster_node and it starts an RPC multicall in
rabbitmq internals to all cluster nodes, including the dead one, hanging
forever). Actually, we started a long time ago with an approach similar to
the one in the script above, but we faced a lot of issues in the case when
a node tries to join a cluster after a dirty failover or a long time of
being out of the cluster. I do not have all the logs of which particular
cases we were handling while introducing that additional logic (it was an
agile process, if you know what I mean :-) ), but we finally came up with
this almost 2K lines code script. We are actively communicating with
Pivotal folks on improving methods of monitoring RabbitMQ cluster nodes or
even switching to RabbitMQ clusterer+autocluster plugins and writing new
smaleer and fancier OCF script, but this is only in plans for further Fuel
releases, I guess.

>
> > Changing the state isn’t ideal but there is precedent, the part that
has me concerned is the error codes coming out of notify.
> > Apart from producing some log messages, I can’t think how it would
produce any recovery.
>
> > Unless you’re relying on the subsequent monitor operation to notice the
error state.
> > I guess that would work but you might be waiting a while for it to
notice.
>
> Yes, we are relying on subsequent monitor operations. We also have
several OCF check levels to catch a case when one node does not have
rabbitmq application started properly (btw, there was a strange bug that we
had to wait for several non-zero checks to fail to get the resource to
restart http://bugs.clusterlabs.org/show_bug.cgi?id=5243) .

Regarding this bug - it was very easy to reproduce - just add additional
check to 'Dummy' resource with non-intersecting interval returning
ERR_GENERIC code and the default check returning SUCCESS code. You will
find that it is restarting only after 2 consequent failures of non-zero
level check.

On Thu, Nov 12, 2015 at 10:58 PM, Andrew Beekhof 
wrote:

>
> > On 12 Nov 2015, at 10:44 PM, Vladimir Kuklin 
> wrote:
> >
> > Hi, Andrew
> >
> > >Ah good, I understood it correctly then :)
> > > I would be interested in your opinion of how the other agent does the
> bootstrapping (ie. without notifications or master/slave).
> > >That makes sense, the part I’m struggling with is that it sounds like
> the other agent shouldn’t work at all.
> > > Yet we’ve used it extensively and not experienced these kinds of hangs.
> > Regarding other scripts - I am not aware of any other scripts that
> actually handle cloned rabbitmq server. I may be mistaking, of course. So
> if you are aware if these scripts succeed in creating rabbitmq cluster
> which actually survives 1-node or all-node failure scenarios and
> reassembles the cluster automatically - please, let us know.
>
> The one I linked to in my original reply does:
>
>
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
>
> >
> > > Changing the state isn’t ideal but there is precedent, the part that
> has me concerned is the error codes coming out of notify.
> > > Apart from producing some log messages, I can’t think how it would
> produce any recovery.
> >
> > > Unless you’re relying on the subsequent monitor operation to notice
> the error state.
> > > I guess that would work but you might be waiting a while for it to
> notice.
> >
> > Yes, we are relying on subsequent monitor operations. We also have
> several OCF check levels to catch a case when one node does not have
> rabbitmq application started properly (btw, there was a strange bug that we
> had to wait for several non-zero checks to fail to get the resource to
> restart http://bugs.clusterlabs.org/show_bug.cgi?id=5243) .
>
> It appears I misunderstood your bug the first time around :-(
> Do you still have logs of this occuring?
>
> > I now remember, why we did notify errors - for error logging, I guess.
> >
> >
> > On Thu, Nov 12, 2015 at 1:30 AM, Andrew Beekhof 
> wrote:
> >
> > > On 11 Nov 2015, at 11:35 PM, Vladimir Kuklin 
> wrote:
> > >
> > > Hi, Andrew
> > >
> > > Let me answer your questions.
> > >
> > > This agent is active/active which actually marks one of the node as
> 'pseudo'-master which is used as a target for other nodes to join to. We
> also check which node is a master and use it in monitor action to check
> whether this node is clustered with this 'master' node. When we do cluster
> bootstrap, we ne

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-12 Thread Andrew Beekhof

> On 12 Nov 2015, at 10:44 PM, Vladimir Kuklin  wrote:
> 
> Hi, Andrew
> 
> >Ah good, I understood it correctly then :)
> > I would be interested in your opinion of how the other agent does the 
> > bootstrapping (ie. without notifications or master/slave).
> >That makes sense, the part I’m struggling with is that it sounds like the 
> >other agent shouldn’t work at all.
> > Yet we’ve used it extensively and not experienced these kinds of hangs.
> Regarding other scripts - I am not aware of any other scripts that actually 
> handle cloned rabbitmq server. I may be mistaking, of course. So if you are 
> aware if these scripts succeed in creating rabbitmq cluster which actually 
> survives 1-node or all-node failure scenarios and reassembles the cluster 
> automatically - please, let us know.

The one I linked to in my original reply does:

   
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster

> 
> > Changing the state isn’t ideal but there is precedent, the part that has me 
> > concerned is the error codes coming out of notify.
> > Apart from producing some log messages, I can’t think how it would produce 
> > any recovery.
> 
> > Unless you’re relying on the subsequent monitor operation to notice the 
> > error state.
> > I guess that would work but you might be waiting a while for it to notice.
> 
> Yes, we are relying on subsequent monitor operations. We also have several 
> OCF check levels to catch a case when one node does not have rabbitmq 
> application started properly (btw, there was a strange bug that we had to 
> wait for several non-zero checks to fail to get the resource to restart 
> http://bugs.clusterlabs.org/show_bug.cgi?id=5243) .

It appears I misunderstood your bug the first time around :-(
Do you still have logs of this occuring?

> I now remember, why we did notify errors - for error logging, I guess.
>  
> 
> On Thu, Nov 12, 2015 at 1:30 AM, Andrew Beekhof  wrote:
> 
> > On 11 Nov 2015, at 11:35 PM, Vladimir Kuklin  wrote:
> >
> > Hi, Andrew
> >
> > Let me answer your questions.
> >
> > This agent is active/active which actually marks one of the node as 
> > 'pseudo'-master which is used as a target for other nodes to join to. We 
> > also check which node is a master and use it in monitor action to check 
> > whether this node is clustered with this 'master' node. When we do cluster 
> > bootstrap, we need to decide which node to mark as a master node. Then, 
> > when it starts (actually, promotes), we can finally pick its name through 
> > notification mechanism and ask other nodes to join this cluster.
> 
> Ah good, I understood it correctly then :)
> I would be interested in your opinion of how the other agent does the 
> bootstrapping (ie. without notifications or master/slave).
> 
> >
> > Regarding disconnect_node+forget_cluster_node this is quite simple - we 
> > need to eject node from the cluster. Otherwise it is mentioned in the list 
> > of cluster nodes and a lot of cluster actions, e.g. list_queues, will hang 
> > forever as well as forget_cluster_node action.
> 
> That makes sense, the part I’m struggling with is that it sounds like the 
> other agent shouldn’t work at all.
> Yet we’ve used it extensively and not experienced these kinds of hangs.
> 
> >
> > We also handle this case whenever a node leaves the cluster. If you 
> > remember, I wrote an email to Pacemaker ML regarding getting notifications 
> > on node unjoin event '[openstack-dev] [Fuel][Pacemaker][HA] Notifying 
> > clones of offline nodes’.
> 
> Oh, I recall that now.
> 
> > So we went another way and added a dbus daemon listener that does the same 
> > when node lefts corosync cluster (we know that this is a little bit racy, 
> > but disconnect+forget actions pair is idempotent).
> >
> > Regarding notification commands - we changed behaviour to the one that 
> > fitter our use cases better and passed our destructive tests. It could be 
> > Pacemaker-version dependent, so I agree we should consider changing this 
> > behaviour. But so far it worked for us.
> 
> Changing the state isn’t ideal but there is precedent, the part that has me 
> concerned is the error codes coming out of notify.
> Apart from producing some log messages, I can’t think how it would produce 
> any recovery.
> 
> Unless you’re relying on the subsequent monitor operation to notice the error 
> state.
> I guess that would work but you might be waiting a while for it to notice.
> 
> >
> > On Wed, Nov 11, 2015 at 2:12 PM, Andrew Beekhof  wrote:
> >
> > > On 11 Nov 2015, at 6:26 PM, bdobre...@mirantis.com wrote:
> > >
> > > Thank you Andrew.
> > > Answers below.
> > > >>>
> > > Sounds interesting, can you give any comment about how it differs to the 
> > > other[i] upstream agent?
> > > Am I right that this one is effectively A/P and wont function without 
> > > some kind of shared storage?
> > > Any particular reason you went down this path instead of full A/A?
> > >
> > > [i]
> > > https://github

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-12 Thread Vladimir Kuklin
Hi, Andrew

>Ah good, I understood it correctly then :)
> I would be interested in your opinion of how the other agent does the
bootstrapping (ie. without notifications or master/slave).
>That makes sense, the part I’m struggling with is that it sounds like the
other agent shouldn’t work at all.
> Yet we’ve used it extensively and not experienced these kinds of hangs.
Regarding other scripts - I am not aware of any other scripts that actually
handle cloned rabbitmq server. I may be mistaking, of course. So if you are
aware if these scripts succeed in creating rabbitmq cluster which actually
survives 1-node or all-node failure scenarios and reassembles the cluster
automatically - please, let us know.

> Changing the state isn’t ideal but there is precedent, the part that has
me concerned is the error codes coming out of notify.
> Apart from producing some log messages, I can’t think how it would
produce any recovery.

> Unless you’re relying on the subsequent monitor operation to notice the
error state.
> I guess that would work but you might be waiting a while for it to notice.

Yes, we are relying on subsequent monitor operations. We also have several
OCF check levels to catch a case when one node does not have rabbitmq
application started properly (btw, there was a strange bug that we had to
wait for several non-zero checks to fail to get the resource to restart
http://bugs.clusterlabs.org/show_bug.cgi?id=5243) . I now remember, why we
did notify errors - for error logging, I guess.


On Thu, Nov 12, 2015 at 1:30 AM, Andrew Beekhof  wrote:

>
> > On 11 Nov 2015, at 11:35 PM, Vladimir Kuklin 
> wrote:
> >
> > Hi, Andrew
> >
> > Let me answer your questions.
> >
> > This agent is active/active which actually marks one of the node as
> 'pseudo'-master which is used as a target for other nodes to join to. We
> also check which node is a master and use it in monitor action to check
> whether this node is clustered with this 'master' node. When we do cluster
> bootstrap, we need to decide which node to mark as a master node. Then,
> when it starts (actually, promotes), we can finally pick its name through
> notification mechanism and ask other nodes to join this cluster.
>
> Ah good, I understood it correctly then :)
> I would be interested in your opinion of how the other agent does the
> bootstrapping (ie. without notifications or master/slave).
>
> >
> > Regarding disconnect_node+forget_cluster_node this is quite simple - we
> need to eject node from the cluster. Otherwise it is mentioned in the list
> of cluster nodes and a lot of cluster actions, e.g. list_queues, will hang
> forever as well as forget_cluster_node action.
>
> That makes sense, the part I’m struggling with is that it sounds like the
> other agent shouldn’t work at all.
> Yet we’ve used it extensively and not experienced these kinds of hangs.
>
> >
> > We also handle this case whenever a node leaves the cluster. If you
> remember, I wrote an email to Pacemaker ML regarding getting notifications
> on node unjoin event '[openstack-dev] [Fuel][Pacemaker][HA] Notifying
> clones of offline nodes’.
>
> Oh, I recall that now.
>
> > So we went another way and added a dbus daemon listener that does the
> same when node lefts corosync cluster (we know that this is a little bit
> racy, but disconnect+forget actions pair is idempotent).
> >
> > Regarding notification commands - we changed behaviour to the one that
> fitter our use cases better and passed our destructive tests. It could be
> Pacemaker-version dependent, so I agree we should consider changing this
> behaviour. But so far it worked for us.
>
> Changing the state isn’t ideal but there is precedent, the part that has
> me concerned is the error codes coming out of notify.
> Apart from producing some log messages, I can’t think how it would produce
> any recovery.
>
> Unless you’re relying on the subsequent monitor operation to notice the
> error state.
> I guess that would work but you might be waiting a while for it to notice.
>
> >
> > On Wed, Nov 11, 2015 at 2:12 PM, Andrew Beekhof 
> wrote:
> >
> > > On 11 Nov 2015, at 6:26 PM, bdobre...@mirantis.com wrote:
> > >
> > > Thank you Andrew.
> > > Answers below.
> > > >>>
> > > Sounds interesting, can you give any comment about how it differs to
> the other[i] upstream agent?
> > > Am I right that this one is effectively A/P and wont function without
> some kind of shared storage?
> > > Any particular reason you went down this path instead of full A/A?
> > >
> > > [i]
> > >
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> > > <<<
> > > It is based on multistate clone notifications. It requries nothing
> shared but Corosync info base CIB where all Pacemaker resources stored
> anyway.
> > > And it is fully A/A.
> >
> > Oh!  So I should skip the A/P parts before "Auto-configuration of a
> cluster with a Pacemaker”?
> > Is the idea that the master mode is for picking a node to bootstrap the
> cluster?
> >

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-11 Thread Andrew Beekhof

> On 11 Nov 2015, at 11:35 PM, Vladimir Kuklin  wrote:
> 
> Hi, Andrew
> 
> Let me answer your questions.
> 
> This agent is active/active which actually marks one of the node as 
> 'pseudo'-master which is used as a target for other nodes to join to. We also 
> check which node is a master and use it in monitor action to check whether 
> this node is clustered with this 'master' node. When we do cluster bootstrap, 
> we need to decide which node to mark as a master node. Then, when it starts 
> (actually, promotes), we can finally pick its name through notification 
> mechanism and ask other nodes to join this cluster. 

Ah good, I understood it correctly then :)
I would be interested in your opinion of how the other agent does the 
bootstrapping (ie. without notifications or master/slave).

> 
> Regarding disconnect_node+forget_cluster_node this is quite simple - we need 
> to eject node from the cluster. Otherwise it is mentioned in the list of 
> cluster nodes and a lot of cluster actions, e.g. list_queues, will hang 
> forever as well as forget_cluster_node action. 

That makes sense, the part I’m struggling with is that it sounds like the other 
agent shouldn’t work at all.
Yet we’ve used it extensively and not experienced these kinds of hangs.

> 
> We also handle this case whenever a node leaves the cluster. If you remember, 
> I wrote an email to Pacemaker ML regarding getting notifications on node 
> unjoin event '[openstack-dev] [Fuel][Pacemaker][HA] Notifying clones of 
> offline nodes’.

Oh, I recall that now.

> So we went another way and added a dbus daemon listener that does the same 
> when node lefts corosync cluster (we know that this is a little bit racy, but 
> disconnect+forget actions pair is idempotent).
> 
> Regarding notification commands - we changed behaviour to the one that fitter 
> our use cases better and passed our destructive tests. It could be 
> Pacemaker-version dependent, so I agree we should consider changing this 
> behaviour. But so far it worked for us.

Changing the state isn’t ideal but there is precedent, the part that has me 
concerned is the error codes coming out of notify.
Apart from producing some log messages, I can’t think how it would produce any 
recovery.

Unless you’re relying on the subsequent monitor operation to notice the error 
state.
I guess that would work but you might be waiting a while for it to notice.

> 
> On Wed, Nov 11, 2015 at 2:12 PM, Andrew Beekhof  wrote:
> 
> > On 11 Nov 2015, at 6:26 PM, bdobre...@mirantis.com wrote:
> >
> > Thank you Andrew.
> > Answers below.
> > >>>
> > Sounds interesting, can you give any comment about how it differs to the 
> > other[i] upstream agent?
> > Am I right that this one is effectively A/P and wont function without some 
> > kind of shared storage?
> > Any particular reason you went down this path instead of full A/A?
> >
> > [i]
> > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> > <<<
> > It is based on multistate clone notifications. It requries nothing shared 
> > but Corosync info base CIB where all Pacemaker resources stored anyway.
> > And it is fully A/A.
> 
> Oh!  So I should skip the A/P parts before "Auto-configuration of a cluster 
> with a Pacemaker”?
> Is the idea that the master mode is for picking a node to bootstrap the 
> cluster?
> 
> If so I don’t believe that should be necessary provided you specify 
> ordered=true for the clone.
> This allows you to assume in the agent that your instance is the only one 
> currently changing state (by starting or stopping).
> I notice that rabbitmq.com explicitly sets this to false… any particular 
> reason?
> 
> 
> Regarding the pcs command to create the resource, you can simplify it to:
> 
> pcs resource create --force --master p_rabbitmq-server 
> ocf:rabbitmq:rabbitmq-server-ha \
>   erlang_cookie=DPMDALGUKEOMPTHWPYKC node_port=5672 \
>   op monitor interval=30 timeout=60 \
>   op monitor interval=27 role=Master timeout=60 \
>   op monitor interval=103 role=Slave timeout=60 OCF_CHECK_LEVEL=30 \
>   meta notify=true ordered=false interleave=true master-max=1 
> master-node-max=1
> 
> If you update the stop/start/notify/promote/demote timeouts in the agent’s 
> metadata.
> 
> 
> Lines 1602,1565,1621,1632,1657, and 1678 have the notify command returning an 
> error.
> Was this logic tested? Because pacemaker does not currently support/allow 
> notify actions to fail.
> IIRC pacemaker simply ignores them.
> 
> Modifying the resource state in notifications is also highly unusual.
> What was the reason for that?
> 
> I notice that on node down, this agent makes disconnect_node and 
> forget_cluster_node calls.
> The other upstream agent does not, do you have any information about the bad 
> things that might happen as a result?
> 
> Basically I’m looking for what each option does differently/better with a 
> view to converging on a single implementation.
> I don’t much care in which location it lives.
> 
>

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-11 Thread Vladimir Kuklin
Hi, Andrew

Let me answer your questions.

This agent is active/active which actually marks one of the node as
'pseudo'-master which is used as a target for other nodes to join to. We
also check which node is a master and use it in monitor action to check
whether this node is clustered with this 'master' node. When we do cluster
bootstrap, we need to decide which node to mark as a master node. Then,
when it starts (actually, promotes), we can finally pick its name through
notification mechanism and ask other nodes to join this cluster.

Regarding disconnect_node+forget_cluster_node this is quite simple - we
need to eject node from the cluster. Otherwise it is mentioned in the list
of cluster nodes and a lot of cluster actions, e.g. list_queues, will hang
forever as well as forget_cluster_node action.

We also handle this case whenever a node leaves the cluster. If you
remember, I wrote an email to Pacemaker ML regarding getting notifications
on node unjoin event '[openstack-dev] [Fuel][Pacemaker][HA] Notifying
clones of offline nodes'. So we went another way and added a dbus daemon
listener that does the same when node lefts corosync cluster (we know that
this is a little bit racy, but disconnect+forget actions pair is
idempotent).

Regarding notification commands - we changed behaviour to the one that
fitter our use cases better and passed our destructive tests. It could be
Pacemaker-version dependent, so I agree we should consider changing this
behaviour. But so far it worked for us.

On Wed, Nov 11, 2015 at 2:12 PM, Andrew Beekhof  wrote:

>
> > On 11 Nov 2015, at 6:26 PM, bdobre...@mirantis.com wrote:
> >
> > Thank you Andrew.
> > Answers below.
> > >>>
> > Sounds interesting, can you give any comment about how it differs to the
> other[i] upstream agent?
> > Am I right that this one is effectively A/P and wont function without
> some kind of shared storage?
> > Any particular reason you went down this path instead of full A/A?
> >
> > [i]
> >
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> > <<<
> > It is based on multistate clone notifications. It requries nothing
> shared but Corosync info base CIB where all Pacemaker resources stored
> anyway.
> > And it is fully A/A.
>
> Oh!  So I should skip the A/P parts before "Auto-configuration of a
> cluster with a Pacemaker”?
> Is the idea that the master mode is for picking a node to bootstrap the
> cluster?
>
> If so I don’t believe that should be necessary provided you specify
> ordered=true for the clone.
> This allows you to assume in the agent that your instance is the only one
> currently changing state (by starting or stopping).
> I notice that rabbitmq.com explicitly sets this to false… any particular
> reason?
>
>
> Regarding the pcs command to create the resource, you can simplify it to:
>
> pcs resource create --force --master p_rabbitmq-server
> ocf:rabbitmq:rabbitmq-server-ha \
>   erlang_cookie=DPMDALGUKEOMPTHWPYKC node_port=5672 \
>   op monitor interval=30 timeout=60 \
>   op monitor interval=27 role=Master timeout=60 \
>   op monitor interval=103 role=Slave timeout=60 OCF_CHECK_LEVEL=30 \
>   meta notify=true ordered=false interleave=true master-max=1
> master-node-max=1
>
> If you update the stop/start/notify/promote/demote timeouts in the agent’s
> metadata.
>
>
> Lines 1602,1565,1621,1632,1657, and 1678 have the notify command returning
> an error.
> Was this logic tested? Because pacemaker does not currently support/allow
> notify actions to fail.
> IIRC pacemaker simply ignores them.
>
> Modifying the resource state in notifications is also highly unusual.
> What was the reason for that?
>
> I notice that on node down, this agent makes disconnect_node and
> forget_cluster_node calls.
> The other upstream agent does not, do you have any information about the
> bad things that might happen as a result?
>
> Basically I’m looking for what each option does differently/better with a
> view to converging on a single implementation.
> I don’t much care in which location it lives.
>
> I’m CC’ing the other upstream maintainer, it would be good if you guys
> could have a chat :-)
>
> > All running rabbit nodes may process AMQP connections. Master state is
> only for a cluster initial point at wich other slaves may join to it.
> > Note, here you can find events flow charts as well [0]
> > [0] https://www.rabbitmq.com/pacemaker.html
> > Regards,
> > Bogdan
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailm

Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-11 Thread Andrew Beekhof

> On 11 Nov 2015, at 6:26 PM, bdobre...@mirantis.com wrote:
> 
> Thank you Andrew.
> Answers below.
> >>>
> Sounds interesting, can you give any comment about how it differs to the 
> other[i] upstream agent?
> Am I right that this one is effectively A/P and wont function without some 
> kind of shared storage?
> Any particular reason you went down this path instead of full A/A?
> 
> [i] 
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
> <<<
> It is based on multistate clone notifications. It requries nothing shared but 
> Corosync info base CIB where all Pacemaker resources stored anyway.
> And it is fully A/A.

Oh!  So I should skip the A/P parts before "Auto-configuration of a cluster 
with a Pacemaker”? 
Is the idea that the master mode is for picking a node to bootstrap the cluster?

If so I don’t believe that should be necessary provided you specify 
ordered=true for the clone.
This allows you to assume in the agent that your instance is the only one 
currently changing state (by starting or stopping).
I notice that rabbitmq.com explicitly sets this to false… any particular reason?


Regarding the pcs command to create the resource, you can simplify it to:

pcs resource create --force --master p_rabbitmq-server 
ocf:rabbitmq:rabbitmq-server-ha \
  erlang_cookie=DPMDALGUKEOMPTHWPYKC node_port=5672 \
  op monitor interval=30 timeout=60 \
  op monitor interval=27 role=Master timeout=60 \
  op monitor interval=103 role=Slave timeout=60 OCF_CHECK_LEVEL=30 \
  meta notify=true ordered=false interleave=true master-max=1 master-node-max=1

If you update the stop/start/notify/promote/demote timeouts in the agent’s 
metadata.


Lines 1602,1565,1621,1632,1657, and 1678 have the notify command returning an 
error.
Was this logic tested? Because pacemaker does not currently support/allow 
notify actions to fail.
IIRC pacemaker simply ignores them.

Modifying the resource state in notifications is also highly unusual.
What was the reason for that?

I notice that on node down, this agent makes disconnect_node and 
forget_cluster_node calls.
The other upstream agent does not, do you have any information about the bad 
things that might happen as a result?

Basically I’m looking for what each option does differently/better with a view 
to converging on a single implementation. 
I don’t much care in which location it lives.

I’m CC’ing the other upstream maintainer, it would be good if you guys could 
have a chat :-)

> All running rabbit nodes may process AMQP connections. Master state is only 
> for a cluster initial point at wich other slaves may join to it.
> Note, here you can find events flow charts as well [0]
> [0] https://www.rabbitmq.com/pacemaker.html
> Regards,
> Bogdan
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-10 Thread bdobrelia
Thank you Andrew.
Answers below.
>>>
Sounds interesting, can you give any comment about how it differs to the 
other[i] upstream agent?
Am I right that this one is effectively A/P and wont function without some kind 
of shared storage?
Any particular reason you went down this path instead of full A/A?

[i] 
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster
<<<
It is based on multistate clone notifications. It requries nothing shared but 
Corosync info base CIB where all Pacemaker resources stored anyway. And it is 
fully A/A. All running rabbit nodes may process AMQP connections. Master state 
is only for a cluster initial point at wich other slaves may join to it.
Note, here you can find events flow charts as well [0]
[0] https://www.rabbitmq.com/pacemaker.html


Regards,

Bogdan__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-11-09 Thread Andrew Beekhof

> On 23 Oct 2015, at 7:01 PM, Bogdan Dobrelya  wrote:
> 
> Hello.
> I'm glad to announce that the pacemaker OCF resource agent for the
> rabbitmq clustering, which was born in the Fuel project initially, now
> available and maintained upstream! It will be shipped with the
> rabbitmq-server 3.5.7 package (release by November, 2015)
> 
> You can read about this OCF agent in the official guide [0] (flow charts
> for promote/demote/start/stop actions in progress).

Sounds interesting, can you give any comment about how it differs to the 
other[i] upstream agent?
Am I right that this one is effectively A/P and wont function without some kind 
of shared storage?
Any particular reason you went down this path instead of full A/A?

[i] 
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/rabbitmq-cluster

> 
> And you can try it as a tiny cluster example with a Vagrant box for
> Atlas [1]. Note, this only installs an Ubuntu box with a
> Corosync/Pacemaker & RabbitMQ clusters running, no Fuel or OpenStack
> required :-)
> 
> I'm also planning to refer this official RabbitMQ cluster setup guide in
> the OpenStack HA guide as well [2].
> 
> PS. Original rabbitmq-users mail thread is here [3].
> [openstack-operators] cross posted as well.
> 
> [0] http://www.rabbitmq.com/pacemaker.html
> [1] https://atlas.hashicorp.com/bogdando/boxes/rabbitmq-cluster-ocf
> [2] https://bugs.launchpad.net/openstack-manuals/+bug/1497528
> [3] https://groups.google.com/forum/#!topic/rabbitmq-users/BnoIQJb34Ao
> 
> -- 
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [HA][RabbitMQ][messaging][Pacemaker][operators] Improved OCF resource agent for dynamic active-active mirrored clustering

2015-10-23 Thread Bogdan Dobrelya
Hello.
I'm glad to announce that the pacemaker OCF resource agent for the
rabbitmq clustering, which was born in the Fuel project initially, now
available and maintained upstream! It will be shipped with the
rabbitmq-server 3.5.7 package (release by November, 2015)

You can read about this OCF agent in the official guide [0] (flow charts
for promote/demote/start/stop actions in progress).

And you can try it as a tiny cluster example with a Vagrant box for
Atlas [1]. Note, this only installs an Ubuntu box with a
Corosync/Pacemaker & RabbitMQ clusters running, no Fuel or OpenStack
required :-)

I'm also planning to refer this official RabbitMQ cluster setup guide in
the OpenStack HA guide as well [2].

PS. Original rabbitmq-users mail thread is here [3].
[openstack-operators] cross posted as well.

[0] http://www.rabbitmq.com/pacemaker.html
[1] https://atlas.hashicorp.com/bogdando/boxes/rabbitmq-cluster-ocf
[2] https://bugs.launchpad.net/openstack-manuals/+bug/1497528
[3] https://groups.google.com/forum/#!topic/rabbitmq-users/BnoIQJb34Ao

-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev