Re: [openstack-dev] [Fuel] Waiting for Haproxy backends

Andrew Beekhof Tue, 18 Nov 2014 15:19:29 -0800

Hi Everyone,

I was reading the blueprints mentioned here and thought I'd take the 
opportunity to introduce myself and ask a few questions.
For those that don't recognise my name, Pacemaker is my baby - so I take a keen 
interest helping people have a good experience with it :)

A couple of items stood out to me (apologies if I repeat anything that is 
already well understood):

* Operations with CIB utilizes almost 100% of CPU on the Controller

 We introduced a new CIB algorithm in 1.1.12 which is O(2) faster/less resource 
hungry than prior versions.
 I would be interested to hear your experiences with it if you are able to 
upgrade to that version.

* Corosync shutdown process takes a lot of time

 Corosync (and Pacemaker) can shut down incredibly quickly. 
 If corosync is taking a long time, it will be because it is waiting for 
pacemaker, and pacemaker is almost always waiting for for one of the clustered 
services to shut down.

* Current Fuel Architecture is limited to Corosync 1.x and Pacemaker 1.x

 Corosync 2 is really the way to go.
 Is there something in particular that is holding you back?
 Also, out of interest, are you using cman or the pacemaker plugin?

*  Diff operations against Corosync CIB require to save data to file rather
  than keep all data in memory

 Can someone clarify this one for me?

 Also, I notice that the corosync init script has been modified to set/unset 
maintenance-mode with cibadmin.
 Any reason not to use crm_attribute instead?  You might find its a less 
fragile solution than a hard-coded diff.

* Debug process of OCF scripts is not unified requires a lot of actions from
 Cloud Operator

 Two things to mention here... the first is crm_resource 
--force-(start|stop|check) which queries the cluster for the resource's 
definition but runs the command directly.
 Combined with -V, this means that you get to see everything the agent is doing.

 Also, pacemaker now supports the ability for agents to emit specially 
formatted error messages that are stored in the cib and can be shown back to 
users.
 This can make things much less painful for admins. Look for 
PCMK_OCF_REASON_PREFIX in the upstream resource-agents project.

* Openstack services are not managed by Pacemaker

 Oh?

* Compute nodes aren't in Pacemaker cluster, hence, are lacking a viable
 control plane for their's compute/nova services.

 pacemaker-remoted might be of some interest here.  

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Remote/index.html

* Creating and committing shadows not only adds constant pain with dependencies 
and unneeded complexity but also rewrites cluster attributes and even other 
changes if you mess up with ordering and it’s really hard to debug it.

 Is this still an issue?  I'm reasonably sure this is specific to the way crmsh 
uses shadows.  
 Using the native tools it should be possible to commit only the delta, so any 
other changes that occur while you're updating the shadow would not be an 
issue, and existing attributes wouldn't be rewritten.

* Restarting resources by Puppet’s pacemaker service provider restarts them 
even if they are running on other nodes and it sometimes impacts the cluster.

 Not available yet, but upstream there is now a smart --restart option for 
crm_resource which can optionally take a --host parameter.
 Sounds like it would be useful here.  

http://blog.clusterlabs.org/blog/2014/feature-spotlight-smart-resource-restart-from-the-command-line/

* An attempt to stop or restart corosync service brings down a lot of resources 
and probably will fail and bring down the entire deployment.

 That sounds deeply worrying.  Details?

* Controllers other the the first download configured cib an immediate start 
all cloned resources before they are configured so they have to be cleaned up 
later.

 By this you mean clones are being started on nodes which do not have the 
software? Or before the ordering/colocation constraints have been configured?

> On 15 Nov 2014, at 10:31 am, Sergii Golovatiuk <[email protected]> 
> wrote:
> 
> +1 for ha-pacemaker-improvements
> 
> --
> Best regards,
> Sergii Golovatiuk,
> Skype #golserge
> IRC #holser
> 
> On Fri, Nov 14, 2014 at 11:51 PM, Dmitry Borodaenko 
> <[email protected]> wrote:
> Good plan, but I really hate the name of this blueprint. I think we
> should stop lumping different unrelated HA improvements into a single
> blueprint with a generic name like that, especially when we already
> had a blueprint with essentially the same name
> (ha-pacemaker-improvements). There's nothing wrong with having 4
> trivial but specific blueprints instead of one catch-all.
> 
> On Wed, Nov 12, 2014 at 4:10 AM, Aleksandr Didenko
> <[email protected]> wrote:
> > HI,
> >
> > in order to make sure some critical Haproxy backends are running (like mysql
> > or keystone) before proceeding with deployment, we use execs like [1] or
> > [2].
> >
> > We're currently working on a minor improvements of those execs, but there is
> > another approach - we can replace those execs with puppet resource providers
> > and move all the iterations/loops/timeouts logic there. Also we should fail
> > catalog compilation/run if those resource providers are not able to ensure
> > needed Haproxy backends are up and running. Because there is no point to
> > proceed with deployment if keystone is not running, for example.
> >
> > If no one objects, I can start implementing this for Fuel-6.1. We can
> > address it as a part of pacemaker improvements BP [3] or create a new BP.
> >
> > [1]
> > https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/osnailyfacter/manifests/cluster_ha.pp#L551-L572
> > [2]
> > https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/openstack/manifests/ha/mysqld.pp#L28-L33
> > [3] https://blueprints.launchpad.net/fuel/+spec/pacemaker-improvements
> >
> > Regards,
> > Aleksandr Didenko
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > [email protected]
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> 
> 
> --
> Dmitry Borodaenko
> 
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Fuel] Waiting for Haproxy backends

Reply via email to