On 12/09/2013, at 4:44 PM, Lars Marowsky-Bree <l...@suse.com> wrote:
> On 2013-09-12T14:34:02, Andrew Beekhof <and...@beekhof.net> wrote: > >>> Well, they're all doing something completely different. >> No, they're all crude approximations designed to stop the cluster as a whole >> from using up so much cpu/network/etc that recovery introduces more failures >> than it resolves. > > OK. Though they do effect the limit on very different levels - which > sort of makes some sense, because there are limitations on different > levels, and at best we want to use them all. > >>> The max_children prevent a given node from being overloaded by >>> concurrent operations. >> At the expense of introducing other failures... such as "I fired off >> an action N seconds ago with a timeout < N and still haven't heard >> back" which was possible if batch-limit and max children were too out >> of balance. > > Yes. That was very rare, but could happen. > >> Which is why any limiting needs to happen at centrally on the DC. > > On the other hand, the DC cannot possibly limit concurrent monitor > operations (since it isn't involved). Arguably, for nodes hosting 100+ > resources, there is some value in limiting parallelism on those. But I'd > be happy if they were smartly staggered. Yep, the more we can do without pestering the admin the better. > >> As above, the rate limiting needs to happen on the DC which lends >> itself to being a property of the cib and/or transition graph rather >> than defined in sysconfig. > > I'd be quite happy with that. > > The most directly equivalent solution would be to number the per-node > in-flight operations similar to what migration-threshold does. (I think > we can safely continue to treat all resources as equal to start with.) Agreed. Perhaps even repurpose/rename migration-threshold for the task? Or is this typically set much lower than max children? > > Though the transition from an environment variable to a CIB node > attribute (inherited from a cluster-property, I assume) is going to suck > for the upgrade path :-/ > > > Regards, > Lars > > -- > Architect Storage/HA > SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, > HRB 21284 (AG Nürnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org