On 11 Nov 2013, at 11:48 pm, yusuke iida <yusk.i...@gmail.com> wrote:
> Execution of the graph was also checked. > Since the number of pending(s) is restricted to 16 from the middle, it > is judged that batch-limit is effective. > Observing here, even if a job is restricted by batch-limit, two or > more jobs are always fired(ed) in 1 second. > These performed jobs return a result and the synchronous message of > CIB generates them. > The node which continued receiving a synchronous message processes > there preferentially, and postpones an internal IPC message. > I think that it caused timeout. What load-threshold were you running this with? I see this in the logs: "Host vm10 supports a maximum of 4 jobs and throttle mode 0100. New job limit is 1" Have you set LRMD_MAX_CHILDREN=4 on these nodes? I wouldn't recommend that for a single core VM. I'd let the default of 2*cores be used. Also, I'm not seeing "Extreme CIB load detected". Are these still single core machines? If so it would suggest that something about: if(cores == 1) { cib_max_cpu = 0.4; } if(throttle_load_target > 0.0 && throttle_load_target < cib_max_cpu) { cib_max_cpu = throttle_load_target; } if(load > 1.5 * cib_max_cpu) { /* Can only happen on machines with a low number of cores */ crm_notice("Extreme %s detected: %f", desc, load); mode |= throttle_extreme; is wrong. What was load-threshold configured as?
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org