>>> Jorge Fábregas <[email protected]> schrieb am 30.12.2015 um 17:53 in Nachricht <[email protected]>: > Hi, > > We're having some issues with a particular oversubscribed hypervisor > (cpu-wise) where we run SLES 11 SP4 guests. I had to increase many > timeouts on the cluster to cope with this:
Hi! (I'm late) As Kai pointed out, Domain-0 will be scheduled like any Dom-U, so either never oversubscribe CPUs, or reserve a few CPUs for Domain-0. See Domain-0 as virtual I/O server; then it's obvious that the I/O server needs CPU cycles for guest I/O to complete. Regards, Ulrich > > - Corosync's token timeout (from the default of 5 secs to 30 seconds) > - SBD's watchdog & msgwait (from 15/30 to 30/60 respectively) > - Pacemaker's resource-monitoring timeouts > > I know the consequence for doing all this will be *slow reaction times* > but it's all I can do in the meantime. > > However, when the hypervisor is at 100% full CPU utilization I still get > these messages: > > sbd: :WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd > logd: WARN: G_CH_prepare_int: working on IPC channel took 220 ms (> 100 ms) > sbd: WARN: Pacemaker state outdated (age: 4) > sbd: info: Pacemaker health check: OK > sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd > logd: WARN: G_CH_check_int: working on IPC channel took 150 ms (> 100 ms) > sbd: WARN: Latency: 4 exceeded threshold 3 on disk /dev/mapper/clustersbd > sbd: WARN: Servant for /dev/mapper/clustersbd outdated (age: 5) > sbd: WARN: Majority of devices lost - surviving on pacemaker > > Is this latency configurable? It keeps mentioning "threshold 3". Is that > 3 seconds? How does it relates to the following parameters ? > > ==Dumping header on disk /dev/mapper/clustersbd > Header version : 2.1 > UUID : 54597871-2392-475f-ba2d-71bdf92c36b5 > Number of slots : 255 > Sector size : 512 > Timeout (watchdog) : 30 > Timeout (allocate) : 2 > Timeout (loop) : 1 > Timeout (msgwait) : 60 > ==Header on disk /dev/mapper/clustersbd is dumped > > I'm using the -P option with sbd so I know it will not fence the system > as long as the node's health is ok (as reported by Pacemaker). I'd > still like to find out if the latency mentioned is configurable or is it > safe to ignore. > > Thanks! > > Regards, > Jorge > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
