Thanks, resolved.
I ran into the following libqb issue:
https://github.com/ClusterLabs/libqb/issues/139
https://github.com/ClusterLabs/libqb/pull/141
Applying 7f56f58 on libqb 0.17.1 fixed my problem.
https://github.com/davidvossel/libqb/commit/7f56f583d891859c94b24db0ec38a301c3f3466a.patch
2015-08-06 1:57 GMT+02:00 Pallai Roland pall...@magex.hu:
hi,
I've built a recent cluster stack from sources on Debian Jessie and I
can't get rid of cpu spikes. Corosync blocks the entire system for
seconds on every simple transition, even itself:
drbdtest1 corosync[4734]: [MAIN ] Corosync main process was not
scheduled for 2590.4512 ms (threshold is 2400. ms). Consider token
timeout increase.
and even drbd:
drbdtest1 kernel: drbd p1: PingAck did not arrive in time.
My previous build (corosync 1.4.6, libqb 0.17.0, pacemaker 1.1.12) works
fine on this nodes with the same corosync/pacemaker setup.
What should I try? It's a test environment, the issue is 100% reproducible
in seconds. Network traffic is minimal all the time and there is no I/O
load.
*Pacemaker config:*
node 167969573: drbdtest1
node 167969574: drbdtest2
primitive drbd_p1 ocf:linbit:drbd \
params drbd_resource=p1 \
op monitor interval=30
primitive drbd_p2 ocf:linbit:drbd \
params drbd_resource=p2 \
op monitor interval=30
primitive dummy_test ocf:pacemaker:Dummy \
meta allow-migrate=true \
params state=/var/run/activenode
primitive fence_libvirt stonith:external/libvirt \
params hostlist=drbdtest1,drbdtest2
hypervisor_uri=qemu+ssh://libvirt-fencing@mgx4/system \
op monitor interval=30
primitive fs_boot Filesystem \
params device=/dev/null directory=/boot fstype=* \
meta is-managed=false \
op monitor interval=20 timeout=40 on-fail=block OCF_CHECK_LEVEL=20
primitive fs_f1 Filesystem \
params device=/dev/drbd/by-res/p1 directory=/mnt/p1
fstype=ext4 options=commit=60,barrier=0,data=writeback \
op monitor interval=20 timeout=40 \
op start timeout=300 interval=0 \
op stop timeout=180 interval=0
primitive ip_10.3.3.138 IPaddr2 \
params ip=10.3.3.138 cidr_netmask=32 \
op monitor interval=10s timeout=20s
primitive sysinfo ocf:pacemaker:SysInfo \
op start timeout=20s interval=0 \
op stop timeout=20s interval=0 \
op monitor interval=60s
group dummy-group dummy_test
ms ms_drbd_p1 drbd_p1 \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
ms ms_drbd_p2 drbd_p2 \
meta master-max=2 master-node-max=1 clone-max=2 notify=true
clone fencing_by_libvirt fence_libvirt \
meta globally-unique=false
clone fs_boot_clone fs_boot
clone sysinfos sysinfo \
meta globally-unique=false
location fs1_on_high_load fs_f1 \
rule -inf: cpu_load gte 4
colocation dummy_coloc inf: dummy-group ms_drbd_p2:Master
colocation f1a-coloc inf: fs_f1 ms_drbd_p1:Master
colocation f1b-coloc inf: fs_f1 fs_boot_clone:Started
order dummy_order inf: ms_drbd_p2:promote dummy-group:start
order orderA inf: ms_drbd_p1:promote fs_f1:start
property cib-bootstrap-options: \
dc-version=1.1.13-6052cd1 \
cluster-infrastructure=corosync \
expected-quorum-votes=2 \
no-quorum-policy=ignore \
symmetric-cluster=true \
placement-strategy=default \
last-lrm-refresh=1438735742 \
have-watchdog=false
property cib-bootstrap-options-stonith: \
stonith-enabled=true \
stonith-action=reboot
rsc_defaults rsc-options: \
resource-stickiness=100
*corosync.conf:*
totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
clear_node_high_bit: yes
crypto_cipher: none
crypto_hash: none
interface {
ringnumber: 0
bindnetaddr: 10.3.3.37
mcastaddr: 225.0.0.37
mcastport: 5403
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
}
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org