Hello list, Can you, please, help me in debugging 1 resource not being started after node failover ?
Here is configuration that I'm testing: 3 nodes(kvm VM) cluster, that have: node 10: aic-controller-58055.test.domain.local node 6: aic-controller-50186.test.domain.local node 9: aic-controller-12993.test.domain.local primitive cmha cmha \ params conffile="/etc/cmha/cmha.conf" daemon="/usr/bin/cmhad" pidfile="/var/run/cmha/cmha.pid" user=cmha \ meta failure-timeout=30 resource-stickiness=1 target-role=Started migration-threshold=3 \ op monitor interval=10 on-fail=restart timeout=20 \ op start interval=0 on-fail=restart timeout=60 \ op stop interval=0 on-fail=block timeout=90 primitive sysinfo_aic-controller-12993.test.domain.local ocf:pacemaker:SysInfo \ params disk_unit=M disks="/ /var/log" min_disk_free=512M \ op monitor interval=15s primitive sysinfo_aic-controller-50186.test.domain.local ocf:pacemaker:SysInfo \ params disk_unit=M disks="/ /var/log" min_disk_free=512M \ op monitor interval=15s primitive sysinfo_aic-controller-58055.test.domain.local ocf:pacemaker:SysInfo \ params disk_unit=M disks="/ /var/log" min_disk_free=512M \ op monitor interval=15s location cmha-on-aic-controller-12993.test.domain.local cmha 100: aic-controller-12993.test.domain.local location cmha-on-aic-controller-50186.test.domain.local cmha 100: aic-controller-50186.test.domain.local location cmha-on-aic-controller-58055.test.domain.local cmha 100: aic-controller-58055.test.domain.local location sysinfo-on-aic-controller-12993.test.domain.local sysinfo_aic-controller-12993.test.domain.local inf: aic-controller-12993.test.domain.local location sysinfo-on-aic-controller-50186.test.domain.local sysinfo_aic-controller-50186.test.domain.local inf: aic-controller-50186.test.domain.local location sysinfo-on-aic-controller-58055.test.domain.local sysinfo_aic-controller-58055.test.domain.local inf: aic-controller-58055.test.domain.local property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.14-70404b0 \ cluster-infrastructure=corosync \ cluster-recheck-interval=15s \ no-quorum-policy=stop \ stonith-enabled=false \ start-failure-is-fatal=false \ symmetric-cluster=false \ node-health-strategy=migrate-on-red \ last-lrm-refresh=1470334410 When 3 nodes online, everything seemed OK, this is output of scoreshow.sh: Resource Score Node Stickiness #Fail Migration-Threshold cmha -INFINITY aic-controller-12993.test.domain.local 1 0 cmha 101 aic-controller-50186.test.domain.local 1 0 cmha -INFINITY aic-controller-58055.test.domain.local 1 0 sysinfo_aic-controller-12993.test.domain.local INFINITY aic-controller-12993.test.domain.local 0 0 sysinfo_aic-controller-50186.test.domain.local -INFINITY aic-controller-50186.test.domain.local 0 0 sysinfo_aic-controller-58055.test.domain.local INFINITY aic-controller-58055.test.domain.local 0 0 The problem starts when 1 node, goes offline (aic-controller-50186). The resource cmha is stocked in stopped state. Here is the showscores: Resource Score Node Stickiness #Fail Migration-Threshold cmha -INFINITY aic-controller-12993.test.domain.local 1 0 cmha -INFINITY aic-controller-50186.test.domain.local 1 0 cmha -INFINITY aic-controller-58055.test.domain.local 1 0 Even it has target-role=Started pacemaker skipping this resource. And in logs I see: pengine: info: native_print: cmha (ocf::heartbeat:cmha): Stopped pengine: info: native_color: Resource cmha cannot run anywhere pengine: info: LogActions: Leave cmha (Stopped) To recover cmha resource I need to run either: 1) crm resource cleanup cmha 2) crm resource reprobe After any of the above commands, resource began to be picked up be pacemaker and I see valid scores: Resource Score Node Stickiness #Fail Migration-Threshold cmha 100 aic-controller-58055.test.domain.local 1 0 3 cmha 101 aic-controller-12993.test.domain.local 1 0 3 cmha -INFINITY aic-controller-50186.test.domain.local 1 0 3 So the questions here - why cluster-recheck doesn't work, and should it do reprobing ? How to make migration work or what I missed in configuration that prevents migration? corosync 2.3.4 pacemaker 1.1.14
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org