[ClusterLabs] Pacemaker crashed and produce a coredump file

2020-07-29 Thread lkxjtu
Hi Reid Wahl, There are more log informations below. The reason seems to be that communication with DBUS timed out. Any suggestions? 1672712 Jul 24 21:20:17 [3945305] B0610011 lrmd: info: pcmk_dbus_timeout_dispatch:Timeout 0x147bbd0 expired 1672713 Jul 24 21:20:17 [3945305]

[ClusterLabs] Pacemaker crashed and produce a coredump file

2020-07-29 Thread lkxjtu
RPM Version Information: corosync-2.3.4-7.el7_2.1.x86_64 pacemaker-1.1.12-22.el7.x86_64 Coredump file backtrace: ``` warning: .dynamic section for "/lib64/libk5crypto.so.3" is not at the expected address (wrong library or version mismatch?) Missing separate debuginfo for Try: yum

[ClusterLabs] Keep printing "Sent 0 CPG messages" in corosync.log

2018-09-29 Thread lkxjtu
Corosync.log has kept printing the following logs for several days. What's wrong with the corosync cluster? Now the cpu load is not high. Cluster version information: [root@paas-controller-172-167-40-24:~]$ rpm -q corosync corosync-2.4.0-9.el7_4.2.x86_64 [root@paas-controller-172-167-40-24:~]$

[ClusterLabs] The crmd process exited: Generic Pacemaker error (201)

2018-09-29 Thread lkxjtu
Version information [root@paas-controller-172-167-40-24:~]$ rpm -q corosync corosync-2.4.0-9.el7_4.2.x86_64 [root@paas-controller-172-167-40-24:~]$ rpm -q pacemaker pacemaker-1.1.16-12.el7_4.2.x86_64 The crmd process exited with error code of 201. The pacemakerd process tried to fork 100

[ClusterLabs] How does failure-timeout works, will the resource not be scheduled when setting too short?

2018-05-19 Thread lkxjtu
I have two pacemaker resources. We call them A and B. Because of environmental reasons, their start methods and monitor methods always return failure (OCF_ERR_GENERIC). The following are their configurations:(The cluster property of start-failure-is-fatal is false) primitive A A \ op

Re: [ClusterLabs] What is the mechanism for pacemaker to recovery resources

2018-05-10 Thread lkxjtu
Great! These two parameters (batch-limit & node-action-limit) solve my problem. Thank you very much! By the way, is there any way to know the number of parallel action on node and cluster? At 2018-05-10 20:56:27, "lkxjtu" <lkx...@163.com> wrote: On Tue, 2018-05-08 at

[ClusterLabs] What is the mechanism for pacemaker to recovery resources

2018-05-08 Thread lkxjtu
I have a three node cluster of about 50 resources. When I reboot three nodes at the same time, I observe the resource by "crm status". I found that pacemaker starts 3-5 resources at a time, from top to bottom, rather than start all at the same time. Is there any parameter control? It seems to

Re: [ClusterLabs] Pacemaker resources are not scheduled

2018-04-16 Thread lkxjtu
> Lkxjtu, > On 14/04/18 00:16 +0800, lkxjtu wrote: >> My cluster version: >> Corosync 2.4.0 >> Pacemaker 1.1.16 >>>> There are many resource anomalies. Some resources are only monitored >> and not recovered. Some resources are not monitored or recovered

[ClusterLabs] Pacemaker resources are not scheduled

2018-04-13 Thread lkxjtu
My cluster version: Corosync 2.4.0 Pacemaker 1.1.16 There are many resource anomalies. Some resources are only monitored and not recovered. Some resources are not monitored or recovered. Only one resource of vnm is scheduled normally, but this resource cannot be started because other resources

[ClusterLabs] What does these logs mean in corosync.log

2018-02-12 Thread lkxjtu
These logs are both print when system is abnormal, I am very confused what they mean. Does anyone know what they mean? Thank you very much corosync version 2.4.0 pacemaker version 1.1.16 1) Feb 01 10:57:58 [18927] paas-controller-192-167-0-2 crmd: warning: find_xml_node:Could

Re: [ClusterLabs] Pacemaker resource start delay when there are another resource is starting

2017-11-09 Thread lkxjtu
quot;Ken Gaillot" <kgail...@redhat.com> wrote: >On Sat, 2017-11-04 at 22:46 +0800, lkxjtu wrote: >> >> >> >Another possibility would be to have the start return immediately, >> and >> >make the monitor artificially return success for the first 10 >>

Re: [ClusterLabs] Pacemaker resource start delay when there are another resource is starting

2017-11-04 Thread lkxjtu
redhat.com> wrote: >On Sat, 2017-10-28 at 01:11 +0800, lkxjtu wrote: >> >> Thank you for your response! This means that there shoudn't be long >> "sleep" in ocf script. >> If my service takes 10 minite from service starting to healthcheck >>

Re: [ClusterLabs] Pacemaker resource start delay when there are another resource is starting

2017-10-27 Thread lkxjtu
hether some of > the actions in the second transition would be needed regardless of > whether the pending actions succeeded or failed, but in practice, that > would be difficult to implement (and possibly take more time to > calculate than is desirable in a recovery situation). > On F

[ClusterLabs] Pacemaker resource start delay when there are another resource is starting

2017-10-27 Thread lkxjtu
I have two clone resources in my corosync/pacemaker cluster. They are fm_mgt and logserver. Both of their RA is ocf. fm_mgt takes 1 minute to start the service(calling ocf start function for 1 minite). Configured as below: # crm configure show node 168002177: 192.168.2.177 node 168002178:

[ClusterLabs] Pacemaker resource start delay when there are another resource is starting

2017-10-27 Thread lkxjtu
I have two clone resources in my corosync/pacemaker cluster. They are fm_mgt and logserver. Both of their RA is ocf. fm_mgt takes 1 minute to start the service(calling ocf start function for 1 minite). Configured as below: # crm configure show node 168002177: 192.168.2.177 node 168002178:

Re: [ClusterLabs] dependency of pacemaker resources starting

2017-10-27 Thread lkxjtu
Does anyone know this question? best regards 发自网易邮箱手机版 在2017年10月25日 23:10,lkxjtu 写道: My problem is about the pacemaker. For example,the pacemaker cluster has two resources, both of them resource agent are ocf. One of resource is starting(calling ocf start function), such as needing for 1

[ClusterLabs] dependency of pacemaker resources starting

2017-10-25 Thread lkxjtu
My problem is about the pacemaker. For example,the pacemaker cluster has two resources, both of them resource agent are ocf. One of resource is starting(calling ocf start function), such as needing for 1 minutes, then in this 1 minutes, if another resource monitor failed, pacemaker will not