Hi all, Jan Friesse <[email protected]> wrote: > >>>There is really no help. It's best to make sure corosync is scheduled > >regularly. > >I may sound silly, but how can I do it? > > It's actually very hard to say. Pauses like 30 sec is really unusual > and shouldn't happen (specially with RT scheduling). It is usually > happening on VM where host is overcommitted.
It's funny you are discussing this during the same period where my team is seeing this happen fairly regularly within VMs on a host which is overcommitted. In other words, I can confirm Jan's statement above is true. Like Konstiantyn, we have also sometimes seen no fencing occur as a result of these pauses, e.g. Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [MAIN ] Corosync main process was not scheduled for 7343.1909 ms (threshold is 4000.0000 ms). Consider token timeout increase. Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [TOTEM ] A processor failed, forming new configuration. Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] CLM CONFIGURATION CHANGE Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] New Configuration: Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] #011r(0) ip(192.168.2.82) Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] #011r(0) ip(192.168.2.84) Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] Members Left: Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] Members Joined: Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 32: memb=2, new=0, lost=0 Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] info: pcmk_peer_update: memb: d52-54-77-77-77-01 1084752466 Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] info: pcmk_peer_update: memb: d52-54-77-77-77-02 1084752468 Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] CLM CONFIGURATION CHANGE Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] New Configuration: Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] #011r(0) ip(192.168.2.82) Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] #011r(0) ip(192.168.2.84) Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] Members Left: Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CLM ] Members Joined: Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 32: memb=2, new=0, lost=0 Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] info: pcmk_peer_update: MEMB: d52-54-77-77-77-01 1084752466 Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [pcmk ] info: pcmk_peer_update: MEMB: d52-54-77-77-77-02 1084752468 Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [CPG ] chosen downlist: sender r(0) ip(192.168.2.82) ; members(old:2 left:0) Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]: [MAIN ] Completed service synchronization, ready to provide service. I don't understand why it claims a processor failed, forming a new configuration, when the configuration appears no different from before: no members joined or left. Can anyone explain this? _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
