Hello,

here's another problem we're having:

Jul 31 03:51:02 node01 corosync[5870]: [TOTEM ] Process pause detected for 11149 ms, flushing membership messages. Jul 31 03:51:11 node01 corosync[5870]: [CLM ] CLM CONFIGURATION CHANGE
Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] New Configuration:
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] r(0) ip(192.168.1.1) r(1) ip(x.y.z.3)
Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Left:
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] r(0) ip(192.168.1.2) r(1) ip(x.y.z.1)
Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Joined:
Jul 31 03:51:11 node01 corosync[5870]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 9708: memb=1, new=0, lost=1 Jul 31 03:51:11 node01 corosync[5870]: [pcmk ] info: pcmk_peer_update: memb: node01 16885952 Jul 31 03:51:11 node01 corosync[5870]: [pcmk ] info: pcmk_peer_update: lost: node02 33663168 Jul 31 03:51:11 node01 corosync[5870]: [CLM ] CLM CONFIGURATION CHANGE
Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] New Configuration:
Jul 31 03:51:11 node01 corosync[5870]: [CLM ] r(0) ip(192.168.1.1) r(1) ip(x.y.z.3)
Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Left:
Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Joined:
Jul 31 03:51:11 node01 crmd: [5912]: notice: ais_dispatch_message: Membership 9708: quorum lost

Node01 gets Stonith'd shortly after that. There is no indication whatsoever that this would happen in the logs. For at least half an hour before that there's only the normal status-message noise from monitor ops etc.

Jul 31 03:51:01 node02 corosync[5810]: [TOTEM ] A processor failed, forming new configuration. Jul 31 03:51:11 node02 corosync[5810]: [CLM ] CLM CONFIGURATION CHANGE
Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] New Configuration:
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] r(0) ip(192.168.1.2) r(1) ip(x.y.z.1)
Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Left:
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] r(0) ip(192.168.1.1) r(1) ip(x.y.z.3)
Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Joined:
Jul 31 03:51:11 node02 corosync[5810]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 9708: memb=1, new=0, lost=1 Jul 31 03:51:11 node02 corosync[5810]: [pcmk ] info: pcmk_peer_update: memb: node02 33663168 Jul 31 03:51:11 node02 corosync[5810]: [pcmk ] info: pcmk_peer_update: lost: node01 16885952 Jul 31 03:51:11 node02 corosync[5810]: [CLM ] CLM CONFIGURATION CHANGE
Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] New Configuration:
Jul 31 03:51:11 node02 corosync[5810]: [CLM ] r(0) ip(192.168.1.2) r(1) ip(x.y.z.1)
Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Left:
Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Joined:

What does "Process pause detected" mean?

Quoting from my other recent post regarding the backup ring being marked faulty sporadically:

|We're running a two-node cluster with redundant rings.
|Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GB interfaces that are bonded in |active-backup mode and routed through two independent switches for each node. The ring 1 network |is our "normal" 1G LAN and should only be used in case the direct 10G connection should fail.
|
|Corosync Cluster Engine, version '1.3.1'
|Copyright (c) 2006-2009 Red Hat, Inc.
|
|It's the version that comes with SLES11-SP1-HA.

Thanks in advance!

--
Sebastian

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to