[Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

Sebastian Kaps Thu, 04 Aug 2011 06:35:20 -0700

Hello,

here's another problem we're having:

Jul 31 03:51:02 node01 corosync[5870]: [TOTEM ] Process pause detectedfor 11149 ms, flushing membership messages.Jul 31 03:51:11 node01 corosync[5870]: [CLM ] CLM CONFIGURATIONCHANGE

Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] New Configuration:

Jul 31 03:51:11 node01 corosync[5870]: [CLM ] r(0) ip(192.168.1.1)r(1) ip(x.y.z.3)

Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Left:

Jul 31 03:51:11 node01 corosync[5870]: [CLM ] r(0) ip(192.168.1.2)r(1) ip(x.y.z.1)

Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Joined:

Jul 31 03:51:11 node01 corosync[5870]: [pcmk ] notice:pcmk_peer_update: Transitional membership event on ring 9708: memb=1,new=0, lost=1Jul 31 03:51:11 node01 corosync[5870]: [pcmk ] info:pcmk_peer_update: memb: node01 16885952Jul 31 03:51:11 node01 corosync[5870]: [pcmk ] info:pcmk_peer_update: lost: node02 33663168Jul 31 03:51:11 node01 corosync[5870]: [CLM ] CLM CONFIGURATIONCHANGE

Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] New Configuration:

Jul 31 03:51:11 node01 corosync[5870]: [CLM ] r(0) ip(192.168.1.1)r(1) ip(x.y.z.3)

Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Left:
Jul 31 03:51:11 node01 corosync[5870]:  [CLM   ] Members Joined:

Jul 31 03:51:11 node01 crmd: [5912]: notice: ais_dispatch_message:Membership 9708: quorum lost

Node01 gets Stonith'd shortly after that. There is no indicationwhatsoever that this would happen in the logs.For at least half an hour before that there's only the normalstatus-message noise from monitor ops etc.

Jul 31 03:51:01 node02 corosync[5810]: [TOTEM ] A processor failed,forming new configuration.Jul 31 03:51:11 node02 corosync[5810]: [CLM ] CLM CONFIGURATIONCHANGE

Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] New Configuration:

Jul 31 03:51:11 node02 corosync[5810]: [CLM ] r(0) ip(192.168.1.2)r(1) ip(x.y.z.1)

Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Left:

Jul 31 03:51:11 node02 corosync[5810]: [CLM ] r(0) ip(192.168.1.1)r(1) ip(x.y.z.3)

Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Joined:

Jul 31 03:51:11 node02 corosync[5810]: [pcmk ] notice:pcmk_peer_update: Transitional membership event on ring 9708: memb=1,new=0, lost=1Jul 31 03:51:11 node02 corosync[5810]: [pcmk ] info:pcmk_peer_update: memb: node02 33663168Jul 31 03:51:11 node02 corosync[5810]: [pcmk ] info:pcmk_peer_update: lost: node01 16885952Jul 31 03:51:11 node02 corosync[5810]: [CLM ] CLM CONFIGURATIONCHANGE

Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] New Configuration:

Jul 31 03:51:11 node02 corosync[5810]: [CLM ] r(0) ip(192.168.1.2)r(1) ip(x.y.z.1)

Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Left:
Jul 31 03:51:11 node02 corosync[5810]:  [CLM   ] Members Joined:

What does "Process pause detected" mean?

Quoting from my other recent post regarding the backup ring beingmarked faulty sporadically:


|We're running a two-node cluster with redundant rings.

|Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GBinterfaces that are bonded in|active-backup mode and routed through two independent switches foreach node. The ring 1 network|is our "normal" 1G LAN and should only be used in case the direct 10Gconnection should fail.

|
|Corosync Cluster Engine, version '1.3.1'
|Copyright (c) 2006-2009 Red Hat, Inc.
|
|It's the version that comes with SLES11-SP1-HA.

Thanks in advance!

--
Sebastian

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

Reply via email to