Which version of corosync? On 08/02/2011 07:35 AM, Sebastian Kaps wrote: > Hi, > > we're running a two-node cluster with redundant rings. > Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GB > interfaces that are bonded in > active-backup mode and routed through two independent switches for each > node. The ring 1 network > is our "normal" 1G LAN and should only be used in case the direct 10G > connection should fail. > I often (once a day on average, I'd guess) see that ring 1 (an only that > one) is marked as > FAULTY without any obvious reasons. > > Aug 2 08:56:15 node02 corosync[5752]: [TOTEM ] Retransmit List: c76 > c7a c7c c7e c80 c82 c84 > Aug 2 08:56:15 node02 corosync[5752]: [TOTEM ] Retransmit List: c82 > Aug 2 08:56:15 node02 corosync[5752]: [TOTEM ] Marking seqid 568416 > ringid 1 interface x.y.z.1 FAULTY - administrative intervention required. > > Whenever I see this, I check if the other node's address can be pinged > (I never saw any > connectivity problems there), then reenable the ring with > "corosync-cfgtool -r" and > everything looks ok for a while (i.e. hours or days). > > How could I find out why this happens? > What do these "Retransmit List" or seqid (sequence id, I assume?) values > tell me? > Is it safe to reenable the second ring when the partner node can be > pinged successfully? > > The totem section on our config looks like this: > > totem { > rrp_mode: passive > join: 60 > max_messages: 20 > vsftype: none > consensus: 10000 > secauth: on > token_retransmits_before_loss_const: 10 > threads: 16 > token: 10000 > version: 2 > interface { > bindnetaddr: 192.168.1.0 > mcastaddr: 239.250.1.1 > mcastport: 5405 > ringnumber: 0 > } > interface { > bindnetaddr: x.y.z.0 > mcastaddr: 239.250.1.2 > mcastport: 5415 > ringnumber: 1 > } > clear_node_high_bit: yes > } >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker