Re: [Pacemaker] never ending election

David Riccitelli Tue, 05 Aug 2008 04:18:38 -0700

Here is the log of the second one (node #1):

https://share.acrobat.com/adc/document.do?docid=4f7ed8e7-33ca-47fc-89e5-50bb8b26f656

This line refers to the moment I manually killed the heartbeat on node #2:

Aug 1 12:16:13 rmefp-srv01x heartbeat: [28575]: WARN: node rmefp-srv02x: is dead

Best regards,

David Riccitelli

________________________________________________________________________

David Riccitelli

e-mail: [EMAIL PROTECTED]

skype: ziodave

phone: +39.0658318336

roma - tel.+39.0658318301 fax.+39.0658318303 P.I. 04856801008

Rispetta l'ambiente e non stampare questa e-mail a meno che non ti sia realmente utile.
Please consider the environment and don't print this e-mail unless you really need to.

NOTE SULLA PRIVACY
Le informazioni trasmesse attraverso la presente e-mail ed i suoi allegati sono diretti esclusivamente al
destinatario e devono ritenersi riservati con divieto di diffusione e di uso. La diffusione e la comunicazione
da parte di soggetto diverso dal destinatario è vietata dall’art. 616 e ss. c.p. e dal d. l.vo n. 196/03.
Se la presente e-mail ed i suoi allegati fossero stati ricevuti per errore da persona diversa dal destinatario
siete pregati di distruggere tutto quanto ricevuto e di informare il mittente con lo stesso mezzo.
________________________________________________________________________

On 04/ago/08, at 17:19, Andrew Beekhof wrote:

On Mon, Aug 4, 2008 at 16:53, David Riccitelli <[EMAIL PROTECTED]> wrote:
The log for the second node are located here:
https://share.acrobat.com/adc/document.do?docid=144a8a57-4c6a-46d9-bfc4-cad7dd31fc02
I don't have the one for the first node at the moment.

unfortunately i need both - since both think they should win the
election and I need to try and figure out who's right (and thus where
the bug is)

The log starts with this line:
Aug 1 11:50:09 rmefp-srv02x heartbeat: [20780]: WARN: node rmefp-srv01x:
is dead
which is when I removed the two network cables from the first node;
And the last meaningful line I believe is this:
Aug 1 12:19:51 rmefp-srv02x crmd: [20793]: info: do_election_check: Still
waiting on 2 non-votes (2 total)
As the following line happens when I forced the restart of the heartbeat
service (on the second node):
Aug 1 12:19:55 rmefp-srv02x heartbeat: [6404]: info: No log entry found in
ha.cf -- use logd

Best regards,
David Riccitelli

________________________________________________________________________

David Riccitelli

e-mail: [EMAIL PROTECTED]
skype: ziodave
phone: +39.0658318336

roma - tel.+39.0658318301 fax.+39.0658318303 P.I. 04856801008

Rispetta l'ambiente e non stampare questa e-mail a meno che non ti sia
realmente utile.
Please consider the environment and don't print this e-mail unless you
really need to.

NOTE SULLA PRIVACY
Le informazioni trasmesse attraverso la presente e-mail ed i suoi allegati
sono diretti esclusivamente al
destinatario e devono ritenersi riservati con divieto di diffusione e di
uso. La diffusione e la comunicazione
da parte di soggetto diverso dal destinatario è vietata dall'art. 616 e ss.
c.p. e dal d. l.vo n. 196/03.
Se la presente e-mail ed i suoi allegati fossero stati ricevuti per errore
da persona diversa dal destinatario
siete pregati di distruggere tutto quanto ricevuto e di informare il
mittente con lo stesso mezzo.
________________________________________________________________________

On 04/ago/08, at 13:11, Andrew Beekhof wrote:

Hard to say what's going on based on this log fragment.
Can you put the full logs from both nodes somewhere?

On Sun, Aug 3, 2008 at 11:18, David Riccitelli <[EMAIL PROTECTED]> wrote:

Hello there,

Can somebody help me with this problem?

I have 2 identical nodes, node #1 and node #2. Nodes are installed with

CentOS 5 and the current version of heartbeat (2.1.3) and pacemaker (0.6.5).

Each node has 2 network ports bonded together (mode 1). bonding is

configured and working fine.

The nodes have one resource configured. And I must say everything works

fine. All the tests I'm running show perfect failovers, but one test:

1. node #1 has the resource, node #2 is waiting,

2. I remove both network cables from node #1,

3. node #2 doesn't sense node #1 anymore and believes it is dead,

4. node #2 brings up the resource,

5. then I put back node #1 in the network - I believe the nodes should see

themselves and one of the two will leave the resource,

6. node #1 and node #2 see each other and start counting election votes,

but for an indefinite time and the resource is active on two nodes at the

same time:

logs (same on both nodes - this pattern repeats forever, until heartbeat is

manually stopped on one of the nodes):

_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] never ending election

Reply via email to