Re: Problems with Pacemaker + Corosync after reboot

2010-12-29 Thread Peter Beck
On Sun, 2010-12-19 at 21:06 -0300, Daniel Bareiro wrote:

 # ps auxf
 [...]
 root  1508  0.1  1.9 182624  4880 ?Ssl  15:52   0:22 
 /usr/sbin/corosync
 root  1539  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
 /usr/sbin/corosync
 root  1540  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
 /usr/sbin/corosync
 root  1541  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
 /usr/sbin/corosync
 root  1542  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
 /usr/sbin/corosync
 root  1543  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
 /usr/sbin/corosync
 root  1544  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
 /usr/sbin/corosync

Hi Daniel,

Stefan Voelkel just made a Bugreport against this issue:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=608269

Regards
Peter


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1293637381.26493.9.ca...@peanut.datentraeger.li



Re: Problems with Pacemaker + Corosync after reboot

2010-12-25 Thread Peter Beck
On Sun, 2010-12-19 at 21:06 -0300, Daniel Bareiro wrote:
 # ps auxf
 [...]
 root  1508  0.1  1.9 182624  4880 ?Ssl  15:52
 0:22 /usr/sbin/corosync
 root  1539  0.0  1.2 168144  3240 ?S15:52   0:00
 \_ /usr/sbin/corosync 

Hi Daniel

have you tried to kill corosync with killall -9 corosync
and then to restart via /etc/init.d/corosync start ?

This seems to bring back my nodes. If I do this, both nodes here are
back. But it does not solve the issue, every reboot I have to do it
again. Maybe corosync starts too early at bootup and one of the
depending services is not ready at this time ?

Best Regards
Peter



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1293314479.2491.10.ca...@peanut.datentraeger.li



Re: Problems with Pacemaker + Corosync after reboot

2010-12-22 Thread Peter Beck
On Sun, 2010-12-19 at 21:06 -0300, Daniel Bareiro wrote:
 Hi all!
 
 I'm beginning to test HA clusters with GNU/Linux and for that I
 decided to try Pacemaker + Corosync in Debian Lenny following this [1]
 howto.
 
 Both packages were installed from the Backports repositories. But I am
 observing that if after configuration I reboot a node, it fails to
 join to the cluster after the boot.

Hi there,

I am trying the same with Squeeze (in VMs) and there are the same
issues. Sometimes it seems to work fine, but then there is the same
issue with just corosync. I also bought the Clusterbook from O'Reilly
(no idea if this is available in english [1]) but I have no clue what I
am doing wrong. I haven't found much useful documentation (beside the
same links you've already mentioned).

I've heard that Pacemaker and Corosync causes a lot of issues and it's
not very reliable and better to run Pacemaker with Heartbeat. Is this
true ?

Regards
Peter

[1] http://www.oreilly.de/catalog/pdf_linuxhacluster2ger/index.html



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1293063058.3531.19.ca...@peanut.datentraeger.li



Problems with Pacemaker + Corosync after reboot

2010-12-19 Thread Daniel Bareiro
Hi all!

I'm beginning to test HA clusters with GNU/Linux and for that I decided
to try Pacemaker + Corosync in Debian Lenny following this [1] howto.

Both packages were installed from the Backports repositories. But I am
observing that if after configuration I reboot a node, it fails to join
to the cluster after the boot.

This is what I see in /var/log/daemon.log:

--
Dec 19 17:13:13 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.crmd failed: unknown (rc=-2)
Dec 19 17:13:13 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.cib failed: unknown (rc=-2)
Dec 19 17:13:13 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.attrd failed: unknown (rc=-2)
Dec 19 17:13:13 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.cib failed: unknown (rc=-2)
Dec 19 17:13:14 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.cib failed: unknown (rc=-2)
Dec 19 17:13:14 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.cib failed: unknown (rc=-2)
Dec 19 17:13:21 atlantis corosync[1508]:   [TOTEM ] A processor failed, forming 
new configuration.
Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 72: memb=1, new=0, lost=1
Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: pcmk_peer_update: 
memb: atlantis 335544586
Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: pcmk_peer_update: 
lost: daedalus 369099018
Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] notice: pcmk_peer_update: 
Stable membership event on ring 72: memb=1, new=0, lost=0
Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: pcmk_peer_update: 
MEMB: atlantis 335544586
Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: 
ais_mark_unseen_peer_dead: Node daedalus was not seen in the previous transition
Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: update_member: Node 
369099018/daedalus is now: lost
Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: 
send_member_notification: Sending membership update 72 to 0 children
Dec 19 17:13:25 atlantis corosync[1508]:   [TOTEM ] A processor joined or left 
the membership and a new membership was formed.
Dec 19 17:13:25 atlantis corosync[1508]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
--


# ps auxf
[...]
root  1508  0.1  1.9 182624  4880 ?Ssl  15:52   0:22 
/usr/sbin/corosync
root  1539  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
/usr/sbin/corosync
root  1540  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
/usr/sbin/corosync
root  1541  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
/usr/sbin/corosync
root  1542  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
/usr/sbin/corosync
root  1543  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
/usr/sbin/corosync
root  1544  0.0  1.2 168144  3240 ?S15:52   0:00  \_ 
/usr/sbin/corosync


From what I see in the howto, the output should be something like this:


root 29980  0.0  0.8  44304  3808 ?Ssl  20:55   0:00 
/usr/sbin/corosync
root 29986  0.0  2.4  10812 10812 ?SLs  20:55   0:00  \_ 
/usr/lib/heartbeat/stonithd
102  29987  0.0  0.8  13012  3804 ?S20:55   0:00  \_ 
/usr/lib/heartbeat/cib
root 29988  0.0  0.4   5444  1800 ?S20:55   0:00  \_ 
/usr/lib/heartbeat/lrmd
102  29989  0.0  0.5  12364  2368 ?S20:55   0:00  \_ 
/usr/lib/heartbeat/attrd
102  29990  0.0  0.5   8604  2304 ?S20:55   0:00  \_ 
/usr/lib/heartbeat/pengine
102  29991  0.0  0.6  12648  3080 ?S20:55   0:00  \_ 
/usr/lib/heartbeat/crmd


I also tried compiling Pacemaker using these [2] steps, but I get the
same result.


Thanks in advance for your reply.

Regards,
Daniel

[1] http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo
[2] http://www.clusterlabs.org/wiki/Install#Building_from_Source
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Lenny - Linux user #188.598


signature.asc
Description: Digital signature