On Mon, Dec 20, 2010 at 12:55 AM, Daniel Bareiro <[email protected]> wrote:
> Hi all!
>
> I hope this is the right group to discuss my problem.
>
> I'm beginning to test HA clusters with GNU/Linux and for that I decided
> to try Pacemaker + Corosync in Debian Lenny following this [1] howto.
>
> Both packages were installed from the Backports repositories. But I am
> observing that if after configuration I reboot a node, it fails to join
> to the cluster after the boot.
>
> This is what I see in /var/log/daemon.log:
>
> --------------------------------------------------------------------------
> Dec 19 17:13:13 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
> Sending message to local.crmd failed: unknown (rc=-2)
> Dec 19 17:13:13 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
> Sending message to local.cib failed: unknown (rc=-2)
> Dec 19 17:13:13 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
> Sending message to local.attrd failed: unknown (rc=-2)
> Dec 19 17:13:13 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
> Sending message to local.cib failed: unknown (rc=-2)
> Dec 19 17:13:14 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
> Sending message to local.cib failed: unknown (rc=-2)
> Dec 19 17:13:14 atlantis corosync[1508]:   [pcmk  ] WARN: route_ais_message: 
> Sending message to local.cib failed: unknown (rc=-2)
> Dec 19 17:13:21 atlantis corosync[1508]:   [TOTEM ] A processor failed, 
> forming new configuration.
> Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] notice: pcmk_peer_update: 
> Transitional membership event on ring 72: memb=1, new=0, lost=1
> Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: pcmk_peer_update: 
> memb: atlantis 335544586
> Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: pcmk_peer_update: 
> lost: daedalus 369099018
> Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] notice: pcmk_peer_update: 
> Stable membership event on ring 72: memb=1, new=0, lost=0
> Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: pcmk_peer_update: 
> MEMB: atlantis 335544586
> Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: 
> ais_mark_unseen_peer_dead: Node daedalus was not seen in the previous 
> transition
> Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: update_member: Node 
> 369099018/daedalus is now: lost
> Dec 19 17:13:25 atlantis corosync[1508]:   [pcmk  ] info: 
> send_member_notification: Sending membership update 72 to 0 children
> Dec 19 17:13:25 atlantis corosync[1508]:   [TOTEM ] A processor joined or 
> left the membership and a new membership was formed.
> Dec 19 17:13:25 atlantis corosync[1508]:   [MAIN  ] Completed service 
> synchronization, ready to provide service.
> --------------------------------------------------------------------------
>
>
> # ps auxf
> [...]
> root      1508  0.1  1.9 182624  4880 ?        Ssl  15:52   0:22 
> /usr/sbin/corosync
> root      1539  0.0  1.2 168144  3240 ?        S    15:52   0:00  \_ 
> /usr/sbin/corosync
> root      1540  0.0  1.2 168144  3240 ?        S    15:52   0:00  \_ 
> /usr/sbin/corosync
> root      1541  0.0  1.2 168144  3240 ?        S    15:52   0:00  \_ 
> /usr/sbin/corosync
> root      1542  0.0  1.2 168144  3240 ?        S    15:52   0:00  \_ 
> /usr/sbin/corosync
> root      1543  0.0  1.2 168144  3240 ?        S    15:52   0:00  \_ 
> /usr/sbin/corosync
> root      1544  0.0  1.2 168144  3240 ?        S    15:52   0:00  \_ 
> /usr/sbin/corosync

You're hitting a deadlock between the calls to fork() and exec() when
pacemaker is trying to start.
This is the reason we created the MCP
    
http://theclusterguy.clusterlabs.org/post/907043024/introducing-the-pacemaker-master-control-process-for

>
>
> From what I see in the howto, the output should be something like this:
>
>
> root     29980  0.0  0.8  44304  3808 ?        Ssl  20:55   0:00 
> /usr/sbin/corosync
> root     29986  0.0  2.4  10812 10812 ?        SLs  20:55   0:00  \_ 
> /usr/lib/heartbeat/stonithd
> 102      29987  0.0  0.8  13012  3804 ?        S    20:55   0:00  \_ 
> /usr/lib/heartbeat/cib
> root     29988  0.0  0.4   5444  1800 ?        S    20:55   0:00  \_ 
> /usr/lib/heartbeat/lrmd
> 102      29989  0.0  0.5  12364  2368 ?        S    20:55   0:00  \_ 
> /usr/lib/heartbeat/attrd
> 102      29990  0.0  0.5   8604  2304 ?        S    20:55   0:00  \_ 
> /usr/lib/heartbeat/pengine
> 102      29991  0.0  0.6  12648  3080 ?        S    20:55   0:00  \_ 
> /usr/lib/heartbeat/crmd
>
>
>
> I also tried compiling Pacemaker using these [2] steps, but I get the
> same result.
>
>
>
> Thanks in advance for your reply.
>
> Regards,
> Daniel
>
> [1] http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo
> [2] http://www.clusterlabs.org/wiki/Install#Building_from_Source
> --
> Daniel Bareiro - GNU/Linux registered user #188.598
> Proudly running Debian GNU/Linux with uptime:
> 20:31:04 up 67 days, 20:57, 10 users,  load average: 0.11, 0.05, 0.01
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAk0Om2MACgkQZpa/GxTmHTdoogCeLw6ysNseW4V/K9Mcto8FsqAA
> /bEAn0W0lwyse9qw/hp8gR+ITsOGs9pB
> =v4rt
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
>
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to