Re: [Pacemaker] Debian Unstable (sid) Problem with Pacemaker/Corosync Apache HA-Load Balanced cluster

Nick Khamis Sat, 01 Oct 2011 05:46:44 -0700

Can you post your crm please.

Nick.


On Sat, Oct 1, 2011 at 6:32 AM, Miltiadis Koutsokeras
<m.koutsoke...@biovista.com> wrote:
> Hello everyone,
>
> My goal is to build a Round Robin balanced, HA Apache Web server cluster.
> The
> main purpose is to balance HTTP requests evenly between the nodes and have
> one
> machine pickup all requests if and ONLY if the others are not available at
> the
> moment. The cluster will be accessible only from internal network. Any
> advise on
> this will be highly appreciated (resources to use, services to install and
> configure etc.). After walking through ClusterLabs documentation, I think
> the
> proper deployment is an active/active Pacemaker managed cluster.
>
> I'm trying to follow the "Cluster from scratch" article in order to build a
> 2
> node cluster on an experimental setup:
>
> 2 GNU/Linux Debian Unstable (sid) Virtual Machines (Kernel 3.0.0-1-686-pae,
> Apache/2.2.21 (Debian)) on same LAN network.
>
> node-0 IP: 192.168.0.101
> node-1 IP: 192.168.0.102
> Desired Cluster Virtual IP: 192.168.0.100
>
> The two nodes are setup to communicate with proper SSH keys and it works
> flawlessly. Also they can communicate with short names:
>
> root@node-0:~# ssh node-1 -- hostname
> node-1
>
> root@node-1:~# ssh node-0 -- hostname
> node-0
>
> My problem is that although I've reached the part where you have the
> ClusterIP
> resource setup properly, the Apache resource does not get started in either
> node. The logs do not have a message explaining the failure in detail, even
> with
> debug messages enabled. All related messages report unknown errors while
> trying
> to start the service and after a while the cluster manager gives up. From
> the
> messages it seems like the manager is getting unexpected exit codes from the
> Apache resource. The server-status URL is accessible from 127.0.0.1 in both
> nodes.
>
> root@node-0:~# crm_mon -1
> ============
> Last updated: Fri Sep 30 14:04:55 2011
> Stack: openais
> Current DC: node-1 - partition with quorum
> Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ node-1 node-0 ]
>
>  ClusterIP    (ocf::heartbeat:IPaddr2):    Started node-1
>
> Failed actions:
>    Apache2_monitor_0 (node=node-0, call=3, rc=1, status=complete): unknown
> error
>    Apache2_start_0 (node=node-0, call=5, rc=1, status=complete): unknown
> error
>    Apache2_monitor_0 (node=node-1, call=8, rc=1, status=complete): unknown
> error
>    Apache2_start_0 (node=node-1, call=10, rc=1, status=complete): unknown
> error
>
> Let's checkout the logs for this resource:
>
> root@node-0:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
> (Nothing)
>
> root@node-0:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
> Sep 30 14:04:23 node-0 lrmd: [2555]: WARN: Managed Apache2:monitor process
> 2802 exited with return code 1.
> Sep 30 14:04:30 node-0 lrmd: [2555]: WARN: Managed Apache2:start process
> 2942 exited with return code 1.
>
> root@node-1:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log
> Sep 30 14:04:23 node-1 pengine: [1676]: ERROR: native_create_actions:
> Resource Apache2 (ocf::apache) is active on 2 nodes attempting recovery
>
> root@node-1:~# grep WARN.*Apache2 /var/log/corosync/corosync.log
> Sep 30 14:04:23 node-1 lrmd: [1674]: WARN: Managed Apache2:monitor process
> 3006 exited with return code 1.
> Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 5
> (Apache2_monitor_0) on node-1 failed (target: 7 vs. rc: 1): Error
> Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 7
> (Apache2_monitor_0) on node-0 failed (target: 7 vs. rc: 1): Error
> Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-0: unknown error (1)
> Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-1: unknown error (1)
> Sep 30 14:04:30 node-1 crmd: [1677]: WARN: status_from_rc: Action 10
> (Apache2_start_0) on node-0 failed (target: 0 vs. rc: 1): Error
> Sep 30 14:04:30 node-1 crmd: [1677]: WARN: update_failcount: Updating
> failcount for Apache2 on node-0 after failed start: rc=1 (update=INFINITY,
> time=1317380670)
> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-0: unknown error (1)
> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_start_0 on node-0: unknown error (1)
> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-1: unknown error (1)
> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-0: unknown error (1)
> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_start_0 on node-0: unknown error (1)
> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-1: unknown error (1)
> Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
> Sep 30 14:04:36 node-1 lrmd: [1674]: WARN: Managed Apache2:start process
> 3146 exited with return code 1.
> Sep 30 14:04:36 node-1 crmd: [1677]: WARN: status_from_rc: Action 9
> (Apache2_start_0) on node-1 failed (target: 0 vs. rc: 1): Error
> Sep 30 14:04:36 node-1 crmd: [1677]: WARN: update_failcount: Updating
> failcount for Apache2 on node-1 after failed start: rc=1 (update=INFINITY,
> time=1317380676)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-0: unknown error (1)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_start_0 on node-0: unknown error (1)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-1: unknown error (1)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_start_0 on node-1: unknown error (1)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-0: unknown error (1)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_start_0 on node-0: unknown error (1)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-1: unknown error (1)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_start_0 on node-1: unknown error (1)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
> Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-0: unknown error (1)
> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_start_0 on node-0: unknown error (1)
> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-1: unknown error (1)
> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_start_0 on node-1: unknown error (1)
> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
> Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_monitor_0 on node-1: unknown error (1)
> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing
> failed op Apache2_start_0 on node-1: unknown error (1)
> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-1 after 1000000 failures (max=1000000)
> Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness:
> Forcing Apache2 away from node-0 after 1000000 failures (max=1000000)
>
> Any suggestions?
>
> File /etc/corosync/corosync.conf (Only changes here , see attached for full
> file)
>
> # Please read the openais.conf.5 manual page
>
> totem {
>
> ... (Default)
>
>     interface {
>        # The following values need to be set based on your environment
>        ringnumber: 0
>        bindnetaddr: 192.168.0.0
>        mcastaddr: 226.94.1.1
>        mcastport: 5405
>    }
> }
>
> ... (Default)
>
> service {
>     # Load the Pacemaker Cluster Resource Manager
>     ver:       1
>     name:      pacemaker
> }
>
> ... (Default)
>
> logging {
>        fileline: off
>        to_stderr: no
>        to_logfile: yes
>        logfile: /var/log/corosync/corosync.log
>        to_syslog: no
>        syslog_facility: daemon
>        debug: on
>        timestamp: on
>        logger_subsys {
>                subsys: AMF
>                debug: off
>                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
>        }
> }
>
> --
> Koutsokeras Miltiadis M.Sc.
> Software Engineer
> Biovista Inc.
>
> US Offices
> 2421 Ivy Road
> Charlottesville, VA 22903
> USA
> T: +1.434.971.1141
> F: +1.434.971.1144
>
> European Offices
> 34 Rodopoleos Street
> Ellinikon, Athens 16777
> GREECE
> T: +30.210.9629848
> F: +30.210.9647606
>
> www.biovista.com
>
> Biovista is a privately held biotechnology company that finds novel uses for
> existing drugs, and profiles their side effects using their mechanism of
> action. Biovista develops its own pipeline of drugs in CNS, oncology,
> auto-immune and rare diseases. Biovista is collaborating with
> biopharmaceutical companies on indication expansion and de-risking of their
> portfolios and with the FDA on adverse event prediction.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Debian Unstable (sid) Problem with Pacemaker/Corosync Apache HA-Load Balanced cluster

Reply via email to