Can you post your crm please. Nick.
On Sat, Oct 1, 2011 at 6:32 AM, Miltiadis Koutsokeras <m.koutsoke...@biovista.com> wrote: > Hello everyone, > > My goal is to build a Round Robin balanced, HA Apache Web server cluster. > The > main purpose is to balance HTTP requests evenly between the nodes and have > one > machine pickup all requests if and ONLY if the others are not available at > the > moment. The cluster will be accessible only from internal network. Any > advise on > this will be highly appreciated (resources to use, services to install and > configure etc.). After walking through ClusterLabs documentation, I think > the > proper deployment is an active/active Pacemaker managed cluster. > > I'm trying to follow the "Cluster from scratch" article in order to build a > 2 > node cluster on an experimental setup: > > 2 GNU/Linux Debian Unstable (sid) Virtual Machines (Kernel 3.0.0-1-686-pae, > Apache/2.2.21 (Debian)) on same LAN network. > > node-0 IP: 192.168.0.101 > node-1 IP: 192.168.0.102 > Desired Cluster Virtual IP: 192.168.0.100 > > The two nodes are setup to communicate with proper SSH keys and it works > flawlessly. Also they can communicate with short names: > > root@node-0:~# ssh node-1 -- hostname > node-1 > > root@node-1:~# ssh node-0 -- hostname > node-0 > > My problem is that although I've reached the part where you have the > ClusterIP > resource setup properly, the Apache resource does not get started in either > node. The logs do not have a message explaining the failure in detail, even > with > debug messages enabled. All related messages report unknown errors while > trying > to start the service and after a while the cluster manager gives up. From > the > messages it seems like the manager is getting unexpected exit codes from the > Apache resource. The server-status URL is accessible from 127.0.0.1 in both > nodes. > > root@node-0:~# crm_mon -1 > ============ > Last updated: Fri Sep 30 14:04:55 2011 > Stack: openais > Current DC: node-1 - partition with quorum > Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f > 2 Nodes configured, 2 expected votes > 2 Resources configured. > ============ > > Online: [ node-1 node-0 ] > > ClusterIP (ocf::heartbeat:IPaddr2): Started node-1 > > Failed actions: > Apache2_monitor_0 (node=node-0, call=3, rc=1, status=complete): unknown > error > Apache2_start_0 (node=node-0, call=5, rc=1, status=complete): unknown > error > Apache2_monitor_0 (node=node-1, call=8, rc=1, status=complete): unknown > error > Apache2_start_0 (node=node-1, call=10, rc=1, status=complete): unknown > error > > Let's checkout the logs for this resource: > > root@node-0:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log > (Nothing) > > root@node-0:~# grep WARN.*Apache2 /var/log/corosync/corosync.log > Sep 30 14:04:23 node-0 lrmd: [2555]: WARN: Managed Apache2:monitor process > 2802 exited with return code 1. > Sep 30 14:04:30 node-0 lrmd: [2555]: WARN: Managed Apache2:start process > 2942 exited with return code 1. > > root@node-1:~# grep ERROR.*Apache2 /var/log/corosync/corosync.log > Sep 30 14:04:23 node-1 pengine: [1676]: ERROR: native_create_actions: > Resource Apache2 (ocf::apache) is active on 2 nodes attempting recovery > > root@node-1:~# grep WARN.*Apache2 /var/log/corosync/corosync.log > Sep 30 14:04:23 node-1 lrmd: [1674]: WARN: Managed Apache2:monitor process > 3006 exited with return code 1. > Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 5 > (Apache2_monitor_0) on node-1 failed (target: 7 vs. rc: 1): Error > Sep 30 14:04:23 node-1 crmd: [1677]: WARN: status_from_rc: Action 7 > (Apache2_monitor_0) on node-0 failed (target: 7 vs. rc: 1): Error > Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-0: unknown error (1) > Sep 30 14:04:23 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-1: unknown error (1) > Sep 30 14:04:30 node-1 crmd: [1677]: WARN: status_from_rc: Action 10 > (Apache2_start_0) on node-0 failed (target: 0 vs. rc: 1): Error > Sep 30 14:04:30 node-1 crmd: [1677]: WARN: update_failcount: Updating > failcount for Apache2 on node-0 after failed start: rc=1 (update=INFINITY, > time=1317380670) > Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-0: unknown error (1) > Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_start_0 on node-0: unknown error (1) > Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-1: unknown error (1) > Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) > Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-0: unknown error (1) > Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_start_0 on node-0: unknown error (1) > Sep 30 14:04:31 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-1: unknown error (1) > Sep 30 14:04:31 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) > Sep 30 14:04:36 node-1 lrmd: [1674]: WARN: Managed Apache2:start process > 3146 exited with return code 1. > Sep 30 14:04:36 node-1 crmd: [1677]: WARN: status_from_rc: Action 9 > (Apache2_start_0) on node-1 failed (target: 0 vs. rc: 1): Error > Sep 30 14:04:36 node-1 crmd: [1677]: WARN: update_failcount: Updating > failcount for Apache2 on node-1 after failed start: rc=1 (update=INFINITY, > time=1317380676) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-0: unknown error (1) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_start_0 on node-0: unknown error (1) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-1: unknown error (1) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_start_0 on node-1: unknown error (1) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-1 after 1000000 failures (max=1000000) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-0: unknown error (1) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_start_0 on node-0: unknown error (1) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-1: unknown error (1) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_start_0 on node-1: unknown error (1) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-1 after 1000000 failures (max=1000000) > Sep 30 14:04:37 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) > Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-0: unknown error (1) > Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_start_0 on node-0: unknown error (1) > Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-1: unknown error (1) > Sep 30 14:13:38 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_start_0 on node-1: unknown error (1) > Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-1 after 1000000 failures (max=1000000) > Sep 30 14:13:38 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) > Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_monitor_0 on node-1: unknown error (1) > Sep 30 14:13:52 node-1 pengine: [1676]: WARN: unpack_rsc_op: Processing > failed op Apache2_start_0 on node-1: unknown error (1) > Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-1 after 1000000 failures (max=1000000) > Sep 30 14:13:52 node-1 pengine: [1676]: WARN: common_apply_stickiness: > Forcing Apache2 away from node-0 after 1000000 failures (max=1000000) > > Any suggestions? > > File /etc/corosync/corosync.conf (Only changes here , see attached for full > file) > > # Please read the openais.conf.5 manual page > > totem { > > ... (Default) > > interface { > # The following values need to be set based on your environment > ringnumber: 0 > bindnetaddr: 192.168.0.0 > mcastaddr: 226.94.1.1 > mcastport: 5405 > } > } > > ... (Default) > > service { > # Load the Pacemaker Cluster Resource Manager > ver: 1 > name: pacemaker > } > > ... (Default) > > logging { > fileline: off > to_stderr: no > to_logfile: yes > logfile: /var/log/corosync/corosync.log > to_syslog: no > syslog_facility: daemon > debug: on > timestamp: on > logger_subsys { > subsys: AMF > debug: off > tags: enter|leave|trace1|trace2|trace3|trace4|trace6 > } > } > > -- > Koutsokeras Miltiadis M.Sc. > Software Engineer > Biovista Inc. > > US Offices > 2421 Ivy Road > Charlottesville, VA 22903 > USA > T: +1.434.971.1141 > F: +1.434.971.1144 > > European Offices > 34 Rodopoleos Street > Ellinikon, Athens 16777 > GREECE > T: +30.210.9629848 > F: +30.210.9647606 > > www.biovista.com > > Biovista is a privately held biotechnology company that finds novel uses for > existing drugs, and profiles their side effects using their mechanism of > action. Biovista develops its own pipeline of drugs in CNS, oncology, > auto-immune and rare diseases. Biovista is collaborating with > biopharmaceutical companies on indication expansion and de-risking of their > portfolios and with the FDA on adverse event prediction. > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker