Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest
Andrei Borzenkov arvidj...@gmail.com writes: [...] Now I have one issue, when the bare metal host on which the VM is running die, the VM is lost and can not be fenced. Is there a way to make pacemaker ACK the fencing of the VM running on a host when the host is fenced itself? Yes, you can define multiple stonith agents and priority between them. http://clusterlabs.org/wiki/Fencing_topology Hello, If I understand correctly, fencing topology is the way to have several fencing devices for a node and try them consecutively until one works. In my configuration, I group the VM stonith agents with the corresponding VM resource, to make them move together[1]. Here is my use case: 1. Resource ONE-Frontend-Group runs on nebula1 2. nebula1 is fenced 3. node one-fronted can not be fenced Is there a way to say that the life on node one-frontend is related to the state of resource ONE-Frontend? In which case when the node nebula1 is fenced, pacemaker should be aware that resource ONE-Frontend is not running any more, so node one-frontend is OFFLINE and not UNCLEAN. Regards. Footnotes: [1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF node $id=1084811078 nebula1 node $id=1084811079 nebula2 node $id=1084811080 nebula3 node $id=108488 quorum \ attributes standby=on node $id=108489 one-frontend primitive ONE-Datastores ocf:heartbeat:Filesystem \ params device=/dev/one-fs/datastores directory=/var/lib/one/datastores fstype=gfs2 \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 \ op monitor interval=20 timeout=40 primitive ONE-Frontend ocf:heartbeat:VirtualDomain \ params config=/var/lib/one/datastores/one/one.xml \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 \ utilization cpu=1 hv_memory=1024 primitive ONE-vg ocf:heartbeat:LVM \ params volgrpname=one-fs \ op start interval=0 timeout=30 \ op stop interval=0 timeout=30 \ op monitor interval=60 timeout=30 primitive Quorum-Node ocf:heartbeat:VirtualDomain \ params config=/var/lib/libvirt/qemu/pcmk/quorum.xml \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 \ utilization cpu=1 hv_memory=1024 primitive Stonith-ONE-Frontend stonith:external/libvirt \ params hostlist=one-frontend hypervisor_uri=qemu:///system pcmk_host_list=one-frontend pcmk_host_check=static-list \ op monitor interval=30m primitive Stonith-Quorum-Node stonith:external/libvirt \ params hostlist=quorum hypervisor_uri=qemu:///system pcmk_host_list=quorum pcmk_host_check=static-list \ op monitor interval=30m primitive Stonith-nebula1-IPMILAN stonith:external/ipmi \ params hostname=nebula1-ipmi ipaddr=XXX.XXX.XXX.XXX interface=lanplus userid=USER passwd=PASSWORD1 passwd_method=env priv=operator pcmk_host_list=nebula1 pcmk_host_check=static-list \ op monitor interval=30m \ meta target-role=Started primitive Stonith-nebula2-IPMILAN stonith:external/ipmi \ params hostname=nebula2-ipmi ipaddr=YYY.YYY.YYY.YYY interface=lanplus userid=USER passwd=PASSWORD2 passwd_method=env priv=operator pcmk_host_list=nebula2 pcmk_host_check=static-list \ op monitor interval=30m \ meta target-role=Started primitive Stonith-nebula3-IPMILAN stonith:external/ipmi \ params hostname=nebula3-ipmi ipaddr=ZZZ.ZZZ.ZZZ.ZZZ interface=lanplus userid=USER passwd=PASSWORD3 passwd_method=env priv=operator pcmk_host_list=nebula3 pcmk_host_check=static-list \ op monitor interval=30m \ meta target-role=Started primitive clvm ocf:lvm2:clvmd \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 \ op monitor interval=60 timeout=90 primitive dlm ocf:pacemaker:controld \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 \ op monitor interval=60 timeout=60 group ONE-Frontend-Group Stonith-ONE-Frontend ONE-Frontend \ meta target-role=Started group ONE-Storage dlm clvm ONE-vg ONE-Datastores group Quorum-Node-Group Stonith-Quorum-Node Quorum-Node \ meta target-role=Started clone ONE-Storage-Clone ONE-Storage \ meta interleave=true target-role=Started location Nebula1-does-not-fence-itslef Stonith-nebula1-IPMILAN \ rule $id=Nebula1-does-not-fence-itslef-rule 50: #uname eq nebula2 \ rule $id=Nebula1-does-not-fence-itslef-rule-0 40: #uname eq nebula3 location Nebula2-does-not-fence-itslef Stonith-nebula2-IPMILAN \ rule $id=Nebula2-does-not-fence-itslef-rule 50: #uname eq nebula3 \ rule $id=Nebula2-does-not-fence-itslef-rule-0 40: #uname eq nebula1 location Nebula3-does-not-fence-itslef Stonith-nebula3-IPMILAN \ rule
Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest
I think the suggestion was to put shooting the host in the fencing path of a VM. This way if you can't get the host to fence the VM (as the host is already dead) you just check if the host was fenced. Daniel Dehennin daniel.dehen...@baby-gnu.org napisał: Andrei Borzenkov arvidj...@gmail.com writes: [...] Now I have one issue, when the bare metal host on which the VM is running die, the VM is lost and can not be fenced. Is there a way to make pacemaker ACK the fencing of the VM running on a host when the host is fenced itself? Yes, you can define multiple stonith agents and priority between them. http://clusterlabs.org/wiki/Fencing_topology Hello, If I understand correctly, fencing topology is the way to have several fencing devices for a node and try them consecutively until one works. In my configuration, I group the VM stonith agents with the corresponding VM resource, to make them move together[1]. Here is my use case: 1. Resource ONE-Frontend-Group runs on nebula1 2. nebula1 is fenced 3. node one-fronted can not be fenced Is there a way to say that the life on node one-frontend is related to the state of resource ONE-Frontend? In which case when the node nebula1 is fenced, pacemaker should be aware that resource ONE-Frontend is not running any more, so node one-frontend is OFFLINE and not UNCLEAN. Regards. Footnotes: [1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Wysłane za pomocą K-9 Mail.___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Loosing corosync communication clusterwide
Hello, I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was blocked. The “dlm_tool ls” command told me “wait ringid”. The corosync-* commands hangs (like corosync-quorumtool). The pacemaker “crm_mon” display nothing wrong. I'm using Ubuntu Trusty Tahr: - corosync 2.3.3-1ubuntu1 - pacemaker 1.1.10+git20130802-1ubuntu2.1 My cluster was manually rebooted. Any idea how to debug such situation? Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to avoid CRM sending stop when ha.cf gets 2nd node configured
On Sat, Nov 08, 2014 at 12:58:36AM +, aridh bose wrote: Hi, While using heartbeat and pacemaker, is it possible to bringup first node which can go as Master, followed by second node which should go as Slave without causing any issues to the first node? Currently, I see a couple of problems in achieving this:1. Assuming I am not using mcast communication, heartbeat is mandating me to configure second node info either in ha.cf or in /etc/hosts file with associated IP address. Why can't it come up by itself as Master to start with? 2. If I update ha.cf with the 2nd node info and use 'heartbeat -r' CRM first sends stop on the Master before sending start. Appreciate any help or pointers. Regardless of what you do there, or why, or on which communication stack: how about you first put pacemaker into maintenance-mode, then you do your re-archetecturing of your cluster, and once you are satisfied with the new cluster, you take it out of maintenance mode again? At least that is one of the intended use cases for maintenance mode. -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Intermittent Failovers: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Hey Team, I'm receiving some strange intermittent failovers on a two-node cluster (happens once every week or two). When this happens, both nodes are unavailable; one node will be marked offline and the other will be shown as unclean. Any help on this would be massively appreciated. Thanks. Running Ubuntu 12.04 (64-bit) Pacemaker 1.1.6-2ubuntu3.3 Corosync 1.4.2-2ubuntu0.2 Here are the logs: Nov 08 14:26:26 corosync [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x12bebe0, async-conn=0x12bebe0) left Nov 08 14:26:26 corosync [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) Nov 08 14:26:27 corosync [pcmk ] info: pcmk_ipc_exit: Client attrd (conn=0x12d0230, async-conn=0x12d0230) left Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x12c7d80, async-conn=0x12c7d80) left Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x12c3a20, async-conn=0x12c3a20) left Nov 08 14:26:32 corosync [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) Nov 08 14:26:32 corosync [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12bebe0 for stonith-ng/0 Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12c2f40 for attrd/0 Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12c72a0 for cib/0 Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Sending membership update 12 to cib Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12cb600 for crmd/0 Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Sending membership update 12 to crmd Output of crm configure show: node p-sbc3 \ attributes standby=off node p-sbc4 \ attributes standby=off primitive fs lsb:FSSofia \ op monitor interval=2s enabled=true timeout=10s on-fail=standby \ meta target-role=Started primitive fs-ip ocf:heartbeat:IPaddr2 \ params ip=10.100.0.90 nic=eth0:0 cidr_netmask=24 \ op monitor interval=10s primitive fs-ip2 ocf:heartbeat:IPaddr2 \ params ip=10.100.0.99 nic=eth0:1 cidr_netmask=24 \ op monitor interval=10s group cluster_services fs-ip fs-ip2 fs \ meta target-role=Started property $id=cib-bootstrap-options \ dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false \ last-lrm-refresh=1348755080 \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
I think, you don't have fencing configured in your cluster. 2014-11-10 17:02 GMT+01:00 Daniel Dehennin daniel.dehen...@baby-gnu.org: Daniel Dehennin daniel.dehen...@baby-gnu.org writes: Hello, Hello, I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was blocked. The “dlm_tool ls” command told me “wait ringid”. It happened again: root@nebula2:~# dlm_tool ls dlm lockspaces name datastores id0x1b61ba6a flags 0x0004 kern_stop changemember 4 joined 1 remove 0 failed 0 seq 3,3 members 1084811078 1084811079 1084811080 108489 new changemember 3 joined 0 remove 1 failed 1 seq 4,4 new statuswait ringid new members 1084811078 1084811079 1084811080 name clvmd id0x4104eefa flags 0x0004 kern_stop changemember 4 joined 1 remove 0 failed 0 seq 3,3 members 1084811078 1084811079 1084811080 108489 new changemember 3 joined 0 remove 1 failed 1 seq 4,4 new statuswait ringid new members 1084811078 1084811079 1084811080 root@nebula2:~# dlm_tool status cluster nodeid 1084811079 quorate 1 ring seq 21372 21372 daemon now 8351 fence_pid 0 fence 108489 nodedown pid 0 actor 0 fail 1415634527 fence 0 now 1415634734 node 1084811078 M add 5089 rem 0 fail 0 fence 0 at 0 0 node 1084811079 M add 5089 rem 0 fail 0 fence 0 at 0 0 node 1084811080 M add 5089 rem 0 fail 0 fence 0 at 0 0 node 108489 X add 5766 rem 8144 fail 8144 fence 0 at 0 0 Any idea? -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest
В Mon, 10 Nov 2014 10:07:18 +0100 Tomasz Kontusz tomasz.kont...@gmail.com пишет: I think the suggestion was to put shooting the host in the fencing path of a VM. This way if you can't get the host to fence the VM (as the host is already dead) you just check if the host was fenced. Exactly. One thing I do not know how it will behave in case of multiple VMs on the same host. I.e. will pacemaker try to fence host for every VM or recognize that all VMs are dead after the first time agent is invoked. Daniel Dehennin daniel.dehen...@baby-gnu.org napisał: Andrei Borzenkov arvidj...@gmail.com writes: [...] Now I have one issue, when the bare metal host on which the VM is running die, the VM is lost and can not be fenced. Is there a way to make pacemaker ACK the fencing of the VM running on a host when the host is fenced itself? Yes, you can define multiple stonith agents and priority between them. http://clusterlabs.org/wiki/Fencing_topology Hello, If I understand correctly, fencing topology is the way to have several fencing devices for a node and try them consecutively until one works. In my configuration, I group the VM stonith agents with the corresponding VM resource, to make them move together[1]. Here is my use case: 1. Resource ONE-Frontend-Group runs on nebula1 2. nebula1 is fenced 3. node one-fronted can not be fenced Is there a way to say that the life on node one-frontend is related to the state of resource ONE-Frontend? In which case when the node nebula1 is fenced, pacemaker should be aware that resource ONE-Frontend is not running any more, so node one-frontend is OFFLINE and not UNCLEAN. Regards. Footnotes: [1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
Hanging corosync sounds like libqb problems: trusty comes with 0.16, which likes to hang from time to time. Try building libqb 0.17. Daniel Dehennin daniel.dehen...@baby-gnu.org napisał: Hello, I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was blocked. The “dlm_tool ls” command told me “wait ringid”. The corosync-* commands hangs (like corosync-quorumtool). The pacemaker “crm_mon” display nothing wrong. I'm using Ubuntu Trusty Tahr: - corosync 2.3.3-1ubuntu1 - pacemaker 1.1.10+git20130802-1ubuntu2.1 My cluster was manually rebooted. Any idea how to debug such situation? Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Wysłane za pomocą K-9 Mail.___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
emmanuel segura emi2f...@gmail.com writes: I think, you don't have fencing configured in your cluster. I have fencing configured and working, modulo fencing VMs on dead host[1]. Regards. Footnotes: [1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
Tomasz Kontusz tomasz.kont...@gmail.com writes: Hanging corosync sounds like libqb problems: trusty comes with 0.16, which likes to hang from time to time. Try building libqb 0.17. Thanks, I'll look at this. Is there a way to get back to normal state without rebooting all machines and interrupting services? I thought about a lightweight version of something like: 1. stop pacemaker on all nodes without doing anything with resources, they all continue to work 2. stop corosync on all nodes 3. start corosync on all nodes 4. start pacemaker on all nodes, as services are running nothing needs to be done I looked in the documentation but fail to find some kind of cluster management best practices. Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
On 11 Nov 2014, at 4:39 am, Daniel Dehennin daniel.dehen...@baby-gnu.org wrote: emmanuel segura emi2f...@gmail.com writes: I think, you don't have fencing configured in your cluster. I have fencing configured and working, modulo fencing VMs on dead host[1]. Are you saying that the host and the VMs running inside it are both part of the same cluster? Regards. Footnotes: [1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, DocumentRoot is still set to /var/www/html ls -al /var/www/html shows different things on the 2 nodes node01: total 28 drwxr-xr-x. 3 root root 4096 Nov 11 12:25 . drwxr-xr-x. 6 root root 4096 Jul 23 22:18 .. -rw-r--r--. 1 root root50 Oct 28 18:00 index.html drwx--. 2 root root 16384 Oct 28 17:59 lost+found node02 only has index.html, no lost+found, and it's a different version of the file. Status URL is enabled in both nodes. On Oct 30, 2014 11:14 AM, Andrew Beekhof and...@beekhof.net wrote: On 29 Oct 2014, at 1:01 pm, Sihan Goi gois...@gmail.com wrote: Hi, I've never used crm_report before. I just read the man file and generated a tarball from 1-2 hours before I reconfigured all the DRBD related resources. I've put the tarball here - https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0 Hope you can help figure out what I'm doing wrong. Thanks for the help! Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start for /dev/drbd/by-res/wwwdata on /var/www/html Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem with ordered data mode. Opts: Oct 28 18:13:39 node02 crmd[9870]: notice: process_lrm_event: LRM operation WebFS_start_0 (call=164, rc=0, cib-update=298, confirmed=true) ok Oct 28 18:13:39 node02 crmd[9870]: notice: te_rsc_command: Initiating action 7: start WebSite_start_0 on node02 (local) Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory Is DocumentRoot still set to /var/www/html? If so, what happens if you run 'ls -al /var/www/html' in a shell? Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up Did you enable the status url? http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org