date:20141110

Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-10 Thread Daniel Dehennin

Andrei Borzenkov arvidj...@gmail.com writes:


[...]

 Now I have one issue, when the bare metal host on which the VM is
 running die, the VM is lost and can not be fenced.
 
 Is there a way to make pacemaker ACK the fencing of the VM running on a
 host when the host is fenced itself?
 

 Yes, you can define multiple stonith agents and priority between them.

 http://clusterlabs.org/wiki/Fencing_topology

Hello,

If I understand correctly, fencing topology is the way to have several
fencing devices for a node and try them consecutively until one works.

In my configuration, I group the VM stonith agents with the
corresponding VM resource, to make them move together[1].

Here is my use case:

1. Resource ONE-Frontend-Group runs on nebula1
2. nebula1 is fenced
3. node one-fronted can not be fenced

Is there a way to say that the life on node one-frontend is related to
the state of resource ONE-Frontend?

In which case when the node nebula1 is fenced, pacemaker should be aware that
resource ONE-Frontend is not running any more, so node one-frontend is
OFFLINE and not UNCLEAN.

Regards.

Footnotes: 
[1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

node $id=1084811078 nebula1
node $id=1084811079 nebula2
node $id=1084811080 nebula3
node $id=108488 quorum \
attributes standby=on
node $id=108489 one-frontend
primitive ONE-Datastores ocf:heartbeat:Filesystem \
params device=/dev/one-fs/datastores 
directory=/var/lib/one/datastores fstype=gfs2 \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=20 timeout=40
primitive ONE-Frontend ocf:heartbeat:VirtualDomain \
params config=/var/lib/one/datastores/one/one.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
utilization cpu=1 hv_memory=1024
primitive ONE-vg ocf:heartbeat:LVM \
params volgrpname=one-fs \
op start interval=0 timeout=30 \
op stop interval=0 timeout=30 \
op monitor interval=60 timeout=30
primitive Quorum-Node ocf:heartbeat:VirtualDomain \
params config=/var/lib/libvirt/qemu/pcmk/quorum.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
utilization cpu=1 hv_memory=1024
primitive Stonith-ONE-Frontend stonith:external/libvirt \
params hostlist=one-frontend hypervisor_uri=qemu:///system 
pcmk_host_list=one-frontend pcmk_host_check=static-list \
op monitor interval=30m
primitive Stonith-Quorum-Node stonith:external/libvirt \
params hostlist=quorum hypervisor_uri=qemu:///system 
pcmk_host_list=quorum pcmk_host_check=static-list \
op monitor interval=30m
primitive Stonith-nebula1-IPMILAN stonith:external/ipmi \
params hostname=nebula1-ipmi ipaddr=XXX.XXX.XXX.XXX 
interface=lanplus userid=USER passwd=PASSWORD1 passwd_method=env 
priv=operator pcmk_host_list=nebula1 pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
primitive Stonith-nebula2-IPMILAN stonith:external/ipmi \
params hostname=nebula2-ipmi ipaddr=YYY.YYY.YYY.YYY 
interface=lanplus userid=USER passwd=PASSWORD2 passwd_method=env 
priv=operator pcmk_host_list=nebula2 pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
primitive Stonith-nebula3-IPMILAN stonith:external/ipmi \
params hostname=nebula3-ipmi ipaddr=ZZZ.ZZZ.ZZZ.ZZZ 
interface=lanplus userid=USER passwd=PASSWORD3 passwd_method=env 
priv=operator pcmk_host_list=nebula3 pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
primitive clvm ocf:lvm2:clvmd \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=60 timeout=90
primitive dlm ocf:pacemaker:controld \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=60 timeout=60
group ONE-Frontend-Group Stonith-ONE-Frontend ONE-Frontend \
meta target-role=Started
group ONE-Storage dlm clvm ONE-vg ONE-Datastores
group Quorum-Node-Group Stonith-Quorum-Node Quorum-Node \
meta target-role=Started
clone ONE-Storage-Clone ONE-Storage \
meta interleave=true target-role=Started
location Nebula1-does-not-fence-itslef Stonith-nebula1-IPMILAN \
rule $id=Nebula1-does-not-fence-itslef-rule 50: #uname eq nebula2 \
rule $id=Nebula1-does-not-fence-itslef-rule-0 40: #uname eq nebula3
location Nebula2-does-not-fence-itslef Stonith-nebula2-IPMILAN \
rule $id=Nebula2-does-not-fence-itslef-rule 50: #uname eq nebula3 \
rule $id=Nebula2-does-not-fence-itslef-rule-0 40: #uname eq nebula1
location Nebula3-does-not-fence-itslef Stonith-nebula3-IPMILAN \
rule

Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-10 Thread Tomasz Kontusz

I think the suggestion was to put shooting the host in the fencing path of a 
VM. This way if you can't get the host to fence the VM (as the host is already 
dead) you just check if the host was fenced.

Daniel Dehennin daniel.dehen...@baby-gnu.org napisał:
Andrei Borzenkov arvidj...@gmail.com writes:


[...]

 Now I have one issue, when the bare metal host on which the VM is
 running die, the VM is lost and can not be fenced.
 
 Is there a way to make pacemaker ACK the fencing of the VM running
on a
 host when the host is fenced itself?
 

 Yes, you can define multiple stonith agents and priority between
them.

 http://clusterlabs.org/wiki/Fencing_topology

Hello,

If I understand correctly, fencing topology is the way to have several
fencing devices for a node and try them consecutively until one works.

In my configuration, I group the VM stonith agents with the
corresponding VM resource, to make them move together[1].

Here is my use case:

1. Resource ONE-Frontend-Group runs on nebula1
2. nebula1 is fenced
3. node one-fronted can not be fenced

Is there a way to say that the life on node one-frontend is related to
the state of resource ONE-Frontend?

In which case when the node nebula1 is fenced, pacemaker should be
aware that
resource ONE-Frontend is not running any more, so node one-frontend is
OFFLINE and not UNCLEAN.

Regards.

Footnotes: 
[1] 
http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-- 
Wysłane za pomocą K-9 Mail.___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin

Hello,

I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
blocked.

The “dlm_tool ls” command told me “wait ringid”.

The corosync-* commands hangs (like corosync-quorumtool).

The pacemaker “crm_mon” display nothing wrong.

I'm using Ubuntu Trusty Tahr:

- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.1

My cluster was manually rebooted.

Any idea how to debug such situation?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] How to avoid CRM sending stop when ha.cf gets 2nd node configured

2014-11-10 Thread Lars Ellenberg

On Sat, Nov 08, 2014 at 12:58:36AM +, aridh bose wrote:
 Hi,
 While using heartbeat and pacemaker, is it possible to bringup first
 node which can go as Master, followed by second node which should go
 as Slave without causing any issues to the first node? Currently, I
 see a  couple of problems in achieving this:1. Assuming I am not using
 mcast communication, heartbeat is mandating me to configure second
 node info either in ha.cf or in /etc/hosts file with associated IP
 address. Why can't it come up by itself as Master to start with?

 2. If I update ha.cf with the 2nd node info and use 'heartbeat -r' CRM
 first sends stop on the Master before sending start.
 Appreciate any help or pointers.


Regardless of what you do there, or why,
or on which communication stack:

how about you first put pacemaker into maintenance-mode,
then you do your re-archetecturing of your cluster,
and once you are satisfied with the new cluster,
you take it out of maintenance mode again?

At least that is one of the intended use cases
for maintenance mode.

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Intermittent Failovers: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)

2014-11-10 Thread Zach Wolf

Hey Team,

I'm receiving some strange intermittent failovers on a two-node cluster 
(happens once every week or two). When this happens, both nodes are 
unavailable; one node will be marked offline and the other will be shown as 
unclean. Any help on this would be massively appreciated. Thanks.

Running Ubuntu 12.04 (64-bit)
Pacemaker 1.1.6-2ubuntu3.3
Corosync 1.4.2-2ubuntu0.2

Here are the logs:
Nov 08 14:26:26 corosync [pcmk  ] info: pcmk_ipc_exit: Client crmd 
(conn=0x12bebe0, async-conn=0x12bebe0) left
Nov 08 14:26:26 corosync [pcmk  ] WARN: route_ais_message: Sending message to 
local.crmd failed: ipc delivery failed (rc=-2)
Nov 08 14:26:27 corosync [pcmk  ] info: pcmk_ipc_exit: Client attrd 
(conn=0x12d0230, async-conn=0x12d0230) left
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc_exit: Client cib 
(conn=0x12c7d80, async-conn=0x12c7d80) left
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc_exit: Client stonith-ng 
(conn=0x12c3a20, async-conn=0x12c3a20) left
Nov 08 14:26:32 corosync [pcmk  ] WARN: route_ais_message: Sending message to 
local.crmd failed: ipc delivery failed (rc=-2)
Nov 08 14:26:32 corosync [pcmk  ] WARN: route_ais_message: Sending message to 
local.cib failed: ipc delivery failed (rc=-2)
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12bebe0 
for stonith-ng/0
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12c2f40 
for attrd/0
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12c72a0 
for cib/0
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Sending membership update 12 
to cib
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12cb600 
for crmd/0
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Sending membership update 12 
to crmd

Output of crm configure show:
node p-sbc3 \
attributes standby=off
node p-sbc4 \
attributes standby=off
primitive fs lsb:FSSofia \
op monitor interval=2s enabled=true timeout=10s on-fail=standby 
\
meta target-role=Started
primitive fs-ip ocf:heartbeat:IPaddr2 \
params ip=10.100.0.90 nic=eth0:0 cidr_netmask=24 \
op monitor interval=10s
primitive fs-ip2 ocf:heartbeat:IPaddr2 \
params ip=10.100.0.99 nic=eth0:1 cidr_netmask=24 \
op monitor interval=10s
group cluster_services fs-ip fs-ip2 fs \
meta target-role=Started
property $id=cib-bootstrap-options \
dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \
cluster-infrastructure=openais \
expected-quorum-votes=2 \
stonith-enabled=false \
last-lrm-refresh=1348755080 \
no-quorum-policy=ignore
rsc_defaults $id=rsc-options \
resource-stickiness=100
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread emmanuel segura

I think, you don't have fencing configured in your cluster.

2014-11-10 17:02 GMT+01:00 Daniel Dehennin daniel.dehen...@baby-gnu.org:
 Daniel Dehennin daniel.dehen...@baby-gnu.org writes:

 Hello,

 Hello,

 I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
 blocked.

 The “dlm_tool ls” command told me “wait ringid”.

 It happened again:

 root@nebula2:~# dlm_tool ls
 dlm lockspaces
 name  datastores
 id0x1b61ba6a
 flags 0x0004 kern_stop
 changemember 4 joined 1 remove 0 failed 0 seq 3,3
 members   1084811078 1084811079 1084811080 108489
 new changemember 3 joined 0 remove 1 failed 1 seq 4,4
 new statuswait ringid
 new members   1084811078 1084811079 1084811080

 name  clvmd
 id0x4104eefa
 flags 0x0004 kern_stop
 changemember 4 joined 1 remove 0 failed 0 seq 3,3
 members   1084811078 1084811079 1084811080 108489
 new changemember 3 joined 0 remove 1 failed 1 seq 4,4
 new statuswait ringid
 new members   1084811078 1084811079 1084811080

 root@nebula2:~# dlm_tool status
 cluster nodeid 1084811079 quorate 1 ring seq 21372 21372
 daemon now 8351 fence_pid 0
 fence 108489 nodedown pid 0 actor 0 fail 1415634527 fence 0 now
 1415634734
 node 1084811078 M add 5089 rem 0 fail 0 fence 0 at 0 0
 node 1084811079 M add 5089 rem 0 fail 0 fence 0 at 0 0
 node 1084811080 M add 5089 rem 0 fail 0 fence 0 at 0 0
 node 108489 X add 5766 rem 8144 fail 8144 fence 0 at 0 0

 Any idea?
 --
 Daniel Dehennin
 Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
 Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-10 Thread Andrei Borzenkov

В Mon, 10 Nov 2014 10:07:18 +0100
Tomasz Kontusz tomasz.kont...@gmail.com пишет:

 I think the suggestion was to put shooting the host in the fencing path of a 
 VM. This way if you can't get the host to fence the VM (as the host is 
 already dead) you just check if the host was fenced.
 

Exactly. One thing I do not know how it will behave in case of multiple
VMs on the same host. I.e. will pacemaker try to fence host for every
VM or recognize that all VMs are dead after the first time agent is
invoked.

 Daniel Dehennin daniel.dehen...@baby-gnu.org napisał:
 Andrei Borzenkov arvidj...@gmail.com writes:
 
 
 [...]
 
  Now I have one issue, when the bare metal host on which the VM is
  running die, the VM is lost and can not be fenced.
  
  Is there a way to make pacemaker ACK the fencing of the VM running
 on a
  host when the host is fenced itself?
  
 
  Yes, you can define multiple stonith agents and priority between
 them.
 
  http://clusterlabs.org/wiki/Fencing_topology
 
 Hello,
 
 If I understand correctly, fencing topology is the way to have several
 fencing devices for a node and try them consecutively until one works.
 
 In my configuration, I group the VM stonith agents with the
 corresponding VM resource, to make them move together[1].
 
 Here is my use case:
 
 1. Resource ONE-Frontend-Group runs on nebula1
 2. nebula1 is fenced
 3. node one-fronted can not be fenced
 
 Is there a way to say that the life on node one-frontend is related to
 the state of resource ONE-Frontend?
 
 In which case when the node nebula1 is fenced, pacemaker should be
 aware that
 resource ONE-Frontend is not running any more, so node one-frontend is
 OFFLINE and not UNCLEAN.
 
 Regards.
 
 Footnotes: 
 [1] 
 http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html
 
 -- 
 Daniel Dehennin
 Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
 Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
 
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Tomasz Kontusz

Hanging corosync sounds like libqb problems: trusty comes with 0.16, which 
likes to hang from time to time. Try building libqb 0.17.

Daniel Dehennin daniel.dehen...@baby-gnu.org napisał:
Hello,

I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
blocked.

The “dlm_tool ls” command told me “wait ringid”.

The corosync-* commands hangs (like corosync-quorumtool).

The pacemaker “crm_mon” display nothing wrong.

I'm using Ubuntu Trusty Tahr:

- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.1

My cluster was manually rebooted.

Any idea how to debug such situation?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-- 
Wysłane za pomocą K-9 Mail.___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin

emmanuel segura emi2f...@gmail.com writes:

 I think, you don't have fencing configured in your cluster.

I have fencing configured and working, modulo fencing VMs on dead host[1].

Regards.

Footnotes: 
[1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin

Tomasz Kontusz tomasz.kont...@gmail.com writes:

 Hanging corosync sounds like libqb problems: trusty comes with 0.16,
 which likes to hang from time to time. Try building libqb 0.17.

Thanks, I'll look at this.

Is there a way to get back to normal state without rebooting all
machines and interrupting services?

I thought about a lightweight version of something like:

1. stop pacemaker on all nodes without doing anything with resources,
   they all continue to work
   
2. stop corosync on all nodes

3. start corosync on all nodes

4. start pacemaker on all nodes, as services are running nothing needs
   to be done

I looked in the documentation but fail to find some kind of cluster
management best practices.

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Andrew Beekhof


 On 11 Nov 2014, at 4:39 am, Daniel Dehennin daniel.dehen...@baby-gnu.org 
 wrote:
 
 emmanuel segura emi2f...@gmail.com writes:
 
 I think, you don't have fencing configured in your cluster.
 
 I have fencing configured and working, modulo fencing VMs on dead host[1].

Are you saying that the host and the VMs running inside it are both part of the 
same cluster?

 
 Regards.
 
 Footnotes: 
 [1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html
 
 -- 
 Daniel Dehennin
 Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
 Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-11-10 Thread Sihan Goi

Hi,

DocumentRoot is still set to /var/www/html
ls -al /var/www/html shows different things on the 2 nodes
node01:

total 28
drwxr-xr-x. 3 root root  4096 Nov 11 12:25 .
drwxr-xr-x. 6 root root  4096 Jul 23 22:18 ..
-rw-r--r--. 1 root root50 Oct 28 18:00 index.html
drwx--. 2 root root 16384 Oct 28 17:59 lost+found

node02 only has index.html, no lost+found, and it's a different version of
the file.

Status URL is enabled in both nodes.


On Oct 30, 2014 11:14 AM, Andrew Beekhof and...@beekhof.net wrote:


  On 29 Oct 2014, at 1:01 pm, Sihan Goi gois...@gmail.com wrote:
 
  Hi,
 
  I've never used crm_report before. I just read the man file and
 generated a tarball from 1-2 hours before I reconfigured all the DRBD
 related resources. I've put the tarball here -
 https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0
 
  Hope you can help figure out what I'm doing wrong. Thanks for the help!

 Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start for
 /dev/drbd/by-res/wwwdata on /var/www/html
 Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem with
 ordered data mode. Opts:
 Oct 28 18:13:39 node02 crmd[9870]:   notice: process_lrm_event: LRM
 operation WebFS_start_0 (call=164, rc=0, cib-update=298, confirmed=true) ok
 Oct 28 18:13:39 node02 crmd[9870]:   notice: te_rsc_command: Initiating
 action 7: start WebSite_start_0 on node02 (local)
 Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error on line
 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory

 Is DocumentRoot still set to /var/www/html?
 If so, what happens if you run 'ls -al /var/www/html' in a shell?

 Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running
 Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache
 /etc/httpd/conf/httpd.conf to come up

 Did you enable the status url?

 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

[Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] How to avoid CRM sending stop when ha.cf gets 2nd node configured

[Pacemaker] Intermittent Failovers: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

12 matches

Site Navigation

Mail list logo

Footer information