Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-11 Thread Andrew Beekhof

 On 11 Nov 2014, at 10:12 pm, Daniel Dehennin daniel.dehen...@baby-gnu.org 
 wrote:
 
 Andrew Beekhof and...@beekhof.net writes:
 
 
 [...]
 
 I have fencing configured and working, modulo fencing VMs on dead host[1].
 
 Are you saying that the host and the VMs running inside it are both part of 
 the same cluster?
 
 Yes, one of the VM needs to access the GFS2 filesystem like the nodes,
 the other VM is a quorum node (standby=on).

That sounds like a recipe for disaster to be honest.
If you want VM's to be part of a cluster, it would be advisable to have their 
host(s) be in a different one.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
Hello,

I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
blocked.

The “dlm_tool ls” command told me “wait ringid”.

The corosync-* commands hangs (like corosync-quorumtool).

The pacemaker “crm_mon” display nothing wrong.

I'm using Ubuntu Trusty Tahr:

- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.1

My cluster was manually rebooted.

Any idea how to debug such situation?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread emmanuel segura
I think, you don't have fencing configured in your cluster.

2014-11-10 17:02 GMT+01:00 Daniel Dehennin daniel.dehen...@baby-gnu.org:
 Daniel Dehennin daniel.dehen...@baby-gnu.org writes:

 Hello,

 Hello,

 I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
 blocked.

 The “dlm_tool ls” command told me “wait ringid”.

 It happened again:

 root@nebula2:~# dlm_tool ls
 dlm lockspaces
 name  datastores
 id0x1b61ba6a
 flags 0x0004 kern_stop
 changemember 4 joined 1 remove 0 failed 0 seq 3,3
 members   1084811078 1084811079 1084811080 108489
 new changemember 3 joined 0 remove 1 failed 1 seq 4,4
 new statuswait ringid
 new members   1084811078 1084811079 1084811080

 name  clvmd
 id0x4104eefa
 flags 0x0004 kern_stop
 changemember 4 joined 1 remove 0 failed 0 seq 3,3
 members   1084811078 1084811079 1084811080 108489
 new changemember 3 joined 0 remove 1 failed 1 seq 4,4
 new statuswait ringid
 new members   1084811078 1084811079 1084811080

 root@nebula2:~# dlm_tool status
 cluster nodeid 1084811079 quorate 1 ring seq 21372 21372
 daemon now 8351 fence_pid 0
 fence 108489 nodedown pid 0 actor 0 fail 1415634527 fence 0 now
 1415634734
 node 1084811078 M add 5089 rem 0 fail 0 fence 0 at 0 0
 node 1084811079 M add 5089 rem 0 fail 0 fence 0 at 0 0
 node 1084811080 M add 5089 rem 0 fail 0 fence 0 at 0 0
 node 108489 X add 5766 rem 8144 fail 8144 fence 0 at 0 0

 Any idea?
 --
 Daniel Dehennin
 Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
 Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Tomasz Kontusz
Hanging corosync sounds like libqb problems: trusty comes with 0.16, which 
likes to hang from time to time. Try building libqb 0.17.

Daniel Dehennin daniel.dehen...@baby-gnu.org napisał:
Hello,

I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
blocked.

The “dlm_tool ls” command told me “wait ringid”.

The corosync-* commands hangs (like corosync-quorumtool).

The pacemaker “crm_mon” display nothing wrong.

I'm using Ubuntu Trusty Tahr:

- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.1

My cluster was manually rebooted.

Any idea how to debug such situation?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-- 
Wysłane za pomocą K-9 Mail.___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
emmanuel segura emi2f...@gmail.com writes:

 I think, you don't have fencing configured in your cluster.

I have fencing configured and working, modulo fencing VMs on dead host[1].

Regards.

Footnotes: 
[1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
Tomasz Kontusz tomasz.kont...@gmail.com writes:

 Hanging corosync sounds like libqb problems: trusty comes with 0.16,
 which likes to hang from time to time. Try building libqb 0.17.

Thanks, I'll look at this.

Is there a way to get back to normal state without rebooting all
machines and interrupting services?

I thought about a lightweight version of something like:

1. stop pacemaker on all nodes without doing anything with resources,
   they all continue to work
   
2. stop corosync on all nodes

3. start corosync on all nodes

4. start pacemaker on all nodes, as services are running nothing needs
   to be done

I looked in the documentation but fail to find some kind of cluster
management best practices.

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Andrew Beekhof

 On 11 Nov 2014, at 4:39 am, Daniel Dehennin daniel.dehen...@baby-gnu.org 
 wrote:
 
 emmanuel segura emi2f...@gmail.com writes:
 
 I think, you don't have fencing configured in your cluster.
 
 I have fencing configured and working, modulo fencing VMs on dead host[1].

Are you saying that the host and the VMs running inside it are both part of the 
same cluster?

 
 Regards.
 
 Footnotes: 
 [1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html
 
 -- 
 Daniel Dehennin
 Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
 Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org