[Pacemaker] pacemaker-remote debian wheezy

2015-01-12 Thread Thomas Manninger
Hi,



what is the best way, to install in a debian wheezy vm the package pacemaker-remote? This package is in the debian repository not available.



Thanks!



Regards,

Thomas

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pacemaker-mgmt (pygui) on fedora 20

2015-01-12 Thread Dori Seliskar
On Thursday, January 08, 2015 02:25:50 pm Gao,Yan wrote:

 Actually, for corosync-2, the decent way would be an init script or a
 systemd service file for mgmtd to start it after pacemaker. Patches are
 welcome ;)
 

Your wish is my command ;)
You can find systemd pacemaker-mgmt.service at
http://ds.delo.si/pacemaker-mgmt/fedora20/

and also RPMS for Fedora 20, so install instructions are much simpler now. ;) 
I used/massaged some spec you provided some time ago on mailing list.

 Hmm, I guess sbd has failed to be built in that repo. Anyway, here is
 the sbd upstream:
 https://github.com/l-mb/sbd

In meantime sbd found its way into rawhide so I used it from there:
https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/source/SRPMS/s/sbd-1.2.1-1.src.rpm

Best regards,
ds

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Corosync fails to start when NIC is absent

2015-01-12 Thread Kostiantyn Ponomarenko
According to the https://access.redhat.com/solutions/638843 , the
interface, that is defined in the corosync.conf, must be present in the
system (see at the bottom of the article, section ROOT CAUSE).
To confirm that I made a couple of tests.

Here is a part of the corosync.conf file (in a free-write form) (also
attached the origin config file):
===
rrp_mode: passive
ring0_addr is defined in corosync.conf
ring1_addr is defined in corosync.conf
===

---

Two-node cluster

---

Test #1:
--
IP for ring0 is not defines in the system:
--
Start Corosync simultaneously on both nodes.
Corosync fails to start.
From the logs:
Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] parse error in
config: No interfaces defined
Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] Corosync Cluster
Engine exiting with status 8 at main.c:1343.
Result: Corosync and Pacemaker are not running.

Test #2:
--
IP for ring1 is not defines in the system:
--
Start Corosync simultaneously on both nodes.
Corosync starts.
Start Pacemaker simultaneously on both nodes.
Pacemaker fails to start.
From the logs, the last writes from the corosync:
Jan 8 16:31:29 daemon.err27 corosync[3728]: [TOTEM ] Marking ringid 0
interface 169.254.1.3 FAULTY
Jan 8 16:31:30 daemon.notice29 corosync[3728]: [TOTEM ] Automatically
recovered ring 0
Result: Corosync and Pacemaker are not running.


Test #3:

rrp_mode: active leads to the same result, except Corosync and Pacemaker
init scripts return status running.
But still vim /var/log/cluster/corosync.log shows a lot of errors like:
Jan 08 16:30:47 [4067] A6-402-1 cib: error: pcmk_cpg_dispatch: Connection
to the CPG API failed: Library error (2)

Result: Corosync and Pacemaker show their statuses as running, but
crm_mon cannot connect to the cluster database. And half of the
Pacemaker's services are not running (including Cluster Information Base
(CIB)).


---

For a single node mode

---

IP for ring0 is not defines in the system:

Corosync fails to start.

IP for ring1 is not defines in the system:

Corosync and Pacemaker are started.

It is possible that configuration will be applied successfully (50%),

and it is possible that the cluster is not running any resources,

and it is possible that the node cannot be put in a standby mode (shows:
communication error),

and it is possible that the cluster is running all resources, but applied
configuration is not guaranteed to be fully loaded (some rules can be
missed).


---

Conclusions:

---

It is possible that in some rare cases (see comments to the bug) the
cluster will work, but in that case its working state is unstable and the
cluster can stop working every moment.


So, is it correct? Does my assumptions make any sense? I didn't any other
explanation in the network ... .



Thank you,
Kostya

On Fri, Jan 9, 2015 at 11:10 AM, Kostiantyn Ponomarenko 
konstantin.ponomare...@gmail.com wrote:

 Hi guys,

 Corosync fails to start if there is no such network interface configured
 in the system.
 Even with rrp_mode: passive the problem is the same when at least one
 network interface is not configured in the system.

 Is this the expected behavior?
 I thought that when you use redundant rings, it is enough to have at least
 one NIC configured in the system. Am I wrong?

 Thank you,
 Kostya



corosync.conf
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Some questions on the currenct state

2015-01-12 Thread Andreas Mock
Hi all,

almost allways when I'm forced to do some major upgrades
to our core machines in terms of hardware and/or software (OS)
I'm forced to have a look at the current state of pacemaker 
based HA. Things are going on and things change. Projects
converge and diverge, tool(s)/chains come and go and
distributions marketing strategies change. Therefor I want
to ask the following question in the hope list members
deeply involved can answer easily.

1) Are there pacemaker packages für RHEL 6.6 and clones?
When yes where?

2) How can I create a pacemaker package 1.1.12 on my own from
the git sources?

3) How can I get the current versions of pcs and/or crmsh?
Is pcs competitive to crmsh meanwhile?

4) Is the pacemaker HA solution of RHEL 7.x still bound to use
of cman?

5) Where can I find a currenct workable version of the agents
for RHEL 6.6 (and clones) and RHEL 7.x?

It would be really nice if someone could give answers or
helpful pointers for answering the questions on my own.

Thank you all in advance.

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problems with SBD

2015-01-12 Thread Oriol Mula-Valls
Thanks a lot Lars. I took advantage of a crash last week to add the -P
parameter.

I'll try to read more carefully the man of sbd to increase the IO timeout.

Kind regards,
Oriol

On Wed, Jan 7, 2015 at 12:09 PM, Lars Marowsky-Bree l...@suse.com wrote:

 On 2015-01-04T19:49:58, Oriol Mula-Valls omv.li...@gmail.com wrote:

  I have a two node system with SLES 11 SP3 (pacemaker-1.1.9-0.19.102,
  corosync-1.4.5-0.18.15, sbd-1.1-0.13.153). Since desember we started to
  have several reboots of the system due to SBD; 22nd, 24th and 26th. Last
  reboot happened yesterday January 3rd. The message is the same all the
  times.
  /var/log/messages:Jan  3 11:55:08 kernighan sbd: [7879]: info: Cancelling
  IO request due to timeout (rw=0)
  /var/log/messages:Jan  3 11:55:08 kernighan sbd: [7879]: ERROR: mbox read
  failed in servant.
  /var/log/messages:Jan  3 11:55:08 kernighan sbd: [7878]: WARN: Servant
 for
  /dev/sdc1 (pid: 7879) has terminated
  /var/log/messages:Jan  3 11:55:08 kernighan sbd: [7878]: WARN: Servant
 for
  /dev/sdc1 outdated (age: 4)
  /var/log/messages:Jan  3 11:55:08 kernighan sbd: [8183]: info: Servant
  starting for device /dev/sdc1
  /var/log/messages:Jan  3 11:55:11 kernighan sbd: [8183]: info: Cancelling
  IO request due to timeout (rw=0)
  /var/log/messages:Jan  3 11:55:11 kernighan sbd: [8183]: ERROR: Unable to
  read header from device 5
  /var/log/messages:Jan  3 11:55:11 kernighan sbd: [8183]: ERROR: Not a
 valid
  header on /dev/sdc1
  /var/log/messages:Jan  3 11:55:11 kernighan sbd: [7878]: WARN: Servant
 for
  /dev/sdc1 (pid: 8183) has terminated
  /var/log/messages:Jan  3 11:55:11 kernighan sbd: [7878]: WARN: Latency:
 No
  liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
 
  The sbd is an iscsi drive shared by synology box.
 
  Could any one provide me some guidance on what's happenning please?

 Those are pretty clearly IO errors due to high latency. You may need to
 increase the IO timeout, and/or figure out why the IO to your Synology
 box sometimes stalls for multiple seconds. See the manpage for this; you
 can add the required flag to /etc/sysconfig/sbd - SBD_OPTS.

 You also should use a stable name (/dev/disk/by-id/...) rather than
 /dev/sdc1 - note that /dev/sdX may not be stable over reboots or iSCSI
 restarts.

 Further, you can avoid the reboots by enabling the pacemaker
 integration. See the manpage for details on what that flag does. (-P)
 That will be the default in later sbd versions for releases after SLE HA
 11.



 Regards,
 Lars

 --
 Architect Storage/HA
 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Jennifer Guild,
 Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Some questions on the currenct state

2015-01-12 Thread Trevor Hemsley
On 12/01/15 15:09, Andreas Mock wrote:
 Hi all,

 almost allways when I'm forced to do some major upgrades
 to our core machines in terms of hardware and/or software (OS)
 I'm forced to have a look at the current state of pacemaker 
 based HA. Things are going on and things change. Projects
 converge and diverge, tool(s)/chains come and go and
 distributions marketing strategies change. Therefor I want
 to ask the following question in the hope list members
 deeply involved can answer easily.

 1) Are there pacemaker packages für RHEL 6.6 and clones?
 When yes where?

In the CentOS (etc) base/updates repos. For RHEL they're in the HA channel.


 2) How can I create a pacemaker package 1.1.12 on my own from
 the git sources?
It's already in base/updates.


 3) How can I get the current versions of pcs and/or crmsh?
 Is pcs competitive to crmsh meanwhile?
pcs is in el6.6 and now includes pcsd. You can get crmsh from an
opensuse build repo for el6.

 4) Is the pacemaker HA solution of RHEL 7.x still bound to use
 of cman?
No

 5) Where can I find a currenct workable version of the agents
 for RHEL 6.6 (and clones) and RHEL 7.x?
Probably you want the resource-agents package.

T

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Some questions on the currenct state

2015-01-12 Thread Andreas Mock
Hi Trevor,

thank you for answering so fast.

2) Besides the fact that rpm packages are available do
you know how to make rpm packages from git repository?

4) Is RHEL 7.x using corosync 2.x and pacemaker plugin
for cluster membership?

Best regards
Andreas Mock


 -Ursprüngliche Nachricht-
 Von: Trevor Hemsley [mailto:thems...@voiceflex.com]
 Gesendet: Montag, 12. Januar 2015 16:42
 An: The Pacemaker cluster resource manager
 Betreff: Re: [Pacemaker] Some questions on the currenct state
 
 On 12/01/15 15:09, Andreas Mock wrote:
  Hi all,
 
  almost allways when I'm forced to do some major upgrades
  to our core machines in terms of hardware and/or software (OS)
  I'm forced to have a look at the current state of pacemaker
  based HA. Things are going on and things change. Projects
  converge and diverge, tool(s)/chains come and go and
  distributions marketing strategies change. Therefor I want
  to ask the following question in the hope list members
  deeply involved can answer easily.
 
  1) Are there pacemaker packages für RHEL 6.6 and clones?
  When yes where?
 
 In the CentOS (etc) base/updates repos. For RHEL they're in the HA
 channel.
 
 
  2) How can I create a pacemaker package 1.1.12 on my own from
  the git sources?
 It's already in base/updates.
 
 
  3) How can I get the current versions of pcs and/or crmsh?
  Is pcs competitive to crmsh meanwhile?
 pcs is in el6.6 and now includes pcsd. You can get crmsh from an
 opensuse build repo for el6.
 
  4) Is the pacemaker HA solution of RHEL 7.x still bound to use
  of cman?
 No
 
  5) Where can I find a currenct workable version of the agents
  for RHEL 6.6 (and clones) and RHEL 7.x?
 Probably you want the resource-agents package.
 
 T
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Some questions on the currenct state

2015-01-12 Thread Andreas Mock
Hi David,

thank you for your answers.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: David Vossel [mailto:dvos...@redhat.com] 
Gesendet: Montag, 12. Januar 2015 18:28
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Some questions on the currenct state



- Original Message -
 Hi Trevor,
 
 thank you for answering so fast.
 
 2) Besides the fact that rpm packages are available do you know how to 
 make rpm packages from git repository?

./autogen.sh  ./configure  make rpm

That will generate rpms from the source tree.

 4) Is RHEL 7.x using corosync 2.x and pacemaker plugin for cluster 
 membership?

no. RHEL 7.x uses corosync 2.x and the new corosync vote quorum api.
The plugins are a thing of the past for rhel7.

 Best regards
 Andreas Mock
 
 
  -Ursprüngliche Nachricht-
  Von: Trevor Hemsley [mailto:thems...@voiceflex.com]
  Gesendet: Montag, 12. Januar 2015 16:42
  An: The Pacemaker cluster resource manager
  Betreff: Re: [Pacemaker] Some questions on the currenct state
  
  On 12/01/15 15:09, Andreas Mock wrote:
   Hi all,
  
   almost allways when I'm forced to do some major upgrades to our 
   core machines in terms of hardware and/or software (OS) I'm forced 
   to have a look at the current state of pacemaker based HA. Things 
   are going on and things change. Projects converge and diverge, 
   tool(s)/chains come and go and distributions marketing strategies 
   change. Therefor I want to ask the following question in the hope 
   list members deeply involved can answer easily.
  
   1) Are there pacemaker packages für RHEL 6.6 and clones?
   When yes where?
  
  In the CentOS (etc) base/updates repos. For RHEL they're in the HA 
  channel.
  
  
   2) How can I create a pacemaker package 1.1.12 on my own from the 
   git sources?
  It's already in base/updates.
  
  
   3) How can I get the current versions of pcs and/or crmsh?
   Is pcs competitive to crmsh meanwhile?
  pcs is in el6.6 and now includes pcsd. You can get crmsh from an 
  opensuse build repo for el6.
  
   4) Is the pacemaker HA solution of RHEL 7.x still bound to use of 
   cman?
  No
  
   5) Where can I find a currenct workable version of the agents for 
   RHEL 6.6 (and clones) and RHEL 7.x?
  Probably you want the resource-agents package.
  
  T
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org Getting started: 
  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Help needed to configure MySQL Cluster using Pacemaker, Corosync, DRBD and PCS

2015-01-12 Thread Shameer Babu
Hi Andrew,

Could you be a bit more specific about didn't work ?

I have installed MariaDB and tried to add the resource mysql (simply
created resource) using the simple command pcs resource create MysQL
ocf:heartbeat:mysql . I know this is incomplete and removed those resource.

Please see my existing cluster status:
---
PCS Status:
---
Every 2.0s: pcs status   Mon Jan 12
14:56:38 2015

Cluster name: MyCluster
Last updated: Mon Jan 12 14:56:38 2015
Last change: Mon Jan 12 14:56:25 2015 via cibadmin on webserver-01
Stack: corosync
Current DC: webserver-02 (2) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
2 Resources configured


Online: [ webserver-01 webserver-02 ]

Full list of resources:

 ClusterIP  (ocf::heartbeat:IPaddr2):   Started webserver-01

PCSD Status:
  webserver-01: Online
  webserver-02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled

DRBD status in webserver-01: (This may be used to setup MySQL)
-
[root@webserver-01 ~]# cat /proc/drbd
version: 8.4.5 (api:1/proto:86-101)
GIT-hash: 1d360bde0e095d495786eaeb2a1ac76888e4db96 build by mockbuild@,
2014-08-17 22:54:26

 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
ns:0 nr:12288 dw:12288 dr:728 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
oos:0
--
Databases:
--
MariaDB is installed on both nodes

[root@webserver-01 ~]# mysql -V
mysql  Ver 15.1 Distrib 5.5.40-MariaDB, for Linux (x86_64) using readline
5.1


My aim is to create MySQL/MariaDB cluster using pacemaker DRBD on CentOS7.
Could you please guide me on this or provide any related articles ?

Regards,
Shameer


On Mon, Jan 12, 2015 at 6:20 AM, Andrew Beekhof and...@beekhof.net wrote:


  On 11 Jan 2015, at 5:39 pm, Shameer Babu shameerbab...@gmail.com
 wrote:
 
  Hi,
 
  I have configured Apache cluster by referring you document
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf and it was good
 and working. Now I would like to configure a simple MySQL cluster using
 pacemaker,corosync,DRBD and Pcs in CentOS 7. I was trying add the mysql
 resource but didn't work.

 Could you be a bit more specific about didn't work ?

  In CentOS 7 the database is MariaDB. I would like to configure the
 resource mysql/mariadb for two nodes. Can any body help me on this?
 
  I have configured the following till now:
 
  1. Installed pcs,mariadb,pacemaker,corosync and DRBD
  2. Configured the resource Virtual IP and it is working fine
  3. Configured DRBD to sync files in both nodes and /dev/drbd1 is ready
 to use
 
  Please help me to configure the resource maraidb/mysql for my cluster
 
  Regards,
  Shameer
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Avoid one node from being a target for resources migration

2015-01-12 Thread Dmitry Koterov

 1. install the resource related packages on node3 even though you never
 want
 them to run there. This will allow the resource-agents to verify the
 resource
 is in fact inactive.


Thanks, your advise helped: I installed all the services at node3 as well
(including DRBD, but without it configs) and stopped+disabled them. Then I
added the following line to my configuration:

location loc_drbd drbd rule -inf: #uname eq node3

So node3 is never a target for DRBD, and this helped: crm nodr standby
node1 doesn't tries to use node3 anymore.

But I have another (related) issue. If some node (e.g. node1) becomes
isolated from other 2 nodes, how to force it to shutdown its services? I
cannot use IPMB-based fencing/stonith, because there are no reliable
connections between nodes at all (the nodes are in geo-distributed
datacenters), and IPMI call to shutdown a node from another node is
impossible.

E.g. initially I have the following:

*# crm status*
Online: [ node1 node2 node3 ]
Master/Slave Set: ms_drbd [drbd]
 Masters: [ node2 ]
 Slaves: [ node1 ]
Resource Group: server
 fs (ocf::heartbeat:Filesystem):Started node2
 postgresql (lsb:postgresql):   Started node2
 bind9  (lsb:bind9):Started node2
 nginx  (lsb:nginx):Started node2

Then I turn on firewall on node2 to isolate it from the outside internet:

*root@node2:~# iptables -A INPUT -p tcp --dport 22 -j ACCEPT*
*root@node2:~# **iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT*
*root@node2:~# **iptables -A INPUT -i lo -j ACCEPT*
*root@node2:~# **iptables -A OUTPUT -o lo -j ACCEPT*
*root@node2:~# **iptables -P INPUT DROP; iptables -P OUTPUT DROP*

Then I see that, although node2 clearly knows it's isolated (it doesn't see
other 2 nodes and does not have quorum), it does not stop its services:

*root@node2:~# crm status*
Online: [ node2 ]
OFFLINE: [ node1 node3 ]
Master/Slave Set: ms_drbd [drbd]
 Masters: [ node2 ]
 Stopped: [ node1 node3 ]
Resource Group: server
 fs (ocf::heartbeat:Filesystem): Started node2
 postgresql (lsb:postgresql): Started node2
 bind9 (lsb:bind9): Started node2
 nginx (lsb:nginx): Started node2

So is there a way to say pacemaker to shutdown nodes' services when they
become isolated?



On Mon, Jan 12, 2015 at 8:25 PM, David Vossel dvos...@redhat.com wrote:



 - Original Message -
  Hello.
 
  I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2
 are
  DRBD master-slave, also they have a number of other services installed
  (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
  DRBD/postgresql/... are installed at it, only corosync+pacemaker.
 
  But when I add resources to the cluster, a part of them are somehow
 moved to
  node3 and since then fail. Note than I have a colocation directive to
  place these resources to the DRBD master only and location with -inf
 for
  node3, but this does not help - why? How to make pacemaker not run
 anything
  at node3?
 
  All the resources are added in a single transaction: cat config.txt |
 crm -w
  -f- configure where config.txt contains directives and commit
 statement
  at the end.
 
  Below are crm status (error messages) and crm configure show outputs.
 
 
  root@node3:~# crm status
  Current DC: node2 (1017525950) - partition with quorum
  3 Nodes configured
  6 Resources configured
  Online: [ node1 node2 node3 ]
  Master/Slave Set: ms_drbd [drbd]
  Masters: [ node1 ]
  Slaves: [ node2 ]
  Resource Group: server
  fs (ocf::heartbeat:Filesystem): Started node1
  postgresql (lsb:postgresql): Started node3 FAILED
  bind9 (lsb:bind9): Started node3 FAILED
  nginx (lsb:nginx): Started node3 (unmanaged) FAILED
  Failed actions:
  drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
  last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
  installed
  postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
  last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
  error
  bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
  last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
  error
  nginx_stop_0 (node=node3, call=767, rc=5, status=complete,
 last-rc-change=Mon
  Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed

 Here's what is going on. Even when you say never run this resource on
 node3
 pacemaker is going to probe for the resource regardless on node3 just to
 verify
 the resource isn't running.

 The failures you are seeing monitor_0 failed indicate that pacemaker
 failed
 to be able to verify resources are running on node3 because the related
 packages for the resources are not installed. Given pacemaker's default
 behavior I'd expect this.

 You have two options.

 1. install the resource related packages on node3 even though you never
 want
 them to run there. This will allow the resource-agents to verify the
 resource
 is in fact inactive.

 2. If you are using the current master 

Re: [Pacemaker] Avoid one node from being a target for resources migration

2015-01-12 Thread Andrew Beekhof

 On 13 Jan 2015, at 4:25 am, David Vossel dvos...@redhat.com wrote:
 
 
 
 - Original Message -
 Hello.
 
 I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are
 DRBD master-slave, also they have a number of other services installed
 (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
 DRBD/postgresql/... are installed at it, only corosync+pacemaker.
 
 But when I add resources to the cluster, a part of them are somehow moved to
 node3 and since then fail. Note than I have a colocation directive to
 place these resources to the DRBD master only and location with -inf for
 node3, but this does not help - why? How to make pacemaker not run anything
 at node3?
 
 All the resources are added in a single transaction: cat config.txt | crm -w
 -f- configure where config.txt contains directives and commit statement
 at the end.
 
 Below are crm status (error messages) and crm configure show outputs.
 
 
 root@node3:~# crm status
 Current DC: node2 (1017525950) - partition with quorum
 3 Nodes configured
 6 Resources configured
 Online: [ node1 node2 node3 ]
 Master/Slave Set: ms_drbd [drbd]
 Masters: [ node1 ]
 Slaves: [ node2 ]
 Resource Group: server
 fs (ocf::heartbeat:Filesystem): Started node1
 postgresql (lsb:postgresql): Started node3 FAILED
 bind9 (lsb:bind9): Started node3 FAILED
 nginx (lsb:nginx): Started node3 (unmanaged) FAILED
 Failed actions:
 drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
 last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
 installed
 postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
 last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
 error
 bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
 last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
 error
 nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon
 Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
 
 Here's what is going on. Even when you say never run this resource on node3
 pacemaker is going to probe for the resource regardless on node3 just to 
 verify
 the resource isn't running.
 
 The failures you are seeing monitor_0 failed indicate that pacemaker failed
 to be able to verify resources are running on node3 because the related 
 packages for the resources are not installed. Given pacemaker's default
 behavior I'd expect this.
 
 You have two options.
 
 1. install the resource related packages on node3 even though you never want
 them to run there. This will allow the resource-agents to verify the resource
 is in fact inactive.

or 1b. delete the agent too.  recent versions of pacemaker should handle this 
case correctly.

 
 2. If you are using the current master branch of pacemaker, there's a new
 location constraint option called 'resource-discovery=always|never|exclusive'.
 If you add the 'resource-discovery=never' option to your location constraint
 that attempts to keep resources from node3, you'll avoid having pacemaker
 perform the 'monitor_0' actions on node3 as well.
 
 -- Vossel
 
 
 root@node3:~# crm configure show | cat
 node $id=1017525950 node2
 node $id=13071578 node3
 node $id=1760315215 node1
 primitive drbd ocf:linbit:drbd \
 params drbd_resource=vlv \
 op start interval=0 timeout=240 \
 op stop interval=0 timeout=120
 primitive fs ocf:heartbeat:Filesystem \
 params device=/dev/drbd0 directory=/var/lib/vlv.drbd/root
 options=noatime,nodiratime fstype=xfs \
 op start interval=0 timeout=300 \
 op stop interval=0 timeout=300
 primitive postgresql lsb:postgresql \
 op monitor interval=10 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
 primitive bind9 lsb:bind9 \
 op monitor interval=10 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
 primitive nginx lsb:nginx \
 op monitor interval=10 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
 group server fs postgresql bind9 nginx
 ms ms_drbd drbd meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true
 location loc_server server rule $id=loc_server-rule -inf: #uname eq node3
 colocation col_server inf: server ms_drbd:Master
 order ord_server inf: ms_drbd:promote server:start
 property $id=cib-bootstrap-options \
 stonith-enabled=false \
 last-lrm-refresh=1421079189 \
 maintenance-mode=false
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: 

Re: [Pacemaker] Avoid one node from being a target for resources migration

2015-01-12 Thread Andrew Beekhof

 On 13 Jan 2015, at 7:56 am, Dmitry Koterov dmitry.kote...@gmail.com wrote:
 
 1. install the resource related packages on node3 even though you never want
 them to run there. This will allow the resource-agents to verify the resource
 is in fact inactive.
 
 Thanks, your advise helped: I installed all the services at node3 as well 
 (including DRBD, but without it configs) and stopped+disabled them. Then I 
 added the following line to my configuration:
 
 location loc_drbd drbd rule -inf: #uname eq node3
 
 So node3 is never a target for DRBD, and this helped: crm nodr standby 
 node1 doesn't tries to use node3 anymore.
 
 But I have another (related) issue. If some node (e.g. node1) becomes 
 isolated from other 2 nodes, how to force it to shutdown its services? I 
 cannot use IPMB-based fencing/stonith, because there are no reliable 
 connections between nodes at all (the nodes are in geo-distributed 
 datacenters), and IPMI call to shutdown a node from another node is 
 impossible.
 
 E.g. initially I have the following:
 
 # crm status
 Online: [ node1 node2 node3 ]
 Master/Slave Set: ms_drbd [drbd]
  Masters: [ node2 ]
  Slaves: [ node1 ]
 Resource Group: server
  fs (ocf::heartbeat:Filesystem):Started node2
  postgresql (lsb:postgresql):   Started node2
  bind9  (lsb:bind9):Started node2
  nginx  (lsb:nginx):Started node2
 
 Then I turn on firewall on node2 to isolate it from the outside internet:
 
 root@node2:~# iptables -A INPUT -p tcp --dport 22 -j ACCEPT
 root@node2:~# iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT
 root@node2:~# iptables -A INPUT -i lo -j ACCEPT
 root@node2:~# iptables -A OUTPUT -o lo -j ACCEPT
 root@node2:~# iptables -P INPUT DROP; iptables -P OUTPUT DROP
 
 Then I see that, although node2 clearly knows it's isolated (it doesn't see 
 other 2 nodes and does not have quorum)

we don't know that - there are several algorithms for calculating quorum and 
the information isn't included in your output.
are you using cman, or corosync underneath pacemaker? corosync version? 
pacemaker version? have you set no-quorum-policy?

 , it does not stop its services:
 
 root@node2:~# crm status
 Online: [ node2 ]
 OFFLINE: [ node1 node3 ]
 Master/Slave Set: ms_drbd [drbd]
  Masters: [ node2 ]
  Stopped: [ node1 node3 ]
 Resource Group: server
  fs   (ocf::heartbeat:Filesystem):Started node2
  postgresql   (lsb:postgresql):   Started node2
  bind9(lsb:bind9):Started node2
  nginx(lsb:nginx):Started node2
 
 So is there a way to say pacemaker to shutdown nodes' services when they 
 become isolated?
 
 
 
 On Mon, Jan 12, 2015 at 8:25 PM, David Vossel dvos...@redhat.com wrote:
 
 
 - Original Message -
  Hello.
 
  I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are
  DRBD master-slave, also they have a number of other services installed
  (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
  DRBD/postgresql/... are installed at it, only corosync+pacemaker.
 
  But when I add resources to the cluster, a part of them are somehow moved to
  node3 and since then fail. Note than I have a colocation directive to
  place these resources to the DRBD master only and location with -inf for
  node3, but this does not help - why? How to make pacemaker not run anything
  at node3?
 
  All the resources are added in a single transaction: cat config.txt | crm 
  -w
  -f- configure where config.txt contains directives and commit statement
  at the end.
 
  Below are crm status (error messages) and crm configure show outputs.
 
 
  root@node3:~# crm status
  Current DC: node2 (1017525950) - partition with quorum
  3 Nodes configured
  6 Resources configured
  Online: [ node1 node2 node3 ]
  Master/Slave Set: ms_drbd [drbd]
  Masters: [ node1 ]
  Slaves: [ node2 ]
  Resource Group: server
  fs (ocf::heartbeat:Filesystem): Started node1
  postgresql (lsb:postgresql): Started node3 FAILED
  bind9 (lsb:bind9): Started node3 FAILED
  nginx (lsb:nginx): Started node3 (unmanaged) FAILED
  Failed actions:
  drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
  last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
  installed
  postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
  last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
  error
  bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
  last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
  error
  nginx_stop_0 (node=node3, call=767, rc=5, status=complete, 
  last-rc-change=Mon
  Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
 
 Here's what is going on. Even when you say never run this resource on node3
 pacemaker is going to probe for the resource regardless on node3 just to 
 verify
 the resource isn't running.
 
 The failures you are seeing monitor_0 failed indicate that pacemaker failed
 to be able 

[Pacemaker] Unique clone instance is stopped too early on move

2015-01-12 Thread Vladislav Bogdanov
Hi Andrew, David, all.

I found a little bit strange operation ordering during transition execution.

Could you please look at the following partial configuration (crmsh syntax)?

===
...
clone cl-broker broker \
meta interleave=true target-role=Started
clone cl-broker-vips broker-vips \
meta clone-node-max=2 globally-unique=true interleave=true 
resource-stickiness=0 target-role=Started
clone cl-ctdb ctdb \
meta interleave=true target-role=Started
colocation broker-vips-with-broker inf: cl-broker-vips cl-broker
colocation broker-with-ctdb inf: cl-broker cl-ctdb
order broker-after-ctdb inf: cl-ctdb cl-broker
order broker-vips-after-broker 0: cl-broker cl-broker-vips
...
===

After I put one node to standby and then back to online, I see the following 
transition (relevant excerpt):

===
 * Pseudo action:   cl-broker-vips_stop_0
 * Resource action: broker-vips:1   stop on c-pa-0
 * Pseudo action:   cl-broker-vips_stopped_0
 * Pseudo action:   cl-ctdb_start_0
 * Resource action: ctdbstart on c-pa-1
 * Pseudo action:   cl-ctdb_running_0
 * Pseudo action:   cl-broker_start_0
 * Resource action: ctdbmonitor=1 on c-pa-1
 * Resource action: broker  start on c-pa-1
 * Pseudo action:   cl-broker_running_0
 * Pseudo action:   cl-broker-vips_start_0
 * Resource action: broker  monitor=1 on c-pa-1
 * Resource action: broker-vips:1   start on c-pa-1
 * Pseudo action:   cl-broker-vips_running_0
 * Resource action: broker-vips:1   monitor=3 on c-pa-1
===

What could be a reason to stop unique clone instance so early for move?

I tried different clone/order configurations, including 
cl-broker-vips:interleave=false, broker-vips-after-broker:score=inf and
broker-vips-after-broker:symmetrical=false, but picture is always the same: 
broker-vips:1 is
stopped first of all.

Complete crm_report is available if needed.


Best,
Vladislav

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Planning] Organizing HA Summit 2015

2015-01-12 Thread Digimer

Hi all,

  With Fabio away for now, I (and others) are working on the final 
preparations for the summit. This is your chance to speak up and 
influence the planning! Objections/suggestions? Speak now please. :)


  In particular, please raise topics you want to discuss. Either add 
them to the wiki directly or email me and I will update the wiki for 
you. (Note that registration is closed because of spammers, if you want 
an account just let me know and I'll open it back up).


The plan is;

* Informal atmosphere with limited structure to make sure key topics are 
addressed.


Two ways topics will be discussed;

** Someone will guide a given topic they want to raise for ~45 minutes, 
15 minutes for QA


** Round-table style discussion with no one person leading (though it 
would be nice to have someone taking notes).


People presenting are asked not to use slides. Hand-outs are fine and 
either a white-board or paper flip-board will be available for 
illustrating ideas and flushing out concepts.


The summit will start at 9:00 and go until 17:00. We'll go for a 
semi-official summit dinner and drinks around 6pm on the 4th (location 
to be determined). Those staying in Brno are more than welcome to join 
an informal dinner and drinks (and possibly some sight-seeing, etc) the 
evening of the 5th.


Any concerns/comments/suggestions, please speak up ASAP!

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Avoid one node from being a target for resources migration

2015-01-12 Thread David Vossel


- Original Message -
 Hello.
 
 I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are
 DRBD master-slave, also they have a number of other services installed
 (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
 DRBD/postgresql/... are installed at it, only corosync+pacemaker.
 
 But when I add resources to the cluster, a part of them are somehow moved to
 node3 and since then fail. Note than I have a colocation directive to
 place these resources to the DRBD master only and location with -inf for
 node3, but this does not help - why? How to make pacemaker not run anything
 at node3?
 
 All the resources are added in a single transaction: cat config.txt | crm -w
 -f- configure where config.txt contains directives and commit statement
 at the end.
 
 Below are crm status (error messages) and crm configure show outputs.
 
 
 root@node3:~# crm status
 Current DC: node2 (1017525950) - partition with quorum
 3 Nodes configured
 6 Resources configured
 Online: [ node1 node2 node3 ]
 Master/Slave Set: ms_drbd [drbd]
 Masters: [ node1 ]
 Slaves: [ node2 ]
 Resource Group: server
 fs (ocf::heartbeat:Filesystem): Started node1
 postgresql (lsb:postgresql): Started node3 FAILED
 bind9 (lsb:bind9): Started node3 FAILED
 nginx (lsb:nginx): Started node3 (unmanaged) FAILED
 Failed actions:
 drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
 last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
 installed
 postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
 last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
 error
 bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
 last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
 error
 nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon
 Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed

Here's what is going on. Even when you say never run this resource on node3
pacemaker is going to probe for the resource regardless on node3 just to verify
the resource isn't running.

The failures you are seeing monitor_0 failed indicate that pacemaker failed
to be able to verify resources are running on node3 because the related 
packages for the resources are not installed. Given pacemaker's default
behavior I'd expect this.

You have two options.

1. install the resource related packages on node3 even though you never want
them to run there. This will allow the resource-agents to verify the resource
is in fact inactive.

2. If you are using the current master branch of pacemaker, there's a new
location constraint option called 'resource-discovery=always|never|exclusive'.
If you add the 'resource-discovery=never' option to your location constraint
that attempts to keep resources from node3, you'll avoid having pacemaker
perform the 'monitor_0' actions on node3 as well.

-- Vossel

 
 root@node3:~# crm configure show | cat
 node $id=1017525950 node2
 node $id=13071578 node3
 node $id=1760315215 node1
 primitive drbd ocf:linbit:drbd \
 params drbd_resource=vlv \
 op start interval=0 timeout=240 \
 op stop interval=0 timeout=120
 primitive fs ocf:heartbeat:Filesystem \
 params device=/dev/drbd0 directory=/var/lib/vlv.drbd/root
 options=noatime,nodiratime fstype=xfs \
 op start interval=0 timeout=300 \
 op stop interval=0 timeout=300
 primitive postgresql lsb:postgresql \
 op monitor interval=10 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
 primitive bind9 lsb:bind9 \
 op monitor interval=10 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
 primitive nginx lsb:nginx \
 op monitor interval=10 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
 group server fs postgresql bind9 nginx
 ms ms_drbd drbd meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true
 location loc_server server rule $id=loc_server-rule -inf: #uname eq node3
 colocation col_server inf: server ms_drbd:Master
 order ord_server inf: ms_drbd:promote server:start
 property $id=cib-bootstrap-options \
 stonith-enabled=false \
 last-lrm-refresh=1421079189 \
 maintenance-mode=false
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Some questions on the currenct state

2015-01-12 Thread David Vossel


- Original Message -
 Hi Trevor,
 
 thank you for answering so fast.
 
 2) Besides the fact that rpm packages are available do
 you know how to make rpm packages from git repository?

./autogen.sh  ./configure  make rpm

That will generate rpms from the source tree.

 4) Is RHEL 7.x using corosync 2.x and pacemaker plugin
 for cluster membership?

no. RHEL 7.x uses corosync 2.x and the new corosync vote quorum api.
The plugins are a thing of the past for rhel7.

 Best regards
 Andreas Mock
 
 
  -Ursprüngliche Nachricht-
  Von: Trevor Hemsley [mailto:thems...@voiceflex.com]
  Gesendet: Montag, 12. Januar 2015 16:42
  An: The Pacemaker cluster resource manager
  Betreff: Re: [Pacemaker] Some questions on the currenct state
  
  On 12/01/15 15:09, Andreas Mock wrote:
   Hi all,
  
   almost allways when I'm forced to do some major upgrades
   to our core machines in terms of hardware and/or software (OS)
   I'm forced to have a look at the current state of pacemaker
   based HA. Things are going on and things change. Projects
   converge and diverge, tool(s)/chains come and go and
   distributions marketing strategies change. Therefor I want
   to ask the following question in the hope list members
   deeply involved can answer easily.
  
   1) Are there pacemaker packages für RHEL 6.6 and clones?
   When yes where?
  
  In the CentOS (etc) base/updates repos. For RHEL they're in the HA
  channel.
  
  
   2) How can I create a pacemaker package 1.1.12 on my own from
   the git sources?
  It's already in base/updates.
  
  
   3) How can I get the current versions of pcs and/or crmsh?
   Is pcs competitive to crmsh meanwhile?
  pcs is in el6.6 and now includes pcsd. You can get crmsh from an
  opensuse build repo for el6.
  
   4) Is the pacemaker HA solution of RHEL 7.x still bound to use
   of cman?
  No
  
   5) Where can I find a currenct workable version of the agents
   for RHEL 6.6 (and clones) and RHEL 7.x?
  Probably you want the resource-agents package.
  
  T
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pacemaker-remote debian wheezy

2015-01-12 Thread David Vossel


- Original Message -
 Hi,
 what is the best way, to install in a debian wheezy vm the package
 pacemaker-remote? This package is in the debian repository not available.

I have no clue.

I just want to point out, if your host OS is debian wheezy and the 
pacemaker-remote
package is in fact unavailable, it is possible the version of pacemaker shipped
with wheezy doesn't even have the capability of managing pacemaker_remote nodes.

-- Vossel 

 Thanks!
 Regards,
 Thomas
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Avoid one node from being a target for resources migration

2015-01-12 Thread Dmitry Koterov
Hello.

I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2
are DRBD master-slave, also they have a number of other services installed
(postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
DRBD/postgresql/... are installed at it, only corosync+pacemaker.

But when I add resources to the cluster, a part of them are somehow moved
to node3 and since then fail. Note than I have a colocation directive to
place these resources to the DRBD master only and location with -inf for
node3, but this does not help - why? How to make pacemaker not run anything
at node3?

All the resources are added in a single transaction: cat config.txt | crm
-w -f- configure where config.txt contains directives and commit
statement at the end.

Below are crm status (error messages) and crm configure show outputs.


*root@node3:~# crm status*
Current DC: node2 (1017525950) - partition with quorum
3 Nodes configured
6 Resources configured
Online: [ node1 node2 node3 ]
Master/Slave Set: ms_drbd [drbd]
 Masters: [ node1 ]
 Slaves: [ node2 ]
Resource Group: server
 fs (ocf::heartbeat:Filesystem): Started node1
 postgresql (lsb:postgresql): Started node3 FAILED
 bind9 (lsb:bind9): Started node3 FAILED
 nginx (lsb:nginx): Started node3 (unmanaged) FAILED
Failed actions:
drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
installed
postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
error
bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
error
nginx_stop_0 (node=node3, call=767, rc=5, status=complete,
last-rc-change=Mon Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not
installed


*root@node3:~# crm configure show | cat*
node $id=1017525950 node2
node $id=13071578 node3
node $id=1760315215 node1
primitive drbd ocf:linbit:drbd \
params drbd_resource=vlv \
op start interval=0 timeout=240 \
op stop interval=0 timeout=120
primitive fs ocf:heartbeat:Filesystem \
params device=/dev/drbd0 directory=/var/lib/vlv.drbd/root
options=noatime,nodiratime fstype=xfs \
op start interval=0 timeout=300 \
op stop interval=0 timeout=300
primitive postgresql lsb:postgresql \
op monitor interval=10 timeout=60 \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60
primitive bind9 lsb:bind9 \
op monitor interval=10 timeout=60 \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60
primitive nginx lsb:nginx \
op monitor interval=10 timeout=60 \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60
group server fs postgresql bind9 nginx
ms ms_drbd drbd meta master-max=1 master-node-max=1 clone-max=2
clone-node-max=1 notify=true
location loc_server server rule $id=loc_server-rule -inf: #uname eq node3
colocation col_server inf: server ms_drbd:Master
order ord_server inf: ms_drbd:promote server:start
property $id=cib-bootstrap-options \
stonith-enabled=false \
last-lrm-refresh=1421079189 \
maintenance-mode=false
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pacemaker-remote debian wheezy

2015-01-12 Thread Ken Gaillot

On 01/12/2015 12:34 PM, David Vossel wrote:

- Original Message -

what is the best way, to install in a debian wheezy vm the package
pacemaker-remote? This package is in the debian repository not available.


I have no clue.

I just want to point out, if your host OS is debian wheezy and the 
pacemaker-remote
package is in fact unavailable, it is possible the version of pacemaker shipped
with wheezy doesn't even have the capability of managing pacemaker_remote nodes.

-- Vossel


Wheezy's pacemaker 1.1.7 does not support pacemaker-remote; jessie's 
1.1.10 should work in a jessie VM, but be aware pacemaker-remote 
received improvements and bugfixes since then.


Of course you can compile 1.1.12 yourself (and optionally use 
checkinstall to make .deb's, see https://wiki.debian.org/CheckInstall). 
Unfortunately you can't backport the 1.1.10 jessie packages (which 
normally would be pretty easy) because the dependencies get too hairy 
(in particular you wind up needing a newer version of gcc than is in 
wheezy).


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org