from:"Serge Dubrouski"

Re: [Pacemaker] Fw: new important message

2016-02-18 Thread Serge Dubrouski

Got hacked?

On Thu, Feb 18, 2016, 7:53 PM Dejan Muhamedagic 
wrote:

> Hello!
>
>
>
> *New message, please read* http://estoncamlievler76.com/leaving.php
> 
>
>
>
> Dejan Muhamedagic
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Suggestions for managing HA of containers from within a Pacemaker container?

2015-02-24 Thread Serge Dubrouski

Pacemaker as a scheduler in Mesos or Kubernates does sound like a very
interesting idea. Packaging corosync into super privileged containers still
doesn't make too much sense to me. What's the reason in isolating something
and then giving it all permissions on a host machine?

On Mon, Feb 23, 2015 at 5:20 PM, Andrew Beekhof and...@beekhof.net wrote:


  On 10 Feb 2015, at 1:45 pm, Serge Dubrouski serge...@gmail.com wrote:
 
  Hello Steve,
 
  Are you sure that Pacemaker is the right product for your project? Have
 you checked Mesos/Marathon or Kubernates? Those are frameworks being
 developed for managing containers.

 And in a few years they'll work out that they need some HA features and
 try to retrofit them :-)
 In the meantime pacemaker is actually rather good at managing containers
 already and knows a thing or two about HA and how to bring up a complex
 stack of services.

 The one thing that would be really interesting in this area is using the
 our policy engine as the kubernates scheduler.
 So many ideas and so little time.

 
  On Sat Feb 07 2015 at 1:19:15 PM Steven Dake (stdake) std...@cisco.com
 wrote:
  Hi,
 
  I am working on Containerizing OpenStack in the Kolla project (
 http://launchpad.net/kolla).  One of the key things we want to do over
 the next few months is add H/A support to our container tech.  David Vossel
 had suggested using systemctl to monitor the containers themselves by
 running healthchecking scripts within the containers.  That idea is sound.
 
  There is another technology called “super-privileged containers”.
 Essentially it allows more host access for the container, allowing the
 treatment of Pacemaker as a container rather than a RPM or DEB file.  I’d
 like corosync to run in a separate container.  These containers will
 communicate using their normal mechanisms in a super-privileged mode.  We
 will implement this in Kolla.
 
  Where I am stuck is how does Pacemaker within a container control other
 containers  in the host os.  One way I have considered is using the docker
 —pid=host flag, allowing pacemaker to communicate directly with the host
 systemctl process.  Where I am stuck is our containers don’t run via
 systemctl, but instead via shell scripts that are executed by third party
 deployment software.
 
  An example:
  Lets say a rabbitmq container wants to run:
 
  The user would run
  kolla-mgr deploy messaging
 
  This would run a small bit of code to launch the docker container set
 for messaging.
 
  Could pacemaker run something like
 
  Kolla-mgr status messaging
 
  To control the lifecycle of the processes?
 
  Or would we be better off with some systemd integration with kolla-mgr?
 
  Thoughts welcome
 
  Regards,
  -steve
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Suggestions for managing HA of containers from within a Pacemaker container?

2015-02-09 Thread Serge Dubrouski

Hello Steve,

Are you sure that Pacemaker is the right product for your project? Have you
checked Mesos/Marathon or Kubernates? Those are frameworks being developed
for managing containers.

On Sat Feb 07 2015 at 1:19:15 PM Steven Dake (stdake) std...@cisco.com
wrote:

  Hi,

  I am working on Containerizing OpenStack in the Kolla project (
 http://launchpad.net/kolla).  One of the key things we want to do over
 the next few months is add H/A support to our container tech.  David Vossel
 had suggested using systemctl to monitor the containers themselves by
 running healthchecking scripts within the containers.  That idea is sound.

  There is another technology called “super-privileged containers”.
 Essentially it allows more host access for the container, allowing the
 treatment of Pacemaker as a container rather than a RPM or DEB file.  I’d
 like corosync to run in a separate container.  These containers will
 communicate using their normal mechanisms in a super-privileged mode.  We
 will implement this in Kolla.

  Where I am stuck is how does Pacemaker within a container control other
 containers  in the host os.  One way I have considered is using the docker
 —pid=host flag, allowing pacemaker to communicate directly with the host
 systemctl process.  Where I am stuck is our containers don’t run via
 systemctl, but instead via shell scripts that are executed by third party
 deployment software.

  An example:
 Lets say a rabbitmq container wants to run:

  The user would run
 kolla-mgr deploy messaging

  This would run a small bit of code to launch the docker container set
 for messaging.

  Could pacemaker run something like

  Kolla-mgr status messaging

  To control the lifecycle of the processes?

  Or would we be better off with some systemd integration with kolla-mgr?

  Thoughts welcome

  Regards,
 -steve
  ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Where the heck is Beekhof?

2013-12-01 Thread Serge Dubrouski

Nope, you need three to always have a quorum.
On Dec 1, 2013 9:43 AM, Arnold Krille arn...@arnoldarts.de wrote:

 On Thu, 28 Nov 2013 12:04:01 +1100 Andrew Beekhof and...@beekhof.net
 wrote:
  If you find yourself asking $subject at some point in the next couple
  of months, the answer is that I'm taking leave to look after our new
  son (Lawson Tiberius Beekhof) who was born on Tuesday.

 Concrats!

 And remember: If you want HA, you gotta have two :-P

 - Arnold

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] named

2013-06-05 Thread Serge Dubrouski

There should be an OCF version of named RA available in resource agents
package.
On Jun 5, 2013 5:45 AM, David Coulson da...@davidcoulson.net wrote:

On 6/5/13 2:30 PM, paul wrote:

Hi. I have followed the Clusters from scratch PDF and have a working two
node active passive cluster with ClusterIP, WebDataClone,WebFS and
WebSite working. I am using BIND DNS to direct my websites to the
cluster address. When I perform a failover which works ok I have to
restart BIND on the now active node since the ClusterIP wasn't available
on this node when BIND originally started even though I have listen-to
the ClusterIP listed in my named.conf. I need DNS to also be part of the
cluster! There is ocf:redhat:named.sh listed but I have no idea how to
incorporate this into my cluster. I have tried pcs resource create Named
ocf:redhat:named.sh configfile=/var/named/chroot/**etc/named.conf op
monitor interval=30s but I seem to have a basename:missing operand and
an invalid Name Of Service. Could someone tell me what the create
command should be? Thanks. Paul

I'm running RHEL6 and just use the named lsb which ships with it. Didn't
make any modifications.

I have also created a resource which does a 'rndc reload', which is
sufficient to cause named to bind to the new address.

primitive re-named-reload ocf:heartbeat:anything \
params binfile=/usr/sbin/rndc cmdline_options=reload

So just have the reload run after your ip resource starts (created a
group or a collocation/order constraint).

__**_
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/**mailman/listinfo/pacemakerhttp://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/**doc/Cluster_from_Scratch.pdfhttp://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] failover after killing a process monitored by a resource ?

2012-02-18 Thread Serge Dubrouski

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html


On Sat, Feb 18, 2012 at 3:21 PM, Sébastien ROHAUT
sebastien.roh...@free.frwrote:

 Hi,

 I have a very basic pacemaker conf, with two resources (ipaddr2 and
 apache) in the same group.

 If I kill my httpd process, the cluster respawn it on the same node :

 Feb 18 23:06:15 node1 pengine: [1173]: notice: common_apply_stickiness:
 apacheServer can fail 99 more times on node2 before being forced off
 ...
 Feb 18 23:06:16 node1 crmd: [1174]: info: match_graph_event: Action
 apacheServer_start_0 (10) confirmed on node2 (rc=0)

 Is it possible to make the resource automatically move to the other node
 when process is killed ? How ?

 My conf :
 node node1
 node node2
 primitive apacheIP ocf:heartbeat:IPaddr \
params ip=192.168.1.80 cidr_netmask=32 nic=eth0 \
op monitor interval=30s \
meta target-role=Started
 primitive apacheServer ocf:heartbeat:apache \
params httpd=/usr/sbin/httpd 
 configfile=/etc/httpd/conf/**httpd.conf
 \
op monitor interval=60s start=40s stop=60s \
meta target-role=Started
 group webserver apacheIP apacheServer
 property $id=cib-bootstrap-options \
dc-version=1.1.6-4.fc16-**89678d4947c5bd466e2f31acd58ea4**e1edb854d5
 \
cluster-infrastructure=**openais \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore \
last-lrm-refresh=1329602712
 rsc_defaults $id=rsc-options \
resource-stickiness=100

 Thanks for your help.

 __**_
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/**mailman/listinfo/pacemakerhttp://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: 
 http://www.clusterlabs.org/**doc/Cluster_from_Scratch.pdfhttp://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pgsql agent won't start postgresql

2012-01-24 Thread Serge Dubrouski

Rex -

To get quicker answer you should use this list.

Anyway it's hard to tell what is the problem without log files and crm_mon
output.

Does you drbd works all right? Are you able to start postgres outside of
Pacemaker?

On Tue, Jan 24, 2012 at 1:50 PM, Rex Zhen rex.z...@nextissuemedia.comwrote:

 Hi Sir,

 The postgreSQL won't start through the resource agent.

 Here is my configure file

 node db-3
 node db-4
 primitive db_drbd ocf:linbit:drbd \
 params drbd_resource=drbd_pgsql \
 op monitor interval=60
 primitive db_fs ocf:heartbeat:Filesystem \
 params device=/dev/drbd0 directory=/var/lib/postgresql fstype=ext4 \
 meta target-role=Started
 primitive db_ip ocf:heartbeat:IPaddr2 \
 params ip=192.168.100.63 \
 op monitor interval=10s \
 meta target-role=Started
 primitive db_stonith stonith:external/ssh \
 params hostlist=db-3 db-4
 primitive pgsql ocf:heartbeat:pgsql \
 params pgctl=/usr/lib/postgresql/9.1/bin/pg_ctl
 psql=/usr/lib/postgresql/9.1/bin/psql
 pgdata=/var/lib/postgresql/9.1/main pgdba=postgres start_opt= \
 op monitor interval=10s timeout=20s depth=0 \
 meta target-role=Started
 group postgres db_fs db_ip
 ms db_ms db_drbd \
 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
 notify=true
 clone fencing db_stonith \
 meta target-role=Started
 colocation fs-with-maskter inf: db_fs db_ms:Master
 colocation ip-with-maskter inf: db_ip db_ms:Master
 colocation pg-with-fs inf: pgsql db_fs
 order drbd-after-fs inf: db_ms:promote postgres:start
 property $id=cib-bootstrap-options \
 dc-version=1.0.9-unknown \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 default-resource-stickiness=100 \
 no-quorum-policy=ignore


 Rex Zhen




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Postgresql streaming replication failover - RAneeded

2012-01-17 Thread Serge Dubrouski

Alex -

Have you tried this RA: https://github.com/t-matsuo/resource-agents/ ?

So far it's the best candidate for an official PostgreSQL RA with streaming
replication support. However be advised that it's still under development.

It'll be very interesting to hear you opinion on it.

On Tue, Jan 17, 2012 at 6:49 AM, Alexander Tkachev a.tkac...@cti.ru wrote:

 Hello Brett,

 Is it possible to get your solution now?

 --
 With best regards,
 Alex Tkachev,
 IPTV Engineer,
 CTI, Moscow, Russia


 We have created just such a multi-state RA, which incorporates a design to


 manage failover, failback, and fallback (regular backups).  Please give
 us a
 few days - a member of my team is removing any product-specifics from it,
 and
 we'll post it shortly.

 Brett

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Postgresql streaming replication failover - RA needed

2011-12-12 Thread Serge Dubrouski

On Mon, Dec 12, 2011 at 5:32 AM, Takatoshi MATSUO matsuo@gmail.comwrote:

 Hello

 2011/12/12 Serge Dubrouski serge...@gmail.com:
 
 
  On Thu, Dec 8, 2011 at 10:34 PM, Takatoshi MATSUO matsuo@gmail.com
  wrote:
 
  Hi Attila
 
  2011/12/8 Attila Megyeri amegy...@minerva-soft.com:
   Hi Takatoshi,
  
   One strange thing I noticed and could probably be improved.
   When there is data inconsistency, I have the following node
 properties:
  
   * Node psql2:
  + default_ping_set  : 100
  + master-postgresql:1   : -INFINITY
  + pgsql-data-status : DISCONNECT
  + pgsql-status  : HS:alone
   * Node psql1:
  + default_ping_set  : 100
  + master-postgresql:0   : 1000
  + master-postgresql:1   : -INFINITY
  + pgsql-data-status : LATEST
  + pgsql-master-baseline : 58:4B20
  + pgsql-status  : PRI
  
   This is fine, and understandable - but I can see this only if I do a
   crm_mon -A.
  
   My problem is, that CRM shows the following:
  
   Master/Slave Set: db-ms-psql [postgresql]
   Masters: [ psql1 ]
   Slaves: [ psql2 ]
  
   So if I monitor the system from crm_mon, HAWK or ther tools - I have
 no
   indication at all that the slave is running in an inconsistent mode.
  
   I would expect the RA to stop the psql2 node in such cases, because:
   - It is running, but has non-up-to-date data, therefore noone will use
   it (the slave IP points to the master as well, which is good)
   - In CRM status eveything looks perfect, even though it is NOT perfect
   and admin intervention is required.
  
  
   Shouldn't the disconnected PSQL server be stopped instead?
 
  hmm..
  It's not better to stop PGSQL server.
  RA cannot know whether PGSQL is disconnected because of
  data-inconsistent or network-down or
  starting-up and so on.
 
 
  Why does it matter? If the state is degraded and inconsistent and there
 is
  no way to fix it from inside of the RA, RA should probably stop it.

 In this case, HS's data may be cosistent but Primary dosen't have enough
 wals or
 HS dosen't have enough wal-archives to be replication-mode.
 Unfortunately this RA dosen't calculate the number of wals.


Honestly I don't know how to better handle this. Pacemaker doesn't have a
concept of degraded node state.



  Let's say that there is pgpool running in front of the cluster, keeping
 an
  inconsistent node up would lead to the routing SQL queries to it and
  possibly getting wrong results.
 

 It dosen't happen in my sample configuration.
 vip-slave is up at master when slave is not HS:sync.


So you have a VIP for each slave node?



 
 
 
  How about using dummy RA such as vip-slave?
  ---
  primitive runningSlaveOK ocf:heartbeat:Dummy
  .(snip)
 
  location rsc_location-dummy runningSlaveOK \
  rule  200: pgsql-status eq HS:sync
  ---

 
  That probably fixes visibility issue. What about notifications on
 DISCONNECT
  state? How administrator would know that cluster is inconsistent? May be
 the
  better option in this case would be collocating MailTo resource with
  HS:alone?

 Yes, it's good idea if you want to receive notifications.


 Regards,
 Takatoshi MATSUO

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Postgresql streaming replication failover - RA needed

2011-12-11 Thread Serge Dubrouski

On Thu, Dec 8, 2011 at 10:34 PM, Takatoshi MATSUO matsuo@gmail.comwrote:

 Hi Attila

 2011/12/8 Attila Megyeri amegy...@minerva-soft.com:
  Hi Takatoshi,
 
  One strange thing I noticed and could probably be improved.
  When there is data inconsistency, I have the following node properties:
 
  * Node psql2:
 + default_ping_set  : 100
 + master-postgresql:1   : -INFINITY
 + pgsql-data-status : DISCONNECT
 + pgsql-status  : HS:alone
  * Node psql1:
 + default_ping_set  : 100
 + master-postgresql:0   : 1000
 + master-postgresql:1   : -INFINITY
 + pgsql-data-status : LATEST
 + pgsql-master-baseline : 58:4B20
 + pgsql-status  : PRI
 
  This is fine, and understandable - but I can see this only if I do a
 crm_mon -A.
 
  My problem is, that CRM shows the following:
 
  Master/Slave Set: db-ms-psql [postgresql]
  Masters: [ psql1 ]
  Slaves: [ psql2 ]
 
  So if I monitor the system from crm_mon, HAWK or ther tools - I have no
 indication at all that the slave is running in an inconsistent mode.
 
  I would expect the RA to stop the psql2 node in such cases, because:
  - It is running, but has non-up-to-date data, therefore noone will use
 it (the slave IP points to the master as well, which is good)
  - In CRM status eveything looks perfect, even though it is NOT perfect
 and admin intervention is required.
 
 
  Shouldn't the disconnected PSQL server be stopped instead?

 hmm..
 It's not better to stop PGSQL server.
 RA cannot know whether PGSQL is disconnected because of
 data-inconsistent or network-down or
 starting-up and so on.


Why does it matter? If the state is degraded and inconsistent and there is
no way to fix it from inside of the RA, RA should probably stop it. Let's
say that there is pgpool running in front of the cluster, keeping an
inconsistent node up would lead to the routing SQL queries to it and
possibly getting wrong results.




 How about using dummy RA such as vip-slave?
 ---
 primitive runningSlaveOK ocf:heartbeat:Dummy
 .(snip)

 location rsc_location-dummy runningSlaveOK \
 rule  200: pgsql-status eq HS:sync
 ---


That probably fixes visibility issue. What about notifications on
DISCONNECT state? How administrator would know that cluster is
inconsistent? May be the better option in this case would be collocating
MailTo resource with HS:alone?




 Regards,
 Takatoshi MATSUO

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Postgresql streaming replication failover - RA needed

2011-11-16 Thread Serge Dubrouski

On Wed, Nov 16, 2011 at 12:55 PM, Attila Megyeri
amegy...@minerva-soft.comwrote:

 Hi Florian,

 -Original Message-
 From: Florian Haas [mailto:flor...@hastexo.com]
 Sent: 2011. november 16. 11:49
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA
 needed

 Hi Attila,

 On 2011-11-16 10:27, Attila Megyeri wrote:
  Hi All,

  We have a two-node postgresql 9.1 system configured using streaming
  replicaiton(active/active with a read-only slave).

  We want to automate the failover process and I couldn't really find a
  resource agent that could do the job.

 That is correct; the pgsql resource agent (unlike its mysql counterpart)
 does not support streaming replication. We've had a contributor submit a
 patch at one point, but it was somewhat ill-conceived and thus did not make
 it into the upstream repo. The relevant thread is here:

 http://lists.linux-ha.org/pipermail/linux-ha-dev/2011-February/018195.html

 Would you feel comfortable modifying the pgsql resource agent to support
 replication? If so, we could revisit this issue and potentially add
 streaming replication support to pgsql.

 Well I'm not sure I would be able to do that change. Failover is
 relatively easy to do but I really have no idea how to do the failback part.

And that's exactly the reason why I haven't implemented it yet. With the
current way how replication is done in PostgreSQL there is no easy way to
switch between roles, or at least I don't know about a such way.
Implementing just fail-over functionality by creating a trigger file on a
slave server in the case of failure on master side doesn't create a full
master-slave implementation in my opinion.

 I will definitively have to sort this out somehow, I am just unsure
 whether I will try to use the repmgr mentioned in the video, or pacemaker
 with some level of customization...

 Is the resource agent that you mentioned available somewhere?

 Thanks.
 Attila

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Postgres RA won't start

2011-10-13 Thread Serge Dubrouski

On Thu, Oct 13, 2011 at 4:29 AM, Lars Ellenberg
lars.ellenb...@linbit.comwrote:

 On Wed, Oct 12, 2011 at 07:41:20PM -0600, Serge Dubrouski wrote:
  On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic a...@linux.org.ba
 wrote:
 
   Thank you all for tips and suggestions. I managed to configure postgres
 so
   it actually starts.
  
   First, I updated resource-agents (Florian thanks for the tip, still
 don't
   know how did I manage to miss that :) )
   Second, I deleted postgres primitive, cleared all failcounts and
 configure
   it again like this:
  
   primitive postgres_res ocf:heartbeat:pgsql \
   params pgctl=/usr/lib/postgresql/8.4/bin/pg_ctl
   psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/8.4/main
   config=/etc/postgresql/8.4/main/postgresql.conf pgdba=postgres \
  
   op start interval=0 timeout=120s \
   op stop interval=0 timeout=120s \
   op monitor interval=30s timeout=30s depth=0
  
   After that, it all worked like a charm.
  
   However, I noticed some strange output in the log file, it wasn't there
   before I updated the resource-agents.
  
   Here is the extract from the syslog:
  
   http://pastebin.com/ybPi0VMp
  
   (postgres_res:monitor:stderr) [: 647: monitor: unexpected operator
  
   This error is actually reported with any operator. I tried to start the
   script from CLI, I got the same thing with ./pgsql start, ./pgsql
 status,
   ./pgsql stop
  
 
  Weird. I don't know what to tell. The RA is basically all right, it just
  misses one nor very important fix. On my system CentOS 5. PosgreSQL 8.4
 or
  9.0 it doesn't produce any errors. If understand you log right the
 problem
  is in line 647 of the RA which is:
 
  [ $1 == validate-all ]  exit $rc

  == != =


Theoretically yes = is for strings and == is for numbers. But why it
would create a problem on Debian and not on CentOS and why nobody else
reported this issue so far?

BTW, other RAs use  == operator as well: apache, LVM, portblock,


 Make that [ $1 = validate-all ]  exit $rc


 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Postgres RA won't start

2011-10-12 Thread Serge Dubrouski

On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic a...@linux.org.ba wrote:

 Thank you all for tips and suggestions. I managed to configure postgres so
 it actually starts.

 First, I updated resource-agents (Florian thanks for the tip, still don't
 know how did I manage to miss that :) )
 Second, I deleted postgres primitive, cleared all failcounts and configure
 it again like this:

 primitive postgres_res ocf:heartbeat:pgsql \
 params pgctl=/usr/lib/postgresql/8.4/bin/pg_ctl
 psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/8.4/main
 config=/etc/postgresql/8.4/main/postgresql.conf pgdba=postgres \

 op start interval=0 timeout=120s \
 op stop interval=0 timeout=120s \
 op monitor interval=30s timeout=30s depth=0

 After that, it all worked like a charm.

 However, I noticed some strange output in the log file, it wasn't there
 before I updated the resource-agents.

 Here is the extract from the syslog:

 http://pastebin.com/ybPi0VMp

 (postgres_res:monitor:stderr) [: 647: monitor: unexpected operator

 This error is actually reported with any operator. I tried to start the
 script from CLI, I got the same thing with ./pgsql start, ./pgsql status,
 ./pgsql stop


Weird. I don't know what to tell. The RA is basically all right, it just
misses one nor very important fix. On my system CentOS 5. PosgreSQL 8.4 or
9.0 it doesn't produce any errors. If understand you log right the problem
is in line 647 of the RA which is:

[ $1 == validate-all ]  exit $rc

I do not see why it would complain on this line.


 Here is the pgsql script I am using:

 http://pastebin.com/55mKNDCM

 P..S. you can ignore nginx errors in syslog, I will open a new topic about
 that

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Postgres RA won't start

2011-10-11 Thread Serge Dubrouski

I don't have too much of experience with pacemaker in Devian. I'd also
suggest getting the latest version of pgsql RA from git, though if your
basic package is too old there could be conflicts.
 On Oct 11, 2011 9:11 AM, Amar Prasovic a...@linux.org.ba wrote:


 What version of resource-agents package do you use?  Old version of pgsql
 depended on fuser tool installed, otherway it could fail with that error
 code.


 Hello Serge,

 thank you for your answer.

 I don't have any resource-agents installed. The system is Debian Squeeze
 6.0.3 and it automatically installed cluster-agents 1.0.3-3.1

 When I try to install resource-agents I run into dependency problems:

 webnode01 postgresql # apt-get install resource-agents
 Reading package lists... Done
 Building dependency tree
 Reading state information... Done
 Some packages could not be installed. This may mean that you have
 requested an impossible situation or if you are using the unstable
 distribution that some required packages have not yet been created
 or been moved out of Incoming.
 The following information may help to resolve the situation:

 The following packages have unmet dependencies:
  resource-agents : Depends: libplumb2 but it is not going to be installed
Depends: libplumbgpl2 but it is not going to be
 installed
 E: Broken packages

 When I try to install libplumb2, the installation wants to remove
 pacemaker:

 webnode01 postgresql # apt-get install libplumb2
 Reading package lists... Done
 Building dependency tree
 Reading state information... Done
 The following packages were automatically installed and are no longer
 required:
   libsensors4 libsnmp15 libheartbeat2 corosync libnspr4-0d libtimedate-perl
 libsnmp-base openhpid libcurl3 libssh2-1 lm-sensors libopenhpi2 fancontrol
 libopenipmi0 libperl5.10 libesmtp5 libcorosync4 libnet1 libnss3-1d
 Use 'apt-get autoremove' to remove them.
 The following extra packages will be installed:
   libpils2
 The following packages will be REMOVED:
   cluster-agents cluster-glue libcluster-glue pacemaker
 The following NEW packages will be installed:
   libpils2 libplumb2
 0 upgraded, 2 newly installed, 4 to remove and 0 not upgraded.
 Need to get 115 kB of archives.
 After this operation, 5,874 kB disk space will be freed.
 Do you want to continue [Y/n]? n
 Abort.

 Can I do something with fuser tools?

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Reloading a resource after a failover

2011-10-08 Thread Serge Dubrouski

Hello -

It's there. And here is the basic configuration that works all right for me.
Unfortunately I didn't find a way how to make Pacemaker to run reload on one
resource after start of another so I just put order constraint for start
action. BIND does a quick restart when IP failes over:

node cs51
node cs52
primitive Bind ocf:heartbeat:named \
params monitor_request=localhost.localdomain
monitor_response=127.0.0.1 \
op monitor interval=30s timeout=20s
primitive BindIP ocf:heartbeat:IPaddr2 \
params ip=192.168.1.130 \
op monitor interval=30s timeout=20s
clone BindClone Bind \
meta clone-max=2 clone-node-max=1 target-role=Started
colocation IP_to_Bind +inf: BindIP BindClone
order Bind_Strart_After_IP_Start +inf: BindIP:start BindClone:start

OC: CentOS 5.7
Pacemaker: pacemaker-1.1.5-1.1.el5
ResourceAgents: resource-agents-3.9.2-1el5 + named RA from GIT


On Wed, Oct 5, 2011 at 5:08 AM, Max Williams max.willi...@betfair.comwrote:

  OK I’ll wait until it is in GIT.

 ** **

 Is there a way I could edit the LSB RA to take ‘reload’? Where is the RA
 file for LSB init scripts? Or is it built in to Pacemaker?

 ** **

 Thanks,

 Max

 ** **

 *From:* Serge Dubrouski [mailto:serge...@gmail.com]
 *Sent:* 05 October 2011 04:32

 *To:* The Pacemaker cluster resource manager
 *Subject:* Re: [Pacemaker] Reloading a resource after a failover

 ** **


 Please also notice that named RA in the git is a little bit broken. You
 need to apply a patch that I recently sent, or wait little bit till it's
 applied to GIT.

 ** **

 On Tue, Oct 4, 2011 at 8:08 AM, Florian Haas flor...@hastexo.com wrote:*
 ***

 On 2011-10-04 12:26, Max Williams wrote:
  Thanks Serge.
 
  How exactly do I tell pacemaker to use this RA? Is there an ‘import’
  command I need to run?

 No, you just drop the resource agent in the appropriate provider
 directory (/usr/lib/ocf/resource.d/provider/) and set its x bit. Then
 it immediately becomes available to the LRM and Pacemaker.


  Also, does this mean I need to change some of the paths in the RA?

 Yes it does. Well, really it means you're using an elderly version of
 the resource agents package which expects the heartbeat shellfuncs
 library in a different location than the new RA.


  [root@host001 ~]# ocf-tester -n named
  /usr/lib/ocf/resource.d/heartbeat/named
 
  Beginning tests for /usr/lib/ocf/resource.d/heartbeat/named...
 
  /usr/lib/ocf/resource.d/heartbeat/named: line 16:
  /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs: No such file or directory
 
  * rc=1: Your agent has too restrictive permissions: should be 755
 
  /usr/lib/ocf/resource.d/heartbeat/named: line 16:
  /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs: No such file or directory

 If you change the path in that line 16 to
 /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs (note leading dot in
 filename and resource.d, not lib), then the RA should be able to find
 the shell function library.

 What's a much better idea though is of course to move to a recent
 version of resource-agents, but that may or may not be an option for you.

 Hope this helps.
 Cheers,
 Florian

 --
 Need help with High Availability?
 http://www.hastexo.com/now


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker*
 ***



 

 ** **

 --
 Serge Dubrouski.

 
 In order to protect our email recipients, Betfair Group use SkyScan from
 MessageLabs to scan all Incoming and Outgoing mail for viruses.

 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Reloading a resource after a failover

2011-10-04 Thread Serge Dubrouski

Please also notice that named RA in the git is a little bit broken. You need
to apply a patch that I recently sent, or wait little bit till it's applied
to GIT.

On Tue, Oct 4, 2011 at 8:08 AM, Florian Haas flor...@hastexo.com wrote:

 On 2011-10-04 12:26, Max Williams wrote:
  Thanks Serge.
 
  How exactly do I tell pacemaker to use this RA? Is there an ‘import’
  command I need to run?

 No, you just drop the resource agent in the appropriate provider
 directory (/usr/lib/ocf/resource.d/provider/) and set its x bit. Then
 it immediately becomes available to the LRM and Pacemaker.

  Also, does this mean I need to change some of the paths in the RA?

 Yes it does. Well, really it means you're using an elderly version of
 the resource agents package which expects the heartbeat shellfuncs
 library in a different location than the new RA.

  [root@host001 ~]# ocf-tester -n named
  /usr/lib/ocf/resource.d/heartbeat/named
 
  Beginning tests for /usr/lib/ocf/resource.d/heartbeat/named...
 
  /usr/lib/ocf/resource.d/heartbeat/named: line 16:
  /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs: No such file or directory
 
  * rc=1: Your agent has too restrictive permissions: should be 755
 
  /usr/lib/ocf/resource.d/heartbeat/named: line 16:
  /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs: No such file or directory

 If you change the path in that line 16 to
 /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs (note leading dot in
 filename and resource.d, not lib), then the RA should be able to find
 the shell function library.

 What's a much better idea though is of course to move to a recent
 version of resource-agents, but that may or may not be an option for you.

 Hope this helps.
 Cheers,
 Florian

 --
 Need help with High Availability?
 http://www.hastexo.com/now


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Trouble with ordering

2011-10-03 Thread Serge Dubrouski

On Mon, Oct 3, 2011 at 7:16 AM, Gerald Vogt v...@spamcop.net wrote:

 On 03.10.11 03:47, Serge Dubrouski wrote:
  As I wrote before: you should be able to test this easily by sending
 a
  STOP signal to the named process. At least in this situation I see
 that
  the rndc stop doesn't return before those 60s.
 
 
  Indeed you are right. Thanks for catching. Attached is the patch that
  fixes this issue. It also makes rndc and host commands configurable.
 
  Please take a look at the patch and if it's all right I'll ask pacemaker
  team to push it into git.

 The 2 of the opening parameters are too much.


Thanks.



 In addition, I don't think you can check for name resolution of
 localhost.localdomain. localhost.localdomain is set in /etc/hosts
 but the DNS server doesn't know about domain localdomain...


Usually both localhost and localhost.localdomain are in /etc/hosts but that
doesn't matter because bind never uses it. The difference is that for
localhost  bind will try to resolve localhost.yourdomain taking
yourdomain from /etc/resolv.conf and for localhost.localdomain bind will
try to resolve localdomain. I'm not sure what is more standard to have
localhost in your zone file or to support localdomain as a zone. But for now
let's go your way with simple localhost.

Otherwise, the first test with a stopped named works nicely. The process

is killed in time and restarted... I'll have to do a few more tests.


Thanks again.


 -Gerald

 Patch against your patched version:

 *** named.ORIG  2011-10-03 14:28:07.0 +0200
 --- named   2011-10-03 15:03:34.0 +0200
 ***
 *** 25,31 
  OCF_RESKEY_named_rootdir_default=
  OCF_RESKEY_named_options_default=
  OCF_RESKEY_named_keytab_file_default=
 ! OCF_RESKEY_monitor_request_default=localhost.localdomain
  OCF_RESKEY_monitor_response_default=127.0.0.1
  OCF_RESKEY_monitor_ip_default=127.0.0.1

 --- 25,31 
  OCF_RESKEY_named_rootdir_default=
  OCF_RESKEY_named_options_default=
  OCF_RESKEY_named_keytab_file_default=
 ! OCF_RESKEY_monitor_request_default=localhost
  OCF_RESKEY_monitor_response_default=127.0.0.1
  OCF_RESKEY_monitor_ip_default=127.0.0.1

 ***
 *** 80,86 
  content type=string default=${OCF_RESKEY_named_default} /
  /parameter

 - parameters
  parameter name=rndc unique=0 required=0
  longdesc lang=en
  Path to the rndc command.
 --- 80,85 
 ***
 *** 89,95 
  content type=string default=${OCF_RESKEY_rndc_default} /
  /parameter

 - parameters
  parameter name=host unique=0 required=0
  longdesc lang=en
  Path to the host command.
 --- 88,93 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Trouble with ordering

2011-10-02 Thread Serge Dubrouski

On Sun, Oct 2, 2011 at 12:31 AM, Gerald Vogt v...@spamcop.net wrote:

 On 02.10.11 03:18, Serge Dubrouski wrote:
  1. You expect rndc and host to be in $PATH. At the same time the path
 to
  named can be configured. I think consequently, the same should apply
 to
  rndc and host as they are bind utils.
 
  On our CentOS servers we run the latest version of bind, compiled
 from
  source and installed in a custom path which is added in /etc/profile.
  For some reason /etc/profile doesn't seem to apply to the ocf scripts
  thus the script doesn't find rndc or host unless I extend PATH
 manually
  at the beginning of the script.
 
 
  We had some discussion around this and finally decided  to leave it up
  to sysadmin ti make sure that both tools are available in PATH. One
  can always create a couple of symlink to cover it.

 But isn't it inconsequent that you can set the named path as a parameter
 but not rndc or host. named, rndc, and host all come out of a bind
 installation and they all run on the same host...

  2. In the stop function you call rndc stop to stop the daemon.
  However, if the daemon hangs, rndc will hang. Thus pacemaker runs
 into a
  timeout and kills the ocf script, leading to a failed stop.
 
 
  You didn't read the code carefully again. Yes it does exactly what you
  want or at least it's supposed to:
 
  if ! $RNDC stop /dev/null; then

 The problem is your script never gets beyond this line. rndc tries to
 contact named which is hanging. I don't know what time out rndc has
 exactly but at least on our CentOS installation it doesn't time out
 within 60s.

 60s is currently the timeout we have set in the primitive declaration.
 Thus after 60s pacemaker assumes your script is hanging and kills your
 script with TERM.

 As I wrote before: you should be able to test this easily by sending a
 STOP signal to the named process. At least in this situation I see that
 the rndc stop doesn't return before those 60s.


Indeed you are right. Thanks for catching. Attached is the patch that fixes
this issue. It also makes rndc and host commands configurable.

Please take a look at the patch and if it's all right I'll ask pacemaker
team to push it into git.

Thanks again.



  kill `cat ${OCF_RESKEY_named_pidfile}`
  fi
 
  if [ -n $OCF_RESKEY_CRM_meta_timeout ]; then
# Allow 2/3 of the action timeout for the orderly shutdown
# (The origin unit is ms, hence the conversion)
timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
  else
timeout=20
  fi
 
  while named_status ; do
  if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then
  break
  else
  sleep 1
  timeout=$((timeout++))
  fi
  done
 
  *#If still up*
  *if named_status 21; then*
  *ocf_log err named is still up! Killing;*
  *kill -9 `cat ${OCF_RESKEY_named_pidfile}`*
  *fi*
 
 
  I think the ocf script should have its own timeout and abort the rndc
  call if it takes too long and then try to kill the server.
 
 
  See above.
 
 
 
  To test send a STOP signal to named and wait...

 Gerald

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
diff --git a/heartbeat/named b/heartbeat/named
index 8d15db6..e115eaf 100755
--- a/heartbeat/named
+++ b/heartbeat/named
@@ -15,23 +15,23 @@
 : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
 . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
 
-# Used binaries
-RNDC=rndc
-HOST=host
-
 #Defaults
 OCF_RESKEY_named_default=/usr/sbin/named
+OCF_RESKEY_rndc_default=/usr/sbin/rndc
+OCF_RESKEY_host_default=/usr/bin/host
 OCF_RESKEY_named_user_default=named
 OCF_RESKEY_named_config_default=
 OCF_RESKEY_named_pidfile_default=/var/run/named/named.pid
 OCF_RESKEY_named_rootdir_default=
 OCF_RESKEY_named_options_default=
 OCF_RESKEY_named_keytab_file_default=
-OCF_RESKEY_monitor_request_default=localhost
+OCF_RESKEY_monitor_request_default=localhost.localdomain
 OCF_RESKEY_monitor_response_default=127.0.0.1
 OCF_RESKEY_monitor_ip_default=127.0.0.1
 
 : ${OCF_RESKEY_named=${OCF_RESKEY_named_default}}
+: ${OCF_RESKEY_rndc=${OCF_RESKEY_rndc_default}}
+: ${OCF_RESKEY_host=${OCF_RESKEY_host_default}}
 : ${OCF_RESKEY_named_user=${OCF_RESKEY_named_user_default}}
 : ${OCF_RESKEY_named_config=${OCF_RESKEY_named_config_default}}
 : ${OCF_RESKEY_named_pidfile=${OCF_RESKEY_named_pidfile_default}}
@@ -80,6 +80,24 @@ Path to the named command.
 content type=string default=${OCF_RESKEY_named_default} /
 /parameter
 
+parameters
+parameter name=rndc unique=0 required=0
+longdesc lang=en
+Path to the rndc command

Re: [Pacemaker] Trouble with ordering

2011-10-01 Thread Serge Dubrouski

On Sat, Oct 1, 2011 at 2:49 PM, Gerald Vogt v...@spamcop.net wrote:

 On 01.10.11 04:53, Serge Dubrouski wrote:
  Technically, I don't want the cluster to control the service in the
  meaning of starting and stopping. The cluster controls the IP
 addresses
  and moves them between nodes. The dns service resource is supposed to
  provide a check that the dns service is working on the node and
 migrate
  the service and most important the IP address if it becomes
  unresponsive.
 
  I didn't look at the concept of clones, yet. Maybe I took a
 completely
  wrong approach to what I am trying to do.
 
 
  I think that clones is  rally good solution for this situation. You can
  configure BIND as a clone service with different configuration though.
  One node will be master another slave. You can also have a floating VIP
  tied up to any of the nodes but collocated with the running BIND.If BIND
  dies for some reason, pacemaker will move your IP to the survived node.
  You can addsending additional alarms.

 Thanks a lot! Just learned a couple of things.


I'm glad it helped.



 I have removed my own script. Installed yours and set it up. Configured
 a clone.

 primitive bind ocf:heartbeat:named ...
 clone bind-clone bind

 Then bind is kept running on all nodes and is only shutdown if it fails.
 If necessary named is restarted. Great.

 Then I colocate my ip resources with the clone:

 colocation ns1-ip-bind inf: nsi1-ip bind-clone
 colocation ns2-ip-bind inf: nsi2-ip bind-clone

 Thus the service IP addresses only run on nodes where bind is active. If
 bind fails on a node the ip address is moved.

 Two notes (regarding the latest version on github):

 1. You expect rndc and host to be in $PATH. At the same time the path to
 named can be configured. I think consequently, the same should apply to
 rndc and host as they are bind utils.

 On our CentOS servers we run the latest version of bind, compiled from
 source and installed in a custom path which is added in /etc/profile.
 For some reason /etc/profile doesn't seem to apply to the ocf scripts
 thus the script doesn't find rndc or host unless I extend PATH manually
 at the beginning of the script.


We had some discussion around this and finally decided  to leave it up to
sysadmin ti make sure that both tools are available in PATH. One
can always create a couple of symlink to cover it.



 2. In the stop function you call rndc stop to stop the daemon.
 However, if the daemon hangs, rndc will hang. Thus pacemaker runs into a
 timeout and kills the ocf script, leading to a failed stop.


You didn't read the code carefully again. Yes it does exactly what you want
or at least it's supposed to:

if ! $RNDC stop /dev/null; then
kill `cat ${OCF_RESKEY_named_pidfile}`
fi

if [ -n $OCF_RESKEY_CRM_meta_timeout ]; then
  # Allow 2/3 of the action timeout for the orderly shutdown
  # (The origin unit is ms, hence the conversion)
  timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
else
  timeout=20
fi

while named_status ; do
if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then
break
else
sleep 1
timeout=$((timeout++))
fi
done

*#If still up*
*if named_status 21; then*
*ocf_log err named is still up! Killing;*
*kill -9 `cat ${OCF_RESKEY_named_pidfile}`*
*fi*


 I think the ocf script should have its own timeout and abort the rndc
 call if it takes too long and then try to kill the server.


See above.



 To test send a STOP signal to named and wait...


 But otherwise, great script.

 Thanks!

 Gerald





 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Trouble with ordering

2011-09-30 Thread Serge Dubrouski

OCF script for bind was recently added to cluster-resources on gorging.
Could you please try to use that one?
On Sep 30, 2011 2:09 AM, Gerald Vogt v...@spamcop.net wrote:
 Hi!

 I am running a cluster with 3 nodes. These nodes provide dns service.
 The purpose of the cluster is to have our two dns service ip addresses
 online at all times. I use IPaddr2 and that part works.

 Now I try to extend our setup to check the dns service itself. So far,
 if a dns server on any node stops or hangs the cluster won't notice.
 Thus, I wrote a custom ocf script to check whether the dns service on
 a node is operational (i.e. if the dns server is listening on the ip
 address and whether it responds to a dns request).

 All cluster nodes are slave dns servers, therefore the dns server
 process is running at all times to get zone transfers from the dns
 master.

 Obviously, the dns service resource must be colocated with the IP
 address resource. However, as the dns server is running at all times,
 the dns service resource must be started or stopped after the ip
 address. This leads me to something like this:

 primitive ns1-ip ocf:heartbeat:IPaddr2 ...
 primitive ns1-dns ocf:custom:dns op monitor interval=30s

 colocation dns-ip1 inf: ns1-dns ns1-ip
 order ns1-ip-dns inf: ns1-ip ns1-dns symmetrical=false

 Problem 1: it seems as if the order constraint does not wait for an
 operation on the first resource to finish before it starts the
 operation on the second. When I migrate an IP address to another node
 the stop operation on ns1-dns will fail because the ip address is
 still active on the network interface. I have worked around this by
 checking for the IP address on the interface in the stop part of my
 dns script and sleeping 5 seconds if it is still there before checking
 again and continuing.

 Shouldn't the stop on ns1-ip first finish before the node initiates
 the stop on ns1-dns?

 Problem 2: if the dns service fails, e.g. hangs, the monitor operation
 fails. Thus, the cluster wants to migrate the ip address and service
 to another node. However, it first initiates a stop on ns1-dns and
 then on ns1-ip.

 What I need is ns1-ip to stop before ns1-dns. But this seems
 impossible to configure. The order constraint only says what operation
 is executed on ns1-dns depending on the status of ns1-ip. It says what
 happens after something. It cannot say what happens before something.
 Is that correct? Or am I missing a configuration option?

 Thanks,

 Gerald

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Trouble with ordering

2011-09-30 Thread Serge Dubrouski

May be you didn't look carefully but that script does exactly that, it
monitors process and service. Also if you want cluster to control your
service, it has to be able to start and stop it. You can configure your
service as a clone and it'll be up on several nodes.
But if you don't want to use it you don't have to.
On Sep 30, 2011 6:02 AM, Gerald Vogt v...@spamcop.net wrote:
On 30.09.11 13:41, Serge Dubrouski wrote:
OCF script for bind was recently added to cluster-resources on gorging.
Could you please try to use that one?

Which script where?

The one you have posted here:

http://lists.linux-ha.org/pipermail/linux-ha-dev/attachments/20110712/e1a1e792/attachment.obj

doesn't do what I need. I don't want to start or stop the name server.
The name server (bind process) is supposed to be running all the time to
get updates from the master.

The script also doesn't check whether the process is working or not. The
process could be running but not responding.

My script tests whether bind is listening on the resource ip address and
whether it resolves one of our domains. If it does, it's O.K. If not, it
fails.

The script is working properly. I just need to tell pacemaker that
ns1-ip should always be stopped before ns1-dns. That's not the case if
ns1-dns monitor returns failure.

Thanks!

Gerald

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Trouble with ordering

2011-09-30 Thread Serge Dubrouski

On Fri, Sep 30, 2011 at 9:20 AM, Gerald Vogt v...@spamcop.net wrote:

 On 30.09.11 15:03, Serge Dubrouski wrote:
  May be you didn't look carefully but that script does exactly that, it
  monitors process and service. Also if you want cluster to control your
  service, it has to be able to start and stop it. You can configure your
  service as a clone and it'll be up on several nodes.
  But if you don't want to use it you don't have to.

 You are right. I did not look at the monitor function. I checked the
 status function and thought it would be in there if it checked it.


That's one of the main differences between LSB and OCF RAs.


 Technically, I don't want the cluster to control the service in the
 meaning of starting and stopping. The cluster controls the IP addresses
 and moves them between nodes. The dns service resource is supposed to
 provide a check that the dns service is working on the node and migrate
 the service and most important the IP address if it becomes unresponsive.

 I didn't look at the concept of clones, yet. Maybe I took a completely
 wrong approach to what I am trying to do.


I think that clones is  rally good solution for this situation. You can
configure BIND as a clone service with different configuration though. One
node will be master another slave. You can also have a floating VIP tied up
to any of the nodes but collocated with the running BIND.If BIND dies for
some reason, pacemaker will move your IP to the survived node. You can
addsending additional alarms.


 The cluster until recently only operated the two DNS service IP
 addresses ns1-ip and ns2-ip for our LAN. Three nodes are used to provide
 redundancy in case one node fails. This way our two DNS server IPs are
 active at all times.

 Bind is running on all three nodes. Bind is configured to scan for
 interface changes every 60s. The three nodes are configured as slave
 servers, getting notified of zone updates by the master server.


Here you have several options. You can either schedule reload operation for
NAMED RA in cluster. or you can try to create an odrer constraint like
somebody else suggested:

order named-service-clone-after-Cluster_IP inf: Cluster_IP:start
Named_Service:reload



 This works in regard to node failures and similar. If a node crashes the
 IP address is moved to another node.

 The problem is if the node is still up but the named process becomes
 unresponsive and is hanging. The cluster wouldn't notice this.


With NAMED RA it will.



 If I understand your script correctly, it starts and stops the named
 process. If I do this, the node which is not running the dns server
 won't get zone updates, i.e. if it starts it has outdated zone files.


As I said earlier  you can configure it as a clone. In this case cluster
will start it on all nodes. and will do monitoring.


 Now if the master server is accessible and running at the time of start
 the dns server gets updated quickly. The trouble is if the master is
 down, too, the dns server will provide outdated dns information until
 the master is running again.




 That seems to me the problem when the bind process is started and
 stopped on the nodes and that I was trying to avoid. IMHO the named
 process can be running all the time, thus getting zone notifies in the
 usual manner.


It depend how you populate your zones.  If you put your zone and config
files on a shared device (DRBD or so) then you can fail it over along with
IP and restart BIND with each failover. If you want to use master/slave BIND
replication then you obviously need to have them both running at all times
and then you need to use clones.


 But maybe I am not getting what clones do. I think so far I didn't quite
 get what they do exactly from the guides in respect to what I am trying
 to achieve.

 Maybe you can give me a hint how I would achieve this with a clone,
 running named on all nodes at all times and moving the service IP
 addresses between nodes in case a node or dns server fails or hangs.


 Thanks!

 Gerald

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Reloading a resource after a failover

2011-09-29 Thread Serge Dubrouski

What kind of RA you do you use? LSB one doesn't support reload, OCF - does.
You need to get OCF RA from github,.

On Thu, Sep 29, 2011 at 3:04 AM, Max Williams max.willi...@betfair.comwrote:

  Yes this is what I would like to do. Ideally have named as a clone and
 then have an order like this:

 crm(live)configure# order named-service-clone-after-Cluster_IP inf:
 Cluster_IP:start Named_Service:reload

 But it gives this error:

 ERROR: bad resource action/instance definition: Named_Service:reload

 ** **

 Is there a way to achieve this?

 Can I edit an RA file so that it takes reload as a action? If so, which
 file?

 Or configure pacemaker to run an external command when a resource fails
 over or moves?

 ** **

 Many thanks,

 Max

 ** **

 *From:* Serge Dubrouski [mailto:serge...@gmail.com]
 *Sent:* 28 September 2011 18:17
 *To:* The Pacemaker cluster resource manager
 *Subject:* Re: [Pacemaker] Reloading a resource after a failover

 ** **

 Put bind itself under pacemaker control. You can use LSB RA or OCF RA that
 I recently created.

 On Sep 28, 2011 10:46 AM, Max Williams max.willi...@betfair.com wrote:
  Hi,
  I have a pair of clustered DNS servers with a virtual IP (VIP)
 configured. The problem is that when the VIP fails over, named on the new
 host of the VIP will not listen on port 53/UDP of the VIP until it is
 reloaded (I think this is because this daemon uses UDP, not TCP).
 
  So I'd like to be able to reload named after a failover of the VIP
 address. Is this possible?
 
  I could do it by configuring named as a cloned resource and then
 configuring an order so that it is restarted when the VIP fails over or
 moves but I would much rather have named reload instead of restart.
 
  Any ideas? I'd rather not have to resort to a wrapper script or anything
 like that.
 
  Thanks,
  Max
 
  
  In order to protect our email recipients, Betfair Group use SkyScan from
  MessageLabs to scan all Incoming and Outgoing mail for viruses.
 
  
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker*
 ***

 
 In order to protect our email recipients, Betfair Group use SkyScan from
 MessageLabs to scan all Incoming and Outgoing mail for viruses.

 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Reloading a resource after a failover

2011-09-29 Thread Serge Dubrouski

Here:
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/named

Let me know how it works for you.
On Sep 29, 2011 8:25 AM, Max Williams max.willi...@betfair.com wrote:
 Yes I was using the LSB RA. Can you give me a link to the OCF RA on
github?
 Thanks,
 Max

 From: Serge Dubrouski [mailto:serge...@gmail.com]
 Sent: 29 September 2011 13:19
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Reloading a resource after a failover

 What kind of RA you do you use? LSB one doesn't support reload, OCF -
does. You need to get OCF RA from github,.
 On Thu, Sep 29, 2011 at 3:04 AM, Max Williams max.willi...@betfair.com
mailto:max.willi...@betfair.com wrote:
 Yes this is what I would like to do. Ideally have named as a clone and
then have an order like this:
 crm(live)configure# order named-service-clone-after-Cluster_IP inf:
Cluster_IP:start Named_Service:reload
 But it gives this error:
 ERROR: bad resource action/instance definition: Named_Service:reload

 Is there a way to achieve this?
 Can I edit an RA file so that it takes reload as a action? If so, which
file?
 Or configure pacemaker to run an external command when a resource fails
over or moves?

 Many thanks,
 Max

 From: Serge Dubrouski [mailto:serge...@gmail.commailto:serge...@gmail.com
]
 Sent: 28 September 2011 18:17
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Reloading a resource after a failover


 Put bind itself under pacemaker control. You can use LSB RA or OCF RA that
I recently created.
 On Sep 28, 2011 10:46 AM, Max Williams max.willi...@betfair.commailto:
max.willi...@betfair.com wrote:
 Hi,
 I have a pair of clustered DNS servers with a virtual IP (VIP)
configured. The problem is that when the VIP fails over, named on the new
host of the VIP will not listen on port 53/UDP of the VIP until it is
reloaded (I think this is because this daemon uses UDP, not TCP).

 So I'd like to be able to reload named after a failover of the VIP
address. Is this possible?

 I could do it by configuring named as a cloned resource and then
configuring an order so that it is restarted when the VIP fails over or
moves but I would much rather have named reload instead of restart.

 Any ideas? I'd rather not have to resort to a wrapper script or anything
like that.

 Thanks,
 Max

 
 In order to protect our email recipients, Betfair Group use SkyScan from
 MessageLabs to scan all Incoming and Outgoing mail for viruses.

 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.orgmailto:
Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

 
 In order to protect our email recipients, Betfair Group use SkyScan from
 MessageLabs to scan all Incoming and Outgoing mail for viruses.

 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.orgmailto:
Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



 --
 Serge Dubrouski.

 
 In order to protect our email recipients, Betfair Group use SkyScan from
 MessageLabs to scan all Incoming and Outgoing mail for viruses.

 
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Reloading a resource after a failover

2011-09-28 Thread Serge Dubrouski

Put bind itself under pacemaker control. You can use LSB RA or OCF RA that I
recently created.
 On Sep 28, 2011 10:46 AM, Max Williams max.willi...@betfair.com wrote:
 Hi,
 I have a pair of clustered DNS servers with a virtual IP (VIP) configured.
The problem is that when the VIP fails over, named on the new host of the
VIP will not listen on port 53/UDP of the VIP until it is reloaded (I think
this is because this daemon uses UDP, not TCP).

 So I'd like to be able to reload named after a failover of the VIP
address. Is this possible?

 I could do it by configuring named as a cloned resource and then
configuring an order so that it is restarted when the VIP fails over or
moves but I would much rather have named reload instead of restart.

 Any ideas? I'd rather not have to resort to a wrapper script or anything
like that.

 Thanks,
 Max

 
 In order to protect our email recipients, Betfair Group use SkyScan from
 MessageLabs to scan all Incoming and Outgoing mail for viruses.

 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [RPMs] clusterlabs.org and epel-6 ?

2011-07-11 Thread Serge Dubrouski

Since the questions was asked. Is it possible to update those epel (at leasy
5 but both i386 and x86_64) repos with the latest build (3.9.2) of
resource-agents? The current state is inconsistent. Some repos have 1.0.1
some 1.0.4

On Mon, Jul 11, 2011 at 8:01 PM, Andrew Beekhof and...@beekhof.net wrote:

 On Tue, Jul 12, 2011 at 11:57 AM, Thomas Guthmann
 tguthm...@iseek.com.au wrote:
  Hi,
 
  On clusterlabs.org, there are 2 repositories epel-5 and epel-6.

 How many nano-seconds has centos-6 been out for?

  However the
  epel-6 repository [1] is empty. Do you guys plan to use it or should we
  build our own RPMs because it's never going to happen ?
 
  I believe, like many, with (finally) Centos 6 out we would like to start
 to
  upgrade our clusters to if possible corosync 1.3.x and Pacemaker 1.1.x
 
  So, can you tell us if el6 RPMs are on the roadmap or not ? And if yes,
 an
  approx. date when they could be released.

 I don't believe there is any need for additional RPMs.  Pacemaker
 should already be in CentOS6

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pacemaker and LDAP (389 Directory Service)

2011-06-28 Thread Serge Dubrouski

I have a similar setup running with this simple configuration:

node mexico
node washington
primitive LDAP lsb:openldap \
op monitor interval=30s
primitive masterIP ocf:heartbeat:IPaddr \
params ip=XXX.XXX.XXX.XXX cidr_netmask=21 \
meta migration-threshold=3
primitive pingd ocf:pacemaker:ping \
params dampen=5s multiplier=1000 host_list=XXX.XXX.XXX.XXX \
op monitor interval=20s timeout=15s
clone LDAP-clone LDAP \
meta target-role=Started is-managed=true
clone pingd-clone pingd \
meta target-role=Started
location cli-prefer-masterIP masterIP \
rule $id=cli-prefer-rule-masterIP inf: #uname eq washington
location connected masterIP \
rule $id=connected-rule -inf: not_defined pingd or pingd lte 0
location primNode masterIP \
rule $id=prefered_primNode 500: #uname eq washington
colocation IP_with_LDAP inf: masterIP LDAP-clone

Not on Amazon cloud but on a real hardware though. You can ignore pingd
part.

On Mon, Jun 27, 2011 at 4:56 PM, veghead s...@studyblue.com wrote:

 Serge Dubrouski sergeyfd@... writes:
  On Mon, Jun 27, 2011 at 3:33 PM, veghead sean at studyblue.com
 wrote:
  If I remove the co-location, won't the elastic_ip resource just stay
 where it
  is? Regardless of what happens to LDAP?
 
  Right. That's why I think that you don't really want to do it. You have
  to make sure that your IP is up where you LDAP is up.

 Okay. So I took a step and revamped the configuration to test the
 elastic_ip
 less frequently and with a long timeout. I committed the changes, but crm
 status doesn't reflect the resources in question.

 Here's the new config:

 ---snip---
 # crm configure show
 node $id=d2b294cf-328f-4481-aa2f-cc7b553e6cde ldap1.example.ec2
 node $id=e2a2e42e-1644-4f7d-8e54-71e1f7531e08 ldap2.example.ec2
 primitive elastic_ip lsb:elastic-ip \
op monitor interval=30 timeout=300 on-fail=ignore
 requires=nothing
 primitive ldap lsb:dirsrv \
op monitor interval=15s on-fail=standby requires=nothing
 clone ldap-clone ldap
 colocation ldap-with-eip inf: elastic_ip ldap-clone
 order ldap-after-eip inf: elastic_ip ldap-clone
 property $id=cib-bootstrap-options \
dc-version=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 \
cluster-infrastructure=Heartbeat \
stonith-enabled=false \
no-quorum-policy=ignore \
stop-all-resources=true
 rsc_defaults $id=rsc-options \
resource-stickiness=100
 ---snip---

 And here's the output from crm status:

 ---snip---
 # crm status
 
 Last updated: Mon Jun 27 18:50:14 2011
 Stack: Heartbeat
 Current DC: ldap2.studyblue.ec2 (e2a2e42e-1644-4f7d-8e54-71e1f7531e08) -
 partition with quorum
 Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
 2 Nodes configured, unknown expected votes
 2 Resources configured.
 

 Online: [ ldap1.example.ec2 ldap2.example.ec2 ]
 ---snip---

 I restarted the nodes one at a time - first I restarted ldap2, then I
 restarted
 ldap1. When ldap1 went down, ldap2 stopped the ldap resource and didn't
 make any
 attempt to start the elastic_ip resource:

 ---snip---
 pengine: [12910]: notice: unpack_config: On loss of CCM Quorum: Ignore
 pengine: [12910]: info: unpack_config: Node scores: 'red' = -INFINITY,
 'yellow'
 = 0, 'green' = 0
 pengine: [12910]: info: determine_online_status: Node ldap2.example.ec2 is
 online
 pengine: [12910]: notice: native_print: elastic_ip   (lsb:elastic-ip):
 Stopped
 pengine: [12910]: notice: clone_print:  Clone Set: ldap-clone
 pengine: [12910]: notice: short_print:  Stopped: [ ldap:0 ldap:1 ]
 pengine: [12910]: notice: LogActions: Leave   resource elastic_ip
 (Stopped)
 pengine: [12910]: notice: LogActions: Leave   resource ldap:0(Stopped)
 pengine: [12910]: notice: LogActions: Leave   resource ldap:1(Stopped)
 ---snip---

 After heartbeat/pacemaker came back up on ldap1, it terminated the ldap
 service
 on ldap1. Now I'm just confused.


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pacemaker and LDAP (389 Directory Service)

2011-06-27 Thread Serge Dubrouski

If you want to make your LDAP independent from IP just remove your
collocation:

colocation ldap-with-eip inf: elastic_ip ldap-clone

But I'd rather try to find out why monitoring for IP fails. May bet it just
needs an increased timeout on monitor operation, though it looks like you've
already increased it. What's in your log files when that monitor fails?

On Mon, Jun 27, 2011 at 9:39 AM, veghead s...@studyblue.com wrote:

 veghead sean@... writes:
  Pair of LDAP servers running 389 (formerly Fedora DS) in
  high availability using Pacemaker with a floating IP.
  In addition, 389 supports multi-master replication,
  where all changes on one node are automatically
  replicated on one or more other nodes.

 I'm so close, but I'm still having issues. I'm running these on EC2 using
 an
 ElasticIP as the floating ip. Unfortunately, I have found that requests
 for
 the status of the ElasticIP occasionally fail for no apparent reason, even
 thought he ElasticIP is actually working fine. Once they fail, that
 triggers a
 failover and creates a mess.

 What I'd like to do is:

 * Run LDAP service on both nodes
 * Ignore the status of the ElasticIP resource and only trigger a fail-over
 when
 the LDAP service fails.

 I feel like my config is close, but the cluster keeps wanting to stop the
 resources.

 Here's my current config:

 ---snip---
 primitive elastic_ip lsb:elastic-ip \
op monitor interval=0 timeout=300 on-fail=ignore
 requires=nothing
 primitive ldap lsb:dirsrv \
op monitor interval=15s on-fail=standby requires=nothing
 clone ldap-clone ldap
 colocation ldap-with-eip inf: elastic_ip ldap-clone
 order ldap-after-eip inf: elastic_ip ldap-clone
 property $id=cib-bootstrap-options \
dc-version=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 \
cluster-infrastructure=Heartbeat \
stonith-enabled=false \
no-quorum-policy=ignore \
stop-all-resources=true
 rsc_defaults $id=rsc-options \
resource-stickiness=100
 ---snip---

 Any suggestions as to what I'm doing wrong?



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Pacemaker and LDAP (389 Directory Service)

2011-06-27 Thread Serge Dubrouski

On Mon, Jun 27, 2011 at 3:33 PM, veghead s...@studyblue.com wrote:

 Sorry for the questions. Some days my brain is just slow. :)

 Serge Dubrouski sergeyfd@... writes:
  If you want to make your LDAP independent from IP just remove your
  collocation:colocation ldap-with-eip inf: elastic_ip ldap-clone

 Is that really what I want to do? I mean, I need the elastic ip assigned to
 ~one~ of the machines... And if LDAP fails on that machine, I need
 Pacemaker to
 start the Elastic IP on the other machine.

 If I remove the co-location, won't the elastic_ip resource just stay where
 it
 is? Regardless of what happens to LDAP?


Right. That's why I think that you don't really want to do it. You have to
make sure that your IP is up where you LDAP is up.



  But I'd rather try to find out why monitoring for IP fails. May bet
  it just needs an increased timeout on monitor operation, though it
  looks like you've already increased it. What's in your log files
  when that monitor fails?

 Originally, I had the monitor on the elastic_ip resource set to 10 seconds.
 The
 error in the logs was:

 ---snip---
 pengine: [16980]: notice: unpack_rsc_op: Operation elastic_ip_monitor_0
 found
 resource elastic_ip active on ldap1.example.ec2
 pengine: [16980]: WARN: unpack_rsc_op: Processing failed op
 elastic_ip_monitor_1 on ldap1.example.ec2: unknown exec error (-2)
 pengine: [16980]: WARN: unpack_rsc_op: Processing failed op
 elastic_ip_stop_0 on
 ldap1.example.ec2: unknown exec error (-2)
 pengine: [16980]: info: native_add_running: resource elastic_ip isnt
 managed


Why it's unmanaged?


 pengine: [16980]: notice: unpack_rsc_op: Operation ldap:1_monitor_0 found
 resource ldap:1 active on ldap2.example.ec2
 pengine: [16980]: WARN: unpack_rsc_op: Processing failed op
 elastic_ip_start_0
 on ldap2.example.ec2: unknown exec error (-2)
 pengine: [16980]: notice: native_print: elastic_ip   (lsb:elastic-ip):
 Started ldap1.example.ec2 (unmanaged) FAILED
 pengine: [16980]: notice: clone_print:  Clone Set: ldap-clone
 pengine: [16980]: notice: short_print:  Stopped: [ ldap:0 ldap:1 ]
 pengine: [16980]: info: get_failcount: elastic_ip has failed INFINITY times
 on
 ldap1.example.ec2
 pengine: [16980]: WARN: common_apply_stickiness: Forcing elastic_ip away
 from
 ldap1.example.ec2 after 100 failures (max=100)
 pengine: [16980]: info: get_failcount: elastic_ip has failed INFINITY times
 on
 ldap2.example.ec2
 pengine: [16980]: WARN: common_apply_stickiness: Forcing elastic_ip away
 from
 ldap2.example.ec2 after 100 failures (max=100)
 pengine: [16980]: info: native_color: Unmanaged resource elastic_ip
 allocated to
 'nowhere': failed
 pengine: [16980]: notice: RecurringOp:  Start recurring monitor (15s) for
 ldap:0
 on ldap1.example.ec2
 pengine: [16980]: notice: RecurringOp:  Start recurring monitor (15s) for
 ldap:1
 on ldap2.example.ec2
 pengine: [16980]: notice: LogActions: Leave   resource elastic_ip
 (Started unmanaged)
 pengine: [16980]: notice: LogActions: Start   ldap:0
 (ldap1.example.ec2)
 pengine: [16980]: notice: LogActions: Start   ldap:1
 (ldap2.example.ec2)
 ---snip---

 Now that I have set the monitor interval for the elastic_ip resource to
 0, it
 keeps thinking everything is either stopped or should be stopped:


You can't have monitoring interval set to 0. It makes no sense and actually
reserved for probs AFAIK.



 ---snip---
 pengine: [7287]: notice: unpack_rsc_op: Operation elastic_ip_monitor_0
 found
 resource elastic_ip active on ldap1.example.ec2
 pengine: [7287]: notice: unpack_rsc_op: Operation ldap:0_monitor_0 found
 resource ldap:0 active on ldap2.example.ec2
 pengine: [7287]: notice: native_print: elastic_ip (lsb:elastic-ip):
 Stopped
 pengine: [7287]: notice: clone_print:  Clone Set: ldap-clone
 pengine: [7287]: notice: short_print:  Stopped: [ ldap:0 ldap:1 ]
 pengine: [7287]: notice: LogActions: Leave   resource elastic_ip  (Stopped)
 pengine: [7287]: notice: LogActions: Leave   resource ldap:0  (Stopped)
 pengine: [7287]: notice: LogActions: Leave   resource ldap:1  (Stopped)
 ---snip---

 Very strange.


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] How to tell pacemaker to start exportfs after filesystem resource

2011-06-21 Thread Serge Dubrouski

2011/6/21 Aleksander Malaev amal...@alt-lan.ru

 Sure, I'm using order constraint.
 But it seems that it doesn't check monitor of the previous started
 resource.


Seems like you don't have an order constraint that would tie clone-share to
clone-fs making it to start sharing after mounting.



 2011/6/21 Dejan Muhamedagic deja...@fastmail.fm

 Hi,

 On Mon, Jun 20, 2011 at 11:40:04PM +0400, Александр Малаев wrote:
  Hello,
 
  I have configured pacemaker+ocfs2 cluster with shared storage connected
 by
  FC.
  Now I need to setup NFS export in Active/Active mode and I added all
 needed
  resources and wrote the order of starting.
  But then node is starting after reboot I got race condition between
  Filesystem resource and exportfs.
  Exportfs couldn't start because ocfs2 mountpoint isn't mounted yet.
 
  How to tell ExportFS resource to start then filesystem resource will be
  ready?

 Use the order constraint? Or did I miss something? You already
 have some order constraints defined, so you should be able to
 manage.

 Thanks,

 Dejan

  crm config is the following:
  node msk-nfs-gw01
  node msk-nfs-gw02
  primitive nfs-kernel-server lsb:nfs-kernel-server \
  op monitor interval=10s timeout=30s
  primitive ping ocf:pacemaker:ping \
  params host_list=10.236.22.35 multiplier=100 name=ping \
  op monitor interval=20s timeout=60s \
  op start interval=0 timeout=60s
  primitive portmap upstart:portmap \
  op monitor interval=10s timeout=30s
  primitive res-dlm ocf:pacemaker:controld \
  op monitor interval=120s
  primitive res-fs ocf:heartbeat:Filesystem \
  params device=/dev/mapper/mpath0 directory=/media/media0
  fstype=ocfs2 \
  op monitor interval=120s
  primitive res-nfs1-ip ocf:heartbeat:IPaddr2 \
  params ip=10.236.22.38 cidr_netmask=27 nic=bond0 \
  op monitor interval=30s
  primitive res-nfs2-ip ocf:heartbeat:IPaddr2 \
  params ip=10.236.22.39 cidr_netmask=27 nic=bond0 \
  op monitor interval=30s
  primitive res-o2cb ocf:pacemaker:o2cb \
  op monitor interval=120s
  primitive res-share ocf:heartbeat:exportfs \
  params directory=/media/media0/nfsroot/export1 clientspec=
  10.236.22.0/24 options=rw,async,no_subtree_check,no_root_squash
 fsid=1
  \
  op monitor interval=10s timeout=30s \
  op start interval=10 timeout=40s \
  op stop interval=0 timeout=40s
  primitive st-null stonith:null \
  params hostlist=msk-nfs-gw01 msk-nfs-gw02
  group nfs portmap nfs-kernel-server
  clone clone-dlm res-dlm \
  meta globally-unique=false interleave=true
  clone clone-fs res-fs \
  meta globally-unique=false interleave=true
  clone clone-nfs nfs \
  meta globally-unique=false interleace=true
  clone clone-o2cb res-o2cb \
  meta globally-unique=false interleave=true
  clone clone-share res-share \
  meta globally-unique=false interleave=true
  clone fencing st-null
  clone ping_clone ping \
  meta globally-unique=false
  location nfs1-ip-on-nfs1 res-nfs1-ip 50: msk-nfs-gw01
  location nfs2-ip-on-nfs2 res-nfs2-ip 50: msk-nfs-gw02
  colocation col-fs-o2cb inf: clone-fs clone-o2cb
  colocation col-nfs-fs inf: clone-nfs clone-fs
  colocation col-o2cb-dlm inf: clone-o2cb clone-dlm
  colocation col-share-nfs inf: clone-share clone-nfs
  order ord-dlm-o2cb 0: clone-dlm clone-o2cb
  order ord-nfs-share 0: clone-nfs clone-share
  order ord-o2cb-fs 0: clone-o2cb clone-fs
  order ord-o2cb-nfs 0: clone-fs clone-nfs
  order ord-share-nfs1 0: clone-share res-nfs1-ip
  order ord-share-nfs2 0: clone-share res-nfs2-ip
  property $id=cib-bootstrap-options \
  dc-version=1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3 \
  cluster-infrastructure=openais \
  expected-quorum-votes=2 \
  stonith-enabled=true \
  no-quorum-policy=ignore \
  last-lrm-refresh=1308040111
 
  --
  Best Regards
  Alexander Malaev

  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




 --
 С уважением,
 Александр Малаев
 +7-962-938-9323

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org

Re: [Pacemaker] strage monitor behaivour

2011-06-07 Thread Serge Dubrouski

This questions pops up over and over again. Pacemaker has to make sure that
your resources aren't up anywhere in the cluster before start them up on
designated nodes. That means that it has to be able to run status/monitor
operations for all configured resources on all configured nodes. You can't
just add a 3rd quorum node into cluster you have to make sure that all RAs
that you use can run on that 3rd node properly.

On Tue, Jun 7, 2011 at 1:58 AM, ruslan usifov ruslan.usi...@gmail.comwrote:

 Hello

 I have 3 node cluster (in future we add another one node) with follow
 configuration:

 crm(live)configure# show
 node drbd1
 node drbd2
 node drbd3
 primitive drbd_web ocf:linbit:drbd \
 params drbd_resource=web \
 op monitor interval=10s timeout=60s
 primitive drbd_web-U ocf:linbit:drbd \
 params drbd_resource=web-U \
 op monitor interval=10s timeout=60s
 primitive iscsi_ip_web ocf:heartbeat:IPaddr2 \
 params ip=192.168.19.91 nic=eth1:1 cidr_netmask=24
 primitive iscsi_web_target ocf:heartbeat:iSCSITarget \
 params iqn=iqn.2010-06.playrix.local:san.web \
 op monitor interval=10s timeout=30s
 primitive iscsi_web_target_lun0 ocf:heartbeat:iSCSILogicalUnit \
 params lun=0 path=/dev/drbd10
 target_iqn=iqn.2010-06.playrix.local:san.web
 group iscsi_web iscsi_ip_web iscsi_web_target iscsi_web_target_lun0
 ms ms_drbd_web drbd_web \
 meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true globally-unique=false
 target-role=Started is-managed=true
 ms ms_drbd_web-U drbd_web-U \
 meta master-max=1 master-node-max=1 clone-max=1
 clone-node-max=1 notify=true is-managed=true globally-unique=false
 location ms_drbd_web-U_on_drbd1_or_drbd2 ms_drbd_web-U \
 rule $id=ms_drbd_web-U_on_drbd1_or_drbd2-rule -inf: #uname ne
 drbd1 and #uname ne drbd2
 location ms_drbd_web_on_drbd1_or_drbd2 ms_drbd_web \
 rule $id=ms_drbd_web_on_drbd1_or_drbd2-rule -inf: #uname ne drbd1
 and #uname ne drbd2
 colocation drbd_web-U_on_drbd_web inf: ms_drbd_web-U:Master
 ms_drbd_web:Master
 colocation iscsi_ip_web_on_drbd_web inf: iscsi_ip_web ms_drbd_web:Master
 colocation iscsi_web_on_drbd_web-U inf: iscsi_web ms_drbd_web-U:Master
 order iscsi_web_after_ms_drbd_web-U inf: ms_drbd_web-U:start iscsi_web
 order ms_drbd_web-U_after_iscsi_ip_web inf: iscsi_ip_web:start
 ms_drbd_web-U:start
 order ms_drbd_web-U_before_ms_drbd_web inf: ms_drbd_web:promote
 iscsi_ip_web:start
 property $id=cib-bootstrap-options \
 dc-version=1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 \
 cluster-infrastructure=openais \
 expected-quorum-votes=3 \
 stonith-enabled=false \
 last-lrm-refresh=1307432239 \
 symmetric-cluster=true



 In this configuration i want that all resources ran only on drbd1 and drb2
 nodes. And as i understand wit location constraint i should reach this
 objective. And all resource mast run on drbd1 and drbb2 nodes. But i got
 follow error:

 Failed actions:
 iscsi_web_target_monitor_0 (node=drbd3, call=5, rc=6, status=complete):
 not configured
 iscsi_web_target_lun0_monitor_0 (node=drbd3, call=6, rc=6,
 status=complete): not configured



 And i confused why drbd3? There is nothing must run o monitored. Please if
 it is possible explain this behavior



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] strage monitor behaivour

2011-06-07 Thread Serge Dubrouski

No, RA acts like it should. It can't find necessary software and returns
OCF_NOT_CONFIGURED, all RAs act this way. You have to install all software
used in you cluster on all nodes even if you are not actually planning to
run that software on some of them.

On Tue, Jun 7, 2011 at 8:02 AM, ruslan usifov ruslan.usi...@gmail.comwrote:

 Thanks fo replay i undestend already this moment. Now i think that this is
 problem of ocf:heartbeat:iSCSITarget RA, which return unproperly return
 code, when no any iscssi target software installed

 2011/6/7 Serge Dubrouski serge...@gmail.com

 This questions pops up over and over again. Pacemaker has to make sure
 that your resources aren't up anywhere in the cluster before start them up
 on designated nodes. That means that it has to be able to run status/monitor
 operations for all configured resources on all configured nodes. You can't
 just add a 3rd quorum node into cluster you have to make sure that all RAs
 that you use can run on that 3rd node properly.

 On Tue, Jun 7, 2011 at 1:58 AM, ruslan usifov ruslan.usi...@gmail.comwrote:

 Hello

 I have 3 node cluster (in future we add another one node) with follow
 configuration:

 crm(live)configure# show
 node drbd1
 node drbd2
 node drbd3
 primitive drbd_web ocf:linbit:drbd \
 params drbd_resource=web \
 op monitor interval=10s timeout=60s
 primitive drbd_web-U ocf:linbit:drbd \
 params drbd_resource=web-U \
 op monitor interval=10s timeout=60s
 primitive iscsi_ip_web ocf:heartbeat:IPaddr2 \
 params ip=192.168.19.91 nic=eth1:1 cidr_netmask=24
 primitive iscsi_web_target ocf:heartbeat:iSCSITarget \
 params iqn=iqn.2010-06.playrix.local:san.web \
 op monitor interval=10s timeout=30s
 primitive iscsi_web_target_lun0 ocf:heartbeat:iSCSILogicalUnit \
 params lun=0 path=/dev/drbd10
 target_iqn=iqn.2010-06.playrix.local:san.web
 group iscsi_web iscsi_ip_web iscsi_web_target iscsi_web_target_lun0
 ms ms_drbd_web drbd_web \
 meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true globally-unique=false
 target-role=Started is-managed=true
 ms ms_drbd_web-U drbd_web-U \
 meta master-max=1 master-node-max=1 clone-max=1
 clone-node-max=1 notify=true is-managed=true globally-unique=false
 location ms_drbd_web-U_on_drbd1_or_drbd2 ms_drbd_web-U \
 rule $id=ms_drbd_web-U_on_drbd1_or_drbd2-rule -inf: #uname ne
 drbd1 and #uname ne drbd2
 location ms_drbd_web_on_drbd1_or_drbd2 ms_drbd_web \
 rule $id=ms_drbd_web_on_drbd1_or_drbd2-rule -inf: #uname ne
 drbd1 and #uname ne drbd2
 colocation drbd_web-U_on_drbd_web inf: ms_drbd_web-U:Master
 ms_drbd_web:Master
 colocation iscsi_ip_web_on_drbd_web inf: iscsi_ip_web ms_drbd_web:Master
 colocation iscsi_web_on_drbd_web-U inf: iscsi_web ms_drbd_web-U:Master
 order iscsi_web_after_ms_drbd_web-U inf: ms_drbd_web-U:start iscsi_web
 order ms_drbd_web-U_after_iscsi_ip_web inf: iscsi_ip_web:start
 ms_drbd_web-U:start
 order ms_drbd_web-U_before_ms_drbd_web inf: ms_drbd_web:promote
 iscsi_ip_web:start
 property $id=cib-bootstrap-options \
 dc-version=1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 \
 cluster-infrastructure=openais \
 expected-quorum-votes=3 \
 stonith-enabled=false \
 last-lrm-refresh=1307432239 \
 symmetric-cluster=true



 In this configuration i want that all resources ran only on drbd1 and
 drb2 nodes. And as i understand wit location constraint i should reach this
 objective. And all resource mast run on drbd1 and drbb2 nodes. But i got
 follow error:

 Failed actions:
 iscsi_web_target_monitor_0 (node=drbd3, call=5, rc=6,
 status=complete): not configured
 iscsi_web_target_lun0_monitor_0 (node=drbd3, call=6, rc=6,
 status=complete): not configured



 And i confused why drbd3? There is nothing must run o monitored. Please
 if it is possible explain this behavior



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




 --
 Serge Dubrouski.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux

Re: [Pacemaker] strage monitor behaivour

2011-06-07 Thread Serge Dubrouski

On Tue, Jun 7, 2011 at 9:04 AM, ruslan usifov ruslan.usi...@gmail.comwrote:

 Hm, drbd RA for example work perfectly if drbd isn't installed on node.
 Also i found that probe error dissapear when i add implementation
 parameter to target and lun RA.


That's really strange because  DRBD RA has code like this:

drbd_init() {
check_binary $DRBDADM

..

drbd_validate_all () {
# First check the configuration file
if [ -n $DRBDCONF ]  [ ! -f $DRBDCONF ]; then
ocf_log err Configuration file does not exist: $DRBDCONF
return $OCF_ERR_CONFIGURED
fi

.

So it has to fail if it can't find binary or config file.


iSCSILogical Unit on its turn checks for this:

iSCSILogicalUnit_validate() {
# Do we have all required variables?
for var in implementation target_iqn lun path; do
param=OCF_RESKEY_${var}
if [ -z ${!param} ]; then
ocf_log error Missing resource parameter \$var\!
exit $OCF_ERR_CONFIGURED
fi
done


So those parameters have to be set as far as I understand.

I've never used iSCSI resources but the basic idea is the same: all used RA
in you cluster have to be able to confirm that all configured resources are
down on any cluster node. Doesn't matter what kind of collocation rules you
use.



 PS: great thanks for your explanations


 2011/6/7 Serge Dubrouski serge...@gmail.com

 No, RA acts like it should. It can't find necessary software and returns
 OCF_NOT_CONFIGURED, all RAs act this way. You have to install all software
 used in you cluster on all nodes even if you are not actually planning to
 run that software on some of them.


 On Tue, Jun 7, 2011 at 8:02 AM, ruslan usifov ruslan.usi...@gmail.comwrote:

 Thanks fo replay i undestend already this moment. Now i think that this
 is problem of ocf:heartbeat:iSCSITarget RA, which return unproperly return
 code, when no any iscssi target software installed

 2011/6/7 Serge Dubrouski serge...@gmail.com

 This questions pops up over and over again. Pacemaker has to make sure
 that your resources aren't up anywhere in the cluster before start them up
 on designated nodes. That means that it has to be able to run 
 status/monitor
 operations for all configured resources on all configured nodes. You can't
 just add a 3rd quorum node into cluster you have to make sure that all RAs
 that you use can run on that 3rd node properly.

 On Tue, Jun 7, 2011 at 1:58 AM, ruslan usifov 
 ruslan.usi...@gmail.comwrote:

 Hello

 I have 3 node cluster (in future we add another one node) with follow
 configuration:

 crm(live)configure# show
 node drbd1
 node drbd2
 node drbd3
 primitive drbd_web ocf:linbit:drbd \
 params drbd_resource=web \
 op monitor interval=10s timeout=60s
 primitive drbd_web-U ocf:linbit:drbd \
 params drbd_resource=web-U \
 op monitor interval=10s timeout=60s
 primitive iscsi_ip_web ocf:heartbeat:IPaddr2 \
 params ip=192.168.19.91 nic=eth1:1 cidr_netmask=24
 primitive iscsi_web_target ocf:heartbeat:iSCSITarget \
 params iqn=iqn.2010-06.playrix.local:san.web \
 op monitor interval=10s timeout=30s
 primitive iscsi_web_target_lun0 ocf:heartbeat:iSCSILogicalUnit \
 params lun=0 path=/dev/drbd10
 target_iqn=iqn.2010-06.playrix.local:san.web
 group iscsi_web iscsi_ip_web iscsi_web_target iscsi_web_target_lun0
 ms ms_drbd_web drbd_web \
 meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true globally-unique=false
 target-role=Started is-managed=true
 ms ms_drbd_web-U drbd_web-U \
 meta master-max=1 master-node-max=1 clone-max=1
 clone-node-max=1 notify=true is-managed=true globally-unique=false
 location ms_drbd_web-U_on_drbd1_or_drbd2 ms_drbd_web-U \
 rule $id=ms_drbd_web-U_on_drbd1_or_drbd2-rule -inf: #uname ne
 drbd1 and #uname ne drbd2
 location ms_drbd_web_on_drbd1_or_drbd2 ms_drbd_web \
 rule $id=ms_drbd_web_on_drbd1_or_drbd2-rule -inf: #uname ne
 drbd1 and #uname ne drbd2
 colocation drbd_web-U_on_drbd_web inf: ms_drbd_web-U:Master
 ms_drbd_web:Master
 colocation iscsi_ip_web_on_drbd_web inf: iscsi_ip_web
 ms_drbd_web:Master
 colocation iscsi_web_on_drbd_web-U inf: iscsi_web ms_drbd_web-U:Master
 order iscsi_web_after_ms_drbd_web-U inf: ms_drbd_web-U:start iscsi_web
 order ms_drbd_web-U_after_iscsi_ip_web inf: iscsi_ip_web:start
 ms_drbd_web-U:start
 order ms_drbd_web-U_before_ms_drbd_web inf: ms_drbd_web:promote
 iscsi_ip_web:start
 property $id=cib-bootstrap-options \
 dc-version=1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 \
 cluster-infrastructure=openais \
 expected-quorum-votes=3 \
 stonith-enabled=false \
 last-lrm-refresh=1307432239 \
 symmetric-cluster=true



 In this configuration i want that all resources ran only on drbd1 and
 drb2 nodes. And as i understand wit location constraint i should reach 
 this
 objective. And all resource mast run

Re: [Pacemaker] strage monitor behaivour

2011-06-07 Thread Serge Dubrouski

On Tue, Jun 7, 2011 at 9:55 AM, Dejan Muhamedagic deja...@fastmail.fmwrote:

 On Tue, Jun 07, 2011 at 09:47:17AM -0600, Serge Dubrouski wrote:
  On Tue, Jun 7, 2011 at 9:39 AM, Dejan Muhamedagic deja...@fastmail.fm
 wrote:
 
   Hi,
  
   On Tue, Jun 07, 2011 at 08:45:07AM -0600, Serge Dubrouski wrote:
No, RA acts like it should. It can't find necessary software and
 returns
OCF_NOT_CONFIGURED, all RAs act this way. You have to install all
   software
used in you cluster on all nodes even if you are not actually
 planning to
run that software on some of them.
  
   This is not so. A resource agents should be able to figure out if
   there's no software installed and then return NOT_INSTALLED or
   NOT_RUNNING.
  
 
  That changes exit code but doesn't change the requirements to have that
  software installed and able to report that it's down.

 If it's not installed, then it cannot run. The difference between
 NOT_INSTALLED and NOT_CONFIGURED is that in the former case
 pacemaker won't try to start the resource on that node whereas in
 the latter it will give up on the resource completely.


Thanks for clarifications. Now it better explains Ruslan's case. DRBD isn't
installed so it returns NOT_INSTALLED and that's treated as definitely DOWN
by Pacemaker when it runs status/monitor operations. iSCSI in it's turn is
installed but not configured and that's treated as state is UNKNOWN. Am I
correct?


 Cheers,

 Dejan

   Thanks,
  
   Dejan
  
On Tue, Jun 7, 2011 at 8:02 AM, ruslan usifov 
 ruslan.usi...@gmail.com
   wrote:
   
 Thanks fo replay i undestend already this moment. Now i think that
 this
   is
 problem of ocf:heartbeat:iSCSITarget RA, which return unproperly
 return
 code, when no any iscssi target software installed

 2011/6/7 Serge Dubrouski serge...@gmail.com

 This questions pops up over and over again. Pacemaker has to make
 sure
 that your resources aren't up anywhere in the cluster before start
   them up
 on designated nodes. That means that it has to be able to run
   status/monitor
 operations for all configured resources on all configured nodes.
 You
   can't
 just add a 3rd quorum node into cluster you have to make sure that
 all
   RAs
 that you use can run on that 3rd node properly.

 On Tue, Jun 7, 2011 at 1:58 AM, ruslan usifov 
   ruslan.usi...@gmail.comwrote:

 Hello

 I have 3 node cluster (in future we add another one node) with
 follow
 configuration:

 crm(live)configure# show
 node drbd1
 node drbd2
 node drbd3
 primitive drbd_web ocf:linbit:drbd \
 params drbd_resource=web \
 op monitor interval=10s timeout=60s
 primitive drbd_web-U ocf:linbit:drbd \
 params drbd_resource=web-U \
 op monitor interval=10s timeout=60s
 primitive iscsi_ip_web ocf:heartbeat:IPaddr2 \
 params ip=192.168.19.91 nic=eth1:1 cidr_netmask=24
 primitive iscsi_web_target ocf:heartbeat:iSCSITarget \
 params iqn=iqn.2010-06.playrix.local:san.web \
 op monitor interval=10s timeout=30s
 primitive iscsi_web_target_lun0 ocf:heartbeat:iSCSILogicalUnit \
 params lun=0 path=/dev/drbd10
 target_iqn=iqn.2010-06.playrix.local:san.web
 group iscsi_web iscsi_ip_web iscsi_web_target
 iscsi_web_target_lun0
 ms ms_drbd_web drbd_web \
 meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true globally-unique=false
 target-role=Started is-managed=true
 ms ms_drbd_web-U drbd_web-U \
 meta master-max=1 master-node-max=1 clone-max=1
 clone-node-max=1 notify=true is-managed=true
   globally-unique=false
 location ms_drbd_web-U_on_drbd1_or_drbd2 ms_drbd_web-U \
 rule $id=ms_drbd_web-U_on_drbd1_or_drbd2-rule -inf:
 #uname
   ne
 drbd1 and #uname ne drbd2
 location ms_drbd_web_on_drbd1_or_drbd2 ms_drbd_web \
 rule $id=ms_drbd_web_on_drbd1_or_drbd2-rule -inf:
 #uname ne
 drbd1 and #uname ne drbd2
 colocation drbd_web-U_on_drbd_web inf: ms_drbd_web-U:Master
 ms_drbd_web:Master
 colocation iscsi_ip_web_on_drbd_web inf: iscsi_ip_web
   ms_drbd_web:Master
 colocation iscsi_web_on_drbd_web-U inf: iscsi_web
   ms_drbd_web-U:Master
 order iscsi_web_after_ms_drbd_web-U inf: ms_drbd_web-U:start
   iscsi_web
 order ms_drbd_web-U_after_iscsi_ip_web inf: iscsi_ip_web:start
 ms_drbd_web-U:start
 order ms_drbd_web-U_before_ms_drbd_web inf: ms_drbd_web:promote
 iscsi_ip_web:start
 property $id=cib-bootstrap-options \

 dc-version=1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
   \
 cluster-infrastructure=openais \
 expected-quorum-votes=3 \
 stonith-enabled=false \
 last-lrm-refresh=1307432239 \
 symmetric-cluster=true



 In this configuration i want that all resources ran only on drbd1

Re: [Pacemaker] Load Balancing Failover with Pacemaker

2011-06-06 Thread Serge Dubrouski

Such documentation does not exists because such configuration does not
exists. What do you exactly mean b=y active/active in this case?


On Sun, Jun 5, 2011 at 11:31 PM, Sachin Gokhale 
sysad...@indicussoftware.com wrote:

 Where I get active/active setup document for postgres? I searched on
 clusterlabs site...but didn't get any help.

 sachin

 On Fri, 20 May 2011 08:12:09 -0600
 Serge Dubrouski serge...@gmail.com wrote:

  On Fri, May 20, 2011 at 7:56 AM, Daniel Bozeman 
  daniel.boze...@americanroamer.com wrote:
 
   Sachin,
  
   Check out pgpool-II (http://pgpool.projects.postgresql.org/). Using
   pgpool, you can use postgres 9.0 streaming replication to set up a
 master
   server and a hot-standby which can accept read-only queries. pgpool
 will
   automatically send read-only transactions to the hot-standby, while
 writes
   will be sent to the master database and replicated to the standby using
   postgres streaming replication. Pgpool will also manage failover and
 online
   recovery of your postgres instances. You can then use Pacemaker to make
   pgpool and a virtual IP pointing to it highly available.
  
   That being said, implementing a reliable active/passive cluster with
   Pacemaker is not trivial. I am currently using LSB to manage pgpool,
 but I
   plan to write my own OCF resource or test/modify the one provided with
 the
   older pgpool-ha project.
  
 
  Depends on what you call  active/passive. Active/Passive can be easily
  implemented with DRBD replication and there is at least 2 HOW-TO
 documents
  on clusterlabs web site. Streaming replication is more like Active/Active
  and this one isn't trivial at all. And not because of Pacemaker but
 because
  of the way how Streaming replication is implemented in PostgreSQL. If it
 was
  close to Oracle DataGuard for example it would be much easier.
 
 
   Take care,
  
   Daniel
  
   On May 20, 2011, at 7:37 AM, Serge Dubrouski wrote:
  
  
  
   On Thu, May 19, 2011 at 11:07 PM, Sachin Gokhale 
   sysad...@indicussoftware.com wrote:
  
   Hello,
  
   I have two nodes active/passive cluster created using documentation
 from
   clusterlabs. Replication is working fine on my setup. I am using this
   for Postgresql database.
  
   But, I am getting how I configure or test Faileover scenario  how can
 I
   add load balancing feature for my Postgres database.
  
  
   Do you need read only load balancing or read/write? First one is
 possible
   with PGSQL hot-standby databases, second one isn't possible in PGSQL
 AFAIK.
  
  
  
   Looking for your kine help.
   Thanks in advance.
  
   Regards,
  
   Sachin
   **
   Sachin  N. Gokhale
   System Administrator
   sysad...@indicussoftware.com
   Indicus Software Pvt. Ltd.
   28' Varshanand Society.
   Sinhgad Road Pune - 411038
   Tel. No. -  91-20-24341287/88
   Fax No. - 91-20-24341289
   **
  
  
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs:
  
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  
  
  
  
   --
   Serge Dubrouski.
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs:
  
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  
  
  
  
  
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs:
  
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  
  
 
 
  --
  Serge Dubrouski.

 **
 Sachin  N. Gokhale
 System Administrator
 sysad...@indicussoftware.com
 Indicus Software Pvt. Ltd.
 28' Varshanand Society.
 Sinhgad Road Pune - 411038
 Tel. No. -  91-20-24341287/88
 Fax No. - 91-20-24341289
 **


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org

Re: [Pacemaker] corosync-quorumtool configuration

2011-05-27 Thread Serge Dubrouski

On Fri, May 27, 2011 at 1:39 PM, Roman Schartlmüller 
roman_schartlmuel...@gmx.at wrote:


  Original-Nachricht 
  Datum: Fri, 27 May 2011 20:30:47 +0200
  Von: Jelle de Jong jelledej...@powercraft.nl
  An: pacemaker@oss.clusterlabs.org
  Betreff: Re: [Pacemaker] corosync-quorumtool configuration

  On 27-05-11 18:19, Roman Schartlmüller wrote:
   Can anybody please help me with configuration? Maybe there is something
   wrong.
 
  Maybe lose the no-quorum-policy=ignore options. And a html mail with
  tables really :D
 
  Kind regards,
 
  Jelle de Jong
 
 Sorry, I did not know that html emails are unwanted. no-quorum-policy =
 ignore means that the node should not worry about the loss of quorum. But
 the quorum is lost anyway.

 I would like to know how to increase the votes of a node. Somehow, the
 value remains to 1.


I don't think that that feature of Cororsync is supported by Pacemaker. I
can be wrong here. So far the rule of thumb was: 2 nodes cluster never has a
quorum, add one more node.



 May 27 21:13:16 corosync [pcmk  ] info: config_find_init: Local handle:
 4835695805891346437 for quorum
 May 27 21:13:16 corosync [pcmk  ] info: config_find_next: Processing
 additional quorum options...
 May 27 21:13:16 corosync [pcmk  ] info: get_config_opt: Found
 'corosync_votequorum' for option: provider
 May 27 21:13:16 corosync [pcmk  ] info: update_member: Node Node1 now has 1
 quorum votes (was 0)
 May 27 21:13:16 corosync [QUORUM] Using quorum provider corosync_votequorum
 May 27 21:13:16 corosync [SERV  ] Service engine loaded: corosync votes
 quorum service v0.91
 May 27 21:13:16 corosync [SERV  ] Service engine loaded: corosync cluster
 quorum service v0.1
 May 27 21:13:18 corosync [VOTEQ ] quorum regained, resuming activity
 May 27 21:13:18 corosync [pcmk  ] info: update_member: Node Node2 now has 1
 quorum votes (was 0)
 --
 Mit freundlichen Grüßen

 Roman Schartlmüller

 NEU: FreePhone - kostenlos mobil telefonieren!
 Jetzt informieren: http://www.gmx.net/de/go/freephone

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] corosync-quorumtool configuration

2011-05-27 Thread Serge Dubrouski

On Fri, May 27, 2011 at 1:46 PM, Serge Dubrouski serge...@gmail.com wrote:



 On Fri, May 27, 2011 at 1:39 PM, Roman Schartlmüller 
 roman_schartlmuel...@gmx.at wrote:


  Original-Nachricht 
  Datum: Fri, 27 May 2011 20:30:47 +0200
  Von: Jelle de Jong jelledej...@powercraft.nl
  An: pacemaker@oss.clusterlabs.org
  Betreff: Re: [Pacemaker] corosync-quorumtool configuration

  On 27-05-11 18:19, Roman Schartlmüller wrote:
   Can anybody please help me with configuration? Maybe there is
 something
   wrong.
 
  Maybe lose the no-quorum-policy=ignore options. And a html mail with
  tables really :D
 
  Kind regards,
 
  Jelle de Jong
 
 Sorry, I did not know that html emails are unwanted. no-quorum-policy =
 ignore means that the node should not worry about the loss of quorum. But
 the quorum is lost anyway.

 I would like to know how to increase the votes of a node. Somehow, the
 value remains to 1.


 I don't think that that feature of Cororsync is supported by Pacemaker. I
 can be wrong here. So far the rule of thumb was: 2 nodes cluster never has a
 quorum, add one more node.


And if you think more it shouldn't work in 2 nodes cluster. Let's say you
set up one node with 1 vote another node with 2 votes and expected votes set
to 2 as well. Now pretend that your node with 2 votes dies. You loose your
cluster in this way. Using nodes with number of votes higher than 1 makes
sense for clusters with even numbers of nodes that can be split equally.
Then partition that includes the node with higher number of votes wins.





 May 27 21:13:16 corosync [pcmk  ] info: config_find_init: Local handle:
 4835695805891346437 for quorum
 May 27 21:13:16 corosync [pcmk  ] info: config_find_next: Processing
 additional quorum options...
 May 27 21:13:16 corosync [pcmk  ] info: get_config_opt: Found
 'corosync_votequorum' for option: provider
 May 27 21:13:16 corosync [pcmk  ] info: update_member: Node Node1 now has
 1 quorum votes (was 0)
 May 27 21:13:16 corosync [QUORUM] Using quorum provider
 corosync_votequorum
 May 27 21:13:16 corosync [SERV  ] Service engine loaded: corosync votes
 quorum service v0.91
 May 27 21:13:16 corosync [SERV  ] Service engine loaded: corosync cluster
 quorum service v0.1
 May 27 21:13:18 corosync [VOTEQ ] quorum regained, resuming activity
 May 27 21:13:18 corosync [pcmk  ] info: update_member: Node Node2 now has
 1 quorum votes (was 0)
 --
 Mit freundlichen Grüßen

 Roman Schartlmüller

 NEU: FreePhone - kostenlos mobil telefonieren!
 Jetzt informieren: http://www.gmx.net/de/go/freephone

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




 --
 Serge Dubrouski.




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Required resources for stateful clones

2011-05-20 Thread Serge Dubrouski

!)

 #
 # Resource stickiness pumped up to 1000 :  #
 #

 $CRM_MASTER -v $PROMOTE_WERT_HOCH || exit $(echo crm_master could not
 change the Master's status!)

 
 # Success! #
 

 return $OCF_SUCCESS

 }


 ##

 Thanks!


And what about demote? Switching standby into primary using trigger files
changes TIMELINE in the DB and that invalidates all other standby databases
as well as previous master database. After that you have to restore them
from a fresh backup made on new master. This particular behavior stopped me
from implementing Master/Slave functionality in pgsql RA so far.

BTW, why pgsql is set to is-managed=false in your configuration.With this
setting cluster will keep monitoring it but won't take any other actions
AFAIK.


Éamon



  Unfortunately, promote/demote doesn't work. ocf-tester tries to use the
  crm_attribute -N pgsql1 -n master-pgrql-replication-agent -l reboot -v
  100, but the (unmanaged) resources don't accept the score change.
 
  I'm pretty sure that I just need to be hit with a clue stick and would
 be
  grateful for any help.
 
  Thanks,
 
  ?amon
 



 --
 Serge Dubrouski.
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Load Balancing Failover with Pacemaker

2011-05-20 Thread Serge Dubrouski

On Thu, May 19, 2011 at 11:07 PM, Sachin Gokhale 
sysad...@indicussoftware.com wrote:

 Hello,

 I have two nodes active/passive cluster created using documentation from
 clusterlabs. Replication is working fine on my setup. I am using this
 for Postgresql database.

 But, I am getting how I configure or test Faileover scenario  how can I
 add load balancing feature for my Postgres database.


Do you need read only load balancing or read/write? First one is possible
with PGSQL hot-standby databases, second one isn't possible in PGSQL AFAIK.



 Looking for your kine help.
 Thanks in advance.

 Regards,

 Sachin
 **
 Sachin  N. Gokhale
 System Administrator
 sysad...@indicussoftware.com
 Indicus Software Pvt. Ltd.
 28' Varshanand Society.
 Sinhgad Road Pune - 411038
 Tel. No. -  91-20-24341287/88
 Fax No. - 91-20-24341289
 **


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Required resources for stateful clones

2011-05-20 Thread Serge Dubrouski

 recovery.conf always tries to start as
standby. You have to have master's IP address there and path to archived log
files.




  
  
   # If both file exist or can be created, then the failover fun can
 start.
  
   ocf_log info $OCF_RESKEY_trigger_file was created.
   ocf_log info $OCF_RESKEY_recovery_conf exists and can be copied to the
   correct location.
  
   # Sometimes, the master needs a bit of time to take the reins. So...
  
   while :
   do
   pgsql_monitor warn
   rc=$?
  
   if [ $rc -eq $OCF_RUNNING_MASTER ]; then
   break;
   fi
  
   ocf_log debug Postgres Server could not be promoted. Please
   wait...
  
   sleep 1
  
   done
  
   ocf_log info Postgres Server has been promoted. Please check on the
   previous master.
  
   #
   #Attributes Update: #
   #
  
   $ATTRD_UPDATER -n $PGSQL_STATUS_NAME -v \PRI\ || exit $(echo Eh!
   Attrd_updater is not working!)
  
   #
   # Resource stickiness pumped up to 1000 :  #
   #
  
   $CRM_MASTER -v $PROMOTE_WERT_HOCH || exit $(echo crm_master could not
   change the Master's status!)
  
   
   # Success! #
   
  
   return $OCF_SUCCESS
  
   }
  
  
  
 
 ##
  
   Thanks!
  
  
  And what about demote? Switching standby into primary using trigger files
  changes TIMELINE in the DB and that invalidates all other standby
 databases
  as well as previous master database. After that you have to restore them
  from a fresh backup made on new master. This particular behavior stopped
 me
  from implementing Master/Slave functionality in pgsql RA so far.
 
  BTW, why pgsql is set to is-managed=false in your configuration.With
 this
  setting cluster will keep monitoring it but won't take any other actions
  AFAIK.

 Demote? Well, seeing as neither promote nor demote actually worked for me,
 I thought I would start small.


It doesn't work because you have it in unmanaged state I think.



 As far as the trigger file switching goes, you're of course completely
 right. This behavior isn't really a big deal in my environment, as it's
 meant as more of test and we want to bring back the demoted servers up
 manually, but I can see that it would cause a lot of problems in a more


That means that demote operation should stop master server which isn't the
best behavior IMHO.



 complex environment. When I tested the failover functionality without
 pacemaker, I have to perform a fresh backup even if I waited less than 30s
 to bring the old master back up as a standby.

 I guess that with 9.1 this will be easier...

 I unmanaged the resources so that my test agent would handle them. Is this
 incorrect?


Again I think you are wrong. In this mode pacemaker won't call your RA to
promote/demote or failover your resource.




 
 
  ?amon
  
  
  
Unfortunately, promote/demote doesn't work. ocf-tester tries to use
 the
crm_attribute -N pgsql1 -n master-pgrql-replication-agent -l reboot
 -v
100, but the (unmanaged) resources don't accept the score change.
   
I'm pretty sure that I just need to be hit with a clue stick and
 would
   be
grateful for any help.
   
Thanks,
   
?amon
   
  
  
  
   --
   Serge Dubrouski.
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs:
  
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

  
  
 
 
  --
  Serge Dubrouski.
  -- next part --
  An HTML attachment was scrubbed...
  URL: http://oss.clusterlabs.org/pipermail/pacemaker/attachments/
  20110520/e1f26230/attachment.html
 
  --

 
  ___
  Pacemaker mailing list
  Pacemaker@oss.clusterlabs.org
 
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 
  End of Pacemaker Digest, Vol 42, Issue 53
  *

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org

Re: [Pacemaker] Required resources for stateful clones (Serge Dubrouski)

2011-05-20 Thread Serge Dubrouski

 a failover with a trigger file, it
 wants
   to find a recovery.conf, which it then processes (checking the
 archive for
   missing updates etc.) and renames (after noticing the trigger file).
  
   I assumed that this would work in exactly the same way with Streaming
   Replication.
  
   Am I wrong?
 
 
  I think so. You have to have recovery.conf when you start your standby,
 not
  master. Actually instance that has recovery.conf always tries to start as
  standby. You have to have master's IP address there and path to archived
 log
  files.
 
 

 So the failover behavior in binary replication and streaming replication is
 different? Or is the wiki entry just antiquated?


This is correct: http://wiki.postgresql.org/wiki/Streaming_Replication



  
  


 # If both file exist or can be created, then the failover fun can
   start.

 ocf_log info $OCF_RESKEY_trigger_file was created.
 ocf_log info $OCF_RESKEY_recovery_conf exists and can be copied to
 the
 correct location.

 # Sometimes, the master needs a bit of time to take the reins.
 So...

 while :
 do
 pgsql_monitor warn
 rc=$?

 if [ $rc -eq $OCF_RUNNING_MASTER ]; then
 break;
 fi

 ocf_log debug Postgres Server could not be promoted.
 Please
 wait...

 sleep 1

 done

 ocf_log info Postgres Server has been promoted. Please check on
 the
 previous master.

 #
 #Attributes Update: #
 #

 $ATTRD_UPDATER -n $PGSQL_STATUS_NAME -v \PRI\ || exit $(echo Eh!
 Attrd_updater is not working!)

 #
 # Resource stickiness pumped up to 1000 :  #
 #

 $CRM_MASTER -v $PROMOTE_WERT_HOCH || exit $(echo crm_master could
 not
 change the Master's status!)

 
 # Success! #
 

 return $OCF_SUCCESS

 }



   
  
 
 ##

 Thanks!


And what about demote? Switching standby into primary using trigger
 files
changes TIMELINE in the DB and that invalidates all other standby
   databases
as well as previous master database. After that you have to restore
 them
from a fresh backup made on new master. This particular behavior
 stopped
   me
from implementing Master/Slave functionality in pgsql RA so far.
   
BTW, why pgsql is set to is-managed=false in your
 configuration.With
   this
setting cluster will keep monitoring it but won't take any other
 actions
AFAIK.
  
   Demote? Well, seeing as neither promote nor demote actually worked for
 me,
   I thought I would start small.
  
 
  It doesn't work because you have it in unmanaged state I think.
 

 I'm using the ocf-tester utility to test the agent. Won't there be a
 conflict if I try and have the cluster manage the resources and then try and
 wrest it's control away with my own testing agent?


 
  
   As far as the trigger file switching goes, you're of course completely
   right. This behavior isn't really a big deal in my environment, as it's
   meant as more of test and we want to bring back the demoted servers up
   manually, but I can see that it would cause a lot of problems in a more
 
 
  That means that demote operation should stop master server which isn't
 the
  best behavior IMHO.
 

 I don't disagree. This was the policy that was agreed upon, so it's more
 of a political issue, really.

 Would you prefer putting it into RO mode?

 
 
   complex environment. When I tested the failover functionality without
   pacemaker, I have to perform a fresh backup even if I waited less than
 30s
   to bring the old master back up as a standby.
  
   I guess that with 9.1 this will be easier...
  
   I unmanaged the resources so that my test agent would handle them. Is
 this
   incorrect?
  
 
  Again I think you are wrong. In this mode pacemaker won't call your RA to
  promote/demote or failover your resource.
 
 
  
  
   
   
?amon



  Unfortunately, promote/demote doesn't work. ocf-tester tries to
 use
   the
  crm_attribute -N pgsql1 -n master-pgrql-replication-agent -l
 reboot
   -v
  100, but the (unmanaged) resources don't accept the score
 change.
 
  I'm pretty sure that I just need to be hit with a clue stick and
   would
 be
  grateful for any help.
 
  Thanks,
 
  ?amon
 



 --
 Serge Dubrouski.
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started

Re: [Pacemaker] Load Balancing Failover with Pacemaker

2011-05-20 Thread Serge Dubrouski

On Fri, May 20, 2011 at 7:56 AM, Daniel Bozeman 
daniel.boze...@americanroamer.com wrote:

 Sachin,

 Check out pgpool-II (http://pgpool.projects.postgresql.org/). Using
 pgpool, you can use postgres 9.0 streaming replication to set up a master
 server and a hot-standby which can accept read-only queries. pgpool will
 automatically send read-only transactions to the hot-standby, while writes
 will be sent to the master database and replicated to the standby using
 postgres streaming replication. Pgpool will also manage failover and online
 recovery of your postgres instances. You can then use Pacemaker to make
 pgpool and a virtual IP pointing to it highly available.

 That being said, implementing a reliable active/passive cluster with
 Pacemaker is not trivial. I am currently using LSB to manage pgpool, but I
 plan to write my own OCF resource or test/modify the one provided with the
 older pgpool-ha project.


Depends on what you call  active/passive. Active/Passive can be easily
implemented with DRBD replication and there is at least 2 HOW-TO documents
on clusterlabs web site. Streaming replication is more like Active/Active
and this one isn't trivial at all. And not because of Pacemaker but because
of the way how Streaming replication is implemented in PostgreSQL. If it was
close to Oracle DataGuard for example it would be much easier.


 Take care,

 Daniel

 On May 20, 2011, at 7:37 AM, Serge Dubrouski wrote:



 On Thu, May 19, 2011 at 11:07 PM, Sachin Gokhale 
 sysad...@indicussoftware.com wrote:

 Hello,

 I have two nodes active/passive cluster created using documentation from
 clusterlabs. Replication is working fine on my setup. I am using this
 for Postgresql database.

 But, I am getting how I configure or test Faileover scenario  how can I
 add load balancing feature for my Postgres database.


 Do you need read only load balancing or read/write? First one is possible
 with PGSQL hot-standby databases, second one isn't possible in PGSQL AFAIK.



 Looking for your kine help.
 Thanks in advance.

 Regards,

 Sachin
 **
 Sachin  N. Gokhale
 System Administrator
 sysad...@indicussoftware.com
 Indicus Software Pvt. Ltd.
 28' Varshanand Society.
 Sinhgad Road Pune - 411038
 Tel. No. -  91-20-24341287/88
 Fax No. - 91-20-24341289
 **


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




 --
 Serge Dubrouski.
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Required resources for stateful clones

2011-05-19 Thread Serge Dubrouski

On Thu, May 19, 2011 at 5:05 AM, Eamon Roque eamon.ro...@lex-com.netwrote:

 Hi,

 I've put together a cluster of two nodes running a databank without shared
 storage. Both nodes replicate data between them, which is taken care of by
 the databank itself.

 I have a resource for the databank and ip. I then created a stateful clone
 from the databank resource. I created colocation rules joining the
 databank-ms-clone and ip:

 node pgsqltest1
 node pgsqltest2
 primitive Postgres-IP ocf:heartbeat:IPaddr2 \
 params ip=10.19.57.234 cidr_netmask=32 \
 op monitor interval=30s \
 meta is-managed=false
 primitive resPostgres ocf:heartbeat:pgsql \
 params pgctl=/opt/PostgreSQL/9.0/bin/pg_ctl
 pgdata=/opt/PostgreSQL/9.0/data psql=/opt/PostgreSQL/9.0/bin/psql
 pgdba=postgres \
 op monitor interval=1min \
 meta is-managed=false
 ms msPostgres resPostgres \
 meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true target-role=started
 colocation colPostgres inf: Postgres-IP msPostgres:Master
 order ordPostgres inf: msPostgres:promote Postgres-IP:start
 property $id=cib-bootstrap-options \
 dc-version=1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 stonith-enabled=false \
 no-quorum-policy=ignore \
 last-lrm-refresh=1302707146
 rsc_defaults $id=rsc-options \
 resource-stickiness=200
 op_defaults $id=op_defaults-options \
 record-pending=false

 The normal postgres agent doesn't support this functionality, but I've put
 together my own using the mysql agent as a model. Before running the script
 through ocf-tester, I unmanage the postgres resource.


Could you show how you implemented promote/demote for pgsql?



 Unfortunately, promote/demote doesn't work. ocf-tester tries to use the
 crm_attribute -N pgsql1 -n master-pgrql-replication-agent -l reboot -v
 100, but the (unmanaged) resources don't accept the score change.

 I'm pretty sure that I just need to be hit with a clue stick and would be
 grateful for any help.

 Thanks,

 Éamon

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Move resources on failure

2011-05-08 Thread Serge Dubrouski

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-resource-options.html#id649472

Look for migration-threshold.

Do not use LSB script for apache, instead carefully study all features of
OCF one.


On Sun, May 8, 2011 at 8:45 AM, Sergey V. Arlashin 
maillist.arlas...@yandex.ru wrote:

 Hallo!
 I'm a newbie and I've just set up a small cluster wich consists of two
 nodes with apache and one mutual ip address. I use corosync and pacemaker.
 When I kill apache process crm restarts it so that apache continues
 working, but I want (in case of apache process failure) to make crm move the
 whole bunch of resources to another node instead.
 Is it possible?

 This is my config

 crm(live)configure# show
 node centos1 \
attributes standby=off
 node centos2 \
attributes standby=off
 primitive mutip1 ocf:heartbeat:IPaddr \
params ip=192.168.1.200 cidr_netmask=255.255.255.255 nic=eth0
 \
op monitor interval=5s timeout=20s
 primitive website lsb:httpd \
op monitor interval=15 timeout=15 start-delay=15 \
meta target-role=Started
 colocation ipapache inf: mutip1 website
 order apache-after-ip inf: mutip1 website
 property $id=cib-bootstrap-options \
dc-version=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 \
cluster-infrastructure=openais \
expected-quorum-votes=2 \
no-quorum-policy=ignore \
stonith-enabled=false


 ---
 WBR, Sergey



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Very strange behavior on asymmetric cluster

2011-03-21 Thread Serge Dubrouski

On Sat, Mar 19, 2011 at 4:14 PM, Pavel Levshin pa...@levshin.spb.ru wrote:
 19.03.2011 19:10, Dan Frincu:

 Even if that is set, we need to verify that the resources are, indeed,
 NOT running where they shouldn't be; remember, it is our job to ensure
 that the configured policy is enforced. So, we probe them everywhere to
 ensure they are indeed not around, and stop them if we find them.

 Again, WHY do you need to verify things which cannot happen by setup? If
 some resource cannot, REALLY CANNOT exist on a node, and administrator can
 confirm this, why rely on network, cluster stack, resource agents,
 electricity in power outlet, etc. to verify that 2+2 is still 4?

 Don't want to step on any toes or anything, mainly because me stepping on
 somebody's toes without the person wearing a pair of steel-toe cap boots
 would leave them toeless, but I've been hearing the ranting go on and on and
 just felt like maybe something's missing from the picture, specifically, an
 example for why checking for resources on passive nodes is a good thing,
 which I haven't seen thus far.

 ...

 Ok, so far it sounds perfect, but what happens if on the secondary/passive
 node, someone starts the service, by user error, by upgrading the software
 and thus activating its automatic startup at the given runlevel and
 restarting the secondary node (common practice when performing upgrades in a
 cluster environment), etc. If Pacemaker were not to check all the nodes for
 the service being active or not = epic fail. Its state-based model, where
 it maintains a state of the resources and performs the necessary actions to
 bring the cluster to that state is what saves us from the epic fail
 moment.

 Surely you are right. Resources must be monitored on standby nodes to
 prevent such a scenario. You can screw your setup by many other ways,
 howewer. And pacemaker (1.0.10, at least) does not execute recurring monitor
 on passive node, so you may start your service by hands, and it will be
 unnoticed for quite some time.

 What I am talking about is monitoring (probing) of a resource on a node
 where this resource cannot be exist. For example, if you have five nodes in
 your cluster and a DRBD resource, which can, by it's nature, work on no more
 than two nodes. Then, other three of your nodes will be occasionally probed
 for that resource. If that action fails, the resource will be restarted
 everywhere. If that node cannot be fenced, the resource will be dead.

As far as I understand that would require a definition of a quorum
node or another special kind of node where resource cannot exist.
Figuring out a a such role from location/collocation rules seems to
complex to me. The idea of quorum node was abandoned by long ago in
favor for some other features/project that Lars mentioned earlier.


 There is still at least one case when such a failure may happen even if RA
 is perfect: misbehaving or highly overloaded node may cause RA timeout. And
 bugs or configuration errors may, of course.

 A resource should not depend on unrelated things, such as nodes which have
 no connections to the resource. Then the resource will be more stable.

 I'm trying to be impartial here, although I may be biased by my experience
 to rule in favor of Pacemaker, but here's a thought, it's a free world, we
 all have the freedom of speech, which I'm also exercising at the moment,
 want something done, do it yourself, patches are being accepted, don't have
 the time, ask people for their help, in a polite manner, wait for them to
 reply, kindly ask them again (and prayers are heard, Steven Dake released
 http://www.mail-archive.com/openais@lists.linux-foundation.org/msg06072.html  a
 patch for automatic redundant ring recovery, thank you Steven), want
 something done fast, pay some developers to do it for you, say the folks
 over at www.linbit.com wouldn't mind some sponsorship (and I'm not
 affiliated with them in any way, believe it or not, I'm actually doing this
 without external incentives, from the kindness of my heart so to speak).

 My goal for now is to make the problem clear to the team. It is doubtful
 that such a patch will be accepted without that, given current reaction.
 Moreover, it is not clear how to fix the problem to the best advantage.

 This cluster stack is brilliant. It's a pity to see how it fails to keep a
 resource running while it is relatively simple to avoid unneeded downtime.

 Thank you for participating.


 P.S. There is a crude workaround: op monitor interval=0 timeout=10
 on_fail=nothing. Obvoiusly, it has own deficiencies.


 --
 Pavel Levshin


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





-- 
Serge Dubrouski

Re: [Pacemaker] Very strange behavior on asymmetric cluster

2011-03-21 Thread Serge Dubrouski

On Mon, Mar 21, 2011 at 10:43 AM, Carlos G Mendioroz t...@huapi.ba.ar wrote:

 Serge Dubrouski @ 21/03/2011 13:10 -0300 dixit:

 What I am talking about is monitoring (probing) of a resource on a node
 where this resource cannot be exist.

 As far as I understand that would require a definition of a quorum
 node or another special kind of node where resource cannot exist.
 Figuring out a a such role from location/collocation rules seems to
 complex to me. The idea of quorum node was abandoned by long ago in
 favor for some other features/project that Lars mentioned earlier.

 There is already a location rule, and a minus infinite value.

 Is that value being used dynamically ? If not, that could be used
 as a marker for this (resource) can not possibly run in this node
 so monitoring is not necesary ?

It is used dynamically quite often. For example moving resource out of
one node creates a such location rule. Does it mean that along with
moving resource Pacemaker has to stop monitoring it on the left node?
I don't think so.


 --
 Carlos G Mendioroz  t...@huapi.ba.ar  LW7 EQI  Argentina

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Validate strategy for RA on DRBD standby node

2011-02-24 Thread Serge Dubrouski

Why are you trying to start LDAP on a node where you don't have your
DRBD resource mounted. Having LDAP up on both nodes would make sense
if you were building an active/active LDAP cluster with syncrepl or
any other replication mechanism. In that case you'd set it up and M/S
and or as a clone and would have to provide access to the config file
on both nodes. In active/passive case you have to collocate your LDAP
resource with your DRBD and filesystem resources and Pacemaker won't
try to start LDAP on a node that doesn't have DRBD activated and
filesystem mounted.

On Thu, Feb 24, 2011 at 6:06 AM, David McCurley m...@fabric.com wrote:
 Pacemaker and list newbie here :)

 I'm writing a resource adapter in python for the newer release of OpenLDAP 
 but I need some pointers on a strategy for the validate function in a certain 
 case.  (In python because the more advanced shell scripting hurts my head :). 
  Here is the situation:

 The config file for OpenLDAP is stored in /etc/ldap/slapd.d/cn=config.ldif.  
 This is on a DRBD active-passive system and the /etc/ldap directory is 
 actually a symlink to the DRBD controlled share /vcoreshare/etc/ldap.  The 
 real config file is at /vcoreshare/etc/ldap/slapd.d/cn=config.ldif.

 So I'm trying to be very judicious with every function and validation, 
 checking file permissions, etc.  But the problem is that 
 /etc/ldap/slapd.d/cn=config.ldif is only present on the active DRBD node.  My 
 validate function checks that the file is readable by the user/group that 
 slapd is to run as.  Now, as soon as I start ldap in the cluster, it starts 
 fine, but validate fails on the standby node (because the DRBD volume isn't 
 mounted) and crm_mon shows a failed action:
 --
 
 Last updated: Wed Feb 23 07:35:19 2011
 Stack: openais
 Current DC: vcoresrv1 - partition with quorum
 Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
 2 Nodes configured, 2 expected votes
 5 Resources configured.
 

 Online: [ vcoresrv1 vcoresrv2 ]

 fs_vcoreshare   (ocf::heartbeat:Filesystem):    Started vcoresrv1
  Master/Slave Set: ms_drbd_vcoreshare
     Masters: [ vcoresrv1 ]
     Slaves: [ vcoresrv2 ]
 clusterip       (ocf::heartbeat:IPaddr2):       Started vcoresrv1
 clusteripsourcing       (ocf::heartbeat:IPsrcaddr):     Started vcoresrv1

 Failed actions:
    ldap_monitor_0 (node=vcoresrv2, call=130, rc=5, status=complete): not 
 installed
 -

 Is there a way for my RA to know that it is being called on the active node 
 instead of the passive node.  Or more generally, what would anyone recommend 
 here?  I really didn't want to write the resource adapter so it would be 
 specific to our setup (e.g. checking to make sure the DRBD mount is readable 
 before looking for the config files).  Maybe Pacemaker passes in some extra 
 env variable that can be used?

 I'm reluctanct to post the code for the RA here in the list because it is 450 
 lines.  But, here is the logic for the validate function:

 if the appropriate slapd user and group do not exist:
   return OCF_ERR_INSTALLED
 if the ldap config file doesn't exist or isn't readable by the slapd user:
   return OCF_ERR_INSTALLED
 if the ldap binary doesn't exist or isn't executable:
   return OCF_ERR_INSTALLED
 return OCF_SUCCESS

 Or maybe I'm overdoing it in my tests or have misinterpreted the OCF 
 Resource Agent Developer's Guide?

 Any advice or guidance / clarification appreciated.

 Thanks,

 Mac

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Validate strategy for RA on DRBD standby node

2011-02-24 Thread Serge Dubrouski

Ahh! I see, you need to use ocf_is_probe function in your RA to
isolate that case.

On Thu, Feb 24, 2011 at 9:17 AM, David McCurley m...@fabric.com wrote:
 I'm not trying to start it.  The problem is that my validate function was 
 failing.  Here is the case:

 Deploy RA on both nodes (master DRBD and slave).
 Edit crm config to add the ldap resource, co_location,etc.
 Save the config and Pacemaker attempts to start the LDAP, but it also runs a 
 check on both the master and the slave, and my validate was failing on the 
 slave since it didn't have the file system resources for ldap available.

 We are in active/passive case so it is problems with my code when PM runs the 
 monitor/validate check on the slave.  The live ldap instance is colocated 
 with DRBD, filesystem, eg from crm configure show:

 node vcoresrv1 \
        attributes standby=off
 node vcoresrv2 \
        attributes standby=off
 primitive clusterip ocf:heartbeat:IPaddr2 \
        params ip=192.168.1.4 cidr_netmask=24 nic=eth0 iflabel=cip \
        op monitor interval=30s
 primitive clusteripsourcing ocf:heartbeat:IPsrcaddr \
        params ipaddress=192.168.1.4 \
        op monitor interval=10 timeout=20s depth=0
 primitive ldap ocf:fabric:openldap \
    op monitor interval=10
 primitive drbd_vcoreshare ocf:linbit:drbd \
        params drbd_resource=r0 \
        op start interval=0 timeout=240s \
        op stop interval=0 timeout=100s \
        op promote interval=0 timeout=90s \
        op demote interval=0 timeout=90s \
        op monitor interval=15s
 primitive fs_vcoreshare ocf:heartbeat:Filesystem \
        params device=/dev/drbd/by-res/r0 directory=/vcoreshare 
 fstype=ext4 \
        op start interval=0 timeout=60s \
        op stop interval=0 timeout=60s
 ms ms_drbd_vcoreshare drbd_vcoreshare \
        meta master-max=1 master-node-max=1 clone-max=2 
 clone-node-max=1 notify=true
 colocation clusterip_with_vcoreshare inf: clusterip fs_vcoreshare
 colocation ipsourcing_with_clusterip inf: clusteripsourcing clusterip
 colocation vcoreshare_on_drbd inf: fs_vcoreshare ms_drbd_vcoreshare:Master
 colocation ldap_with_vcoreshare inf: ldap fs_vcoreshare
 order clusterip_after_vcoreshare inf: fs_vcoreshare clusterip
 order ldap_after_clusterip inf: clusterip ldap
 order ipsourcing_after_clusterip inf: clusterip clusteripsourcing
 order vcoreshare_after_drbd inf: ms_drbd_vcoreshare:promote 
 fs_vcoreshare:start
 property $id=cib-bootstrap-options \
        dc-version=1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd \
        cluster-infrastructure=openais \
        expected-quorum-votes=2 \
        stonith-enabled=false \
        no-quorum-policy=ignore
 rsc_defaults $id=rsc-options \
        resource-stickiness=100


 - Original Message -
 From: Serge Dubrouski serge...@gmail.com
 To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Sent: Thursday, February 24, 2011 11:05:56 AM
 Subject: Re: [Pacemaker] Validate strategy for RA on DRBD standby node

 Why are you trying to start LDAP on a node where you don't have your
 DRBD resource mounted. Having LDAP up on both nodes would make sense
 if you were building an active/active LDAP cluster with syncrepl or
 any other replication mechanism. In that case you'd set it up and M/S
 and or as a clone and would have to provide access to the config file
 on both nodes. In active/passive case you have to collocate your LDAP
 resource with your DRBD and filesystem resources and Pacemaker won't
 try to start LDAP on a node that doesn't have DRBD activated and
 filesystem mounted.

 On Thu, Feb 24, 2011 at 6:06 AM, David McCurley m...@fabric.com
 wrote:
  Pacemaker and list newbie here :)
 
  I'm writing a resource adapter in python for the newer release of
  OpenLDAP but I need some pointers on a strategy for the validate
  function in a certain case.  (In python because the more advanced
  shell scripting hurts my head :).  Here is the situation:
 
  The config file for OpenLDAP is stored in
  /etc/ldap/slapd.d/cn=config.ldif.  This is on a DRBD
  active-passive system and the /etc/ldap directory is actually a
  symlink to the DRBD controlled share /vcoreshare/etc/ldap.  The
  real config file is at
  /vcoreshare/etc/ldap/slapd.d/cn=config.ldif.
 
  So I'm trying to be very judicious with every function and
  validation, checking file permissions, etc.  But the problem is
  that /etc/ldap/slapd.d/cn=config.ldif is only present on the
  active DRBD node.  My validate function checks that the file is
  readable by the user/group that slapd is to run as.  Now, as soon
  as I start ldap in the cluster, it starts fine, but validate fails
  on the standby node (because the DRBD volume isn't mounted) and
  crm_mon shows a failed action:
  --
  
  Last updated: Wed Feb 23 07:35:19 2011
  Stack: openais
  Current DC: vcoresrv1 - partition with quorum
  Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
  2 Nodes

Re: [Pacemaker] Apache server won't start

2011-02-24 Thread Serge Dubrouski

Apache RA relies on mod_status and monitors /server-status page. When
you added your loadbalancer you redirected all requests to your Tomcat
instance that doesn't have /server-status page. You either have to
change your Apache configuration to keep /server-status in place or to
change your Pacemaker configuration to use other page for monitoring.
Check out statusurl parameter for Apache RA.

On Thu, Feb 24, 2011 at 1:58 PM, Diego Bevilaqua
diego.bevilaqu...@gmail.com wrote:
 Hello to you all.
 I started researching Pacemaker/Corosync recently, as the company I work at
 wants to change the current high availability infrastructure. I decided to
 start by following the Cluster from Scratch step-to-step, and got a running
 environment. As I saw it running smoothly - I could access, from any
 machine, the address I had for the cluster, and it returned the page
 according to the node Apache server was running on - , I decided to change
 some configurations so as to test it running our application. Apache, as I
 am trying, is being used as a load-balancer for some machines running a
 cluster of Tomcat servlet containers, which in instance run our application.
 However, when I inserted the changes for Apache to serve as a reverse proxy
 for the Tomcat machines, it simply stoped working =\
 The changes I made to the default Apache httpd.conf file were these:

 # Load-balance configuration (mod_proxy)
 Header add Set-Cookie ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/
 env=BALANCER_ROUTE_CHANGED
 Proxy balancer://tomcatCluster/
 BalancerMember http://xxx.xxx.xxx.xxx:/ route=1
 BalancerMember http://xxx.xxx.xxx.xxx:/ route=2
 ProxySet stickysession=ROUTEID
 /Proxy
 ProxyPass / balancer://tomcatCluster/
 ProxyPassReverse  /  balancer://tomcatCluster/

 The error I'm getting is this:

 Failed actions:
     WebSite_start_0 (node=worker-1, call=5, rc=1, status=complete): unknown
 error

 Strange thing is that, if I just run the apache service (/etc/init.d/httpd
 start) it runs smoothly, and I can run the application using the machine's
 IP in any of my network nodes. I really searched all over for some reason
 for this, but couldn't find any, and the error message (unknown error)
 really didn't help much =\
 I would really appreciate any help you could get me
 --
 Diego Bevilaqua
 diego.bevilaqu...@gmail.com

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] ifstatus OCF RA

2011-02-24 Thread Serge Dubrouski

I wonder if that would be possible to make Pacemaker to move virtual
IP from one interface to another (not from one node to another) using
an RA like this one.

On Thu, Feb 24, 2011 at 4:16 AM, Vladislav Bogdanov
bub...@hoster-ok.com wrote:
 23.02.2011 11:53, Vladislav Bogdanov wrote:

  Also note that this is
   - linux specific
   - requires kernel = 2.6.33,
   afaict no /sys/class/net/*/speed before that

 Ahm, was not aware about that. I need to look again at this because I
 need this to run on RHEL6 too. Anybody knows, does it has this sysfs?
 Let's delay with this for a bit.

 Quick answer to myself, EL6 has it with 2.6.32.
 I'd probably better test for /sys/class/net/lo/speed rather then for
 kernel version.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Config Sanity Check - Postgres, OCFS2 etc

2011-02-10 Thread Serge Dubrouski

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Colocation / Order question

2010-12-09 Thread Serge Dubrouski

Why not just include those IPs into groups?

Are there any plans to support groups that can include another groups?
Basically it's sill those ((colocate( ( A or B) C ) that you already
mentioned but having an ability to write group GroupA resB would be
good too.

On Thu, Dec 9, 2010 at 10:28 AM, Andrew Beekhof and...@beekhof.net wrote:
 On Thu, Dec 9, 2010 at 2:45 PM, Vadym Chepkov vchep...@gmail.com wrote:

 symmetrical-cluster=false
 ?

 nope


 Cheers,
 Vadym

 On Dec 9, 2010 5:09 AM, Andrew Beekhof and...@beekhof.net wrote:
 On Wed, Dec 8, 2010 at 11:31 PM, Vadym Chepkov vchep...@gmail.com wrote:
 On Dec 8, 2010, at 9:43 AM, Michael Schwartzkopff wrote:

 Hi,

 I have two groups each located on one node. So GroupA on nodeA and
 GroupB on
 nodeB.

 I want to have a third resource (i.e. IP-Address) make run on any node
 where
 one of the above groups are running, i.e. on nodeA or nodeB.

 Can I realize this with constraint sets? Is it possible to have simple
 constraints?

 Thanks for any hints.

 colocation col1 500: IP-Address GroupA
 colocation col1 500: IP-Address GroupB

 :)

 Would mostly work but if neither A nor B are running then the IP would
 still be active.
 I've wanted to implement colocate( ( A or B) C ) for a while now but
 haven't had the bandwidth.





 Michael

 --
 Dr. Michael Schwartzkopff
 Guardinistr. 63
 81375 München

 Tel: (0163) 172 50 98
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Wiki

2010-10-29 Thread Serge Dubrouski

Hello -

I'd like to translate some documents from Clusterlabs Wiki site to
Russian. How do I create a version of a page in a particular Language?

-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Failover domains?

2010-10-26 Thread Serge Dubrouski

On Tue, Oct 26, 2010 at 12:24 PM, David Quenzler quenz...@gmail.com wrote:
 How about something like...

 Cluster with 4 nodes: node1 node2 node3 node4

 ResourceA runs only on node1 and node2, never on node3 or node4
 ResourceB runs only on node3 and node4, never on node1 or node2

Just curios, what is the point of building a 4-node cluster in this
case instead of 2 clusters of 2 nodes each?


 On 10/26/10, Pavlos Parissis pavlos.paris...@gmail.com wrote:
 On 25 October 2010 19:50, David Quenzler quenz...@gmail.com wrote:

 Is there a way to limit failover behavior to a subset of cluster nodes
 or pin a resource to a node?


 Yes, there is a way.

 Make sure you have a asymmetric cluster by setting symmetric-cluster to
 false
 and then configure accordingly your location constraints in order to have
 the failover domains as you wish.

 Here is en example from my cluster where I have 3 nodes and 2 resource
 group. Each resource group have unique primary node but both of them have
 shared secondary node.

 location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01
 location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02

 location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03
 location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03


 Cheers,
 Pavlos


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Failover domains?

2010-10-26 Thread Serge Dubrouski

On Tue, Oct 26, 2010 at 12:59 PM, David Quenzler quenz...@gmail.com wrote:
 Other than single point of control, there is no reason multiple
 clusters could not be used.
 What would the best practice be?

For me the simpler is better and again for me that would be 2 clusters
with a simple configuration. As for single point of control as far as
I know GUI can connect to the remote cluster and control it.

BTW, it would be good if CRM shell could do the same but I doubt that
that's possible.


 On 10/26/10, Serge Dubrouski serge...@gmail.com wrote:
 On Tue, Oct 26, 2010 at 12:24 PM, David Quenzler quenz...@gmail.com wrote:
 How about something like...

 Cluster with 4 nodes: node1 node2 node3 node4

 ResourceA runs only on node1 and node2, never on node3 or node4
 ResourceB runs only on node3 and node4, never on node1 or node2

 Just curios, what is the point of building a 4-node cluster in this
 case instead of 2 clusters of 2 nodes each?


 On 10/26/10, Pavlos Parissis pavlos.paris...@gmail.com wrote:
 On 25 October 2010 19:50, David Quenzler quenz...@gmail.com wrote:

 Is there a way to limit failover behavior to a subset of cluster nodes
 or pin a resource to a node?


 Yes, there is a way.

 Make sure you have a asymmetric cluster by setting symmetric-cluster to
 false
 and then configure accordingly your location constraints in order to have
 the failover domains as you wish.

 Here is en example from my cluster where I have 3 nodes and 2 resource
 group. Each resource group have unique primary node but both of them have
 shared secondary node.

 location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01
 location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02

 location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03
 location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03


 Cheers,
 Pavlos


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




 --
 Serge Dubrouski.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] custom path to LSB compliant script in primitive?

2010-10-06 Thread Serge Dubrouski

LSB scripts do not accept any parameters. You have to convert to OCF.
Since you already started changing it, you are half way done :-)

On Wed, Oct 6, 2010 at 2:36 PM, Craig Hurley li...@thehurley.com wrote:
 Hello,

 Is it possible to provide a custom path to a lsb compliant script in a
 primitive?  To get it working I copied my script to /etc/init.d/rp and
 I'm using the following config:

 primitive p_rp lsb:rp \
        op monitor interval=30s \
        meta target-role=Started

 ...  but I'd like to leave the startup script in it's default install
 path of /opt/x/y/x/rp and modify the config so point at that default
 path.

 Is this possible?
 Is this documented somewhere?

 Thanks,
 Craig.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] howto stop node from becoming primary

2010-08-12 Thread Serge Dubrouski

The right answer to this answer is use stonith. Other possible option
is to use pingd to configure one node to never run any resources if
it's disconnected from the network.

On Thu, Aug 12, 2010 at 3:42 PM,  lxnf9...@comcast.net wrote:
 I have managed to get a 2 node cluster working with the exception of one
 thing
 If the nodes loose connectivity they each think the other is dead and both
 become primaries
 I do not want this to happen
 Suggestion please

 [r...@abctest2 ~]# crm configure show
 node abctest1
 node abctest2
 primitive apache ocf:heartbeat:apache \
        params configfile=/etc/httpd/conf/httpd.conf \
        op monitor interval=60s timeout=40s \
        meta target-role=Started
 primitive clusterip1 ocf:heartbeat:IPaddr2 \
        params ip=172.17.32.11 cidr_netmask=32 \
        op monitor interval=30s \
        meta target-role=Started
 primitive clusterip2 ocf:heartbeat:IPaddr2 \
        params ip=172.17.32.12 cidr_netmask=32 \
        op monitor interval=30s \
        meta target-role=Started
 primitive drbd0 ocf:linbit:drbd \
        params drbd_resource=r0 \
        op monitor interval=59s role=Master timeout=30s \
        op monitor interval=60s role=Slave timeout=30s
 primitive fs0 ocf:heartbeat:Filesystem \
        params fstype=ext3 directory=/drbd_r0/ device=/dev/drbd0 \
        meta target-role=Started
 primitive mysql ocf:heartbeat:mysql \
        params binary=/usr/bin/mysqld_safe datadir=/var/lib/mysql
 socket=/var/lib/mysql/mysql.sock log=/var/log/mysqld.log
 pid=/var/run/mysqld/mysqld.pid user=mysql \
        op monitor interval=30s
 primitive tomcat ocf:heartbeat:tomcat \
        params java_home=/opt/IBMJava2-142
 catalina_home=/opt/apache-tomcat-4.1.40 \
        op monitor interval=10s timeout=30s depth=0 \
        meta target-role=Started
 ms ms-drbd0 drbd0 \
        meta clone-max=2 notify=true globally-unique=false
 target-role=Master is-managed=true
 colocation apache-on-fs0 inf: apache fs0
 colocation clusterip1-on-drbd inf: clusterip1 ms-drbd0:Master
 colocation clusterip2-on-drbd inf: clusterip2 ms-drbd0:Master
 colocation fs-on-drbd inf: fs0 ms-drbd0:Master
 colocation mysql-on-fs0 inf: mysql fs0
 colocation tomcat-on-fs0 inf: tomcat fs0
 order apache-after-clusterip1 inf: clusterip1 apache
 order apache-after-clusterip2 inf: clusterip2 apache
 order apache-after-fs0 inf: fs0 apache
 order apache-after-mysql inf: mysql apache
 order apache-after-tomcat inf: tomcat apache
 order fs0-after-drbd0 inf: ms-drbd0:promote fs0:start
 order mysql-after-fs0 inf: fs0 mysql
 order tomcat-after-fs0 inf: fs0 tomcat
 property $id=cib-bootstrap-options \
        dc-version=1.0.9-89bd754939df5150de7cd76835f98fe90851b677 \
        cluster-infrastructure=openais \
        expected-quorum-votes=2 \
        last-lrm-refresh=1281644620 \
        stonith-enabled=false \
        no-quorum-policy=ignore
 rsc_defaults $id=rsc-options \
        resource-stickiness=100


 Thanks
 Richard

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Stop one instance in a clone

2010-07-02 Thread Serge Dubrouski

On Fri, Jul 2, 2010 at 7:27 AM, Dejan Muhamedagic deja...@fastmail.fm wrote:
 Hi,

 On Thu, Jul 01, 2010 at 10:23:41PM +0200, Andrew Beekhof wrote:
 On Wed, Jun 30, 2010 at 7:07 PM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
  Hi,
 
  On Wed, Jun 30, 2010 at 10:57:21AM -0600, Serge Dubrouski wrote:
  Hello -
 
  Is there any way to stop an instance of a cloned resource on a
  particular node using crm shell?
 
  How would you stop it with crm_resource?
 
  Perhaps with one -inf location constraint for the target node?

 Yep, thats the recommended way.
 Actually it would be pretty neat if the shell did this automagically
 if the user asked to stop a clone and specified a node.

 Not a bad idea. That would require a bit more syntax. Can you
 please open an enhancement bugzilla.


 Bug 2447

 Thanks,

 Dejan

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Stop one instance in a clone

2010-06-30 Thread Serge Dubrouski

Hello -

Is there any way to stop an instance of a cloned resource on a
particular node using crm shell?

-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Stop one instance in a clone

2010-06-30 Thread Serge Dubrouski

On Wed, Jun 30, 2010 at 11:07 AM, Dejan Muhamedagic deja...@fastmail.fm wrote:
 Hi,

 On Wed, Jun 30, 2010 at 10:57:21AM -0600, Serge Dubrouski wrote:
 Hello -

 Is there any way to stop an instance of a cloned resource on a
 particular node using crm shell?

 How would you stop it with crm_resource?

Actually I don't know. So let me change my question: Is there any way
to stop just one instance of a cloned resource using any of pacemaker
commands? Let say I need to do a maintenance on one of the nodes but I
don't want to completely stop corosync/heartbeat there. One solution
is to put a cloned resource into unmanaged state and then stop it
manually on a given node. But in this case I have to make sure that
all dependent resources were migrated out of that node.

Changing clone-max to clone-max - 1 and manual stopping of the
resource instance might also work.




 Perhaps with one -inf location constraint for the target node?

 Thanks,

 Dejan


 --
 Serge Dubrouski.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Stop one instance in a clone

2010-06-30 Thread Serge Dubrouski

On Wed, Jun 30, 2010 at 11:56 AM, Dejan Muhamedagic deja...@fastmail.fm wrote:
 On Wed, Jun 30, 2010 at 11:16:13AM -0600, Serge Dubrouski wrote:
 On Wed, Jun 30, 2010 at 11:07 AM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
  Hi,
 
  On Wed, Jun 30, 2010 at 10:57:21AM -0600, Serge Dubrouski wrote:
  Hello -
 
  Is there any way to stop an instance of a cloned resource on a
  particular node using crm shell?
 
  How would you stop it with crm_resource?

 Actually I don't know. So let me change my question: Is there any way
 to stop just one instance of a cloned resource using any of pacemaker
 commands? Let say I need to do a maintenance on one of the nodes but I
 don't want to completely stop corosync/heartbeat there. One solution
 is to put a cloned resource into unmanaged state and then stop it
 manually on a given node. But in this case I have to make sure that
 all dependent resources were migrated out of that node.

 Changing clone-max to clone-max - 1 and manual stopping of the
 resource instance might also work.

 Yes, but you can't control the node.

  Perhaps with one -inf location constraint for the target node?

 Did you miss this one or was it too silly to consider?

It absolutely wasn't and in fact it works too.


 Cheers,

 Dejan

  Thanks,
 
  Dejan
 
 
  --
  Serge Dubrouski.
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: 
  http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: 
  http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 



 --
 Serge Dubrouski.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Patch for pgsql RA

2010-05-17 Thread Serge Dubrouski

Hello -

Here is follow up patch for today's incident with pgsql script. The
patch is intended to cover up an issue with missed /sbin/fuser tool:

--- /usr/lib/ocf/resource.d/heartbeat/pgsql 2010-05-03 01:20:16.0 
-0600
+++ usr/lib/ocf/resource.d/heartbeat/pgsql.new  2010-05-17
11:32:33.0 -0600
@@ -291,7 +291,7 @@
  if [ -f $PIDFILE ]
  then
  PID=`head -n 1 $PIDFILE`
- kill -0 $PID /dev/null 21  fuser $OCF_RESKEY_pgdata
21 | grep $PID /dev/null 21
+ kill -0 $PID /dev/null 21  /sbin/fuser
$OCF_RESKEY_pgdata 21 | grep $PID /dev/null 21
  return $?
  fi

@@ -359,6 +359,11 @@
 return $OCF_ERR_INSTALLED
 fi

+if ! have_binary /sbin/fuser
+then
+   return $OCF_ERR_INSTALLED
+fi
+
 return $OCF_SUCCESS
 }



It also makes sense to update SPEC file for resource-agents package
with a dependency for psmisc package.
-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] Patch for pgsql RA

2010-05-17 Thread Serge Dubrouski

I'm really sorry but please disregard that patch. The more appropriate
would be these two attached patches.

On Mon, May 17, 2010 at 11:35 AM, Serge Dubrouski serge...@gmail.com wrote:
 Hello -

 Here is follow up patch for today's incident with pgsql script. The
 patch is intended to cover up an issue with missed /sbin/fuser tool:

 --- /usr/lib/ocf/resource.d/heartbeat/pgsql     2010-05-03 01:20:16.0 
 -0600
 +++ usr/lib/ocf/resource.d/heartbeat/pgsql.new  2010-05-17
 11:32:33.0 -0600
 @@ -291,7 +291,7 @@
      if [ -f $PIDFILE ]
      then
          PID=`head -n 1 $PIDFILE`
 -         kill -0 $PID /dev/null 21  fuser $OCF_RESKEY_pgdata
 21 | grep $PID /dev/null 21
 +         kill -0 $PID /dev/null 21  /sbin/fuser
 $OCF_RESKEY_pgdata 21 | grep $PID /dev/null 21
          return $?
      fi

 @@ -359,6 +359,11 @@
         return $OCF_ERR_INSTALLED
     fi

 +    if ! have_binary /sbin/fuser
 +    then
 +       return $OCF_ERR_INSTALLED
 +    fi
 +
     return $OCF_SUCCESS
  }



 It also makes sense to update SPEC file for resource-agents package
 with a dependency for psmisc package.
 --
 Serge Dubrouski.




-- 
Serge Dubrouski.
--- a/heartbeat/pgsql	2010-05-03 01:20:16.0 -0600
+++ b/heartbeat/pgsql	2010-05-17 11:41:55.0 -0600
@@ -291,7 +291,7 @@
  if [ -f $PIDFILE ]
  then
  PID=`head -n 1 $PIDFILE`
- kill -0 $PID /dev/null 21  fuser $OCF_RESKEY_pgdata 21 | grep $PID /dev/null 21
+ kill -0 $PID /dev/null 21  $FUSER $OCF_RESKEY_pgdata 21 | grep $PID /dev/null 21
  return $?
  fi
 
@@ -343,22 +343,27 @@
 
 # Validate most critical parameters
 pgsql_validate_all() {
-if ! have_binary $SH
+if ! check_binary $SH
 then
 return $OCF_ERR_INSTALLED
 fi
 
-if ! have_binary $OCF_RESKEY_pgctl
+if ! check_binary $OCF_RESKEY_pgctl
 then
 return $OCF_ERR_INSTALLED
 fi
 
 
-if ! have_binary $OCF_RESKEY_psql
+if ! check_binary $OCF_RESKEY_psql
 then
 return $OCF_ERR_INSTALLED
 fi
 
+if ! check_binary $FUSER
+then
+   return $OCF_ERR_INSTALLED
+fi
+
 return $OCF_SUCCESS
 }
--- a/heartbeat/.ocf-binaries	2010-05-03 01:20:16.0 -0600
+++ b/heartbeat/.ocf-binaries 2010-05-17 11:45:06.0 -0600
@@ -15,13 +15,13 @@
 : ${SH:=/bin/sh}
 : ${TEST:=/usr/bin/test}
 : ${TESTPROG:=/usr/bin/test}
+: ${FUSER:=/sbin/fuser}
 
 # Entries that should probably be removed
 : ${BASENAME:=basename}
 : ${BLOCKDEV:=blockdev}
 : ${CAT:=cat}
 : ${FSCK:=fsck}
-: ${FUSER:=fuser}
 : ${GETENT:=getent}
 : ${GREP:=grep}
 : ${IFCONFIG:=ifconfig}
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] Patch for pgsql RA

2010-05-17 Thread Serge Dubrouski

Thanks. Then the patch will look like this:

--- a/heartbeat/pgsql   2010-05-03 01:20:16.0 -0600
+++ b/heartbeat/pgsql   2010-05-17 13:43:04.0 -0600
@@ -343,21 +343,10 @@

 # Validate most critical parameters
 pgsql_validate_all() {
-if ! have_binary $SH
-then
-return $OCF_ERR_INSTALLED
-fi
-
-if ! have_binary $OCF_RESKEY_pgctl
-then
-return $OCF_ERR_INSTALLED
-fi
-
-
-if ! have_binary $OCF_RESKEY_psql
-then
-return $OCF_ERR_INSTALLED
-fi
+check_binary $SH
+check_binary $OCF_RESKEY_pgctl
+check_binary $OCF_RESKEY_psql
+check_binary fuser

 return $OCF_SUCCESS
 }

The only reason for using $FUSER was the fact that it already existed
in .ocf-binaries but was marked for future elimination. There is no
real preference in using one way or another.

On Mon, May 17, 2010 at 1:32 PM, Florian Haas florian.h...@linbit.com wrote:
 Hi Serge,

 On 05/17/2010 07:50 PM, Serge Dubrouski wrote:
 I'm really sorry but please disregard that patch. The more appropriate
 would be these two attached patches.

  # Validate most critical parameters
  pgsql_validate_all() {
 -    if ! have_binary $SH
 +    if ! check_binary $SH
      then
          return $OCF_ERR_INSTALLED
      fi

 You can just do check_binary. No need for have_binary and any if test;
 check_binary just exits with $OCF_ERR_INSTALLED if it doesn't find the
 binary.

 Wrt the $FUSER patch, is it really necessary to specify the full path to
 fuser? Other RAs simply do check_binary binary, relying on binary to
 be found in the $PATH.

 Cheers,
 Florian


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] primitive dlm ocf:pacemaker:controld -- WARNING: dlm: default-action-timeout 20s for start is smaller than the advised 90

2010-04-20 Thread Serge Dubrouski

://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 
 
  ___
  Pacemaker mailing list
  Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf



 --
 Regards,

 Jayakrishnan. L

 Visit:
 www.foralllinux.blogspot.com
 www.jayakrishnan.bravehost.com


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf




-- 
Serge Dubrouski.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] Announce: Pacemaker 1.0.8 released

2010-03-17 Thread Serge Dubrouski

Hello -

It looks like there is a package name conflict for RedHat
distribution, at least for CentOS. RedHat has its own resource-agents
package:


# yum search resource-agents
== Matched: resource-agents ===
resource-agents.i386 : Reusable cluster resource scripts
resource-agents.noarch : Open Source HA Resource Agents for Red Hat Cluster

First one is Pacemaker/Heartbeat  package, second one is RedHat one.


On Wed, Mar 17, 2010 at 7:08 AM, Andrew Beekhof and...@beekhof.net wrote:
 Pacemaker 1.0.8 was tagged and released last night.

 You can read the full announcement and changelog at:
   http://theclusterguy.clusterlabs.org/post/452813842/pacemaker-1-0-8-released

 Updated packages for rpm-based distros are also now available at the usual 
 location.
 See the following link for more details:
  http://www.clusterlabs.org/wiki/Install#RPM

 -- Andrew




 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker





-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Announce: Pacemaker 1.0.8 released

2010-03-17 Thread Serge Dubrouski

It doesn't, at least the one for FC10.

On Wed, Mar 17, 2010 at 1:31 PM, Andrew Beekhof and...@beekhof.net wrote:
 On Wed, Mar 17, 2010 at 8:16 PM, Serge Dubrouski serge...@gmail.com wrote:
 Hello -

 It looks like there is a package name conflict for RedHat
 distribution, at least for CentOS. RedHat has its own resource-agents
 package:


 # yum search resource-agents
 == Matched: resource-agents ===
 resource-agents.i386 : Reusable cluster resource scripts
 resource-agents.noarch : Open Source HA Resource Agents for Red Hat Cluster

 First one is Pacemaker/Heartbeat  package, second one is RedHat one.

 Yep, just go with the Red Hat one, it also contains our OCF agents



 On Wed, Mar 17, 2010 at 7:08 AM, Andrew Beekhof and...@beekhof.net wrote:
 Pacemaker 1.0.8 was tagged and released last night.

 You can read the full announcement and changelog at:
   
 http://theclusterguy.clusterlabs.org/post/452813842/pacemaker-1-0-8-released

 Updated packages for rpm-based distros are also now available at the usual 
 location.
 See the following link for more details:
  http://www.clusterlabs.org/wiki/Install#RPM

 -- Andrew




 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker





 --
 Serge Dubrouski.

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

[Pacemaker] 10.8 and pingd problem

2010-03-17 Thread Serge Dubrouski

Hello -

I've just installed fresh new Pacemaker 1.0.8 and ran into a problem
with pind. When Corosync/Pacemaker/Pingd start it doesn't initialize
pingd attribute for the nodes so the resources in the cluster stay
down. Then if I change configuration (delete/add pingd resource or
clone, stop/start node, etc.) pingd attr gets set and resources start
up. My configuration looks like this and used to work fine with 1.0.7:

primitive child_DoFencing stonith:external/xen0 \
meta $id=primitive-child_DoFencing.meta \
params hostlist=fc-node1 fc-node2 dom0=home
primitive drbd0 ocf:heartbeat:drbd \
params drbd_resource=drbd0 \
op monitor interval=59s role=Master timeout=10s \
op monitor interval=60s role=Slave timeout=10s
primitive fs0 ocf:heartbeat:Filesystem \
meta $id=primitive-fs0.meta \
params fstype=ext2 directory=/mnt device=/dev/drbd0
primitive myIP ocf:heartbeat:IPaddr \
meta $id=primitive-myIP.meta \
params ip=192.168.1.130 \
op monitor interval=30s timeout=30s \
op start interval=0s timeout=30s \
op stop interval=0s timeout=30s
primitive myPgsql ocf:heartbeat:pgsql \
meta $id=primitive-myPgsql.meta \
params ctl_opt=-w \
op monitor interval=30s timeout=30s \
op start interval=0s timeout=30s \
op stop interval=0s timeout=30s
primitive pingd ocf:pacemaker:pingd \
params dampen=5s multiplier=1000 host_list=192.168.2.1
group myGroup myIP fs0 myPgsql \
meta $id=group-myGroup.meta
ms ms-drbd0 drbd0 \
meta clone-max=2 clone-node-max=1 master-max=1
master-node-max=1 notify=yes globally-unique=false \
meta target-role=Started
clone DoFencing child_DoFencing \
meta clone-max=2 clone-node-max=1
clone pingd-clone pingd \
meta target-role=Started
location connected myGroup \
rule $id=connected-rule -inf: not_defined pingd or pingd lte 0
location primNode myGroup \
rule $id=prefered_primNode 1000: #uname eq fc-node1
colocation myGroup_on_drbd0 inf: myGroup ms-drbd0:Master
order drbd0_before_myGroup : ms-drbd0:promote myGroup:start
property $id=cib-bootstrap-options \
default-resource-stickiness=600 \
default-resource-failure-stickiness=-520 \
dc-version=1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe \
cluster-infrastructure=openais \
expected-quorum-votes=2 \
last-lrm-refresh=1254058067 \
no-quorum-policy=ignore


-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] 10.8 and pingd problem

2010-03-17 Thread Serge Dubrouski

Both version of pingd expose the same problem for me. After fresh
start CIB doesn't get updated with pingd attributes for the cluster
nodes so my location rule:

location connected myGroup \
rule $id=connected-rule -inf: not_defined pingd or pingd lte 0

prevents resources from starting. After I manually stop/start
pingd-clone, cib receives following records:

  nvpair id=status-fc-node1-pingd name=pingd value=1000/
  nvpair id=status-fc-node2-pingd name=pingd value=1000/

and everything works fine.

On Wed, Mar 17, 2010 at 2:46 PM, Andrew Beekhof and...@beekhof.net wrote:
 On Wed, Mar 17, 2010 at 9:10 PM, Serge Dubrouski serge...@gmail.com wrote:
 Hello -

 I've just installed fresh new Pacemaker 1.0.8 and ran into a problem
 with pind. When Corosync/Pacemaker/Pingd start it doesn't initialize
 pingd attribute for the nodes so the resources in the cluster stay
 down. Then if I change configuration (delete/add pingd resource or
 clone, stop/start node, etc.) pingd attr gets set and resources start
 up. My configuration looks like this and used to work fine with 1.0.7:

 Strange, the only changes to pingd were spelling ones.
 Perhaps try ocf:pacemaker:ping instead, it uses the system ping binary
 and is therefore possibly more reliable.

 primitive child_DoFencing stonith:external/xen0 \
        meta $id=primitive-child_DoFencing.meta \
        params hostlist=fc-node1 fc-node2 dom0=home
 primitive drbd0 ocf:heartbeat:drbd \
        params drbd_resource=drbd0 \
        op monitor interval=59s role=Master timeout=10s \
        op monitor interval=60s role=Slave timeout=10s
 primitive fs0 ocf:heartbeat:Filesystem \
        meta $id=primitive-fs0.meta \
        params fstype=ext2 directory=/mnt device=/dev/drbd0
 primitive myIP ocf:heartbeat:IPaddr \
        meta $id=primitive-myIP.meta \
        params ip=192.168.1.130 \
        op monitor interval=30s timeout=30s \
        op start interval=0s timeout=30s \
        op stop interval=0s timeout=30s
 primitive myPgsql ocf:heartbeat:pgsql \
        meta $id=primitive-myPgsql.meta \
        params ctl_opt=-w \
        op monitor interval=30s timeout=30s \
        op start interval=0s timeout=30s \
        op stop interval=0s timeout=30s
 primitive pingd ocf:pacemaker:pingd \
        params dampen=5s multiplier=1000 host_list=192.168.2.1
 group myGroup myIP fs0 myPgsql \
        meta $id=group-myGroup.meta
 ms ms-drbd0 drbd0 \
        meta clone-max=2 clone-node-max=1 master-max=1
 master-node-max=1 notify=yes globally-unique=false \
        meta target-role=Started
 clone DoFencing child_DoFencing \
        meta clone-max=2 clone-node-max=1
 clone pingd-clone pingd \
        meta target-role=Started
 location connected myGroup \
        rule $id=connected-rule -inf: not_defined pingd or pingd lte 0
 location primNode myGroup \
        rule $id=prefered_primNode 1000: #uname eq fc-node1
 colocation myGroup_on_drbd0 inf: myGroup ms-drbd0:Master
 order drbd0_before_myGroup : ms-drbd0:promote myGroup:start
 property $id=cib-bootstrap-options \
        default-resource-stickiness=600 \
        default-resource-failure-stickiness=-520 \
        dc-version=1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe \
        cluster-infrastructure=openais \
        expected-quorum-votes=2 \
        last-lrm-refresh=1254058067 \
        no-quorum-policy=ignore


 --
 Serge Dubrouski.

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] DRBD and fencing

2010-03-10 Thread Serge Dubrouski

On Wed, Mar 10, 2010 at 6:59 PM, Martin Aspeli optilude+li...@gmail.com wrote:
 Serge Dubrouski wrote:

 On Wed, Mar 10, 2010 at 5:30 PM, Martin Aspelioptilude+li...@gmail.com
  wrote:

 Martin Aspeli wrote:

 Hi folks,

 Let's say have a two-node cluster with DRBD and OCFS2, with a database
 server that's supposed to be active on one node at a time, using the
 OCFS2 partition for its data store.

 If we detect a failure on the active node and fail the database over to
 the other node, we need to fence off the shared storage in case the
 active node is still writing to it.

 Can this be done in such a way that the local DRBD/OCFS2 refuses to
 accept writes from the now-presumed-dead node? I guess this would be
 similar to putting an access rule on a SAN to block off the previously
 active node from attempting to read or write any data.

 Is this feasible?

 We went off on a side-track, I think, but I'd still like to know the
 answer:
 Can one fence at the DRBD level?

  From the thread, it sounds like we'll not use OCFS2 for the Postgres
 data
 store, but would still use DRBD, e.g. with ext4 or whatever. The fencing
 problem would then be equally, if not more, acute.

 It's basically between doing something at the DRBD level, if that's
 feasible, or using the DRAC IPMI device on our server to shoot it.

 But if you implement fencing on Pacemaker level and include your
 DRBD/Filesystem resource into Pacemaker configuration you'll be fine.

 Sorry, I don't quite understand what you mean.

 What would fencing on the Pacemaker level look like? Certainly, DRBD would
 be managed by the cluster.


That means that you have to implement STONITH through DRAC or any
other device that will provide fencing capability. In this case if
Pacemaker detects a split-brain situation it'll kill one of the nodes.



-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Fencing with iDrac 6 Enterprise

2010-03-08 Thread Serge Dubrouski

On Sun, Mar 7, 2010 at 9:00 PM, Martin Aspeli optilude+li...@gmail.com wrote:
 Hi,

 We have a two-node cluster of Dell servers. They have an iDRAC 6 Enterprise
 each. The cluster is also backed up by a UPS with a diesel generator.

Don't forget that to make it reliable you have to backup up by UPS not
only cluster nodes but all networking equipment that connects your
nodes and DRAC ports as well.


 I realise on-board devices like the DRAC are not ideal for fencing, but it's
 probably the best we're going to be able to do. However, I've read in some
 manuals that DRAC 6 is troublesome, and that the drac STONITH agent in
 Pacemaker only deals with version 5.

 Is this still current? Can anyone point me to any documentation or examples
 of configuring iDRAC 6 Enterprise for STONITH, if indeed it's possible?

 Thanks!
 Martin


 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Properly fencing Postgres

2010-03-05 Thread Serge Dubrouski

 I don't know if the pgsql RA can support cold standby
 instances.


In my opinion cold standby is a server has has access to the data
files where PostreSQL is down but can be brought up any time. pgsql RA
does exactly that if other resources proved access to the data. What
pgsql RA doesn't do is data synchronization in a master/slave way. But
as far as I understand this is not required here.

 Also note:

  - We intend to use the IPaddr2 resource agent to cluster an IP
 address across master and slave.

  - I'm not sure we need to use Pacemaker to manage HAProxy on slave;
 it will simply not be used until the IP address fails over to slave.

 The difference is that if it fails, the cluster won't be able to
 help you. Otherwise, you can configure it as a cloned resource.

  - To deal with sometimes severe peaks in our load, we'll have
 HAProxy on master send certain requests to the live Zope app
 server processes on slave. HAProxy deals with Zope processes going
 up and down, so we don't really need to cluster these per se.

  - Zope communicates with Postgres. We intend that connection string
 to use the floating IP address, so that if Postgres fails over to
 slave, Zope will be unaware.

  - Memcached is used by Zope to cache certain Postgres database
 queries, so it would be similar. We can have this on hot standby (if
 that's easier?) since it only manages data in local RAM, but the
 memcached connection string would use the floating IP address too.

  - Zope writes certain blob files to the filesystem. All Zope
 clients (across both servers) need a shared blob directory. They do
 implement locking on this directory, so concurrent access is not a
 problem.

 Now for the bits I'm less sure about:

  - We were thinking to create a DRBD partition with OCFS2 for
 Postgres data + blob data. IIUC, that setup handles multiple nodes
 writing, so the blob storage should work fine (since Zope will
 ensure integrity of the directory

 OK.

  - We were thing to use the pgsql resource agent that comes with
 Pacemaker to manage Postgres.

  - The postgres data would need fencing when failing over, from what
 I understand. I read the notes that using an on-board device like
 Dell's DRAC to implemenet STONITH is not a good idea. We don't have
 the option at this stage to buy a UPS-based solution (we do have
 UPS, but it can't be used to cut power to individual servers). We do
 have two pairs of NICs in each server, one of which would be used
 crossover between master and slave.

 The problem with lights-out devices such as DRAC is that if they
 lose power then fencing doesn't work. But if you have them
 connected to UPS which is reliable then DRAC should be OK.

 Given this, what is the best way to implement fencing in this
 situation? Could we use DRBD to just refuse master write access to
 the slave disk? Could we accept a bit more risk and say that STONITH
 will succeed even if *communication* with the DRAC fails, but will
 try to use DRAC if it can reach it?

 This is not possible. If the fencing action fails, the cluster
 won't make any progress.

 This may solve the fencing
 indefinitely problem when postgres is failing over due to a power
 outage on master, and Pacemaker can't find DRAC to kill master.

 On two-node clusters fencing replaces quorum so it is
 indispensable.

  - If HAProxy or memcached on master fails (in the software sense),
 we'd need to fail over the floating IP address so that the front-end
 firewall and the Zope connection strings would continue to work,
 even though we have hot standby's on slave. Is this the right thing
 to do?

 Looks like it.

 If so, I'd appreciate some pointers on how to configure this.
 There are no resource agents that ship with Pacemaker I can find for
 memcached/HAProxy, though perhaps it'd be better to create them and
 let Pacemaker manage everything?

 It is also possible to use init scripts (lsb). I guess that those
 exist, just test them thoroughly. If you let the cluster manage
 them, they can be monitored.

 In that case, how do we manage the
 connection string issue (making sure Zope talks to the right thing)
 if not by failing over an IP address?

 You lost me here. The IP address is going to failover. I don't
 see where is the connection string issue?

 Thanks,

 Dejan

 Thanks a lot!

 Martin


 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Properly fencing Postgres

2010-03-05 Thread Serge Dubrouski

On Fri, Mar 5, 2010 at 7:56 AM, Martin Aspeli optilude+li...@gmail.com wrote:
 Hi Serge,

Hello -


 I don't know if the pgsql RA can support cold standby
 instances.


 In my opinion cold standby is a server has has access to the data
 files where PostreSQL is down but can be brought up any time. pgsql RA
 does exactly that if other resources proved access to the data. What
 pgsql RA doesn't do is data synchronization in a master/slave way. But
 as far as I understand this is not required here.

 Well, that was certainly the way I envisaged it working, with data
 synchronisation being provided by DRBD + OCFS2, and data integrity ensured
 by STONITH with the Dell DRAC.

 That said, if there are better solutions, I'd love to hear about them.

This is the right solution, in my opinion, for the LAN installation.
Data replication id good for geographically distributed installations,
like primary site and DR site.



 Martin


 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker




-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Announce: Clusters From Scratch

2009-09-21 Thread Serge Dubrouski

Do you actually have to use PDF format? I'd try to translate these
documents to Russian but I don't have PDF Writer.

On Mon, Sep 21, 2009 at 5:43 AM, Dejan Muhamedagic deja...@fastmail.fm wrote:
Hi,

On Mon, Sep 21, 2009 at 10:59:39AM +0200, Andrew Beekhof wrote:
I'm pleased to announce the first of a new series of step-by-step guides
for Pacemaker.

Great.

This installment covers installation, the creation of an active/passive
cluster and its conversion to active/active.

Technologies used include:

• Fedora 11 as the host operating system
• OpenAIS to provide messaging and membership services,
• Pacemaker to perform resource management,
• DRBD as a cost-effective alternative to shared storage,
• OCFS2 as the cluster filesystem (in active/active mode)
• The crm shell for displaying the configuration and making changes
• Apache as the example service.

The PDF is available from the Documentation page or directly via
http://www.clusterlabs.org/mediawiki/images/9/9d/Clusters_from_Scratch_-_Apache_on_Fedora11.pdf

Future guides are anticipated to include MySQL, mail servers and
asymmetrical clusters.
Feedback and suggestions for additional topics are welcome.

I didn't have time to get to read the text, but here, using xpdf,
the images are all blurred (low resolution?). It's on a 1400x1050
display. I have to scale the document to 200% in order to be able
to read it.

Thanks,

Dejan

-- Andrew

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

--
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Announce: Clusters From Scratch

2009-09-21 Thread Serge Dubrouski

On Mon, Sep 21, 2009 at 11:09 AM, Andrew Beekhof and...@beekhof.net wrote:
 On Mon, Sep 21, 2009 at 2:23 PM, Serge Dubrouski serge...@gmail.com wrote:
 Do you actually have to use PDF format? I'd try to translate these
 documents to Russian but I don't have PDF Writer.

 Me either.  Its just how my app exports them the best.
 But why would that matter if you're translating?  Wouldn't it be just
 as easy to create a new document?

 (Unless you mean to ask if you have to use PDF as well, which would be
 no.  Use whatever makes you happy :-)


I just wanted to keep things consistent. So what do you use to create
those docs?


-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] postgresql RA problem [SOLVED]

2009-09-20 Thread Serge Dubrouski

Can you provide your change as a diff patch?

On Sat, Sep 19, 2009 at 11:22 PM, E-Blokos in...@e-blokos.com wrote:
 For whos are interested to have more password security with Postgresql
 until now the RA didn't work if db user in pg_hba.conf was set on other than
 trust,
 otherwise  psql command always ask password prompt, which break the RA
 script.
 So I updated the PGSQL RA to use pgsql with more security.

 Regards

 Franck Chionna
 --
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker





-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] ocf apache problem

2009-09-15 Thread Serge Dubrouski

To help out with pgsql and apache I'll need log files and xml or crm
configuration.

On Tue, Sep 15, 2009 at 12:20 PM, E-Blokos in...@e-blokos.com wrote:

 - Original Message - From: Andrew Beekhof and...@beekhof.net
 To: pacema...@clusterlabs.org
 Sent: Tuesday, September 15, 2009 1:43 PM
 Subject: Re: [Pacemaker] ocf apache problem

 Based on?

 On Tue, Sep 15, 2009 at 5:46 PM, J. Davin Flatten jflat...@iso-ne.com
 wrote:

 I am having the exact same issue. Any ideas?

 On 09/15/2009 10:21 AM, E-Blokos wrote:

 Hi,

 Tried to load OCF RA apache (last version)
 as clone for 4 nodes but apache2 starts
 and fails infinitely.
 in log
 INFO: apache not running
 Sep 15 10:19:21 node132 apache[28428]: INFO: waiting for apache
 /usr/local/apache/conf/httpd.conf to come up

 Thanks

 Franck Chionna
 --
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 --
 System Administrator
 ISO New England
 1 Sullivan Road
 Holyoke, MA 01040
 1-413-535-4087

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Sorry i'ts FEDORA 10.
 I tried also other OCF RA like pgsql and it's the same.
 stop- ok restart- failed infinitely

 Thanks

 Franck Chionna

 --
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

-- 
Serge Dubrouski.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

77 matches

Mail list logo