Re: [Linux-ha-dev] patch: RA conntrackd: Request state info on startup

2012-07-24 Thread Dominik Klein
currently doing another conntrackd project and therefore using the Found a minor issue: When the active host is fenced and returns to the cluster, it does not request the current connection tracking states. Therefore state information might be lost. This patch fixes that. Any comments? I'm

[Linux-ha-dev] patch: RA conntrackd: Request state info on startup

2012-07-18 Thread Dominik Klein
Hi people currently doing another conntrackd project and therefore using the code once again (jippie :)). Found a minor issue: When the active host is fenced and returns to the cluster, it does not request the current connection tracking states. Therefore state information might be lost. This

Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-08-29 Thread Dominik Klein
Node level failure is detected on the communications layer, ie hearbeat or corosync. That software is run with realtime priority. So it keeps working just fine (use tcpdump on the healthy node to verify). So pacemaker on the healthy node does now know that the other node has a problem and

Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-08-29 Thread Dominik Klein
On 08/29/2011 09:51 AM, Dominik Klein wrote: Node level failure is detected on the communications layer, ie hearbeat or corosync. That software is run with realtime priority. So it keeps working just fine (use tcpdump on the healthy node to verify). So pacemaker on the healthy node does now

Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-28 Thread Dominik Klein
There did not have to be a negative location constraint up to now, because the cluster took care of that. Only because it didn't work correctly. Okay. Actually, this is a wanted setup. It happened that VMs configs were changed in ways that lead to a VM not being startable any more. For

Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-27 Thread Dominik Klein
On 06/27/2011 11:09 AM, Dejan Muhamedagic wrote: Hi Dominik, On Fri, Jun 24, 2011 at 03:50:40PM +0200, Dominik Klein wrote: Hi Dejan, this way, the cluster never learns that it can't start a resource on that node. This resource depends on shared storage. So, the cluster won't try

Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-27 Thread Dominik Klein
With the agent before the mentioned patch, during probe of a newly configured resource, the cluster would have learned that the VM is not available on one of the nodes (ERR_INSTALLED), so it would never start the resource there. This is exactly the problem with shared storage setups, where

Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-26 Thread Dominik Klein
I'm not sure my fix is correct. According to https://github.com/ClusterLabs/resource-agents/commit/96ff8e9ad3d4beca7e063beef156f3b838a798e1#heartbeat/VirtualDomain this is a regression which was introduced in April '11. So the fix should be the other way around: Introduce a parameter that

[Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-24 Thread Dominik Klein
This fixes the issue described yesterday. Comments? Regards Dominik exporting patch: # HG changeset patch # User Dominik Klein dominik.kl...@gmail.com # Date 1308909599 -7200 # Node ID 2b1615aaca2c90f2f4ab93eb443e5902906fb28a # Parent 7a11934b142d1daf42a04fbaa0391a3ac47cee4c RA VirtualDomain

Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-24 Thread Dominik Klein
Hi Dejan, this way, the cluster never learns that it can't start a resource on that node. I don't consider this a solution. Regards Dominik ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org

[Linux-ha-dev] VirtualDomain issue

2011-06-22 Thread Dominik Klein
Hi code snippet from http://hg.linux-ha.org/agents/raw-file/7a11934b142d/heartbeat/VirtualDomain (which I believe is the current version) VirtualDomain_Validate_All() { snip if [ ! -r $OCF_RESKEY_config ]; then if ocf_is_probe; then ocf_log info Configuration file

Re: [Linux-HA] How to start Pacemaker in unmanaged mode ?

2011-05-11 Thread Dominik Klein
through some crm commands or similar (crm_attributes, etc.) but they are not taken in account before the 60s are ended. Alain Dominik Klein a écrit : Just write it to the xml on all nodes? On 05/10/2011 01:23 PM, Alain.Moulle wrote: Sorry I meant directly with is_managed=false of course

Re: [Linux-HA] How to start Pacemaker in unmanaged mode ?

2011-05-11 Thread Dominik Klein
On 05/11/2011 10:24 AM, Alain.Moulle wrote: Hi Dominik, I just have tried again : service corosync stop on both nodes node1 node2 remove the cib.xml on node2 vi cib.xml on node1 set the property maintenance-mode=true (nvpair id=cib-bootstrap-options-maintenance-mode

Re: [Linux-HA] How to start Pacemaker in unmanaged mode ?

2011-05-10 Thread Dominik Klein
Just write it to the xml on all nodes? On 05/10/2011 01:23 PM, Alain.Moulle wrote: Sorry I meant directly with is_managed=false of course ! Alain ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha

Re: [Linux-HA] ocf:pacemaker:ping: dampen

2011-04-29 Thread Dominik Klein
It waits $dampen before changes are pushed to the cib. So that eventually occuring icmp hickups do not produce an unintended failover. At least that's my understanding. Regards Dominik On 04/29/2011 09:22 AM, Ulrich Windl wrote: Hi, I think the description for dampen in OCF:pacemaker:ping

Re: [Linux-HA] ocf:pacemaker:ping: dampen

2011-04-29 Thread Dominik Klein
correcto wow. again! :) ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] New OCF RA: symlink

2011-04-21 Thread Dominik Klein
Am I too paranoid? I don't think you are. Some non-root pratically being able to remove any file is certainly a valid concern. Thing is: I needed an RA that configured a cronjob. Florian suggested writing the symlink RA instead, that could manage symlink. Apparently there was an IRC discussion

Re: [Linux-ha-dev] libglue2 dependency missing in cluster-glue

2011-03-17 Thread Dominik Klein
Mornin Dejan, The reason was that libglue2 and cluster-glue were not installed from the clusterlabs repository, as the rest of the packages were, but instead they were pulled from the original opensuse repository in an older version. This is what I found in pacemaker.spec.in in the

Re: [Linux-ha-dev] libglue2 dependency missing in cluster-glue

2011-03-17 Thread Dominik Klein
This is what I found in pacemaker.spec.in in the repository: Requires(pre): cluster-glue = 1.0.6 The 1.0.10 rpm from clusterlabs for opensuse 11.2 just says cluster-glue afaict: rpm -qR pacemaker cluster-glue resource-agents python = 2.4 libpacemaker3 = 1.0.10-1.4 libesmtp net-snmp

[Linux-ha-dev] libglue2 dependency missing in cluster-glue

2011-03-16 Thread Dominik Klein
Hi as some of you might have seen on the pacemaker list, I tried to install a 3 node cluster and there were ipc issues reported by the cib and therefore the cluster could not start correctly. The reason was that libglue2 and cluster-glue were not installed from the clusterlabs repository, as the

Re: [Linux-ha-dev] libglue2 dependency missing in cluster-glue

2011-03-16 Thread Dominik Klein
Hi Dejan The reason was that libglue2 and cluster-glue were not installed from the clusterlabs repository, as the rest of the packages were, but instead they were pulled from the original opensuse repository in an older version. This is what I found in pacemaker.spec.in in the repository:

Re: [Linux-HA] stonith + APC Masterswitch (AP9225 + AP9616)

2011-02-25 Thread Dominik Klein
You could also try apcmastersnmp. Got that to work with apc devices which did not work with the telnet thing. As long as they didn't change mibs (which I don't know whether they have). Might be worth a shot. Regards Dominik On 02/25/2011 02:24 AM, Avestan wrote: Hello Dejan, As I am

Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-02-14 Thread Dominik Klein
Thanks for inclusion. While looking through the pushed changes, I spotted two meta-data typos. See trivial patch. Regards Dominik Applied and pushed with two minor edits. Thanks a lot! Cheers, Florian --- conntrackd.orig 2011-02-14 11:43:22.0 +0100 +++ conntrackd 2011-02-14

Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-02-11 Thread Dominik Klein
@@ # An OCF RA for conntrackd # http://conntrack-tools.netfilter.org/ # -# Copyright (c) 2010 Dominik Klein +# Copyright (c) 2011 Dominik Klein # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as @@ -25,11

Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-02-11 Thread Dominik Klein
Maybe you applied the s/100/$slavescore patch someone sent a couple weeks ago. I used the last version from thread New stateful RA: conntrackd dated october 27th 3:29pm. Anyway, here's my version. Regards Dominik On 02/11/2011 01:36 PM, Florian Haas wrote: On 2011-02-11 09:48, Dominik Klein

Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-02-08 Thread Dominik Klein
Not yet. That's why I wrote soon_-ish_ ;) Any release coming up you want to include this in? any news on this? Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home

Re: [Linux-ha-dev] Report on conntrackd RA

2011-01-31 Thread Dominik Klein
Hi thanks for testing and feedback. On 01/27/2011 01:37 PM, Marjan, BlatnikČŠŽ wrote: Conntrackd RA from Dominik Klein works. We can now successfully migrate/fail from one node to another one. At the begining, we have problems with failing. After reboot/fail, the slave was not synced

Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-01-31 Thread Dominik Klein
at the mailing list we found a couple of options although we only fully evaluated the RA produced by Dominik Klein as it appears to be more feature complete than the alternative. For a full description of his RA please see his original thread[2]. So far throughout testing we have been very pleased

Re: [Linux-ha-dev] Feedback on conntrackd RA by Dominik Klein

2011-01-31 Thread Dominik Klein
Or, put differently: is us tracking the supposed state really necessary, or can we inquire it from the service somehow? From the submitted RA: # You can't query conntrackd whether it is master or slave. It can be both at the same time. # This RA creates a statefile

Re: [Linux-ha-dev] New stateful RA: conntrackd

2010-10-27 Thread Dominik Klein
# http://conntrack-tools.netfilter.org/ # # Copyright (c) 2010 Dominik Klein # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed

[Linux-ha-dev] New stateful RA: conntrackd

2010-10-15 Thread Dominik Klein
. -- IN-telegence GmbH Oskar-Jäger-Str. 125 50825 Köln Registergericht AG Köln - HRB 34038 USt-ID DE210882245 Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen #!/bin/bash # # # An OCF RA for conntrackd # http://conntrack-tools.netfilter.org/ # # Copyright (c) 2010 Dominik Klein

Re: [Linux-HA] Need suggestion on STONITH device

2010-04-07 Thread Dominik Klein
budgets for this. Anyway, again, thanks for your advice. I'm going to do some research on them. On Thu, Apr 1, 2010 at 6:38 AM, Dominik Klein d...@in-telegence.net wrote: Tony Gan wrote: Hi, For a two-node cluster, what are the best STONITH devices? Currently I am using Dell's iDrac

Re: [Linux-HA] Need suggestion on STONITH device

2010-04-01 Thread Dominik Klein
Tony Gan wrote: Hi, For a two-node cluster, what are the best STONITH devices? Currently I am using Dell's iDrac for STONITH device. It works pretty well. However the biggest problem for iDrac or any other lights-out devices is that they share power supply with hosts machines. Once an

Re: [Linux-HA] messages from existing hearbeat on the same lan

2010-01-19 Thread Dominik Klein
Aclhk Aclhk wrote: On the same lan, there are already two heartbeat node 136pri and 137sec. I setup another 2 nodes with heartbeat. they keep receiving the following messages: heartbeat[9931]: 2010/01/19_10:53:01 WARN: string2msg_ll: node [136pri] failed authentication heartbeat[9931]:

Re: [Linux-ha-dev] ulimit in ocf scripts

2010-01-13 Thread Dominik Klein
Andrew Beekhof wrote: On Tue, Jan 12, 2010 at 10:43 AM, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 01/12/2010 10:39 AM, Florian Haas wrote: Why not simply set that for root at boot? (it rhymes too :) because i do not like the idea that each and every process gets elevated limits by

Re: [Linux-ha-dev] [PATCH]Support of stop escalation for mysql-RA.

2009-12-01 Thread Dominik Klein
I'd suggest an approach like Florian's from the Virtualdomain RA. Here's a quote, guess you get the idea. shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5)) Regards Dominik Dejan Muhamedagic wrote: Hi Hideo-san, On Mon, Nov 30, 2009 at 11:00:05AM +0900, renayama19661...@ybb.ne.jp

Re: [Linux-HA] heartbeat - execute a script on a running node when the other node is back?

2009-11-16 Thread Dominik Klein
Tomasz Chmielewski wrote: Dejan Muhamedagic wrote: Hi, On Sun, Nov 15, 2009 at 09:09:53PM +0100, Tomasz Chmielewski wrote: I have two nodes, node_1 and node_2. node_2 was down, but is now up. How can I execute a custom script on node_1 when it detects that node_2 is back? That's not

Re: [Linux-HA] Restrict resources to specific nodes only

2009-09-29 Thread Dominik Klein
Kenneth Simbron wrote: Hi, Is there a way to restrict some resources to work only on specific nodes and other resources on another nodes? http://clusterlabs.org/mediawiki/images/f/fb/Configuration_Explained.pdf Read up on location constraints. Regards Dominik

Re: [Linux-ha-dev] Monitor operation for the Filesystem RA

2009-09-16 Thread Dominik Klein
Dejan Muhamedagic wrote: Hi Florian, On Wed, Sep 16, 2009 at 08:25:30AM +0200, Florian Haas wrote: Lars, Dejan, as discussed on #linux-ha yesterday, I've pushed a small changeset to the Filesystem RA that implements a monitor operation which checks whether I/O on the mounted filesystem is

Re: [Linux-HA] how to get group members

2009-08-19 Thread Dominik Klein
Ivan Gromov wrote: Hi, all How to get group members? I use crm_resource -x -t group -r group_Name. Can I get members without xml part? What about crm configure show group-name ? Regards Dominik ___ Linux-HA mailing list

Re: [Linux-HA] Constraints works for one resource but not for another

2009-08-17 Thread Dominik Klein
Tobias Appel wrote: Hi, I have a very weird error with heartbeat version 2.14. I have two IPMI resources for my two nodes. The configuration is posted here: http://pastebin.com/m52c1809c node1 is named nagios1 node2 is named nagios2 now I have ipmi_nagios1 (which should run on

Re: [Linux-HA] Pacemaker 1.4 HBv2 1.99 // About quorum choice (contd.)

2009-08-07 Thread Dominik Klein
Alain.Moulle wrote: Hello Andrew, Could you explain why this functionnality is no more available (configuration lines remain in ha.cf) ? ipfail was replaced by pingd in v2. That was in the very first version of v2 afaik. And how should we proceed to avoid split-brain cases in a two-nodes

Re: [Linux-HA] Command to see if a resource is started or not

2009-08-05 Thread Dominik Klein
Tobias Appel wrote: Hi, I need a command to see if a resource is started or not. Somehow my IPMI resource does not always start, especially on one node (for example if I reboot the node, or have a failover). There is no error and nothing, it just does nothing at all. Usually I have to

Re: [Linux-HA] Command to see if a resource is started or not

2009-08-05 Thread Dominik Klein
Tobias Appel wrote: On 08/05/2009 10:30 AM, Dominik Klein wrote: Tobias Appel wrote: So all I need is a command line tool to check wether a resource is currently started or not. I tried to check the resources with the failcount command, but it's always 0. And the crm_resource command is used

Re: [Linux-HA] Pacemaker 1.4 HBv2 1.99 // About quorum choice (contd.)

2009-08-05 Thread Dominik Klein
Alain.Moulle wrote: Thanks Andrew, 1. So my understanding is that in a more than 2 nodes cluster , if two nodes are failed, the have_quorum is set to 0 by the cluster soft and the behavior is choosen by the administrator with the no-quorum-policy parameter. So the question is now : what is

Re: [Linux-HA] updating cib without status attributes

2009-07-28 Thread Dominik Klein
You can try to compose the output of cibadmin -Q -o crm_config|resources|constraints to something usable for you. looks like I have to run the command once for each type and then concatenate the results. That's sort of what I meant to say. Sorry for being unclear. Regards Dominik

Re: [Linux-HA] Adding to a group without downtime

2009-07-28 Thread Dominik Klein
Gavin Hamill wrote: Hi :) I'm using the Lenny packages http://people.debian.org/~madkiss/ha/ and have been enjoying success with pacemaker + heartbeat (I've used a heartbeat v1 config for years without problems). I have a few IPaddr2 primitives in groups, but I'd like to understand how I

Re: [Linux-HA] updating cib without status attributes

2009-07-27 Thread Dominik Klein
Is there a query/config dump setting that will dump the running config to the command line without the status attributes? cibadmin -Q -o configuration ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org

Re: [Linux-HA] updating cib without status attributes

2009-07-27 Thread Dominik Klein
Dave Augustus wrote: On Mon, 2009-07-27 at 15:09 +0200, Dominik Klein wrote: Is there a query/config dump setting that will dump the running config to the command line without the status attributes? cibadmin -Q -o configuration What a quick reply! However I get: Call cib_query

Re: [Linux-HA] Stonith with APC Smart UPS 1000 +Network ManagementCard

2009-07-10 Thread Dominik Klein
Ehlers, Kolja wrote: Yeah it supports SSH but if I log in using SSH there is just a menu to configure the card. Since I can enter only 2 digits at that prompt 1- Control 2- Diagnostics 3- Configuration 4- Detailed Status 5- About UPS ESC- Back, ENTER-

Re: [Linux-HA] Stonith with APC Smart UPS1000 +Network ManagementCard

2009-07-10 Thread Dominik Klein
...@lists.linux-ha.org] Im Auftrag von Dominik Klein Gesendet: Freitag, 10. Juli 2009 08:27 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] Stonith with APC Smart UPS1000 +Network ManagementCard Ehlers, Kolja wrote: Yeah it supports SSH but if I log in using SSH there is just a menu

Re: [Linux-HA] Resource set question

2009-07-10 Thread Dominik Klein
Steinhauer Juergen wrote: Hi guys! In my cluster setup, I have 6 IP addresses which should be started in parallel for speed purpose, and two apps, depending on the six addresses. What would be the best way to configure this? Putting all IPs in a group will start them one after another.

Re: [Linux-HA] Master-slave, stopping a slave.

2009-07-08 Thread Dominik Klein
c smith wrote: Hi- I currently implement DRBD with Pacemaker. The DRBD resource is configured as a multi-state Master-slave resource in which node1 is the default master and node2 is the default slave. I am putting together a backup system that will run some automated scheduled tasks on

Re: [Linux-HA] Master-slave, stopping a slave.

2009-07-08 Thread Dominik Klein
c smith wrote: Dominik- Thanks for the reply.. I'm aware that the documents advise against it, but surely there must be a way. I was just looking at the new DRBD 8.3.2. It includes a fencing handler script that, upon failure of a DRBD master, adds a -INFINITY location constraint into the

Re: [Linux-HA] Add resource to a group

2009-06-26 Thread Dominik Klein
On 6/25/09 1:20 AM, Dominik Klein d...@in-telegence.net wrote: David Hoskinson wrote: Thanks Got it going again. However my amavisd service fails with a unknown exec error. Its the only one that won't work, and isn't related to the group question. I have it setup the same as postfix

Re: [Linux-HA] Add resource to a group

2009-06-25 Thread Dominik Klein
David Hoskinson wrote: Thanks Got it going again. However my amavisd service fails with a unknown exec error. Its the only one that won't work, and isn't related to the group question. I have it setup the same as postfix, dovecot, etc. Primitive amavisd lsb:amavisd op monitor

Re: [Linux-HA] Failover problem

2009-06-25 Thread Dominik Klein
The default value for stonith-enabled is true. If you however do not have a stonith device, that will give you an endless loop of unsuccessfully trying to shoot the other node before doing anything else to the resources the dead node was running. try crm configure property stonith-enabled=false

Re: [Linux-HA] Monitoring resources

2009-05-26 Thread Dominik Klein
Koen Verwimp wrote: Hi! I have defined a resources called rg_alfresco_ip . This resource consists of a OCF script (AlfrescoIP). This is script is a copy of IPAddr but with a customized status/monitoring procedure. group id= rg_alfresco_ip primitive class=ocf

Re: [Linux-HA] Problems With SLES11 + DRBD

2009-05-04 Thread Dominik Klein
darren.mans...@opengi.co.uk wrote: Hello everyone. Long post, sorry. I've been trying to get SLES11 with Pacemaker 1.0 / OpenAIS working for most of this week without success so far. I thought I may as well bundle my problems into one mail to see if anyone can offer any advice.

Re: [Linux-HA] Problems With SLES11 + DRBD

2009-05-04 Thread Dominik Klein
Dominik Klein wrote: darren.mans...@opengi.co.uk wrote: Hello everyone. Long post, sorry. I've been trying to get SLES11 with Pacemaker 1.0 / OpenAIS working for most of this week without success so far. I thought I may as well bundle my problems into one mail to see if anyone can offer

[Linux-ha-dev] Patch: RA mysql

2009-04-24 Thread Dominik Klein
Trivial. See attached patch. Regards Dominik exporting patch: # HG changeset patch # User Dominik Klein d...@in-telegence.net # Date 1240578752 -7200 # Node ID 2d97904c385cc9b4779286001611bd748f48589d # Parent 60cc2d6eee88ff6c2dedf7b539b9ee018efda6da Low: RA mysql: Correctly remove eventually

Re: [Linux-HA] Assymetric Clustering

2009-04-22 Thread Dominik Klein
fsalas wrote: Hi, I'm quite new to clustering and HeartBeat, but as far as I know, a very nice packages. Well, here is my problem, I'm willing to setup a cluster for an small enterprise that will have several services located in virtual machines, to make it simpler, let's say we have four

Re: [Linux-HA] Restart a service without the dependent services restarting?

2009-04-20 Thread Dominik Klein
Noah Miller wrote: Hi - Is it possible to restart a clustered service (v2 cluster) without its dependent services also stopping and starting? When the constraint score is advisory (0), dependencies should not be restarted, but then they are not really dependencies in the sense of the word.

Re: [Linux-HA] Re: Stopping the Heartbeat daemon does not stop the DRBD Daemon

2009-04-03 Thread Dominik Klein
Joe Bill wrote: Stopping the Heartbeat daemon (service heartbeat stop) does not stop the DRBD daemon even if it is one of the resources. - Heartbeat and DRBD are 2 different products/packages - Like most services, DRBD doesn't need Heartbeat to run. You can set up and run DRBD

Re: [Linux-HA] Re: Re: Re: Stopping the Heartbeat daemon does not stop the DRBD Daemon

2009-04-03 Thread Dominik Klein
- The DRBD daemons provide the communication interface for each network volume and are therefor an integral part of the volume management. Without the DRBD daemons, you (manually) and Heartbeat (automagically) could not handle the DRBD volumes. Just to avoid confusion: There is no such thing

Re: [Linux-HA] HA Books

2009-04-02 Thread Dominik Klein
darren.mans...@opengi.co.uk wrote: Hi. Can anyone recommend any good books about HA with regards to the latest incarnations such as Pacemaker etc? I understand enough about the CRM and heartbeat 2 to get by but lots of the stuff on this list still goes over my head. Thanks. Darren

Re: [Linux-HA] Stopping the Heartbeat daemon does not stop the DRBD Daemon

2009-04-02 Thread Dominik Klein
Jerome Yanga wrote: Stopping the Heartbeat daemon (service heartbeat stop) does not stop the DRBD daemon even if it is one of the resources. # service heartbeat stop Stopping High-Availability services: [ OK ] # service drbd

Re: [Linux-HA] pingd/pacemaker

2009-04-01 Thread Dominik Klein
I know. But this attrbiut does not exist in my setup. pacemaker verison 1.0.1-1. Is this a feature of 1.0.2? 1.0.1 is 4 months old. The RA was updated with those features 3 months ago. So basically, yes. You could still update the single RA from the mercurial repository though. Regards

Re: [Linux-HA] showscores.sh for pacemaker 1.0.2

2009-04-01 Thread Dominik Klein
So here's an update. Michael Schwartzkopf pointed out a bug regarding groups. That has been fixed now and the appropriate values should be shown. Thanks! There's not been a lot of feedback, is it because nobody uses the script or does it just work for you? Regards Dominik Dominik Klein wrote

Re: [Linux-HA] Heartbeat v2 stickiness, score and more

2009-04-01 Thread Dominik Klein
florian.engelm...@bt.com wrote: Hello, I spent the whole afternoon to search for a good heartbeat v2 documentation, but it looks like this is somehow difficult. Maybe someone in here can help me? Anyway I have a short question about stickiness. I only know about sun cluster but I have to

Re: [Linux-HA] pingd/pacemaker

2009-03-31 Thread Dominik Klein
Michael Schwartzkopff wrote: Hi, I am testing the pingd from the provider pacemaker. As Dominik told me, there is no need to define ping nodes in the ha.cf any more. OK so far. As I see pingd tries to reach all pingnodes of the hostlist attribute every 10 seconds. Is it possible to

Re: [Linux-HA] pingd/pacemaker

2009-03-31 Thread Dominik Klein
Michael Schwartzkopff wrote: Am Dienstag, 31. März 2009 15:27:47 schrieb Dominik Klein: Michael Schwartzkopff wrote: Hi, I am testing the pingd from the provider pacemaker. As Dominik told me, there is no need to define ping nodes in the ha.cf any more. OK so far. As I see pingd tries

Re: [Linux-HA] Beginner questions

2009-03-24 Thread Dominik Klein
Juha Heinanen wrote: Juha Heinanen writes: the real problem is that start of mysql server by pacemaker stops altogether after a few manual stops (/etc/init.d/mysql stop). i think i figured this out. when pacemaker needed to start my mysql-server resource three times on node lenny1,

Re: [Linux-HA] Beginner questions

2009-03-23 Thread Dominik Klein
Les Mikesell wrote: My first HA setup is for a squid proxy where all I need is to move an IP address to a backup server if the primary fails (and the cache can just rebuild on its own). This seems to work, but will only fail if the machine goes down completely or the primary IP is

Re: [Linux-HA] Beginner questions

2009-03-23 Thread Dominik Klein
Juha Heinanen wrote: Dominik Klein writes: Heartbeat in v1 mode (haresources configuration) cannot do any resource level monitoring itself. You'd need to do that externally by any means. yes, in v2 mode i have managed to make pacemaker to monitor resources, for example, like

Re: [Linux-HA] maintenance-mode of pengine

2009-03-23 Thread Dominik Klein
Michael Schwartzkopff wrote: Hi, In the metadata of the pengine I found the attribute maintenance-mode. I did not find any documentation about it. The long description also says: Should the cluster Anybody knows what this options does? Thanks. It disables resource management when

Re: [Linux-HA] expected-quorum-votes

2009-03-23 Thread Dominik Klein
crmd metadata tells me that expected-quorum-votes are used to calculate quorum in openais based clusters. Its default value is 2. Do I have to change this value if I have 3 or more nodes in a OpenAIS based cluster? No. It is automatically adjusted by the cluster. Regards Dominik

Re: [Linux-HA] Heartbeat degrades drbd resource

2009-03-23 Thread Dominik Klein
You cannot use drbd in heartbeat the way you configured it. Please refer to http://wiki.linux-ha.org/DRBD/HowTov2 and (if that wasn't made clear enough on the page) make sure the first thing you do is upgrade your cluster software. Read here on how to do that: http://clusterlabs.org/wiki/Install

Re: [Linux-HA] Heartbeat degrades drbd resource

2009-03-23 Thread Dominik Klein
Dominik Klein wrote: You cannot use drbd in heartbeat the way you configured it. Please refer to http://wiki.linux-ha.org/DRBD/HowTov2 Sorry, copy/paste error. I meant to say http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 ___ Linux-HA mailing list

Re: [Linux-HA] Beginner questions

2009-03-23 Thread Dominik Klein
Is there some documentation available for openais? I can't even find a good description of what it does or why you would use it. Also, will this help with my 2nd question: having a few spares for a large number of servers? While my objective with the squid cache is to proxy everything

Re: [Linux-HA] drbd RA issue in (heartbeat 2.1.4 + drbd-8.3.0)

2009-03-19 Thread Dominik Klein
Dejan Muhamedagic wrote: Hi, On Wed, Mar 18, 2009 at 11:37:27AM -0700, Neil Katin wrote: Dejan Muhamedagic wrote: Hi, On Tue, Mar 17, 2009 at 11:56:04AM +0530, Arun G wrote: Hi, I observed below error message when I upgraded drbd to drbd-8.3.0 in heartbeat 2.1.4 cluster on

Re: [Linux-HA] pingnodes in openais

2009-03-18 Thread Dominik Klein
Michael Schwartzkopff wrote: Hi, As far as I know pingnodes have to be configured in heartbeat. heartbeat pings the nodes and updates the CIB. Where can I configure pingnodes, when I use OpenAIS as the cluster stack? Create a pingd clone resource in the CIB. It's the preferred way of

Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-09 Thread Dominik Klein
Hi Jerome Yanga wrote: Dominik, As usual, you are right on the money. I should have caught that myself. Thank you for catching that for me. What happened was that I used a different server to compile DRBD and I had assumed that Nomen and Rubic (my test nodes) were on the same kernel.

Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-04 Thread Dominik Klein
Hi Jerome Yanga wrote: Hi! I am having issues with getting DRBD to work with Pacemaker. I can get Pacemaker and DRBD run individually but not DRBD managed by Pacemaker. I tried following the instruction in the site below but the resources will not go online.

[Linux-HA] showscores.sh for pacemaker 1.0.2

2009-03-03 Thread Dominik Klein
Hi I made the necessary changes to the showscores script to work with pacemaker 1.0.2. Please test and report problems. Has been reported to work by some people and should go into the repository soon. Still, I'd like more people to test and confirm. Important changes: * correctly fetch

Re: [Linux-HA] showscores for pacamaker-1.0

2009-03-02 Thread Dominik Klein
showscores gives me: ~# ./showscores.sh ResourceScore NodeStickiness #FailFail- Stickiness 50 0 on 50

Re: [Linux-HA] HA debug message

2009-02-16 Thread Dominik Klein
Tears ! wrote: Dear members! I have first time Install heartbeat on Slackware 12.2. I have enable debugging in ha.cf Here is the some debug message i want to describe here. Feb 14 23:01:15 haServer1 heartbeat: [15131]: WARN: Core dumps could be lost if multiple dumps occur. Feb 14

Re: [Linux-HA] DRBD in a 2 node cluster

2009-02-12 Thread Dominik Klein
v2.1 to 2.9 but must have missed this bit. user land and kernel module all report the same version. I am on my way into the office now and I will apply the changes once there thanks again Jason 2009/2/12 Dominik Klein d...@in-telegence.net Right, this one looks better. I'll refer

Re: [Linux-HA] Is it possible to cleanly take down a resource in a v1 config?

2009-02-12 Thread Dominik Klein
Hi heartbeat in v1 mode does not do resource monitoring by itself. So if you did not set up any custom resource monitoring, you can just stop your application in whatever way you normally do that and re-start it whenever you like. v1 clusters will not notice. They only see node state changes.

[Linux-ha-dev] Patch: RA anything

2009-02-11 Thread Dominik Klein
if you can improve things. Regards Dominik exporting patch: # HG changeset patch # User Dominik Klein d...@in-telegence.net # Date 1234350091 -3600 # Node ID 04533b37813c8be009814f52de7b14ff65bf9862 # Parent 90ff997faa7288248ac57583b0c03df4c8e41bda RA: anything. Implement most of lmbs suggestions

Re: [Linux-HA] failed dependencies while installing heartbeat 2.99.2-6.1

2009-02-11 Thread Dominik Klein
Gerd König wrote: Hi Dominik, thanks for answering quickly, but there were no dependencies found: #zypper search openipmi * Lese installierte Pakete [100%] Keine möglichen Abhängigkeiten gefunden. Do I need some additional software repositories ? I don't think so. The packages should

Re: [Linux-HA] Failovercluster considered one node down but state transition did not happen succesfully

2009-02-11 Thread Dominik Klein
Zemke, Kai wrote: Hi, I'm running a two node failover cluster. Yesterday the cluster tried to manage a state transition. In the log files I found the following entries: heartbeat[6905]: 2009/02/10_21:45:55 WARN: node nagios-drbd2: is dead heartbeat[6905]: 2009/02/10_21:45:55

Re: [Linux-HA] DRBD in a 2 node cluster

2009-02-11 Thread Dominik Klein
archive :) Regards Dominik Thanks Jason 2009/2/11 Dominik Klein d...@in-telegence.net Hi Jason any chance you started drbd at boot or the drbd device was active at the time you started the cluster resource? If so, read the introduction of the howto again and correct your setup. Jason

Re: [Linux-HA] DRBD in a 2 node cluster

2009-02-11 Thread Dominik Klein
The archive only contains info for one node and the logfile is empty. Did you use appropriate -f time and does ssh work between the nodes? So far, nothing obvious to me except for the order between your FS and DRBD lacking the role definition, but that's not what your problem is about (yet *g*).

Re: [Linux-HA] DRBD in a 2 node cluster

2009-02-11 Thread Dominik Klein
Right, this one looks better. I'll refer to nodes as 1001 and 1002. 1002 is your DC. You have stonith enabled, but no stonith devices. Disable stonith or get and configure a stonith device (_please_ dont use ssh). 1002 ha-log lines 926:939, node 1002 wants to shoot 1001, but cannot (l 978).

Re: [Linux-HA] DRBD in a 2 node cluster

2009-02-11 Thread Dominik Klein
Dominik Klein wrote: Right, this one looks better. I'll refer to nodes as 1001 and 1002. 1002 is your DC. You have stonith enabled, but no stonith devices. Disable stonith or get and configure a stonith device (_please_ dont use ssh). 1002 ha-log lines 926:939, node 1002 wants to shoot

Re: [Linux-HA] DRBD in a 2 node cluster

2009-02-10 Thread Dominik Klein
Jason Fitzpatrick wrote: Hi All I am having a hell of a time trying to get heartbeat to fail over my DRBD harddisk and am hoping for some help. I have a 2 node cluster, heartbeat is working as I am able to fail over IP Addresses and services successfully, but when I try to fail over my DRBD

Re: [Linux-HA] DRBD in a 2 node cluster

2009-02-10 Thread Dominik Klein
make it primary on either node Thanks Jason 2009/2/10 Jason Fitzpatrick jayfitzpatr...@gmail.com Thanks, This was the latest version in the Fedora Repos, I will upgrade and see what happens Jason 2009/2/10 Dominik Klein d...@in-telegence.net Jason Fitzpatrick wrote: Hi All I am

Re: [Linux-HA] failed dependencies while installing heartbeat 2.99.2-6.1

2009-02-10 Thread Dominik Klein
Gerd König wrote: Hello list, I wanted to start with heartbeat using the latest sources for OpenSuse10.3 64bit. I've downloaded these rpm's: heartbeat-2.99.2-6.1.x86_64.rpm heartbeat-common-2.99.2-6.1.x86_64.rpm heartbeat-debuginfo-2.99.2-6.1.x86_64.rpm

Re: [Linux-HA] OCF_ERROR_GENERIC

2009-02-04 Thread Dominik Klein
It is OCF_ERR_GENERIC, not OCF_ERROR_GENERIC. Read /usr/lib/ocf/resource.d/heartbeat/.ocf-returncodes You can also use ocf-tester to test your ocf script. Regards Dominik lakshmipadmaja maddali wrote: Hi all, I have a strange issue, that ocf_error_generic is being ingored at times.

  1   2   3   4   >