Re: [Linux-HA] How to debug corosync?

2011-05-08 Thread Stallmann, Andreas
Hi! Try corosync-objctl runtime.totem.pg.mrp.srp.members. You should see something like: Actually (honestly!) this command does not return anything. For corosync-objctl I can see a whole lot of objects of the type/class runtime.totem.pg.mrp, but none of the type members. The corosync

Re: [Linux-HA] Auto Failback despite location constrain

2011-04-29 Thread Stallmann, Andreas
. Cheers and thanks again for your support, Andreas -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Stallmann, Andreas Gesendet: Freitag, 29. April 2011 10:39 An: General Linux-HA mailing list Betreff: Re: [Linux-HA

[Linux-HA] How to debug corosync?

2011-04-28 Thread Stallmann, Andreas
Hi! In one of my clusters I disconnect one of the nodes (say app01) from the network. App02 takes of the resources as it should. Nice. When I reconnect app01 to the network, crm_mon on app01 continues to report app02 as offline and crm_mon on app02 does the same for app01. Still, no errors are

[Linux-HA] Auto Failback despite location constrain

2011-04-28 Thread Stallmann, Andreas
Hi! I configured my nodes *not* to auto failback after a defective node comes back online. This worked nicely for a while, but now it doesn't (and, honestly, I do not know what was changed in the meantime). What we do: We disconnect the two (virtual) interfaces of our node mgmt01 (running on

[Linux-HA] Pingd does not react as expected = split brain

2011-04-27 Thread Stallmann, Andreas
Hi! I've two cluster-nodes, both running pingd (as a clone), to keep ressources from starting on nodes which have not obvious connection to the network. The ping-nodes are: -appl01 (10.10.10.202) -appl02 (10.10.10.203) -Default GW (10.10.10.254) Before shutting down

Re: [Linux-HA] Pingd does not react as expected = split brain

2011-04-27 Thread Stallmann, Andreas
Hi Lars, Hi Lars! You are exercising complete cluster communication loss. Which is cluster split brain. Correct, yes. If you are specifically exercising cluster split brain, why are you surprised that you get exactly that? Because ping(d) is supposed to keep ressources from starting on

Re: [Linux-HA] Pingd does not react as expected = split brain

2011-04-27 Thread Stallmann, Andreas
Hi Andrew, According to your configuration, it can be up to 60s before we'll detect a change in external connectivity. Thats plenty of time for the cluster to start resources. Maybe shortening the monitor interval will help you. TNX for the suggestion, I'll try that. Any suggestions on

[Linux-HA] Comparison opterators in location constraints

2011-04-21 Thread Stallmann, Andreas
Hi! I tried to compare a value returned by ping(d) to a value given in a location contrain: location only-if-connected nag_grp \ rule $id=only-if-connected-rule -inf: not_defined pingd or pingd lte 2000 I thought, lte stands for [l]ess [t]h[e]n. That's obviously wrong, because when I

[Linux-HA] Resource-Group won't start - crm_mon does not react - no failures shown

2011-04-12 Thread Stallmann, Andreas
Hi! We've got a pretty straightforward and easy configuration: Corosync 1.2.1 / Pacemaker 2.0.0 on OpenSuSE 11.3 running DRBD (M/S), Ping (clone), and a resource-group, containing a shared IP, tomcat and mysql (where the datafiles of mysql reside on the DRBD). The cluster consists of two

Re: [Linux-HA] 3+node clusters?

2011-04-04 Thread Stallmann, Andreas
Hi there, I asked the same question some time ago and received no suitable answer so far. DRBD [1] does no proper replication over three nodes; it's basically still a Two-Node-RAID-1 with a third node, which doesn't really take part in the cluster but receives replication data as kind of a

[Linux-HA] Update of/change to the vmware-stonith-Script: How to contribute

2011-03-25 Thread Stallmann, Andreas
Hi! Just yesterday I made some changes to the vmware-stonith-script, so that it's possible to shutdown/start/reset nodes on vmware hosts, even if the cluster nodes are spread over several vmware hosts (where the vmware hosts are not clusterd themselves and thus aren't reachable over the same

Re: [Linux-HA] Load CRM-Konfiguration from file

2011-03-10 Thread Stallmann, Andreas
Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Dejan Muhamedagic I tried crm -f filename and crm filename to no avail. crm then commits the changes line-by-line imediately, which can lead to undesireable sideeffects (because some

[Linux-HA] Load CRM-Konfiguration from file

2011-03-09 Thread Stallmann, Andreas
Hi there, is it possible to exchange a complete CIB with an other CIB? The background is, that we have to roll out the same cluster in different customer enviroments with different IPs / networks. Instead of manipulating the CIB by hand via CRM, I'd rather replace placeholders in a template

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-25 Thread Stallmann, Andreas
Hi! I conentrate both your answers into one mail, I hope that's allright for you. For now, I need an interim solution, which is, as of now, stonith via suicide. Doesn't work as suicide is not considered reliable - by definition the remaining nodes have no way to verify that the fencing

Re: [Linux-HA] Problems starting apache

2011-02-24 Thread Stallmann, Andreas
Hi! I still have problems getting apache up and running via pacemaker. To do some bugtracking, I tried to figure out how and when the script /usr/lib/ocf/resource.d/heartbeat/apache is called. Strangely, it doesn't seem to be called with the start-Parameter at all. Date: Thu Feb 24 11:01:47

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-24 Thread Stallmann, Andreas
Hi! TNX for your answer. We will switch to sbd after the shared storage has been set up. For now, I need an interim solution, which is, as of now, stonith via suicide. My configuration doesn't work, though. I tried: ~~Output from crm configure show~~ primitive suicide_res

Re: [Linux-HA] HowTo correctly set up stonith:suicide (was AW: Looking for a suitable Stonith Solution)

2011-02-24 Thread Stallmann, Andreas
By the way: stonith -t suicide -T off mgmt03 works nicely. Thus the command itself is working. Cheers folks, and thanks again (in advance) for your help, Andreas CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court:

[Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Stallmann, Andreas
Hello! I'm currently looking for a suitable stonith solution for our environment: 1. We have three cluster nodes running OpenSuSE 10.3 with corosync and pacemaker. 2. The nodes reside on two VMware ESXi-Servers ( v. 4.1.0) in two locations, where one VMware Server hosts two, the other hosts

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Stallmann, Andreas
Hi! - (3) rules out sbd, as this method requires access to a physical device, that offers the shared storage. Am I right? The manual explicitly says, that sbd may even not be used on a DRBD-Partition. Question: Is there a way to insert the sbd-Header on a mounted drive instead of a

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Stallmann, Andreas
Hi there! ... Please no-one try a loop-mounted image file on NFS ;-) Even though in theory it may work, if you mount -o sync ... *Outch* ... Does this help? http://www.linux-ha.org/w/index.php?title=SBD_Fencingdiff=481oldid=97 Yes, this helps... somehow. Well, I should use iSCSI to share

[Linux-HA] Still problems with split brain

2008-05-09 Thread Stallmann, Andreas
Hi there, were still in deep sh** with heartbeat and drbd in a split brain szenario. We have the following set up: - A two node active/passive cluster (heartbeat 2.1.3 without crm) - Dopd with drbd-peer-outdater (the newest ones, patched). - Ipfail Still, if we disconnect one host from the

RE: [Linux-HA] Still problems with split brain

2008-05-09 Thread Stallmann, Andreas
Hi! An other issue adding to the problem described before: For testing purposes, we set in ha.cf: auto_failback off Still, when our old primary comes back, it takes over the ressources! Gnagnagnagna...! Please, make it stop! *sigh* It seems that the setting has no consequence at all! Thanks

RE: [Linux-HA] New questions relating to: Methods of dealing withnetwork fail(ure/over)

2008-04-21 Thread Stallmann, Andreas
Hi there! Thank you Dominik. dopd works just fine in heartbeat 2.1.3-21.1 toghether with drbd 8.2.5-3 and solved our problem. Kind regards, Andreas -- CONET Solutions GmbH Andreas Stallmann, Senior Berater --- CONET Solutions GmbH, Theodor-Heuss-Allee 19,

[Linux-HA] New questions relating to: Methods of dealing with network fail(ure/over)

2008-04-11 Thread Stallmann, Andreas
Hi there! I have set up a two-node heartbeat cluster running apache and drbd. Everthing went fine, till we tested a split brain scenario. In this case, when we detach both network cables from one host, we get a two-primary situation. I read in the thread methods of dealing with network