Re: [Linux-HA] How to debug corosync?

2011-05-08 Thread Stallmann, Andreas
Hi! > Try "corosync-objctl runtime.totem.pg.mrp.srp.members". You should see > something like: Actually (honestly!) this command does not return anything. For corosync-objctl I can see a whole lot of objects of the type/class runtime.totem.pg.mrp, but none of the type "members". The corosync

Re: [Linux-HA] Auto Failback despite location constrain

2011-04-29 Thread Stallmann, Andreas
not desireable. Cheers and thanks again for your support, Andreas -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Stallmann, Andreas Gesendet: Freitag, 29. April 2011 10:39 An: General Linux-HA mailing list Betreff

Re: [Linux-HA] How to debug corosync?

2011-04-29 Thread Stallmann, Andreas
Hi! > Just on a punt... There's not a (partial) firewall running on app02 is there? No, no iptables running anywhere and no layer 3 switches around which could do any filtering. How do you debug corosync? Every command I find to debug corosync shows, that everything is allright. Still, both n

Re: [Linux-HA] Auto Failback despite location constrain

2011-04-29 Thread Stallmann, Andreas
Hi! > If the resource ends up on the non-preferred node, those settings will cause > it to have > an equal score on both nodes, so it should stay put. > If you want to verify, try "ptest -Ls" to see what scores each resource has. Great, that's the command I was looking for! Before the failover t

[Linux-HA] Auto Failback despite location constrain

2011-04-28 Thread Stallmann, Andreas
Hi! I configured my nodes *not* to auto failback after a defective node comes back online. This worked nicely for a while, but now it doesn't (and, honestly, I do not know what was changed in the meantime). What we do: We disconnect the two (virtual) interfaces of our node mgmt01 (running on v

[Linux-HA] How to debug corosync?

2011-04-28 Thread Stallmann, Andreas
Hi! In one of my clusters I disconnect one of the nodes (say app01) from the network. App02 takes of the resources as it should. Nice. When I reconnect app01 to the network, crm_mon on app01 continues to report app02 as "offline" and crm_mon on app02 does the same for app01. Still, no errors ar

Re: [Linux-HA] Pingd does not react as expected => split brain

2011-04-27 Thread Stallmann, Andreas
Hi Andrew, > According to your configuration, it can be up to 60s before we'll detect a > change in external connectivity. > Thats plenty of time for the cluster to start resources. > Maybe shortening the monitor interval will help you. TNX for the suggestion, I'll try that. Any suggestions on r

Re: [Linux-HA] Pingd does not react as expected => split brain

2011-04-27 Thread Stallmann, Andreas
Hi Lars, Hi Lars! > You are exercising complete cluster communication loss. > Which is cluster split brain. Correct, yes. > If you are specifically exercising cluster split brain, why are you surprised > that you get exactly that? Because ping(d) is supposed to keep ressources from starting on

[Linux-HA] Pingd does not react as expected => split brain

2011-04-27 Thread Stallmann, Andreas
Hi! I've two cluster-nodes, both running pingd (as a clone), to keep ressources from starting on nodes which have not obvious connection to the network. The ping-nodes are: -appl01 (10.10.10.202) -appl02 (10.10.10.203) -Default GW (10.10.10.254) Before shutting down

[Linux-HA] Comparison opterators in location constraints

2011-04-21 Thread Stallmann, Andreas
Hi! I tried to compare a value returned by ping(d) to a value given in a location contrain: location only-if-connected nag_grp \ rule $id="only-if-connected-rule" -inf: not_defined pingd or pingd lte 2000 I thought, lte stands for "[l]ess [t]h[e]n". That's obviously wrong, because whe

[Linux-HA] Resource-Group won't start - crm_mon does not react - no failures shown

2011-04-12 Thread Stallmann, Andreas
Hi! We've got a pretty straightforward and easy configuration: Corosync 1.2.1 / Pacemaker 2.0.0 on OpenSuSE 11.3 running DRBD (M/S), Ping (clone), and a resource-group, containing a shared IP, tomcat and mysql (where the datafiles of mysql reside on the DRBD). The cluster consists of two virtua

Re: [Linux-HA] 3+node clusters?

2011-04-04 Thread Stallmann, Andreas
Hi there, I asked the same question some time ago and received no suitable answer so far. DRBD [1] does no "proper" replication over three nodes; it's basically still a Two-Node-RAID-1 with a third node, which doesn't really take part in the cluster but receives replication data as kind of a "b

[Linux-HA] Update of/change to the vmware-stonith-Script: How to contribute

2011-03-25 Thread Stallmann, Andreas
Hi! Just yesterday I made some changes to the vmware-stonith-script, so that it's possible to shutdown/start/reset nodes on vmware hosts, even if the cluster nodes are spread over several vmware hosts (where the vmware hosts are not clusterd themselves and thus aren't reachable over the same IP

Re: [Linux-HA] Load CRM-Konfiguration from file

2011-03-10 Thread Stallmann, Andreas
Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Dejan Muhamedagic >> I tried "crm -f filename" and "crm > the changes line-by-line imediately, which can lead to undesireable >> sideeffects (because some primitives start at once, where I acutal

[Linux-HA] Load CRM-Konfiguration from file

2011-03-09 Thread Stallmann, Andreas
Hi there, is it possible to exchange a complete CIB with an other CIB? The background is, that we have to roll out the same cluster in different customer enviroments with different IPs / networks. Instead of manipulating the CIB by hand via CRM, I'd rather replace placeholders in a "template ci

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-03-02 Thread Stallmann, Andreas
Hi Andrew, >> If "suicide" is no supported fencing option, why is it still included with >> stonith? > Left over from heartbeat v1 days I guess. > Could also be a testing-only device like ssh. www.clusterlabs.org tells me, you're the Pacemaker project leader. Would you, by chance, know who main

[Linux-HA] Recomendations for a three node storage cluster

2011-03-01 Thread Stallmann, Andreas
Hi there, I'm currently trying to set up a three node active/passive/passive storage cluster. My first tought was to use DRBD. The fact that drbd wouldn't want to come up lead me to the fact, that DRBD, as of now, does support three node clustes only by means of a "stacked ressource". My ques

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-25 Thread Stallmann, Andreas
Hi! I conentrate both your answers into one mail, I hope that's allright for you. > >For now, I need an interim solution, which is, as of now, stonith via > >suicide. > Doesn't work as suicide is not considered reliable - by definition the > remaining nodes have no way to verify that the fencin

Re: [Linux-HA] HowTo correctly set up stonith:suicide (was AW: Looking for a suitable Stonith Solution)

2011-02-24 Thread Stallmann, Andreas
By the way: stonith -t suicide -T off mgmt03 works nicely. Thus the command itself is working. Cheers folks, and thanks again (in advance) for your help, Andreas CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court: Amtsgericht

[Linux-HA] HowTo correctly set up stonith:suicide (was AW: Looking for a suitable Stonith Solution)

2011-02-24 Thread Stallmann, Andreas
Hi again! I tried to think my setup trough again, but I'm still not coming to any sensible conclusion. The stonith:suicide ressource was set up as a clone ressource, because that's how it's done in all the examples I found. Well - I didn't find a single example on "suicide", but that's at lea

Re: [Linux-HA] Problems starting apache

2011-02-24 Thread Stallmann, Andreas
Hi! First: I set up my configuration anew, and it works. I didn't change that much, just set the monitor-action differently from before. Instead of: > webserver_ressource ocf:heartbeat:apache \ > params httpd="/usr/sbin/httpd2-prefork" \ > op start interval="0" timeout="40s" \ >

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-24 Thread Stallmann, Andreas
Hi! TNX for your answer. We will switch to sbd after the shared storage has been set up. For now, I need an interim solution, which is, as of now, stonith via suicide. My configuration doesn't work, though. I tried: ~~Output from crm configure show~~ primitive suicide_res stonith:

Re: [Linux-HA] Problems starting apache

2011-02-24 Thread Stallmann, Andreas
Hi! I still have problems getting apache up and running via pacemaker. To do some bugtracking, I tried to figure out how and when the script /usr/lib/ocf/resource.d/heartbeat/apache is called. Strangely, it doesn't seem to be called with the "start"-Parameter at all. Date: Thu Feb 24 11:01:47

[Linux-HA] Problems starting apache

2011-02-23 Thread Stallmann, Andreas
Hi there, I'm afraid, I'm asking a question that several other people asked before. Believe me, I think I tried everything from the posts I've found yet. I'm currently trying to get my apache webserver to be started by pacemaker. Here's the config: primitive sharedIP ocf:heartbeat:IPaddr2 \

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Stallmann, Andreas
Hi there! ... > Please no-one try a loop-mounted image file on NFS ;-) Even though in theory > it may work, if you mount -o sync ... > *Outch* ... > Does this help? > http://www.linux-ha.org/w/index.php?title=SBD_Fencing&diff=481&oldid=97 Yes, this helps... somehow. Well, I should use iSCSI to s

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Stallmann, Andreas
Hi! >> - (3) rules out sbd, as this method requires access to a physical device, >> that offers the shared storage. Am I right? The manual explicitly says, that >> sbd may even not be used on a DRBD-Partition. Question: Is there a way to >> insert the sbd-Header on a mounted drive instead of a

[Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Stallmann, Andreas
Hello! I'm currently looking for a suitable stonith solution for our environment: 1. We have three cluster nodes running OpenSuSE 10.3 with corosync and pacemaker. 2. The nodes reside on two VMware ESXi-Servers ( v. 4.1.0) in two locations, where one VMware Server hosts two, the other hosts on

RE: [Linux-HA] Still problems with split brain

2008-05-09 Thread Stallmann, Andreas
Hi! An other issue adding to the problem described before: For testing purposes, we set in ha.cf: auto_failback off Still, when our old primary comes back, it takes over the ressources! Gnagnagnagna...! Please, make it stop! *sigh* It seems that the setting has no consequence at all! Thanks

[Linux-HA] Still problems with split brain

2008-05-09 Thread Stallmann, Andreas
Hi there, were still in deep sh** with heartbeat and drbd in a split brain szenario. We have the following set up: - A two node active/passive cluster (heartbeat 2.1.3 without crm) - Dopd with drbd-peer-outdater (the newest ones, patched). - Ipfail Still, if we disconnect one host from the netw

RE: [Linux-HA] New questions relating to: Methods of dealing withnetwork fail(ure/over)

2008-04-21 Thread Stallmann, Andreas
Hi there! Thank you Dominik. dopd works just fine in heartbeat 2.1.3-21.1 toghether with drbd 8.2.5-3 and solved our problem. Kind regards, Andreas -- CONET Solutions GmbH Andreas Stallmann, Senior Berater --- CONET Solutions GmbH, Theodor-Heuss-Allee 19, 537

[Linux-HA] New questions relating to: Methods of dealing with network fail(ure/over)

2008-04-11 Thread Stallmann, Andreas
Hi there! I have set up a two-node heartbeat cluster running apache and drbd. Everthing went fine, till we tested a "split brain" scenario. In this case, when we detach both network cables from one host, we get a two-primary situation. I read in the thread "methods of dealing with network failo