Re: [Pacemaker] HA across WDM Fibre link - Nodes won't rejoin after reboot

2012-04-02 Thread Darren.Mansell
> On 2012-04-02T14:53:53, darren.mans...@opengi.co.uk wrote: > > > I have 2 nodes running on ESX hosts in 2 geographically diverse data > > centres. The link between them is a DWDM fibre link which is the only > > thing I can think of as being the cause of this. > >

[Pacemaker] HA across WDM Fibre link - Nodes won't rejoin after reboot

2012-04-02 Thread Darren.Mansell
Hi everyone. I have 2 nodes running on ESX hosts in 2 geographically diverse data centres. The link between them is a DWDM fibre link which is the only thing I can think of as being the cause of this. SLES 11 SP1 with HAE. All latest updates. If Corosync is set to Multicast on the defau

Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-09-29 Thread Darren.Mansell
Sorry for top-posting, I'm Outlook-afflicted. This is also my problem; In the full production environment there will be low-level hardware fencing by means of IBM RSA/ASM but this is a VMware test environment. The vmware STONITH plugin is dated and doesn't seem to work correctly (I gave up quic

[Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-09-29 Thread Darren.Mansell
(Originally sent to DRBD-user, reposted here as it may be more relevant) Hello all. I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2 for dual-primary shared FS. I've followed the instructions on the DRBD applications site and it works really well. However, if I

[Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-09-29 Thread Darren.Mansell
(Originally sent to DRBD-user, reposted here as it may be more relevant) Hello all. I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2 for dual-primary shared FS. I've followed the instructions on the DRBD applications site and it works really well. However, if I

Re: [Pacemaker] Help With Cluster Failure

2011-04-08 Thread Darren.Mansell
-Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: 08 April 2011 08:15 To: The Pacemaker cluster resource manager Cc: Darren Mansell Subject: Re: [Pacemaker] Help With Cluster Failure On Thu, Apr 7, 2011 at 12:12 PM, wrote: > Hi all. > > > > One of my clusters had

[Pacemaker] Help With Cluster Failure

2011-04-07 Thread Darren.Mansell
Hi all. One of my clusters had a STONITH shoot-out last night and then refused to do anything but sit there from 0400 until 0735 after I'd been woken up to fix it. In the end, just a resource cleanup fixed it, which I don't think should be the case. I have an 8MB hb_report file. Is tha

Re: [Pacemaker] IPaddr2 Netmask Bug Fix Issue

2011-03-30 Thread Darren.Mansell
From: Pavel Levshin [mailto:pa...@levshin.spb.ru] Sent: 25 March 2011 19:50 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] IPaddr2 Netmask Bug Fix Issue 25.03.2011 18:47, darren.mans...@opengi.co.uk: We configure a virtual IP on the non-arping lo interface of both servers and

[Pacemaker] IPaddr2 Netmask Bug Fix Issue

2011-03-25 Thread Darren.Mansell
Hello all. Between SLE 11 HAE and SLE 11 SP1 HAE (pacemaker 1.0.3 - pacemaker 1.1.2) the following bit has changed in the IPaddr2 RA: Old: local iface=`$IP2UTIL -o -f inet addr show | grep "\ $BASEIP/" \ | cut -d ' ' -f2 | grep -v '^ipsec[0-9][0-9]*$'` New: local ifac

Re: [Pacemaker] Lots of Issues with Live Pacemaker Cluster

2011-03-15 Thread Darren.Mansell
I'm sorry if it came through to you that way. It's the challenges I face as an Enterprise IT worker using Linux as my desktop. Either I use Outlook in a Windows VM and top post like this, or use Evolution, quote correctly but potentially cause encoding/unicode issues.. Regards, Darren Mansell

Re: [Pacemaker] Lots of Issues with Live Pacemaker Cluster

2011-03-15 Thread Darren.Mansell
On Mon, 2011-03-14 at 20:08 +0300, Pavel Levshin wrote: > > 14.03.2011 13:57, darren.mans...@opengi.co.uk: > > I built and put into production without adequate testing a 2 node > > cluster running Ubuntu 10.04 LTS with Pacemaker and associated > > packages from the Ubuntu-HA-maintainers repo > >

Re: [Pacemaker] Lots of Issues with Live Pacemaker Cluster

2011-03-15 Thread Darren.Mansell
On Mon, 2011-03-14 at 17:35 +0100, Dejan Muhamedagic wrote: > Hi, > > On Mon, Mar 14, 2011 at 10:57:27AM -, darren.mans...@opengi.co.uk wrote: > > Hello everyone. > > > > > > > > I built and put into production without adequate testing a 2 node > > cluster running Ubuntu 10.04 LTS with Pac

Re: [Pacemaker] Lots of Issues with Live Pacemaker Cluster

2011-03-15 Thread Darren.Mansell
湏䴠湯‬〲ㄱ〭ⴳ㐱愠⁴㈱ㄺ‹〫〱ⰰ䄠摮敲⁷敂步潨⁦牷瑯㩥㸊㸠ㄠ‮†††剄䑂搠敯湳璢瀠潲潭整搯浥瑯⁥潣牲捥汴⹹圠敨敮敶⁲⁉慨敶愠㸊映楡潬敶Ⱳ㸊㸠琠敨䐠䉒⁄敲潳牵散眠汩番瑳猠瑩琠敨敲漠桴⁥牷湯⁧潮敤‬潨摬湩⁧灵㸊愠汬㸊㸠漠桴牥漠数慲楴湯⹳䤠ꉴ⁳楬敫琠敨搠浥瑯⁥敮敶⁲慨灰湥⹳丠瑯楨杮椠ੳ‾潬杧摥眠敨੮‾‾桴獩栠灡数獮‬瑩樠獵⁴楳獴映牯癥牥眠瑩⁨慨晬漠⁦桴⁥敲潳牵散ੳ‾瑳灯数⁤湡੤‾‾剄䑂洠獡整⁲湯琠敨眠潲杮渠摯⹥㸊ਠ‾潆⁲桴獩愠⁴敬獡⁴❉⁤湥潣牵条⁥⁡畢⁧敲潰瑲眠瑩⁨⁡扨牟灥牯⁴牡档癩⹥㸊圠瑩潨瑵琠敨氠杯ⱳ琠敨挠湯楦畧慲楴湯愠潬敮眠湯⁴整汬甠⁳畭档‮ਊ桁‮⁉慷⁳潬歯湩⁧潦⁲慨牟灥牯⁴湡⁤獡畳敭

Re: [Pacemaker] [Linux-HA] Solved: SLES 11 HAE SP1 Signon to CIB Failed

2011-02-09 Thread Darren.Mansell
> > So I compared the /etc/ais/openais.conf in non-sp1 with > /etc/corosync/corosync.conf from sp1 and found this bit missing which > could be quite useful... > > service { > # Load the Pacemaker Cluster Resource Manager > ver: 0 > name: pacemaker >

[Pacemaker] Solved: [Linux-HA] SLES 11 HAE SP1 Signon to CIB Failed

2011-02-03 Thread Darren.Mansell
On Fri, Jan 28, 2011 at 1:06 PM, wrote: > Hi all, this seems like it should be an easy one to fix, I'll raise a > support call with Novell if required. > > > > Base install of SLES 11 32 bit SP1 with HAE SP1 and crm_mon gives > 'signon to CIB failed'. Same thing with the CRM shell etc. Too ma

[Pacemaker] SLES 11 HAE SP1 Signon to CIB Failed

2011-01-28 Thread Darren.Mansell
Hi all, this seems like it should be an easy one to fix, I'll raise a support call with Novell if required. Base install of SLES 11 32 bit SP1 with HAE SP1 and crm_mon gives 'signon to CIB failed'. Same thing with the CRM shell etc. All the logs look fine and I'm root. It's using corosync /

Re: [Pacemaker] [lvs-users] is it possible to have ldirector and real cluster server on same physical machine?

2010-12-06 Thread Darren.Mansell
Check the /var/log/ldirectord.log file for errors and check you can manually start it yourself: rcldirectord restart I've had to compile a Perl module myself for ldirector in SLES 11 HAE: http://www.clusterlabs.org/wiki/Load_Balanced_MySQL_Replicated_Cluster#Missing_Perl_Socket6 You also ne

[Pacemaker] Help with understanding CIB scores

2010-07-05 Thread Darren.Mansell
Hi all. Could anyone give me any pointers on how to easily find out what is stopping resources moving to a preferred node as expected? I'm looking at the ptest -Ls output and can see there is a greater score for a resource on another node than the node I am specifically locating. I can't se

Re: [Pacemaker] Time/Date Based Expressions in the CRM Shell

2010-03-31 Thread Darren.Mansell
> -Original Message- > From: Dejan Muhamedagic [mailto:deja...@fastmail.fm] > Sent: 31 March 2010 11:09 > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Time/Date Based Expressions in the CRM Shell > Hi, > > On Wed, Mar 31, 2010 at 10:56:29AM +0100, darren.mans...@op

[Pacemaker] Time/Date Based Expressions in the CRM Shell

2010-03-31 Thread Darren.Mansell
Apologies if this is in the documentation but I can't see how to use the time/date based expression resource constraints in the CRM shell. Can anyone provide an example config or point me to any documentation for how to use it? I'm trying to use these constraints to run scripts at certain t

Re: [Pacemaker] DRBD Recovery Policies

2010-03-12 Thread Darren.Mansell
Sorry, as I thought, I'm being stupid :O Thanks for the prod. -Original Message- From: Menno Luiten [mailto:mlui...@artifix.net] Sent: 12 March 2010 10:40 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] DRBD Recovery Policies On 12-03-10 11:26, darren.mans...@opengi.co.uk wro

Re: [Pacemaker] DRBD Recovery Policies

2010-03-12 Thread Darren.Mansell
Fairly standard, but I don't really want it to be fenced, as I want to keep the data that has been updated on the single remaining nodeB while NodeA was being repaired: global { dialog-refresh 1; minor-count 5; } common { syncer { rate 10M; } } resource cluster_disk { protocol C;

Re: [Pacemaker] DRBD Recovery Policies

2010-03-12 Thread Darren.Mansell
The odd thing is - it didn't. From my test, it failed back, re-promoted NodeA to be the DRBD master and failed all grouped resources back too. Everything was working with the ~7GB of data I had put onto NodeB while NodeA was down, now available on NodeA... /proc/drbd on the slave said Secondary/P

[Pacemaker] DRBD Recovery Policies

2010-03-11 Thread Darren.Mansell
I've been reading the DRBD Pacemaker guide on the DRBD.org site and I'm not sure I can find the answer to my question. Imagine a scenario: (NodeA NodeB Order and group: M/S DRBD Promote/Demote FS Mount Other resource that depends on the F/S mount DRBD master location score of 10

Re: [Pacemaker] Help with OCFS2 / DLM Stability

2010-03-10 Thread Darren.Mansell
On Wed, 2010-03-10 at 13:28 +0100, Dejan Muhamedagic wrote: Hi, On Tue, Mar 09, 2010 at 11:37:02AM -, darren.mans...@opengi.co.uk wrote: > Hi everyone. > > > > Further to some discussions a couple of weeks ago with

Re: [Pacemaker] Help with OCFS2 / DLM Stability

2010-03-10 Thread Darren.Mansell
Sorry, please ignore this mail. Client issues! -Original Message- From: darren.mans...@opengi.co.uk [mailto:darren.mans...@opengi.co.uk] Sent: 10 March 2010 13:53 To: deja...@fastmail.fm Cc: pacemaker@oss.clusterlabs.org Subject: Re: Re: [Pacemaker] Help with OCFS2 / DLM Stability On We

Re: [Pacemaker] Help with OCFS2 / DLM Stability

2010-03-10 Thread Darren.Mansell
On Wed, 2010-03-10 at 13:28 +0100, Dejan Muhamedagic wrote: > Hi, >=20 > On Tue, Mar 09, 2010 at 11:37:02AM -, darren.mans...@opengi.co.uk wro= te: > > Hi everyone. > >=20 > > =20 > >=20 > > Further to some discussions a couple of weeks ago with regard to OCFS2 > > on SLES 11 HAE I'm looking to

Re: [Pacemaker] DRBD and fencing

2010-03-10 Thread Darren.Mansell
On Wed, Mar 10, 2010 at 02:32:05PM +0800, Martin Aspeli wrote: > Florian Haas wrote: >> On 03/09/2010 06:07 AM, Martin Aspeli wrote: >>> Hi folks, >>> >>> Let's say have a two-node cluster with DRBD and OCFS2, with a database >>> server that's supposed to be active on one node at a time, using the

Re: [Pacemaker] cluster/load balancing in openvz containers

2010-02-25 Thread Darren.Mansell
That's about where I got to last time I looked at it. Openvz and Linux-HA should be great together, there's just lots of little configuration issues to get around. I went with the ldirector / ipvsadm route but run into issues with ARP config and didn't really have enough time to look into it. If y

Re: [Pacemaker] OCFS2 fencing regulated by Pacemaker?

2010-02-11 Thread Darren.Mansell
Once again, I apologise for the top-posting. I wish I could use a real mail client but nothing apart from Outlook works properly with Exchange :(. Anyway - Yes We've had a really hard time with our 3-node SAN based cluster. We implemented OCFS2 on top of a shared disk using a o2cb and dlm clones.

Re: [Pacemaker] OCFS2 fencing regulated by Pacemaker?

2010-02-11 Thread Darren.Mansell
Hello. Yes, we get the same kind of thing. SLES11 HAE 64-bit. Average uptime of the boxes is about a week at the moment. Also 3 nodes using OCFS2 / cLVMD / OCFS2: node OGG-NODE-01 node OGG-NODE-02 \ attributes standby="off" node OGG-NODE-0

Re: [Pacemaker] Announce: Hawk (HA Web Konsole) 0.2.0

2010-02-10 Thread Darren.Mansell
On Tue, 2010-02-09 at 16:38 -0700, Tim Serong wrote: >=20 > So, by "fixed" I clearly meant "fixed in only one of the two places > that > require fixing". Please try the following change (the relevant file > will > be /srv/www/hawk/public/javascripts/application.js):=20 This now works great, thank

Re: [Pacemaker] ocf:heartbeat:mysql RA: op monitor

2010-02-09 Thread Darren.Mansell
On Tue, 2010-02-09 at 17:01 +0100, Oscar Rem=C3=AD=C2=ADrez de Ganuza Satr= =C3=BAstegui wrote: > Hello! >=20 > I have one question regarding the ocf:heartbeat:mysql RA. >=20 > I supposed that the following params defined on the resource were to=20 > being used by the monitor operation to check the

Re: [Pacemaker] Announce: Hawk (HA Web Konsole) 0.2.0

2010-02-09 Thread Darren.Mansell
On Tue, 2010-02-09 at 06:44 -0700, Tim Serong wrote: > On 2/9/2010 at 11:05 PM, wrote:=20 > > It's pacemaker-1.0.3-4.1=20 > > =20 > > No output for cluster-infrastructure.=20 > > =20 > > But the HTML source does contain information, just display: none hides=20 > > it:=20 > > =20 > > =20 > > =20

Re: [Pacemaker] Announce: Hawk (HA Web Konsole) 0.2.0

2010-02-09 Thread Darren.Mansell
On Tue, 2010-02-09 at 04:06 -0700, Tim Serong wrote: On 2/9/2010 at 09:15 PM, wrote: > > Hi Tim. Thanks for this project, it seems to be exactly what we're > > looking for. > > Well, I certainly hope so :) > > > I've installed it (it required spawn-fcgi too on SLES11 64) but I just > > get a

Re: [Pacemaker] problem: mysql service duplicated due to existing pidfile

2010-02-09 Thread Darren.Mansell
Hello Oscar. We use the OCF RA for MySQL and have not had any problems. Our config is: primitive MySQL ocf:heartbeat:mysql \ params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" datadir="/data/mysql" user="mysql" pid="/var/lib/mysql/mysql.pid" socket="/var/lib/mysql/mysql.sock" tes

Re: [Pacemaker] Announce: Hawk (HA Web Konsole) 0.2.0

2010-02-09 Thread Darren.Mansell
Hi Tim. Thanks for this project, it seems to be exactly what we're looking for. I've installed it (it required spawn-fcgi too on SLES11 64) but I just get a blank page. I've looked at the page source and the divs have style="display: none". Not sure why that's happening, can you think of anything?

Re: [Pacemaker] Discussion about a "Cluster knowledge test"

2010-01-14 Thread Darren.Mansell
Do you have a link to the online certification please? -Original Message- From: Florian Haas [mailto:florian.h...@linbit.com] Sent: 14 January 2010 09:34 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] Discussion about a "Cluster knowledge test" On 2010-01-14 10:06, Michael Sc

Re: [Pacemaker] [Linux-HA] Multiple Choice test for cluster knowledge

2010-01-13 Thread Darren.Mansell
Yes please in English for both! Have you (or anyone else) thought of doing a Linux-HA certification? Darren -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Michael Schwartzkopff Sent: 13 January 2010 13:17 To: The Pa

Re: [Pacemaker] [PATCH] Allow the user to insert a startup configuration

2009-12-15 Thread Darren.Mansell
Perhaps change: crm_info("Using initial configuration file : %s", > >static_config_file); To: crm_warn("Using initial configuration file : %s", > >static_config_file); ? Anyone who would know to put a static config file in there in the first place would be proficient enough to look in the log

Re: [Pacemaker] [PATCH] Allow the user to insert a startupconfiguration

2009-12-15 Thread Darren.Mansell
As far as I understand the patch, yes I think it would be useful. -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: 15 December 2009 10:38 To: frank.di...@bigbandnet.com Cc: pacema...@clusterlabs.org Subject: Re: [Pacemaker] [PATCH] Allow the user to insert a start

Re: [Pacemaker] I Like This But...

2009-12-07 Thread Darren.Mansell
Depends on what you're doing with it to make it challenging or not. The old Linux-HA had a very steep learning curve that isn't there as much anymore. A decent level of networking knowledge is required but the documentation is now excellent and with the CRM shell, Pacemaker is a lot easier to work

Re: [Pacemaker] is ptest 1.06 working correctly?

2009-11-30 Thread Darren.Mansell
This sounds very interesting. I look forward to trying it :) (sorry for Outlook-affliction) -Original Message- From: Dejan Muhamedagic [mailto:deja...@fastmail.fm] Sent: 30 November 2009 17:28 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] is ptest 1.06 working correctly? Hi

Re: [Pacemaker] is ptest 1.06 working correctly?

2009-11-30 Thread Darren.Mansell
I've never really understood the correct time to do the ptest graphs. I initiated a failover once and did the graph very quickly while it was in a transitional state but I've always wondered if there is an easier way i.e. "show me a graph of the migration plan if such and such were to happen". ---

Re: [Pacemaker] no ais-keygen with ubuntu hardy and launchpad?

2009-11-04 Thread Darren.Mansell
Try corosync-keygen and work with /etc/corosync as if it were /etc/ais/ -Original Message- From: Dirk Taggesell [mailto:dirk.tagges...@proximic.com] Sent: 04 November 2009 11:39 To: pacemaker@oss.clusterlabs.org Subject: [Pacemaker] no ais-keygen with ubuntu hardy and launchpad? Hi all,

Re: [Pacemaker] why use ocf::linbit:drbd instead ofocf::heartbeat:drbd?

2009-10-12 Thread Darren.Mansell
On 2009-10-10 10:37, xin.li...@cs2c.com.cn wrote: > Hi all: > > As I known, drbd (8.3.2 and above) in pacemaker has 2 ocf scripts, > one is from linbit, the other one is from heartbeat . > > In Andrew's "Cluster form Scratch - Fedora 11" , "Configure the > Cluster for DRBD" , he uses ocf::li

Re: [Pacemaker] Low cost stonith device

2009-09-25 Thread Darren.Mansell
I find the riloe plugin to be very good so if you can get cheap HP servers with iLO then that could constitute a low cost STONITH device. -Original Message- From: Mario Giammarco [mailto:mgiamma...@gmail.com] Sent: 24 September 2009 19:08 To: pacema...@clusterlabs.org Subject: [Pacemaker]

Re: [Pacemaker] Arp and configuration advice

2009-09-09 Thread Darren.Mansell
> Greetings, > > I have a two webserver/two database server clustered setup. I've got > ldirector and LVS managed by pacemaker and configured to be able to run > on either database machine. > > I know how to disable ARP for the machine not running ldirector, > unfortunately I'm not sure how to dy

[Pacemaker] Load-Balancing Confusion

2009-09-02 Thread Darren.Mansell
Can anyone help me clear up my confusion with load-balancing / load-sharing using Linux-HA please? I've always used ldirectord/LVS with an IPaddr2 resource (not cloned), collocated them and put the virtual IP address on the loopback of all nodes. When the IPaddr2 resource starts on any node it

Re: [Pacemaker] Resource Failover in 2 Node Cluster

2009-08-19 Thread Darren.Mansell
I've now re-installed the SLES 11 HAE DRBD module, usertools and set the cluster to use the heartbeat RA and it now fails over as expected. Does the Linbit provided RA work differently? Is the following anything to do with it from the logs? Aug 19 11:08:37 gihub2 pengine: [4837]: notice: clone_

[Pacemaker] Temporarily Stop Cloned Resource on 1 Node

2009-08-07 Thread Darren.Mansell
Hello all. I have a cloned resource that I need to stop temporarily on 1 node. Am I missing something quite obvious because I can't figure out how to do it without reconfiguring the CIB. Pacemaker 1.0.3 on SLES 11. Thanks. Darren ___ Pacemak

Re: [Pacemaker] setup for flexlm lmgrd failover

2009-07-02 Thread Darren.Mansell
> > Hi > > I'm looking for a How-To on setting up Pacemaker for a failover pair > of suse 10.2 flexlm license managers servers. For both Portland Group > and Intel compilers float licenses. > > mac address take-over then start lmgrd etc. > > Many thanks > > Jonathan > > __

Re: [Pacemaker] ESX guest having SLE HA

2009-06-10 Thread Darren.Mansell
> From: Priyanka Ranjan [mailto:priyanka3rd...@gmail.com] > Sent: 10 June 2009 04:56 > To: pacemaker@oss.clusterlabs.org > Subject: Re: [Pacemaker] ESX guest having SLE HA > > Thanks Lars and Andrew for this info. I'm finding SLES 11 HAE to work fine in ESX. The GUI install and X don't work to