Re: [Linux-HA] Antw: ocf:heartbeat:Raid1 starts wrong recovery

2011-09-22 Thread Ulrich Windl
Dimitri Maziuk dmaz...@bmrb.wisc.edu schrieb am 21.09.2011 um 19:24 in Nachricht 4e7a1ddc.5000...@bmrb.wisc.edu: On 09/21/2011 01:26 AM, Ulrich Windl wrote: [...] (I wonder how many times you can re-add a physically failing drive before you get your data corrupted -- there must be a clever

Re: [Linux-HA] crm crashes after: TOTEM failed to receive

2011-09-22 Thread Sascha Hagedorn
Hi Dejan, thank you for your quick response. I will post this on the corosync mailing list as well. But is this TOTEM Failed to receive message an indicator that the multicast communication between the two is somehow erroneous? Regards, Sascha -Ursprüngliche Nachricht- Von:

Re: [Linux-HA] crm crashes after: TOTEM failed to receive

2011-09-22 Thread Dejan Muhamedagic
Hi, On Thu, Sep 22, 2011 at 09:23:11AM +0200, Sascha Hagedorn wrote: Hi Dejan, thank you for your quick response. I will post this on the corosync mailing list as well. But is this TOTEM Failed to receive message an indicator that the multicast communication between the two is somehow

Re: [Linux-HA] crm crashes after: TOTEM failed to receive

2011-09-22 Thread Sascha Hagedorn
Hello Dejan, well, actually the nodes communicate over a virtual network device since they are virtual machines on a XEN host. No switches or hardware involved as far as I know. Regards, Sascha -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-

Re: [Linux-HA] crm crashes after: TOTEM failed to receive

2011-09-22 Thread Dejan Muhamedagic
On Thu, Sep 22, 2011 at 12:45:19PM +0200, Sascha Hagedorn wrote: Hello Dejan, well, actually the nodes communicate over a virtual network device since they are virtual machines on a XEN host. No switches or hardware involved as far as I know. Another possibility is that corosync on the

Re: [Linux-HA] [Pacemaker] Errors When Loading OCF

2011-09-22 Thread Dejan Muhamedagic
On Tue, Sep 20, 2011 at 06:05:43PM -0400, Nick Khamis wrote: Trying to get the linbit drbd ocf script going, I tried: root@mydrbd1:export OCF_ROOT=/usr/lib/ocf root@mydrbd1:export OCF_RESKEY_device=/dev/drbd/by-res/r0.res root@mydrbd1:export OCF_RESKEY_directory=/service root@mydrbd1:./drbd

Re: [Linux-HA] resource migrate is not working

2011-09-22 Thread Dejan Muhamedagic
Hi, On Tue, Sep 20, 2011 at 07:27:32PM +0200, Willi Fehler wrote: Hi, I've two nodes with Pacemaker, Corosync, OpenAIS, DRBD, MySQL running on CentOS6. [root@linsrv001 ~]# crm configure show node linsrv001.willi-net.local node linsrv002.willi-net.local \ attributes standby=off

Re: [Linux-HA] [Pacemaker] Errors When Loading OCF

2011-09-22 Thread Nick Khamis
Yeah... I created the link manually. I built the RA from latest source, all other ocf agents are fine from what I can see... Makefile.am MAINTAINERCLEANFILES= Makefile.in aclocal.m4 configure DRF/config-h.in \ DRF/stamp-h.in libtool.m4 ltdl.m4 libltdl.tar SUBDIRS =

[Linux-HA] Simple Architecture Questions

2011-09-22 Thread Nick Khamis
Hello Everyone, We have almost setup a working prototype of what will be our production cluster. A few simple question I have are: i) We begin the installation by creating hacluster:haclient. How bad is it to proceed with the installation as user haclient of course sudo'ing the make install

Re: [Linux-HA] Simple Architecture Questions

2011-09-22 Thread mike
On 11-09-22 10:41 AM, Nick Khamis wrote: Hello Everyone, We have almost setup a working prototype of what will be our production cluster. A few simple question I have are: i) We begin the installation by creating hacluster:haclient. How bad is it to proceed with the installation as user

Re: [Linux-HA] Simple Architecture Questions

2011-09-22 Thread Nick Khamis
Hello Mike, Thank you so much for your response. You do not need to install cluster stack on real or backend servers just the nodes that are actually part of the cluster. This is the part that I am trying to make sure I absolutely understand. For example, we would like to setup HA on mysql1, and

Re: [Linux-HA] Antw: ocf:heartbeat:Raid1 starts wrong recovery

2011-09-22 Thread Dimitri Maziuk
On 9/22/2011 1:10 AM, Ulrich Windl wrote: Dimitri Maziukdmaz...@bmrb.wisc.edu schrieb am 21.09.2011 um 19:24 in (I wonder how many times you can re-add a physically failing drive before you get your data corrupted -- there must be a clever probabilistic formula for that ;) Would you please

[Linux-HA] ERROR: glib: Message too long

2011-09-22 Thread Claus Wimmer
Hello, I have tried to build up a four nodes cluster with heartbeat and pacemaker. Everything is alright as long as the cluster consists of 2 nodes. With 3 or 4 nodes suddenly error messages come up during a configuration change: Sep 06 08:16:56 secomat4 heartbeat: [15956]: ERROR: glib: Unable

Re: [Linux-HA] Simple Architecture Questions

2011-09-22 Thread mike
On 11-09-22 11:45 AM, Nick Khamis wrote: Hello Mike, Thank you so much for your response. You do not need to install cluster stack on real or backend servers just the nodes that are actually part of the cluster. This is the part that I am trying to make sure I absolutely understand. For

Re: [Linux-HA] Simple Architecture Questions

2011-09-22 Thread Nick Khamis
Got it! On my way Thanks Mike! Nick. On Thu, Sep 22, 2011 at 1:05 PM, mike mgbut...@nbnet.nb.ca wrote: On 11-09-22 11:45 AM, Nick Khamis wrote: Hello Mike, Thank you so much for your response. You do not need to install cluster stack on real or backend servers just the nodes that are

Re: [Linux-HA] Simple Architecture Questions

2011-09-22 Thread mike
On 11-09-22 02:45 PM, Nick Khamis wrote: Got it! On my way Thanks Mike! Nick. On Thu, Sep 22, 2011 at 1:05 PM, mikemgbut...@nbnet.nb.ca wrote: On 11-09-22 11:45 AM, Nick Khamis wrote: Hello Mike, Thank you so much for your response. You do not need to install cluster stack on real

Re: [Linux-HA] crm crashes after: TOTEM failed to receive

2011-09-22 Thread Vladislav Bogdanov
22.09.2011 13:58, Dejan Muhamedagic wrote: On Thu, Sep 22, 2011 at 12:45:19PM +0200, Sascha Hagedorn wrote: Hello Dejan, well, actually the nodes communicate over a virtual network device since they are virtual machines on a XEN host. No switches or hardware involved as far as I know.

Re: [Linux-HA] ERROR: glib: Message too long

2011-09-22 Thread Alexander Bodnarashik
Hi. Please see http://www.gossamer-threads.com/lists/linuxha/users/68406?do=post_view_threaded#68406 2011/9/22 Claus Wimmer cw...@web.de Hello, I have tried to build up a four nodes cluster with heartbeat and pacemaker. Everything is alright as long as the cluster consists of 2 nodes. With 3

Re: [Linux-HA] Resource fail and node fence

2011-09-22 Thread Andrew Beekhof
On Wed, Sep 21, 2011 at 4:59 PM, RaSca ra...@miamammausalinux.org wrote: Il giorno Mar 20 Set 2011 17:54:58 CEST, Dejan Muhamedagic ha scritto: [...] And I completely agree with this, but in an environment like mine, where a single resource failure might involve all the others (with fence) is

[Linux-HA] Invalid recurring action when trying to op start...

2011-09-22 Thread Nick Khamis
Hello Everyone, When trying to load the folloing RA configuration: node mydrbd1 \ attributes standby=off node mydrbd2 \ attributes standby=off primitive ip1 ocf:heartbeat:IPaddr2 \ params ip=192.168.2.5 cidr_netmask=24 \ nic=eth1 primitive drbd_mysql ocf:linbit:drbd

Re: [Linux-HA] What's wrong in my configuration for GFS2 under Pacemaker ?

2011-09-22 Thread Andrew Beekhof
On Sat, Sep 17, 2011 at 12:19 AM, alain.mou...@bull.net wrote: Hi , no nothing more with crm_mon -1r but I trace in Filesystem script, in fact I see that if we configure a clone for fsGS2 (Filesystem) it seems that when you ask to start a clone resource, Pacemaker at first call the

Re: [Linux-HA] What's wrong in my configuration for GFS2 under Pacemaker ?

2011-09-22 Thread Andrew Beekhof
On Mon, Sep 19, 2011 at 5:08 PM, alain.mou...@bull.net wrote: Hi, sorry to ask that , but is there a problem with my questions messages ? No, just busy working through a large backlog because I can't see my questions anymore in the digest emails (bounce) ... for example this one ... So I

Re: [Linux-HA] What's wrong in my configuration for GFS2 under Pacemaker ?

2011-09-22 Thread Andrew Beekhof
On Tue, Sep 20, 2011 at 12:40 AM, alain.mou...@bull.net wrote: Hi Ok it was due to the parameter globally-unique which is true by default, and that 's lead to the stop of the clone on both sides, because with globally-unique=true, X:0 on node1 is the same as X:0 on node2 (but different from

Re: [Linux-HA] Invalid recurring action when trying to op start...

2011-09-22 Thread Andrew Beekhof
On Fri, Sep 23, 2011 at 11:59 AM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, When trying to load the folloing RA configuration: node mydrbd1 \       attributes standby=off node mydrbd2 \       attributes standby=off primitive ip1 ocf:heartbeat:IPaddr2 \        params

Re: [Linux-HA] Prob with VirtualDomain RA, Res is active on two nodes

2011-09-22 Thread Andrew Beekhof
On Tue, Sep 20, 2011 at 7:37 PM, Uwe Weiss u.we...@netz-objekte.de wrote: Hello list I am using libvirt 0.94 pacemaker 1.1.5 corosync 1.3.0 kvm 0.15.0 openSuse 11.4 on a two node Cluster. The VMs are stored on glusterfs shared and replicated device.  It seems that there is no

Re: [Linux-HA] Problem with creating constraints

2011-09-22 Thread Andrew Beekhof
On Wed, Sep 21, 2011 at 7:15 PM, Uwe Weiss u.we...@netz-objekte.de wrote: Hello List, I have a problem in creating a constraint. I hope that someone could help me and give me a hint. I have three resources (A,B,C) and two cluster nodes (node0,node1). Resource A can run only on node0 and

Re: [Linux-HA] [DRBD-user] Invalid recurring action when trying to op start...

2011-09-22 Thread Nick Khamis
Hello Andrew. Thank you so much for you response, I originally got the idea from: http://www.gossamer-threads.com/lists/linuxha/pacemaker/65195. Before adding the mysql primitive, everything works fine (starting, connecting and mounting DRBD). After adding the mysql stuff, I run into problems.

Re: [Linux-HA] [DRBD-user] Invalid recurring action when trying to op start...

2011-09-22 Thread Andrew Beekhof
On Fri, Sep 23, 2011 at 12:44 PM, Nick Khamis sym...@gmail.com wrote: Hello Andrew. Thank you so much for you response, I originally got the idea from: http://www.gossamer-threads.com/lists/linuxha/pacemaker/65195. Before adding the mysql primitive, everything works fine (starting,

Re: [Linux-HA] two node cluster: clvm depending resources restart/stuck when failing node joins cluster

2011-09-22 Thread Andrew Beekhof
On Mon, Sep 5, 2011 at 6:38 PM, Oualid Nouri o.no...@computer-lan.de wrote: Hi to all, i have setup a drbd-based dual primary two node cluster with Pacemaker on opensuse 11.4  for testing. I have also setup drbd=controld=clvm=lvm=ocfs2 resources (all clones)   and a samba+IP resource

Re: [Linux-HA] Cluster corosync issues, crmd terminating

2011-09-22 Thread Andrew Beekhof
On Wed, Sep 21, 2011 at 3:43 AM, kevins7189 kevin.sm...@dtn.com wrote: Having an issue with my cluster testing.  I have a simple 2 node drbd/nfs/mysql cluster.  Working on configuring stonith (which is not working for me), but while testing failover scenarios, running into a issue where crmd

Re: [Linux-HA] Pacemaker : Pb on stop on a resource while the monitoring is performed

2011-09-22 Thread Andrew Beekhof
On Thu, Sep 1, 2011 at 10:00 PM, alain.mou...@bull.net wrote: Hi My release is : pacemaker-1.1.2-7 (on RHEL6) and I have checked that the patch : High: PE: Bug lf#2433 - No services should be stopped until probes finish is effectively integrated in this release. Nethertheless, it seems

Re: [Linux-HA] remove resource WITHOUT moving the other resources

2011-09-22 Thread Andrew Beekhof
On Sun, Aug 14, 2011 at 11:04 PM, Julian D. Seifert ala...@julian-seifert.de wrote: Hi, Thank you for your response, I have some follow-up questions. Now what I am looking for is a way to completely delete/remove openvzve_itv without affecting the other resources. is-managed-default=false