[Pacemaker] Suggestions for managing HA of containers from within a Pacemaker container?

2015-02-07 Thread Steven Dake (stdake)
Hi, I am working on Containerizing OpenStack in the Kolla project (http://launchpad.net/kolla). One of the key things we want to do over the next few months is add H/A support to our container tech. David Vossel had suggested using systemctl to monitor the containers themselves by running he

Re: [Pacemaker] Need to relax corosync due to backup of VM through snapshot

2013-11-24 Thread Steven Dake
On 11/21/2013 06:26 AM, Gianluca Cecchi wrote: On Thu, Nov 21, 2013 at 9:09 AM, Lars Marowsky-Bree wrote: On 2013-11-20T16:58:01, Gianluca Cecchi wrote: Based on docs I thought that the timeout should be token x token_retransmits_before_loss_const No, the comments in the corosync.conf.exa

Re: [Pacemaker] Building corosync from source on Angstrom

2013-05-31 Thread Steven Dake
On 05/31/2013 12:57 PM, Simon Platten wrote: Hi, I have been struggling to build corosync in Angstrom Linux on a beaglebone black which runs an ARM Cortex A8. I have been using this page as a guide: http://clusterlabs.org/wiki/SourceInstall So far I've downloaded and built libqb, no problems t

[Pacemaker] Need HA for OpenStack instances? Check out heat V5!

2012-08-01 Thread Steven Dake
Hi folks, A few developers from HA community have been hard at work on a project called heat which provides native HA for OpenStack virtual machines. Heat provides a template based system with API matching AWS CloudFormation semantics specifically for OpenStack. In v5, instance heatlhchecking has

Re: [Pacemaker] [corosync] Different Corosync Rings for Different Nodes in Same Cluster?

2012-07-08 Thread Steven Dake
loss, STONITH, etc)? > Apologies for delay - was on PTO. That is correct. Regards -steve > Thanks, > > Andrew > > ---- > *From: *"Steven Dake" > *To: *"The Pacemaker cluster resource

Re: [Pacemaker] Different Corosync Rings for Different Nodes in Same Cluster?

2012-06-29 Thread Steven Dake
On 06/29/2012 01:42 AM, Dan Frincu wrote: > Hi, > > On Thu, Jun 28, 2012 at 6:13 PM, Andrew Martin wrote: >> Hi Dan, >> >> Thanks for the help. If I configure the network as I described - ring 0 as >> the network all 3 nodes are on, ring 1 as the network only 2 of the nodes >> are on, and using "

[Pacemaker] If you want High Availability on OpenStack, check out Heat! (details inside)

2012-06-27 Thread Steven Dake
As some may know, Angus and I were working previously on a project called pacemaker-cloud, with the intention of adding high availbility to guests in cloud environments. We stopped developing that project in March 2012 and took our experiences to a new project called Heat. For more details of why

Re: [Pacemaker] [corosync] Unable to join cluster from a newly-installed centos 6.2 node

2012-03-02 Thread Steven Dake
On 03/02/2012 05:29 PM, Diego Lima wrote: > Hello, > > I've recently installed Corosync on two CentOS 6.2 machines. One is > working fine but on the other machine I've been unable to connect to > the cluster. On the logs I can see this whenever I start > corosync+pacemaker: > > Mar 2 21:33:16 no

Re: [Pacemaker] OCFS2 in Pacemaker, post Corosync 2.0

2012-03-01 Thread Steven Dake
On 03/01/2012 07:19 AM, Lars Marowsky-Bree wrote: > On 2012-03-01T09:52:29, Florian Haas wrote: > >> Future situation (Pacemaker with Corosync 2.x): >> - OpenAIS goes away, no CKPT service, ocfs2_controld.pcmk stops working; >> - cman goes away, ocfs2_controld.cman stops working. >> >> Is that su

Re: [Pacemaker] need cluster-wide variables

2012-01-11 Thread Steven Dake
On 12/21/2011 12:01 AM, Nirmala S wrote: > Hi, > > > > This is a followup on earlier thread > (http://www.gossamer-threads.com/lists/linuxha/pacemaker/76705). > > > > My situation is somewhat similar. I need to a cluster which contains 3 > kinds of nodes – master, preferred slave, slave. Pr

[Pacemaker] corosync mailing list address change

2011-10-20 Thread Steven Dake
Sending one last reminder that the Corosync mailing list has changed homes from the Linux Foundation's servers. I have been unable to obtain the previous subscriber list, so please resubscribe. http://lists.corosync.org/mailman/listinfo The list is called "discuss". Regards -steve

Re: [Pacemaker] Questions about reasonable cluster size...

2011-10-20 Thread Steven Dake
On 10/20/2011 07:42 AM, Alan Robertson wrote: > On 10/20/2011 03:11 AM, Proskurin Kirill wrote: >> On 10/20/2011 03:15 AM, Steven Dake wrote: >>> On 10/19/2011 01:50 PM, Alan Robertson wrote: >>>> Hi, >>>> >>>> I have an application where

Re: [Pacemaker] Questions about reasonable cluster size...

2011-10-19 Thread Steven Dake
On 10/19/2011 01:50 PM, Alan Robertson wrote: > Hi, > > I have an application where having a 12-node cluster with about 250 > resources would be desirable. > > Is this reasonable? Can Pacemaker+Corosync be expected to reliably > handle a cluster of this size? > > If not, what is the current rec

[Pacemaker] reminder - new corosync mailng list location

2011-10-10 Thread Steven Dake
A few weeks ago I posted we had moved the corosync mailing list from the linux foundation servers because they are down. Please join the corosync list if your interested in the cluster stack or corosync and ask your questions there. To join: http://lists.corosync.org/mailman/listinfo The list is

[Pacemaker] New Corosync Mailing list - Please register for it!

2011-09-20 Thread Steven Dake
Hi, Over the past several years, we have been sharing a mailing list with the openais project. I have made a new mailing list specifically for corosync: This will be the permanent new list for corosync. Please register at: http://lists.corosync.org/mailman/listinfo The list is called "discuss"

Re: [Pacemaker] Building a Corosync 1.4.1 RPM package for SLES11 SP1

2011-09-01 Thread Steven Dake
On 08/31/2011 11:39 PM, Sebastian Kaps wrote: > Hi, > > I'm trying to compile Corosync v1.4.1 from source[1] and create an RPM > x86_64 package for SLES11 SP1. > When running "make rpm" the build process complains about a broken > dependency for the nss-devel package. > The package is not installe

Re: [Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

2011-08-15 Thread Steven Dake
On 08/12/2011 03:19 AM, Vladislav Bogdanov wrote: > ... >>> I would really like someone that has these process pause problems to >>> test a patch I have posted to see if it rectifies the situation. Our >>> significant QE team at Red Hat doesn't see these problems and I can't >>> generate them in e

Re: [Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

2011-08-11 Thread Steven Dake
On 08/11/2011 03:05 AM, Sebastian Kaps wrote: > Hi, > > On 04.08.2011, at 18:21, Steven Dake wrote: > >>> Jul 31 03:51:02 node01 corosync[5870]: [TOTEM ] Process pause detected >>> for 11149 ms, flushing membership messages. >> >> This process pau

Re: [Pacemaker] Backup ring is marked faulty

2011-08-07 Thread Steven Dake
On 08/04/2011 02:04 PM, Sebastian Kaps wrote: > Hi Steven, > > On 04.08.2011, at 20:59, Steven Dake wrote: > >> meaning the corosync community doesn't investigate redundant ring issues >> prior to corosync versions 1.4.1. > > Sadly, we need to use the SLES

Re: [Pacemaker] Backup ring is marked faulty

2011-08-04 Thread Steven Dake
On 08/04/2011 11:43 AM, Sebastian Kaps wrote: > Hi Steven, > > On 04.08.2011, at 18:27, Steven Dake wrote: > >> redundant ring is only supported upstream in corosync 1.4.1 or later. > > What does "supported" mean in this context, exactly? > meaning the c

Re: [Pacemaker] Backup ring is marked faulty

2011-08-04 Thread Steven Dake
On 08/02/2011 11:53 PM, Sebastian Kaps wrote: > Hi Steven! > > On Tue, 02 Aug 2011 17:45:46 -0700, Steven Dake wrote: >> Which version of corosync? > > # corosync -v > Corosync Cluster Engine, version '1.3.1' > Copyright (c) 2006-2009 Red Hat, Inc. > &

Re: [Pacemaker] Live demo of Pacemaker Cloud on Fedora: Friday August 5th at 8am PST

2011-08-04 Thread Steven Dake
sion. Regards -steve > > ---- > *From:* Steven Dake > *To:* pcmk-cl...@oss.clusterlabs.org > *Cc:* aeolus-de...@lists.fedorahosted.org; Fedora Cloud SIG > ; "open...@lists.linux-foundation.org" > ; The Pacemaker cluster resource > manager > *Sent:* Wedne

Re: [Pacemaker] Backup ring is marked faulty

2011-08-04 Thread Steven Dake
_gui and hitting "refresh" inside the > GUI 3-5 times. After that ring 1 (10.2.2.0) will be marked as "faulty" again. > > Thanks and best regards, > -Martin Tegtmeier > > > > > -Ursprüngliche Nachricht- > Von: Sebastian Kaps [mailto:seb

Re: [Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

2011-08-04 Thread Steven Dake
On 08/04/2011 05:46 AM, Sebastian Kaps wrote: > Hello, > > here's another problem we're having: > > Jul 31 03:51:02 node01 corosync[5870]: [TOTEM ] Process pause detected > for 11149 ms, flushing membership messages. This process pause message indicates the scheduler doesn't schedule corosync f

[Pacemaker] Live demo of Pacemaker Cloud on Fedora: Friday August 5th at 8am PST

2011-08-03 Thread Steven Dake
Extending a general invitation to the high availability communities and other cloud community contributors to participate in a live demo I am giving on Friday August 5th 8am PST (GMT-7). Demo portion of session is 15 minutes and will be provided first followed by more details of our approach to hi

Re: [Pacemaker] Backup ring is marked faulty

2011-08-02 Thread Steven Dake
Which version of corosync? On 08/02/2011 07:35 AM, Sebastian Kaps wrote: > Hi, > > we're running a two-node cluster with redundant rings. > Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GB > interfaces that are bonded in > active-backup mode and routed through two independent switc

[Pacemaker] Announcing Pacemaker Cloud 0.4.1 - Available now for download!

2011-07-27 Thread Steven Dake
Angus and I announced a project to apply high availability best known practice to the field of cloud computing in late March 2011. We reuse the policy engine of Pacemaker. Our first tarball is available today containing a functional prototype demonstrating these best known practices. Today the s

Re: [Pacemaker] Sending message via cpg FAILED: (rc=12) Doesn't exist

2011-07-22 Thread Steven Dake
On 07/22/2011 01:15 AM, Proskurin Kirill wrote: > Hello all. > > > pacemaker-1.1.5 > corosync-1.4.0 > > 4 nodes in cluster. 3 online 1 not. > In logs: > > Jul 22 11:50:23 my106.example.com crmd: [28030]: info: > pcmk_quorum_notification: Membership 0: quorum retained (0) > Jul 22 11:50:23 my106

Re: [Pacemaker] corosync-quorumtool configuration

2011-06-21 Thread Steven Dake
On 06/20/2011 07:35 PM, Andrew Beekhof wrote: > I don't think this is legal: > > service { > > name: corosync_quorum > > ver: 0 > > name: pacemaker > > use_mgmtd: yes > > use_logd: yes > > } > > > and even if it were, corosync

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-07 Thread Steven Dake
d them to whoever you like ;) Regards -steve > 2011/6/3 Steven Dake : >> On 06/02/2011 08:16 PM, william felipe_welter wrote: >>> Well, >>> >>> Now with this patch, the pacemakerd process starts and up his other >>> process ( crmd, lrmd, pengine..

[Pacemaker] Updated pacemaker-cloud.org website

2011-06-06 Thread Steven Dake
Hi, I want to spend a moment to tell you about our new website at http://pacemaker-cloud.org. This website will serve as our information store and tarball repo location for the Pacemaker-Cloud project. The features page contains the feature set we plan to deliver. Please have a look and forward

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-03 Thread Steven Dake
is'. > Jun 02 23:12:21 xx attrd: [7992]: info: crm_cluster_connect: > Connecting to cluster infrastructure: classic openais (with plugin) > Jun 02 23:12:21 xx attrd: [7992]: info: > init_ais_connection_classic: Creating connection to our Corosync > plugin > Jun 02 23:1

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-02 Thread Steven Dake
Quicklists:17664 kB > NFS_Unstable: 0 kB > Bounce:0 kB > WritebackTmp: 0 kB > CommitLimit:22651424 kB > Committed_AS: 519368 kB > VmallocTotal: 1069547520 kB > VmallocUsed: 11064 kB > VmallocChunk:

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-01 Thread Steven Dake
UGETLBFS=n at boot to > kernel ?) > > 2011/6/1 Steven Dake mailto:sd...@redhat.com>> > > On 06/01/2011 01:05 AM, Steven Dake wrote: > > On 05/31/2011 09:44 PM, Angus Salkeld wrote: > >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter >

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-01 Thread Steven Dake
On 06/01/2011 01:05 AM, Steven Dake wrote: > On 05/31/2011 09:44 PM, Angus Salkeld wrote: >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter wrote: >>> Angus, >>> >>> I make some test program (based on the code coreipcc.c) and i now i sure

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-01 Thread Steven Dake
On 05/31/2011 09:44 PM, Angus Salkeld wrote: > On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter wrote: >> Angus, >> >> I make some test program (based on the code coreipcc.c) and i now i sure >> that are problems with the mmap systems call on sparc.. >> >> Source code of my test prog

Re: [Pacemaker] Linux HA on debian sparc

2011-05-31 Thread Steven Dake
Note. there are three signals you could possibly see that generate a core file. SIGABRT (assert() called in the codebase) SIGSEGV (segmentation violation) SIGBUS (alignment error) Make sure you don't have a sigbus. Opening the core file with gdb will tell you which signal triggered the fault.

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-05-31 Thread Steven Dake
Try running paceamaker using the MCP. The plugin mode of pacemaker never really worked very well because of complexities of posix mmap and fork. Not having sparc hardware personally, YMMV. We have recently with corosync 1.3.1 gone through an alignment fixing process for ARM arches - hope that so

Re: [Pacemaker] [Openais] Corosync goes into endless loop when same hostname is used on more than one node

2011-05-12 Thread Steven Dake
On 05/12/2011 07:04 AM, Dan Frincu wrote: > Hi, > > When using the same hostname on 2 nodes (debian squeeze, corosync > 1.3.0-3 from unstable) the following happens: > > May 12 08:36:27 debian cib: [3125]: info: cib_process_request: Operation > complete: op cib_sync for section 'all' (origin=loca

[Pacemaker] Pacemaker Cloud Policy Engine Red Hat Summit slides and Mailing List

2011-05-08 Thread Steven Dake
In February we announced our intentions to work on a cloud-specific high availability solution on this list. The code is coming along, and we have reached a point where we should have a mailing list dedicated to cloud specific topics of Pacemaker. The mailing list subscription page is: http://os

[Pacemaker] announcing the Pacemaker Cloud Policy Engine subproject

2011-03-01 Thread Steven Dake
Hi, I'd like to spend a moment to tell you about a new project myself and Angus Salkeld are working on. The project, called the Pacemaker Cloud Policy Engine, is a cloud-specific policy engine and will act as a sub-project of the Pacemaker project. We are doing a ground-up implementation of a cl

Re: [Pacemaker] Cluster Communication fails after VMWare Migration

2011-03-01 Thread Steven Dake
On 02/25/2011 12:40 AM, Andrew Beekhof wrote: > On Wed, Feb 23, 2011 at 10:31 AM, wrote: >> >> Have build a 2 node apache cluster on VMWare virtual machines, which was >> running as expected. We had to migrate the machines to another computing >> center and after that the cluster communication

Re: [Pacemaker] corosync crash

2011-03-01 Thread Steven Dake
On 02/25/2011 12:38 AM, Andrew Beekhof wrote: > This is the same one you sent to the openais list right? > Andrew, This was root caused to a faulty network setup resulting in the failed to receive abort we are working on currently. One key detail missing from this thread is the implementation w

Re: [Pacemaker] Article on HA in the IBM cloud using Pacemaker and Heartbeat

2011-01-28 Thread Steven Dake
On 01/28/2011 08:02 AM, Alan Robertson wrote: > Hi, > > I recently co-authored an article on HA in the IBM cloud using Pacemaker > and Heartbeat. > > http://www.ibm.com/developerworks/cloud/library/cl-highavailabilitycloud/ > > The cool thing is that the IBM cloud supports virtual IPs. With mo

Re: [Pacemaker] pacemaker + corosync in the cloud

2010-12-15 Thread Steven Dake
On 12/14/2010 05:14 PM, ruslan usifov wrote: > Hi > > Is it possible to use pacemaker based on corosync in the cloud hosting > like amazon or soflayer? > > > yes with corosync 1.3.0 in udpu mode. The udpu mode avoids the use of multicast allowing operation in amazon's cloud. Regards -steve

Re: [Pacemaker] UDPU transport patch added, when will the RPMs be available

2010-11-22 Thread Steven Dake
On 11/22/2010 09:27 AM, Dan Frincu wrote: > Hi Steven, > > Steven Dake wrote: >> On 11/19/2010 11:42 AM, Andrew Beekhof wrote: >> >>> On Fri, Nov 19, 2010 at 11:38 AM, Dan Frincu wrote: >>> >>>> Hi, >>>> >>>>

Re: [Pacemaker] service corosync start failed

2010-11-22 Thread Steven Dake
On 11/22/2010 01:27 AM, jiaju liu wrote: > Hi all > If I use command like this > service corosync start > it shows > Starting Corosync Cluster Engine (corosync): [FAILED] > > and I do nothing just reboot my computer it will be OK what is the > reason

Re: [Pacemaker] UDPU transport patch added, when will the RPMs be available

2010-11-19 Thread Steven Dake
On 11/19/2010 11:42 AM, Andrew Beekhof wrote: > On Fri, Nov 19, 2010 at 11:38 AM, Dan Frincu wrote: >> Hi, >> >> The subject is pretty self-explanatory but I'll ask anyway, the patch for >> UDPU has been released, this adds the ability to set unicast peer addresses >> of nodes in a cluster, in net

Re: [Pacemaker] Corosync using unicast instead of multicast

2010-11-08 Thread Steven Dake
On 11/08/2010 05:50 AM, Dan Frincu wrote: Hi, Steven Dake wrote: On 11/05/2010 01:30 AM, Dan Frincu wrote: Hi, Alan Jones wrote: This question should be on the openais list, however, I happen to know the answer. To get up and running quickly you can configure broadcast with the version you

Re: [Pacemaker] Corosync using unicast instead of multicast

2010-11-05 Thread Steven Dake
ed as to what Steven Dake said on the openais mailing list about using broadcast "Broadcast and redundant ring probably don't work to well together.". I've also done some testing and saw that the broadcast address used is 255.255.255.255, regardless of what the bindnetaddr netw

Re: [Pacemaker] Fail over algorithm used by Pacemaker

2010-10-04 Thread Steven Dake
On 10/03/2010 07:01 AM, hudan studiawan wrote: Hi, I want to start to contribute to Pacemaker project. I start to read Documentation and try some basic configurations. I have a question: what kind of algorithm used by Pacemaker to choose another node when a node die in a cluster? Is there any ma

Re: [Pacemaker] Corosync node detection working too good

2010-10-04 Thread Steven Dake
On 10/04/2010 02:04 AM, Stephan-Frank Henry wrote: Hello all, still working on my nodes and although the last problem is not officially solved (I hard coded certain versions of the packages and that seems to be ok now) I have a different interesting feature I need to handle. I am setting up m

Re: [Pacemaker] Timeout after nodejoin

2010-09-22 Thread Steven Dake
On 09/22/2010 05:43 AM, Dan Frincu wrote: Hi all, I have the following packages: # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)" openais-0.80.5-15.2 cluster-glue-1.0-12.2 pacemaker-1.0.5-4.2 cluster-glue-libs-1.0-12.2 resource-agents-1.0-31.5 pacemaker-libs-1.0.5-4.2 pacema

Re: [Pacemaker] Connection to our AIS plugin (9) failed: Library error

2010-09-22 Thread Steven Dake
On 09/22/2010 04:02 AM, Szymon Hersztek wrote: Wiadomość napisana w dniu 2010-09-22, o godz. 10:26, przez Andrew Beekhof: 2010/9/21 Szymon Hersztek : Wiadomość napisana w dniu 2010-09-21, o godz. 09:08, przez Andrew Beekhof: 2010/9/21 Szymon Hersztek : Wiadomość napisana w dniu 2010-09-2

Re: [Pacemaker] MCP init script to 21/79?

2010-09-03 Thread Steven Dake
On 09/03/2010 09:56 AM, Vladislav Bogdanov wrote: 03.09.2010 19:34, Steven Dake wrote: Nope, they are in a natural order for both start and stop sequences. So lower number means 'do start or stop earlier'. grep '# chkconfig' /etc/init.d/* Ok, thanks. Changed to 10

[Pacemaker] MCP init script to 21/79?

2010-09-03 Thread Steven Dake
On 08/24/2010 11:06 PM, Andrew Beekhof wrote: On Wed, Aug 25, 2010 at 8:02 AM, Vladislav Bogdanov wrote: 25.08.2010 08:56, Andrew Beekhof wrote: On Wed, Aug 25, 2010 at 7:39 AM, Vladislav Bogdanov wrote: Hi all, pacemaker has # chkconfig - 90 90 in its MCP initscript. Shouldn't it be cor

Re: [Pacemaker] Corosync + Pacemaker New Install: Corosync Fails Without Error Message

2010-06-22 Thread Steven Dake
On 06/18/2010 09:42 AM, Eliot Gable wrote: I don’t have an “aisexec” section at all. I simply copied the sample file, which did not have one. I did figure out why it wasn’t logging. It was set to AMF mode and ‘mode’ was ‘disabled’ in the AMF configuration section. After changing that to ‘enabled

Re: [Pacemaker] use_logd or use_mgmtd kills corosync

2010-06-08 Thread Steven Dake
On 06/08/2010 11:20 PM, Andrew Beekhof wrote: On Wed, Jun 9, 2010 at 7:27 AM, Devin Reade wrote: I was following the instructions for a new installation of corosync and was wanting to make use of hb_gui so, following an installation via yum per the docs, built Pacemaker-Python-GUI-pacemaker-mgm

[Pacemaker] handle EINTR in sem_wait (pacemaker & corosync 1.2.2+ crash)

2010-06-01 Thread Steven Dake
Hello, I have found the cause of the crash that was occurring only on some deployments. The cause is that sem_wait is interrupted by signal, and the wait operation is not retried (as is customary in posix). Patch attached to fix A big thank you to Vladislav Bogdanov for running the test cas

Re: [Pacemaker] corosync/openais fails to start

2010-05-27 Thread Steven Dake
On 05/27/2010 10:20 AM, Gianluca Cecchi wrote: On Thu, May 27, 2010 at 5:50 PM, Steven Dake mailto:sd...@redhat.com>> wrote: On 05/27/2010 08:40 AM, Diego Remolina wrote: Is there any workaround for this? Perhaps a slightly older version of the rpms? If so wh

Re: [Pacemaker] corosync/openais fails to start

2010-05-27 Thread Steven Dake
so I am stuck with a non-functioning cluster. Diego Steven Dake wrote: This is a known issue on some platforms, although the exact cause is unknown. I have tried RHEL 5.5 as well as CentOS 5.5 with clusterrepo rpms and been unable to reproduce. I'll keep looking. Regards -steve On 05/27/2010 06:07

Re: [Pacemaker] corosync/openais fails to start

2010-05-27 Thread Steven Dake
This is a known issue on some platforms, although the exact cause is unknown. I have tried RHEL 5.5 as well as CentOS 5.5 with clusterrepo rpms and been unable to reproduce. I'll keep looking. Regards -steve On 05/27/2010 06:07 AM, Diego Remolina wrote: Hi, I was running the old rpms from

Re: [Pacemaker] Redundant rings vs one bond based ring

2010-05-18 Thread Steven Dake
On Tue, 2010-05-18 at 23:16 +0200, Gianluca Cecchi wrote: > Hello, > based on pacemaker 1.0.8 + corosync 1.2.2, having two network > interfaces to dedicate to cluster communication, what is better/safer > at this moment: > bonding > > a) only one corosync ring on top of a bond interface > b) two

Re: [Pacemaker] Being fenced node is killed again and again even the connection is recovered!

2010-05-14 Thread Steven Dake
ifconfig eth0 down is not a valid test case. that will likely lead to bad things happening. I recommend using iptables to test the software. Also Corosync 1.2.2 is out which fixes bugs vs corosync 1.2.0. Regards -steve On Fri, 2010-05-14 at 18:02 +0800, Javen Wu wrote: > I forget mention the v

Re: [Pacemaker] Corosync crashes when cluster NIC disabled (Something strange happened)

2010-03-31 Thread Steven Dake
On Wed, 2010-03-31 at 16:07 -0400, Simpson, John R wrote: > Greetings all, > > I have a lab cluster using Pacemaker 1.0.8 and Corosync 1.2.0-1 > (see packages below) on CentOS 5.4 (32-bit) VM's running under > VMware ESXi 3.5. My location constraints and connectivity > tests were working well, so

Re: [Pacemaker] Dropping HeartBeat Stack?

2010-03-04 Thread Steven Dake
On Thu, 2010-03-04 at 21:29 +0100, Dennis J. wrote: > On 03/04/2010 03:37 PM, Andrew Beekhof wrote: > > On Thu, Mar 4, 2010 at 2:54 PM, Dennis J. wrote: > > > >> Pacemaker pulls in hearbeat and corosync as dependency. This is what > >> happens > >> on a freshly install centos 5.4 VM: > > > > Ah,

Re: [Pacemaker] High load issues

2010-02-04 Thread Steven Dake
On Thu, 2010-02-04 at 16:09 +0100, Dominik Klein wrote: > Hi people, > > I'll take the risk of annoying you, but I really think this should not > be forgotten. > > If there is high load on a node, the cluster seems to have problems > recovering from that. I'd expect the cluster to recognize that

[Pacemaker] thread safety problem with pacemaker and corosync integration

2010-02-03 Thread Steven Dake
For some time people have reported segfaults on startup when using pacemaker as a plugin to corosync related to tzset in the stack trace. I believe we had fixed this by removing the thread-unsafe usage of localtime and strftime calls in the code base of corosync in 1.2.0. Via further investigation

Re: [Pacemaker] mcast vs broadcast

2010-01-18 Thread Steven Dake
On Mon, 2010-01-18 at 11:25 -0500, Shravan Mishra wrote: > Hi all, > > > > Following is my corosync.conf. > > Even though broadcast is enabled I see "mcasted" messages like these > in corosync.log. > > Is it ok? even when the broadcast is on and not mcast. > Yes you are using broadcast and

Re: [Pacemaker] errors in corosync.log

2010-01-18 Thread Steven Dake
One possibility is you have a different cluster in your network on the same multicast address and port. Regards -steve On Sat, 2010-01-16 at 15:20 -0500, Shravan Mishra wrote: > Hi Guys, > > I'm running the following version of pacemaker and corosync > corosync=1.1.1-1-2 > pacemaker=1.0.9-2-1 >

Re: [Pacemaker] Pacemaker/OpenAIS Software for openSuSE 11.2

2010-01-12 Thread Steven Dake
> > d) If I would try to compile from source as described at > > http://www.clusterlabs.org/wiki/Install#First_Steps > > one step is to get openais. Why are all the relevant > > prebuild library packages called corosync? > > I don't understand the distinction between openais and corosync > read

Re: [Pacemaker] openais/corosync

2010-01-11 Thread Steven Dake
On Mon, 2010-01-11 at 21:00 +0100, Andreas Mock wrote: > > -Ursprüngliche Nachricht- > > Von: "Steven Dake" > > Gesendet: 11.01.10 20:13:39 > > An: pacema...@clusterlabs.org > > Betreff: Re: [Pacemaker] openais/corosync > > > >

Re: [Pacemaker] openais/corosync

2010-01-11 Thread Steven Dake
On Mon, 2010-01-11 at 19:59 +0100, Andreas Mock wrote: > Hi all, > > I don't understand the distinction between > openais and corosync. The prebuild packages are > named after corosync while the documentation > always talk about openais. > See reasoning here: http://www.corosync.org/doku.php?id=

Re: [Pacemaker] coroync not able to exec services properly

2010-01-02 Thread Steven Dake
If your using corosync 1.2.0, we enforced a constraint on consensus and token such that consensus must be 1.2* token. Your consensus is 1/2 token which will cause corosync to exit at start. Regards -steve On Mon, 2009-12-28 at 12:58 +0100, Dejan Muhamedagic wrote: > Hi, > > On Thu, Dec 24, 2009

Re: [Pacemaker] corosync init script broken

2010-01-02 Thread Steven Dake
Hopefully all of these init script problems have been fixed in 1.2.0 by Fabio and Andrew and should be in a repo available for you soon. Regards -steve On Mon, 2009-12-28 at 13:22 +0100, Dominik Klein wrote: > Hi cluster people > > been a while, couldn't really follow things. Today I was tasked

Re: [Pacemaker] parse error in config: The consensus timeout parameter (1200 ms) must be atleast 1.2 * token (1200 ms)

2010-01-02 Thread Steven Dake
On Mon, 2009-12-28 at 19:05 -0500, Daniel Qian wrote: > I am using Corosync 1.2.0 that comes with Fedora 12 and have this error when > trying to start corosync service. If I set it to anything above 1200 the > error goes away. Is this a bug or something intended for? > > Thanks, > Daniel > We

Re: [Pacemaker] Fedora 12 repository

2009-12-20 Thread Steven Dake
Pacemaker is integrated directly in the fedora repo instead of externally. You can grab it using yum install pacemaker. Regards -steve On Sun, 2009-12-20 at 11:46 -0500, E-Blokos wrote: > Hi, > > is there any yum repository for Fedora 12 ? > I checked http://download.opensuse.org/repositories/s

Re: [Pacemaker] Openais: Corosync Executive couldn't openconfiguration component 'openaisserviceenable'

2009-12-03 Thread Steven Dake
Please note, we are aware of this bug upstream. A fix is in the upstream repo, and will be making a new release which resolves the init script problem hopefully this week. What Frank suggests is essentially what we have done in the repo. Regards -steve On Thu, 2009-12-03 at 12:31 -0500, Frank D

Re: [Pacemaker] Node crash when 'ifdown eth0'

2009-11-30 Thread Steven Dake
On Mon, 2009-11-30 at 17:05 -0700, hj lee wrote: > > > On Fri, Nov 27, 2009 at 3:05 PM, Steven Dake wrote: > On Fri, 2009-11-27 at 11:32 -0200, Mark Horton wrote: > > I'm using pacemaker 1.0.6 and corosync 1.1.2 (not using > openais) with &g

Re: [Pacemaker] Node crash when 'ifdown eth0'

2009-11-27 Thread Steven Dake
On Fri, 2009-11-27 at 11:32 -0200, Mark Horton wrote: > I'm using pacemaker 1.0.6 and corosync 1.1.2 (not using openais) with > centos 5.4. The packages are from here: > http://www.clusterlabs.org/rpm/epel-5/ > > Mark > > On Fri, Nov 27, 2009 at 9:01 AM, Oscar Remí­rez de Ganuza Satrústegui > w

Re: [Pacemaker] pacemaker-1.0.6 + corosync-1.1.2 crashing - SOLVED

2009-11-21 Thread Steven Dake
On Sat, 2009-11-21 at 20:00 +0100, Nikola Ciprich wrote: > Hi Guys, > Finally I've found where the problem was! On my testing machines, > the system was lacking separate /dev/shm tmpfs mount. While the /dev > directory is also mounted as tmpfs, so it seemingly doesn't make any > difference, there I

Re: [Pacemaker] **** SPAM **** Re: pacemaker-1.0.6 + corosync 1.1.2 crashing

2009-11-20 Thread Steven Dake
Nik, Any chance you have a backtrace of the core files? That might be helpful in pinpointing the issue. To do this run gdb binaryname corefilename gdb> bt Regards -steve On Thu, 2009-11-19 at 17:50 +0100, Nikola Ciprich wrote: > Hi Andrew, > sorry to bother again, do You have some idea what el

Re: [Pacemaker] Resource capacity limit

2009-11-12 Thread Steven Dake
On Thu, 2009-11-12 at 14:53 +0100, Andrew Beekhof wrote: > On Wed, Nov 11, 2009 at 1:36 PM, Lars Marowsky-Bree wrote: > > On 2009-11-05T14:45:36, Andrew Beekhof wrote: > > > >> Lastly, I would really like to defer this for 1.2 > >> I know I've bent the rules a bit for 1.0 in the past, but its rea

Re: [Pacemaker] pacemaker-1.0.6 + corosync 1.1.2 crashing

2009-11-10 Thread Steven Dake
Nikola, yet another possibility is your box doesn't have any/enough shared memory available. Usually this is in the directory /dev/shm. Unfortunately bad things happen and error handling around this condition needs some work. Its hard to tell because the signal delivered to the application on fa

Re: [Pacemaker] pacemaker-1.0.6 + corosync 1.1.2 crashing

2009-11-10 Thread Steven Dake
One possibility is selinux is enabled and your selinux policies are out dated. Another possibility is you have improper coroipcc libraries (duplicates) installed on your system. Check your installed lib dir for coroipcc.so.4 and 4.0.0 and coroipcc.so. They should all link to the same file. Anot

Re: [Pacemaker] [ANNOUNCEMENT] Debian Packages for Pacemaker 1.0.6, completely revamped

2009-11-04 Thread Steven Dake
On Thu, 2009-11-05 at 00:06 +0100, Colin wrote: > On Wed, Nov 4, 2009 at 5:47 PM, Andrew Beekhof wrote: > > > > Hopelessly out of date? > > Corosync has been supported for all of 3 days now. > > Sorry, it seems that I jumped to a wrong conclusion (namely that with > Corosync being a part of OpenA

Re: [Pacemaker] [ANNOUNCEMENT] Debian Packages for Pacemaker 1.0.6, completely revamped

2009-11-03 Thread Steven Dake
On Wed, 2009-11-04 at 09:35 +0800, Romain CHANU wrote: > Hi Martin, > > Could you tell us what's the rationale to remove openais and include > corosync? > > Would it mean that people should use corosync from now on for any HA > development? > > Best Regards, > > Romain Chanu > Just a short no

Re: [Pacemaker] [ANNOUNCEMENT] Debian Packages for Pacemaker 1.0.6, completely revamped

2009-11-03 Thread Steven Dake
On Wed, 2009-11-04 at 11:41 +1000, Luke Bigum wrote: > The OpenAIS project has split into Corosync and OpenAIS. Someone else > might be able to explain it better, but Corosync now contains the core > clustering components the openais package used to have (aisexec, etc), > while the OpenAIS project

Re: [Pacemaker] How to configure the openais.conf

2009-11-03 Thread Steven Dake
We have found through testing the practical limit on cluster size with the protocol used in openais is ~30 nodes currently. The default parameters should work well for these sizes. Regards -steve On Wed, 2009-11-04 at 08:57 +0800, lepace wrote: > > Hi,all > I want to configure a HA cluster whic

Re: [Pacemaker] Pacemaker cluster: OpenAis communication channels

2009-10-22 Thread Steven Dake
Regards -steve > > On 10/22/2009 06:14 AM, Steven Dake wrote: > > You can run with one NIC (and switch) but then your NIC and switch > > become a SPOF (single point of failure). Vehicles have a spare tire for > > a reason :) If a NIC fails it may be ok to switch a ser

Re: [Pacemaker] Pacemaker cluster: OpenAis communication channels

2009-10-21 Thread Steven Dake
You can run with one NIC (and switch) but then your NIC and switch become a SPOF (single point of failure). Vehicles have a spare tire for a reason :) If a NIC fails it may be ok to switch a service to a different node. If a switch fails, The entire cluster becomes disabled until the switch retu

Re: [Pacemaker] pacemaker unable to start

2009-10-21 Thread Steven Dake
In place of openaisparser I also tried corosyncparse and > corosyncparser but to no avail. > > -sincerely > Shravan > > On Wed, Oct 21, 2009 at 11:49 AM, Steven Dake wrote: > > I recommend using corosync 1.1.1 - several bug fixes one critical for > > proper pacemak

Re: [Pacemaker] pacemaker unable to start

2009-10-21 Thread Steven Dake
I recommend using corosync 1.1.1 - several bug fixes one critical for proper pacemaker operation. It won't fix this particular problem however. Corosync loads pacemaker by searching for a pacemaker lcrso file. These files are default installed in /usr/libexec/lcrso but may be in a different loca

Re: [Pacemaker] Why are fatal warnings enabled by default?

2009-10-21 Thread Steven Dake
IMO enabling fatal warnings by default when having a bunch of dependent header files, then enabling most warnings in the warn list is problematic. Header files are consistently broken upstream wrt const correctness, typing, etc. With fatal warnings enabled its really difficult to get a clean comp

Re: [Pacemaker] corosync doesn't stop all services

2009-10-21 Thread Steven Dake
We had to change both pacemaker and corosync for this problem. I suspect you don't have the updated pacemaker. Regards -steve On Wed, 2009-10-21 at 15:11 +0200, Michael Schwartzkopff wrote: > Hi, > > perhaps this is the wrong list but anyway: > > I have corosync-1.1.1 and pacemaker-1.0.5 on de

Re: [Pacemaker] Failed in restart of Corosync.

2009-10-18 Thread Steven Dake
This bug is reported and we are working on a solution. Regards -steve On Mon, 2009-10-19 at 11:05 +0900, renayama19661...@ybb.ne.jp wrote: > Hi, > > I understand that a combination is not official in Corosync and Pacemaker. > However, I contributed it because I thought that it was important that

Re: [Pacemaker] fedora11: openais fails to start

2009-10-09 Thread Steven Dake
You could try the f12 rpms - we have tested these. We are in the process of making these available in f11/f10, but there is a bit of a lag because of fedora process. The f12 rpms are at koji.fedoraproject.org. >From looking at your logs, it appears iptables is enabled and not configured properly

Re: [Pacemaker] Openais log and cpu occupation

2009-10-04 Thread Steven Dake
The openais logging service will use the syslog LOG_NOTICE level. To filter in syslog, syslog must be appropriately configured. The to_file directive is really meant for debugging and not deployment. There is no way to set filter (except to filter debug) levels on to_file directives. I recommend

Re: [Pacemaker] A problem to fail in a stop of Pacemaker.

2009-09-29 Thread Steven Dake
On Wed, 2009-09-30 at 09:51 +0900, renayama19661...@ybb.ne.jp wrote: > Hi Remi, > > > It appears that this is a similar problem to the one that I reported, > > yes. It appears to not be a bug in Corosync, but rather one in > > Pacemaker. This bug has been filed in Red Hat Bugzilla, see it at:

Re: [Pacemaker] Can OpenAISs components work with Pacemaker? Whats therelationship between Pacemaker and SA forum?

2009-09-27 Thread Steven Dake
On Sun, 2009-09-27 at 11:11 +0800, xin.li...@cs2c.com.cn wrote: > HI,everyone ;-) > > I'm pretty new in the HA world, and I'm from China. > > When I'd download the code of OpenAIS(Whitetank), I found some > components in it. Such as AMF,CKPT,CLM,MSG. > > Can these components work simultane

  1   2   >