Re: [Pacemaker] The main road of the cluster stack evolution

2013-06-10 Thread Lars Marowsky-Bree
On 2013-06-10T19:26:20, Халезов Иван wrote: > 1) The RedHat company is planning to drop corosync support and wants to > switch to CMAN. ( > http://www.gossamer-threads.com/lists/linuxha/pacemaker/84662 ) To the best of my understanding, this is not correct. Red Hat will continue to support coros

Re: [Pacemaker] What kind of cluster stack at opensuse-repositories

2013-06-10 Thread Lars Marowsky-Bree
On 2013-06-10T19:25:38, Andreas Mock wrote: > Am I right that these a packages for a RHEL 6.x system but in a > corosync-pacemaker-fashion like SuSE uses it over years now? Yes. Those packages are scheduled for an update to latest upstream versions as soon as we wrap up our current project, but

Re: [Pacemaker] pacemaker monitoring user permision denied

2013-06-10 Thread Lars Marowsky-Bree
On 2013-06-10T18:22:37, Wolfgang Routschka wrote: > After reading Documentation (http://clusterlabs.org/doc/acls.html) I found > "All user accounts must be in the haclient group." but all users in haclient > group have full access "Note that the root and hacluster users will always > have ful

Re: [Pacemaker] What kind of cluster stack at opensuse-repositories

2013-06-10 Thread Lars Marowsky-Bree
On 2013-06-10T22:53:54, Andreas Mock wrote: > Hi Lars, > > thank you for answering. Could you tell me whether the stack > is like Option1 or Option3 of this article > http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/ > > If it's Option1 when do you think SuSE switches to

Re: [Pacemaker] how do i remove a resource correct ?

2013-06-11 Thread Lars Marowsky-Bree
On 2013-06-11T13:39:42, andreas graeper wrote: > # removing > cibadmin --delete --xml-text '' > cibadmin --delete --xml-text '' Why not crm configure delete? > i.e. ocf:heartbeat:symlink > (after installing 'failed actions: .. not installed' on passive node) > when i remove the symlink-resourc

Re: [Pacemaker] how do i remove a resource correct ?

2013-06-11 Thread Lars Marowsky-Bree
On 2013-06-11T14:21:52, andreas graeper wrote: > hi and thanks. > > this is what i found : > crm_resource --name p_xxx --set-parameter target-role --meta > --parameter-value Stopped You can also use "crm resource stop p_xxx" here. > crm configure delete o_xxx## order > crm configure delete

Re: [Pacemaker] Help with development

2013-06-12 Thread Lars Marowsky-Bree
On 2013-06-12T09:09:31, Michael Schwartzkopff wrote: > Especially I would like to find out how many nodes are in a cluster and how > many nodes are online. Perhaps somebody could post a code snipplet here. I think you might find the crm_mon.c source code useful for this. Regards, Lars --

Re: [Pacemaker] Announce: Making Resource Utilization Dynamic

2013-06-12 Thread Lars Marowsky-Bree
On 2013-06-05T20:44:56, Michael Schwartzkopff wrote: Hi Michael, yes, the idea to make utilization more dynamic was something Andrew and I looked into ages ago. Especially, there's still the open issue that it somewhat sucks that one has to configure them at all. It'd be nice if monitor_0 would

Re: [Pacemaker] Help with development

2013-06-12 Thread Lars Marowsky-Bree
On 2013-06-12T12:03:46, Michael Schwartzkopff wrote: > > I think you might find the crm_mon.c source code useful for this. > Thanks. That helps a lot. It's where I stole the bits I needed for SBD from ;-) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jenn

Re: [Pacemaker] Why Pacemaker 1.1.8 starts Apache service?

2013-06-12 Thread Lars Marowsky-Bree
On 2013-06-12T17:08:22, Michael Furman wrote: > pcs > resource create WebSite ocf:heartbeat:apache params > configfile=/etc/httpd/conf/httpd.conf op monitor interval=20s > > Then, to > test the failover we stop Apache using the following command: > > service httpd stop > > > We see that Apach

Re: [Pacemaker] Announce: Making Resource Utilization Dynamic

2013-06-12 Thread Lars Marowsky-Bree
On 2013-06-12T11:31:43, Michael Schwartzkopff wrote: > > Pacemaker is not necessarily the best tool to implement quick reaction > > to changing load, though. The utilization feature is concerned with > > *correctness* first - namely, don't overcommit resources severely, e.g., > > the case of Xen/

Re: [Pacemaker] Two resource nodes + one quorum node

2013-06-12 Thread Lars Marowsky-Bree
On 2013-06-12T11:57:05, Digimer wrote: > I build exclusively two-node clusters, and the biggest draw-back is the > possibility of a "fence loop". That is, without quorum and with a network > error, a node can come up on it's own, fail to contact it's peer and fence > it. When the fenced node boot

Re: [Pacemaker] Two resource nodes + one quorum node

2013-06-13 Thread Lars Marowsky-Bree
On 2013-06-13T07:45:09, Andrew Beekhof wrote: > Its certainly possible to build a decent 2-node cluster, but there are > several non-obvious steps that are required - preventing fencing loops being > one. Given that 2-node clusters are probably still the 90%+ majority, I wonder if we shouldn't

Re: [Pacemaker] pacemaker monitoring user permision denied

2013-06-13 Thread Lars Marowsky-Bree
On 2013-06-13T07:48:46, Andrew Beekhof wrote: > > In my opinion the user doesn´t have any rights although the user is in > > haclient group and having no role/user configuration. Is it right? > No. Users in the haclient group have full access. Thats what it is for. Not with ACLs enabled: http

Re: [Pacemaker] Two resource nodes + one quorum node

2013-06-13 Thread Lars Marowsky-Bree
On 2013-06-13T16:24:25, Andrey Groshev wrote: > > It doesn't have to be able to run services, it only needs to contribute to > > quorum. > That is, there is no way to switch the node into standby mode from a > pacemaker script/config? Sure there is: # crm node standby nodename The node will

Re: [Pacemaker] Full API description for Fence Agent

2013-06-13 Thread Lars Marowsky-Bree
On 2013-06-11T09:34:10, Digimer wrote: > If you have any trouble, please don't hesitate to ask here and we will do > our best to help. I wonder what the perspective is on standardizing on one fencing API. Both the "old" cluster-glue one and this one are remarkably similar (except for passing i

Re: [Pacemaker] Two resource nodes + one quorum node

2013-06-13 Thread Lars Marowsky-Bree
On 2013-06-13T22:12:23, Andrey Groshev wrote: > >>>  It doesn't have to be able to run services, it only needs to > >>> contribute to quorum. > >>  That is, there is no way to switch the node into standby mode from > >> a pacemaker script/config? > > Sure there is: > > > > # crm node standby node

Re: [Pacemaker] Full API description for Fence Agent

2013-06-14 Thread Lars Marowsky-Bree
On 2013-06-14T10:50:21, Andrew Beekhof wrote: > If I had my way, they'd > - have env variables the same as OCF > - be executable from the command line like the RH ones (I'm not sure I get the second point. Even the "old" cluster-glue scripts were executable from the commandline?) > - avoid the

Re: [Pacemaker] Two resource nodes + one quorum node

2013-06-14 Thread Lars Marowsky-Bree
On 2013-06-14T08:47:29, Andrey Groshev wrote: > ~ 200 VM hosts. > Suppose if a cloud die, I'll have half a day to work in the role of RC script. If you have 200 VM hosts, one would hope you don't need to configure 100 2-node pairs; that's a perfect scenario where you should easily be able to con

Re: [Pacemaker] Full API description for Fence Agent

2013-06-15 Thread Lars Marowsky-Bree
On 2013-06-15T08:26:33, Andrew Beekhof wrote: > > (I'm not sure I get the second point. Even the "old" cluster-glue > > scripts were executable from the commandline?) > The RH ones are executable directly, not relying on separate command line > utils. Well, you can call the cluster-glue ones di

Re: [Pacemaker] Where can I find descriptions of all Pacemaker 1.1.8 resource agents?

2013-06-16 Thread Lars Marowsky-Bree
On 2013-06-16T13:30:03, Digimer wrote: > All resource and fence agents are supposed to print out their usage details > (in an XML validation format) when you pass ' meta-data' or > ' -o metadata'. > > It's not ideal, but it is useful. > > Perhaps I should start documenting these... https://www

Re: [Pacemaker] Two resource nodes + one quorum node

2013-06-19 Thread Lars Marowsky-Bree
On 2013-06-19T10:13:19, Andrey Groshev wrote: > I started experimenting. > Received the first incomprehensible situation: > There are three nodes. One of the quorum exists only, i.e. without a specific > pacemaker. What do you mean by "without a specific pacemaker"? What does your cluster conf

Re: [Pacemaker] Errors compiling PM 1.1.10 RC3

2013-06-19 Thread Lars Marowsky-Bree
On 2013-06-18T13:47:57, Nikita Michalko wrote: > I tried build/compile the last version of pacemeker from sources > (http://blog.clusterlabs.org/blog/2013/release-candidate-1-dot-1-10-rc3/) > on SLES11/SP2 (kernel 3.0.58-0.6.2-default) with libqb-0.14.4 as follows: > ./configure --prefix=/usr --

Re: [Pacemaker] crm ptest does not show graphics

2013-06-20 Thread Lars Marowsky-Bree
On 2013-06-20T14:53:07, Florian Crouzat wrote: > >Am Donnerstag, 20. Juni 2013, 13:43:10 schrieb Florian Crouzat: > > >My original question was about crm. > > It was about an old tool that doesn't work in your environment. The crm "ptest" command calls crm_simulate in the background. The crm

Re: [Pacemaker] known problem with corosync 1.4.1 on centos64 ?

2013-06-21 Thread Lars Marowsky-Bree
On 2013-06-21T10:56:29, andreas graeper wrote: > hi, > when only i remove or add resources, corosync starts to eat up all cpu. > drbd 8.4.1 (build from source) > corosync 1.4.1 yes, corosync 1.4.1 had one such error, I recall. If you're building from source, why are you sticking to such an old v

Re: [Pacemaker] re-manage a resource

2013-06-21 Thread Lars Marowsky-Bree
On 2013-06-21T12:56:17, andreas graeper wrote: > maybe i asked this before, but i could not find message + answer. > > when a resource gets unmanaged and the problems has gone, i want the > resource get managed by pacemaker again. what is to do ? > > situation: only on node left (other ill) >

Re: [Pacemaker] Two resource nodes + one quorum node

2013-06-22 Thread Lars Marowsky-Bree
On 2013-06-21T14:30:42, Andrey Groshev wrote: > I was wrong - the resource starts in 15 minutes. > I found a matching entry in the log at the same time: > grep '11:59.*900' /var/log/cluster/corosync.log > Jun 21 11:59:50 [23616] dev-cluster2-node4 crmd: info: crm_timer_popped: > PEngi

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-25 Thread Lars Marowsky-Bree
On 2013-06-25T10:16:58, Andrey Groshev wrote: > Ok, I recently became engaged in the PСMK, so for me it is a surprize. > The more so in all the major linux distributions version 1.1.х. Pacemaker has very strong regression and system tests, and barring accidents, it is usually very safe to always

Re: [Pacemaker] ERROR: Wrong stack o2cb

2013-06-26 Thread Lars Marowsky-Bree
On 2013-06-25T17:08:36, Denis Witt wrote: > My cluster.conf (I added this later to be able to run tunefs.ocfs > --update-cluster-stack): This indicates you have a 'wrong stack' on disk still. You need to run mkfs.ocfs2/tunefs.ocfs while the o2cb cluster resource is running, or to set it to "pcmk

Re: [Pacemaker] GPU Processing

2013-06-26 Thread Lars Marowsky-Bree
On 2013-06-25T12:03:22, Colin Blair wrote: > Andrew, > > Does Pacemaker support GPU processes? Pacemaker is not very CPU intensive; what would it use a GPU for? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-26 Thread Lars Marowsky-Bree
On 2013-06-25T20:28:29, Andrew Beekhof wrote: > > Perhaps a numbering scheme like the Linux kernel would fit better than a > > stable/unstable branch distinction. Changes that deserve the "unstable" > > term are really really rare (and I'm sure we've all learned from them), > > so it may be bette

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-26 Thread Lars Marowsky-Bree
On 2013-06-26T21:31:14, Andrew Beekhof wrote: > > Distributions can take care of them when they integrate them; basically > > they'll trickle through until the whole stack the distributions ship > > builds again. > If we let 2.0.x be anything like 1.1.x, I suspect this would be rather > difficul

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-27 Thread Lars Marowsky-Bree
On 2013-06-27T14:28:19, Andrew Beekhof wrote: > I wouldn't say the 6 months between 1.1.7 and 1.1.8 was a particularly > aggressive release cycle. For the amount of changes in there, I think yes. And the intrusive ones didn't show up all at the beginning of that cycle, either. That just made in

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-27 Thread Lars Marowsky-Bree
On 2013-06-27T20:50:34, Andrew Beekhof wrote: > There was one :-) > I merged the best bits of three parallel CPG code paths. > The things that prompted the extra bits in one also applied to the others. Ah, that wasn't so obvious to me when I tried making sense of the commit. ;-) But that's clear

Re: [Pacemaker] Problem with dual-PDU fencing node with redundant PSUs

2013-06-27 Thread Lars Marowsky-Bree
On 2013-06-27T16:52:02, Dejan Muhamedagic wrote: > > I don't want the cluster stack to start on boot, so I disable > > pacemaker/corosync. However, I do want the node to power back on so that > > I can log into it when the alarms go off. Yes, I could log into the good > > node, manually unfence/b

Re: [Pacemaker] Problem with dual-PDU fencing node with redundant PSUs

2013-06-27 Thread Lars Marowsky-Bree
On 2013-06-27T10:56:40, Digimer wrote: > However, this feels like a really bad solution. It's not uncommon to > have two separate power rails feeding either side of the node's PSUs. > Particularly in HA environments. True. But gating them through the same power switch is *not* a SPoF from the cl

Re: [Pacemaker] Problem configuring a simple ping resource

2013-06-27 Thread Lars Marowsky-Bree
On 2013-06-27T17:01:37, Jacobo García wrote: > Enable assymetric clustering > crm_attribute --attr-name symmetric-cluster --attr-value false > > Then I configure the resource: > crm configure primitive ping ocf:pacemaker:ping params > host_list="10.34.151.73" op monitor interval=15s timeout=5s >

Re: [Pacemaker] Problem with dual-PDU fencing node with redundant PSUs

2013-06-27 Thread Lars Marowsky-Bree
On 2013-06-27T11:32:36, Digimer wrote: > >> time and I expect many users will run into this problem as they try to > >> migrate to RHEL 7. I see no reason why this can't be properly handled in > >> pacemaker directly. > > Yes, why not, choice is a good thing ;-) > If an established configuration

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-27T12:53:01, Digimer wrote: > primitive fence_n01_psu1_off stonith:fence_apc_snmp \ > params ipaddr="an-p01" pcmk_reboot_action="off" port="1" > pcmk_host_list="an-c03n01.alteeve.ca" > primitive fence_n01_psu1_on stonith:fence_apc_snmp \ > params ipaddr="an-p01" pcmk_re

Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T11:11:00, Andrew Beekhof wrote: > >> Maybe you're right, maybe I should stop fighting it and go with the > >> firefox approach. > >> That certainly seemed to piss a lot of people off though... > > If there's one message I've learned in 13 years of work on Linux HA, > > then it is th

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T20:21:22, Andrew Beekhof wrote: > > It looks correct, but not quite sane. ;-) That seems not to be > > something you can address, though. I'm thinking that fencing topology > > should be smart enough to, if multiple fencing devices are specified, to > > know how to expand them to "f

Re: [Pacemaker] Release model

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T18:41:35, Andrew Beekhof wrote: > > There's an exception: dropping commonly used external interfaces (say, > > "ptest") needs to be announced a few releases in advance before enacted > > upstream. (And if Enterprise distributions want to keep something, they > > have time to prepare

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T21:01:55, Andrew Beekhof wrote: > > I'd agree, but it's not multiple ports on the same device, it's multiple > > ports on *different* devices. I don't think a single fencing agent can > > handle that - it really looks like something only the higher level can > > cope with. > True, i

Re: [Pacemaker] Release model

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T14:49:06, Dejan Muhamedagic wrote: > > If cluster-glue's LRM had had such a suite, it'd certainly have > > helped tons.) > It did have a regression suite. Yes, well, but it didn't test for LRM_MAX_CHILDREN or the secret support, for example. So it didn't really document the interfa

Re: [Pacemaker] Release model

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T22:04:48, Andrew Beekhof wrote: > I think he did actually. Well, yes, but the hg history or reading the existing code would probably have been quite helpful. I'll take "not well documented", but it's hard to say the rewrite was handled very well. But I don't want to get drawn into

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T10:20:56, Digimer wrote: > >> primitive fence_n01_psu1_off stonith:fence_apc_snmp \ > >> params ipaddr="an-p01" pcmk_reboot_action="off" port="1" > >> pcmk_host_list="an-c03n01.alteeve.ca" > >> primitive fence_n01_psu1_on stonith:fence_apc_snmp \ > >> params ipaddr="

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T10:27:54, Digimer wrote: > > Basically, unless we can do this better, having multiple devices per > > fence topology level needs to be considered broken and might be better > > removed. > NO NO NO NO. > > Please do not remove this. I can not use pacemaker unless I can keep the > po

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T11:29:35, Digimer wrote: > In rhcs, you can control the fence device's action using 'action="..."' > attribute in the element. So for us rhcs migrants, we > expect that action="..." in the fence primitive will have the same > effect. As of now, as you know, this is ignored in favou

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-28 Thread Lars Marowsky-Bree
On 2013-06-28T11:20:32, Digimer wrote: > Yes, a failed "on" action would then fail the method. This is > sub-optimal as FenceAgentAPI says that only the "off" portion of > "reboot" needs to succeed. However, I don't consider this a show stopper > because "on" action of PDUs simply means "re-energ

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-29 Thread Lars Marowsky-Bree
On 2013-06-29T09:22:20, Andrew Beekhof wrote: > > This doesn't help people who have dual power rails/PDUs for power > > redundancy. > I'm yet to be convinced that having two PDUs is helping those people in the > first place. > If it were actually useful, I suspect more than two/three people woul

[Pacemaker] Disconnected from CIB?

2013-06-30 Thread Lars Marowsky-Bree
Hi, sbd connects to the CIB and watches updates come in to see if pacemaker considers the node healthy still, and if the cluster partition is quorate according to the CIB. That's all working fine. But I've noticed that during start-up of a regular cluster, sbd would get disconnected briefly: Jun

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Lars Marowsky-Bree
On 2013-07-01T21:37:38, Andrew Beekhof wrote: > > And apparently, this is one of the scenarios for which fence topology > > was created and supports multiple devices per level. I'd venture the > > opinion that the current implementation of "multiple devices per level" > > is broken (since it requ

Re: [Pacemaker] Disconnected from CIB?

2013-07-01 Thread Lars Marowsky-Bree
On 2013-07-01T21:09:18, Andrew Beekhof wrote: > > Anything I should worry about? > > I would say so, because I can't think of a valid reason for it to happen. > You'll probably want to use the blackbox to diagnose this. > > Reproducible or random? Reproducible on the non-DC node during full st

Re: [Pacemaker] Disconnected from CIB?

2013-07-01 Thread Lars Marowsky-Bree
On 2013-07-01T14:15:01, Lars Marowsky-Bree wrote: > Reproducible on the non-DC node during full start-up of a cluster, yes. And it turns out to be a CIB problem afterall. Or I'm doing something else wrong: I'm doing, basically straight from crm_mon.c: xmlNode *cib_co

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Lars Marowsky-Bree
On 2013-07-01T11:53:29, Digimer wrote: > You are right, of course. Imagine though that the IPMI BMC's network > port or cable could have silently failed some time before the node > failed. Pacemaker can monitor the fencing device if you configure a monitor action for it, for exactly this reason.

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Lars Marowsky-Bree
On 2013-07-01T12:58:25, Digimer wrote: > > Pacemaker can monitor the fencing device if you configure a monitor > > action for it, for exactly this reason. > My *very* initial testing of op monitor="30" didn't detect the failure > or recovery of the fence device. I may very well have screwed somet

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Lars Marowsky-Bree
On 2013-07-01T13:44:43, Digimer wrote: > > I only use fence_*, so the wrapper would need to be there for me to test it. > > > > Tell me about how sbd works, please. > nm, found the page for it. > > http://www.linux-ha.org/wiki/SBD_Fencing Yeah, smart me, forgot to add the URL. The above one i

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Lars Marowsky-Bree
On 2013-07-01T13:52:22, Digimer wrote: > 1. It won't (reliably) work with DRBD because. Not by itself, no. You need shared storage for it, not replicated storage. (Though the shared storage can be provided by other nodes via iSCSI too.) > 2. I never trust a fence method that requires the victim

Re: [Pacemaker] some pacemaker questions

2013-07-02 Thread Lars Marowsky-Bree
On 2013-07-02T10:46:09, Andrew Beekhof wrote: > > Our problem is that if i give "crm resource stop vm1" and immediatly after > > "crm resource stop vm2" > > it happens that pacemaker begins to stop vm2 only after vm1 is stopped. > There's nothing in the config to suggest that this would happen.

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-02 Thread Lars Marowsky-Bree
On 2013-07-02T08:56:06, Andrew Beekhof wrote: > > My *very* initial testing of op monitor="30" didn't detect the failure > > or recovery of the fence device. > That might come down to the quality of the monitor action in the agent though. Would be my best guess - suggest to file a bugzilla for t

Re: [Pacemaker] Disconnected from CIB?

2013-07-02 Thread Lars Marowsky-Bree
On 2013-07-02T08:25:18, Andrew Beekhof wrote: > >if (cli_config_update(&cib_copy, NULL, FALSE) == FALSE) { > Also, change FALSE -> TRUE here so that you see the validation errors. OK. > > What could cause cli_config_update() to fail in this way? > Beats me. Can you log the xml before th

Re: [Pacemaker] Disconnected from CIB?

2013-07-02 Thread Lars Marowsky-Bree
On 2013-07-02T20:12:08, Andrew Beekhof wrote: > > It seems related to the number of times I poll the CIB, too; I seem to > > hit a transient window there, maybe. Since I dropped the number of polls > > (instead of requesting the full CIB once per second) it hasn't > > reproduced. But I'll reinsta

Re: [Pacemaker] Disconnected from CIB?

2013-07-02 Thread Lars Marowsky-Bree
On 2013-07-02T21:58:57, Andrew Beekhof wrote: > Ah, thats probably the issue. > Occasionally a diff doesn't apply correctly (think ordering changes) and your > copy of the error handling code results in cli_config_update() being called > with a NULL pointer. > > Fix the case statements and you

Re: [Pacemaker] Monitor and standby

2013-07-03 Thread Lars Marowsky-Bree
On 2013-07-03T12:21:37, Denis Witt wrote: > Hi List, > > we have a two node cluster (test1-node1, test1-node2) with an additional > quorum node (test1). On all nodes MySQL is running. test1-node1 and > test1-node2 sharing the MySQL-Database via DRBD, so only one Node > should run MySQL. On test1

Re: [Pacemaker] Using "avoids" location constraint

2013-07-08 Thread Lars Marowsky-Bree
On 2013-07-08T14:35:09, Andrew Morgan wrote: > Thanks Florian. > > The problem I have is that I'd like to define a HA configuration that isn't > dependent on a specific set of fencing hardware (or any fencing hardware at > all for that matter) and as the stack has the quorum capability included

Re: [Pacemaker] Using "avoids" location constraint

2013-07-08 Thread Lars Marowsky-Bree
On 2013-07-08T09:57:38, Digimer wrote: > Building a shared storage cluster without fencing is asking for heart-ache. > There is no case, quorum or not, where it is ok to skip fencing. If a node > locks up mid-write and the other node simply assumes it's dead, cleans up > and goes on using storage

Re: [Pacemaker] Using "avoids" location constraint

2013-07-08 Thread Lars Marowsky-Bree
On 2013-07-08T10:13:50, Digimer wrote: > >While in general I agree, the above failure case is not likely with > >DRBD. > > > It was one example. Yes, but the use case here happened to be drbd, and thus replicated (not shared) storage. > You are right though, the "good" node would disconnect, >

Re: [Pacemaker] crmsh dosn't respect the acl read permissions

2013-07-09 Thread Lars Marowsky-Bree
On 2013-07-09T09:14:30, emmanuel segura wrote: > I know is easy, but for enable this option the pacemaker needs to be > compiled with --with-acl, if i understand well Yes. If you want that feature, make sure your distributor builds packages with that enabled. After that, it becomes a runtime opt

Re: [Pacemaker] Using "avoids" location constraint

2013-07-10 Thread Lars Marowsky-Bree
On 2013-07-10T14:32:04, Andrew Morgan wrote: > First of all, setting the 3rd host to be a standby (this was done before > any of the resources were created) didn't stop Pacemaker attempting to > start the resources there (that fails as MySQL isn't installed on that > server) It didn't start

Re: [Pacemaker] VirtualDomain Monitor Script

2013-07-12 Thread Lars Marowsky-Bree
On 2013-07-11T23:20:57, Gregg Stock wrote: > I'm adding monitor_scripts to my VirtualDomain resources and had a couple of > questions. > > 1. Does the monitor script run on the domain controller? Right now, I have > shared storage nodes that are firewalled from accessing the virtual machines > b

Re: [Pacemaker] again trouble with quorum (now with cman)

2013-07-12 Thread Lars Marowsky-Bree
On 2013-07-12T08:59:16, Andrey Groshev wrote: > I understand that it may be correct to do so... > But why so difficult? > Assume, I make a small НА cluster in my garage. > And, I not have a managed switch or managed UPS. > I can not corrupt the data. > I just need to returning node as soon as po

Re: [Pacemaker] again trouble with quorum (now with cman)

2013-07-12 Thread Lars Marowsky-Bree
On 2013-07-12T14:18:03, Digimer wrote: > In any case, if a split-brain is not a concern, then why use an HA stack at > all? That is not the same. DRBD behaves differently from "normal" shared storage, and can recover from concurrent activation differently; either automatically or by manually mer

Re: [Pacemaker] Cman & corosync use serial

2013-07-16 Thread Lars Marowsky-Bree
On 2013-07-15T06:27:47, Zvi wrote: > is there an option to configure corosync to use serial? No. corosync no longer supports serial. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is t

Re: [Pacemaker] Using "avoids" location constraint

2013-07-16 Thread Lars Marowsky-Bree
On 2013-07-12T08:32:53, Andrew Beekhof wrote: > Some parts of cman - dlm specifically, insist on fencing anyway. > If you're not planning on using gfs, you could investigate turning off the > dlm. I keep thinking that those parts should then fail cleanly instead of fuzzy errors. i.e., not start

Re: [Pacemaker] Multi-node resource dependency

2013-07-20 Thread Lars Marowsky-Bree
On 2013-07-19T16:49:21, "\"Tomcsányi, Domonkos\"" wrote: > Now the behaviour I would like to achieve: > If NODE 1 goes offline its services should get migrated to NODE 2 AND NODE > 3's services should get migrated to NODE 4. > If NODE 3 goes offline its services should get migrated to NODE4 AND N

Re: [Pacemaker] Multi-node resource dependency

2013-07-22 Thread Lars Marowsky-Bree
On 2013-07-19T23:18:29, Lars Ellenberg wrote: > You may use node attributes in colocation constraints. Ohhh, good thinking. I had forgotten about that too. But I wonder if that really is bi-directional; is the PE smart enough to figure out where resources need to go if one of them can't run on

Re: [Pacemaker] Multi-node resource dependency

2013-07-22 Thread Lars Marowsky-Bree
On 2013-07-22T12:09:22, "\"Tomcsányi, Domonkos\"" wrote: > crm(live)configure# colocation by_color inf: HTTPS_SERVICE_GROUP > OPENVPN_SERVICE_GROUP node-attribute=color > ERROR: 4: constraint by_color references a resource node-attribute=color > which doesn't exist Which crmsh version are you us

Re: [Pacemaker] Patch of fence agent fence_wti to support named port groups

2013-07-22 Thread Lars Marowsky-Bree
On 2013-07-22T14:00:34, Thibaut Pouzet wrote: > NPS-8HD20-2 > The fence agent fence_wti that is shipped with > "fence-agents-3.1.5-25.el6_4.2.x86_64" on CentOS 6.4 cannot work with named > port groups. It only works with single outlets. We have patched fence_wti in > order to make it also work wi

Re: [Pacemaker] Multi-node resource dependency

2013-07-22 Thread Lars Marowsky-Bree
On 2013-07-22T16:25:29, "\"Tomcsányi, Domonkos\"" wrote: > You were right, the version in the Ubuntu repository is outdated, so I > decided to install it from a PPA-repo. Now I have the latest version of > everything, but naturally I already have an error: > > ERROR: running cibadmin -Ql: Could

Re: [Pacemaker] Node recover causes resource to migrate

2013-07-24 Thread Lars Marowsky-Bree
On 2013-07-24T09:00:23, Andrew Beekhof wrote: > > 4. Node A is back with a different internal ip address. > > This is your basic problem. > > I don't believe there is any cluster software that is designed to support > this sort of scenario. > Even at the corosync level, it has no knowledge tha

Re: [Pacemaker] Node recover causes resource to migrate

2013-07-24 Thread Lars Marowsky-Bree
On 2013-07-24T21:40:40, Andrew Beekhof wrote: > > Statically assigned nodeids? > Wouldn't hurt, but you still need to bring down the still-active node to get > it to talk to the new node. > Which sucks Hm. But ... corosync/pacemaker ought to identify the node via the nodeid. If it comes back w

Re: [Pacemaker] Explanation of resource allocation scores issue?

2013-08-14 Thread Lars Marowsky-Bree
On 2013-08-14T16:02:26, "Howley, Tom" wrote: > property $id="cib-bootstrap-options" \ > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ Upgrade. There were bugs fixed in handling m/s resources and location constraints since. 1.1.10 is a good choice. Regards, L

Re: [Pacemaker] Problems with SBD fencing

2013-08-21 Thread Lars Marowsky-Bree
On 2013-08-20T08:52:00, "Angel L. Mateo" wrote: Sorry, I was on vacation for a few weeks, thus only chiming in now. Instead of the Linux-HA Wiki page, please look here for the documentation: https://github.com/l-mb/sbd/blob/master/man/sbd.8.pod (Or, on a system with sbd installed, simply type "

Re: [Pacemaker] Problems with SBD fencing

2013-08-21 Thread Lars Marowsky-Bree
On 2013-08-21T18:15:39, Jan Christian Kaldestad wrote: > In my case I should mention that stonithing works occasionally when the SBD > resource is defined on one node only, but not too often. Unfortunately I > can't seem to find a pattern when it's working or failing. What I'm curious > about is

Re: [Pacemaker] Problems with SBD fencing

2013-08-26 Thread Lars Marowsky-Bree
On 2013-08-23T15:16:17, Jan Christian Kaldestad wrote: > It looks like pacemaker 1.1.10 fixes a few things related to crmd and > fencing failures, so I'd like to test 1.1.10. Unfortunately it's not > available for SLES yet, so I guess I just have to wait... If you have SLE, please file a request

Re: [Pacemaker] staggered resource startup

2013-08-28 Thread Lars Marowsky-Bree
On 2013-08-28T08:18:48, Andrew Beekhof wrote: > > 4. once pacemaker has brought the node online, all vm resource are > > started one at a time (for X...; crm resource start $X; sleep 45s; done) > > Are you aware of this cluster option? > >batch-limit = integer [30] >The numb

Re: [Pacemaker] Howto recover from node state UNCLEAN (online)

2013-09-05 Thread Lars Marowsky-Bree
On 2013-09-05T12:23:23, Andreas Mock wrote: > - resource monitoring failed on node 1 > => stop of resource on node 1 failed > => stonith off node 1 worked > - more or less parallel as resource is clone resource > resource monitoring failed on node 2 > => stop of resource on node 2 failed

Re: [Pacemaker] heartbeat:anything resource not stop/monitoring after reboot

2013-09-06 Thread Lars Marowsky-Bree
On 2013-09-05T11:23:20, David Coulson wrote: > ocf-tester -n reload -o binfile="/usr/sbin/rndc" -o cmdline_options="reload" > /usr/lib/ocf/resource.d/heartbeat/anything > Beginning tests for /usr/lib/ocf/resource.d/heartbeat/anything... > * rc=1: Monitoring an active resource should return 0 > *

Re: [Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

2013-09-11 Thread Lars Marowsky-Bree
On 2013-04-02T17:02:01, David Vossel wrote: > I'm convinced this useful. > > I'll add PCMK_MAX_CHILDREN to the sysconfig documentation.  To be backwards > compatible I'll have the lrmd internally interpret your LRMD_MAX_CHILDREN > environment variable as well. > > sound reasonable? Hi David,

Re: [Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

2013-09-11 Thread Lars Marowsky-Bree
On 2013-09-11T19:55:38, Andrew Beekhof wrote: > > sorry for being thick, but I can't find this in the code now. Did this > > slip through again in April? > Apparently. But before we add it, I'd like to see if we can do something > coherent. > Having 3 (or more) different variables (batch-limit,

Re: [Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

2013-09-11 Thread Lars Marowsky-Bree
On 2013-09-12T14:34:02, Andrew Beekhof wrote: > > Well, they're all doing something completely different. > No, they're all crude approximations designed to stop the cluster as a whole > from using up so much cpu/network/etc that recovery introduces more failures > than it resolves. OK. Though

Re: [Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

2013-09-12 Thread Lars Marowsky-Bree
On 2013-09-12T16:56:35, Andrew Beekhof wrote: > > The most directly equivalent solution would be to number the per-node > > in-flight operations similar to what migration-threshold does. (I think > > we can safely continue to treat all resources as equal to start with.) > Agreed. Perhaps even re

Re: [Pacemaker] meaning of Monitor Interval

2013-09-12 Thread Lars Marowsky-Bree
On 2013-09-13T12:20:54, Xiaomin Zhang wrote: > Hi, Gurus: > Here's a question about service Monitor Interval: considering this value is > configured as '15' seconds, does this mean corosync/pacemaker will take > average 15 seconds to schedule failed resource on a ready node? It'll take about a m

Re: [Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

2013-09-13 Thread Lars Marowsky-Bree
On 2013-09-13T11:15:16, Vladislav Bogdanov wrote: > > The most directly equivalent solution would be to number the per-node > > in-flight operations similar to what migration-threshold does. > You probably meant migration-limit here and then? > > migration-threshold is a different beast. Uhm, y

Re: [Pacemaker] meaning of Monitor Interval

2013-09-13 Thread Lars Marowsky-Bree
On 2013-09-14T00:29:50, Xiaomin Zhang wrote: > Hello Lars: > I'm still somewhat not clear about this monitor interval setting. What I > observed is that the pacemaker always quickly (in less then 2 seconds) > schedule the failed resource when I just cut down the network (via DROP > INPUT, or free

Re: [Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

2013-09-17 Thread Lars Marowsky-Bree
On 2013-09-16T18:19:55, "Gao,Yan" wrote: > > 1. Global property: operations-limit, operations-limit-migrate (alias > > for migration-threshold) > How about to keep it "migration-limit", and name the new property > "actions-limit" -- Other global properties use th

Re: [Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

2013-09-18 Thread Lars Marowsky-Bree
On 2013-09-18T17:53:16, "Gao,Yan" wrote: > > "actions-limit" is a good name. (One of the three hard problems in > > computer science! ;-) > > > > (If we need it more fine grained at any point in the future, we can > > always add class/provider/type/op elements to it.) > Been thinking how to expr

Re: [Pacemaker] monitor on disabled nodes

2013-09-18 Thread Lars Marowsky-Bree
On 2013-09-18T11:13:46, Radoslaw Garbacz wrote: > Hi, > > I have a question regarding the "monitor" operation on disabled nodes. > > I noticed that this operation is called even, when an agent is disabled for > a node. Is it an indented behavior or is there something wrong with my > configurat

Re: [Pacemaker] monitor on disabled nodes

2013-09-19 Thread Lars Marowsky-Bree
On 2013-09-18T12:20:08, Radoslaw Garbacz wrote: > Sorry for not being specific. > > The agent is meant to run only on a specific node (the head), and by > constraints is disabled on all other nodes. > > 'pcs constraint' reports: > Location Constraints: > Resource: dbx_nfs_head > Enabled

Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down

2013-09-19 Thread Lars Marowsky-Bree
On 2013-09-17T13:37:54, Andreas Mock wrote: > I have the problem that after a node rejoins the cluster some > resources are move back to that node. > Now I want to see the calculated scores to see where I do > have to adjust the stickyness to get the behaviour I like. > > I'm not sure how to us

  1   2   3   4   5   6   >