On 2013-06-10T19:26:20, Халезов Иван wrote:
> 1) The RedHat company is planning to drop corosync support and wants to
> switch to CMAN. (
> http://www.gossamer-threads.com/lists/linuxha/pacemaker/84662 )
To the best of my understanding, this is not correct. Red Hat will
continue to support coros
On 2013-06-10T19:25:38, Andreas Mock wrote:
> Am I right that these a packages for a RHEL 6.x system but in a
> corosync-pacemaker-fashion like SuSE uses it over years now?
Yes.
Those packages are scheduled for an update to latest upstream versions
as soon as we wrap up our current project, but
On 2013-06-10T18:22:37, Wolfgang Routschka
wrote:
> After reading Documentation (http://clusterlabs.org/doc/acls.html) I found
> "All user accounts must be in the haclient group." but all users in haclient
> group have full access "Note that the root and hacluster users will always
> have ful
On 2013-06-10T22:53:54, Andreas Mock wrote:
> Hi Lars,
>
> thank you for answering. Could you tell me whether the stack
> is like Option1 or Option3 of this article
> http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/
>
> If it's Option1 when do you think SuSE switches to
On 2013-06-11T13:39:42, andreas graeper wrote:
> # removing
> cibadmin --delete --xml-text ''
> cibadmin --delete --xml-text ''
Why not crm configure delete?
> i.e. ocf:heartbeat:symlink
> (after installing 'failed actions: .. not installed' on passive node)
> when i remove the symlink-resourc
On 2013-06-11T14:21:52, andreas graeper wrote:
> hi and thanks.
>
> this is what i found :
> crm_resource --name p_xxx --set-parameter target-role --meta
> --parameter-value Stopped
You can also use "crm resource stop p_xxx" here.
> crm configure delete o_xxx## order
> crm configure delete
On 2013-06-12T09:09:31, Michael Schwartzkopff wrote:
> Especially I would like to find out how many nodes are in a cluster and how
> many nodes are online. Perhaps somebody could post a code snipplet here.
I think you might find the crm_mon.c source code useful for this.
Regards,
Lars
--
On 2013-06-05T20:44:56, Michael Schwartzkopff wrote:
Hi Michael,
yes, the idea to make utilization more dynamic was something Andrew and
I looked into ages ago.
Especially, there's still the open issue that it somewhat sucks that one
has to configure them at all. It'd be nice if monitor_0 would
On 2013-06-12T12:03:46, Michael Schwartzkopff wrote:
> > I think you might find the crm_mon.c source code useful for this.
> Thanks. That helps a lot.
It's where I stole the bits I needed for SBD from ;-)
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jenn
On 2013-06-12T17:08:22, Michael Furman wrote:
> pcs
> resource create WebSite ocf:heartbeat:apache params
> configfile=/etc/httpd/conf/httpd.conf op monitor interval=20s
>
> Then, to
> test the failover we stop Apache using the following command:
>
> service httpd stop
>
>
> We see that Apach
On 2013-06-12T11:31:43, Michael Schwartzkopff wrote:
> > Pacemaker is not necessarily the best tool to implement quick reaction
> > to changing load, though. The utilization feature is concerned with
> > *correctness* first - namely, don't overcommit resources severely, e.g.,
> > the case of Xen/
On 2013-06-12T11:57:05, Digimer wrote:
> I build exclusively two-node clusters, and the biggest draw-back is the
> possibility of a "fence loop". That is, without quorum and with a network
> error, a node can come up on it's own, fail to contact it's peer and fence
> it. When the fenced node boot
On 2013-06-13T07:45:09, Andrew Beekhof wrote:
> Its certainly possible to build a decent 2-node cluster, but there are
> several non-obvious steps that are required - preventing fencing loops being
> one.
Given that 2-node clusters are probably still the 90%+ majority, I
wonder if we shouldn't
On 2013-06-13T07:48:46, Andrew Beekhof wrote:
> > In my opinion the user doesn´t have any rights although the user is in
> > haclient group and having no role/user configuration. Is it right?
> No. Users in the haclient group have full access. Thats what it is for.
Not with ACLs enabled: http
On 2013-06-13T16:24:25, Andrey Groshev wrote:
> > It doesn't have to be able to run services, it only needs to contribute to
> > quorum.
> That is, there is no way to switch the node into standby mode from a
> pacemaker script/config?
Sure there is:
# crm node standby nodename
The node will
On 2013-06-11T09:34:10, Digimer wrote:
> If you have any trouble, please don't hesitate to ask here and we will do
> our best to help.
I wonder what the perspective is on standardizing on one fencing API.
Both the "old" cluster-glue one and this one are remarkably similar
(except for passing i
On 2013-06-13T22:12:23, Andrey Groshev wrote:
> >>> It doesn't have to be able to run services, it only needs to
> >>> contribute to quorum.
> >> That is, there is no way to switch the node into standby mode from
> >> a pacemaker script/config?
> > Sure there is:
> >
> > # crm node standby node
On 2013-06-14T10:50:21, Andrew Beekhof wrote:
> If I had my way, they'd
> - have env variables the same as OCF
> - be executable from the command line like the RH ones
(I'm not sure I get the second point. Even the "old" cluster-glue
scripts were executable from the commandline?)
> - avoid the
On 2013-06-14T08:47:29, Andrey Groshev wrote:
> ~ 200 VM hosts.
> Suppose if a cloud die, I'll have half a day to work in the role of RC script.
If you have 200 VM hosts, one would hope you don't need to configure 100
2-node pairs; that's a perfect scenario where you should easily be able
to con
On 2013-06-15T08:26:33, Andrew Beekhof wrote:
> > (I'm not sure I get the second point. Even the "old" cluster-glue
> > scripts were executable from the commandline?)
> The RH ones are executable directly, not relying on separate command line
> utils.
Well, you can call the cluster-glue ones di
On 2013-06-16T13:30:03, Digimer wrote:
> All resource and fence agents are supposed to print out their usage details
> (in an XML validation format) when you pass ' meta-data' or
> ' -o metadata'.
>
> It's not ideal, but it is useful.
>
> Perhaps I should start documenting these...
https://www
On 2013-06-19T10:13:19, Andrey Groshev wrote:
> I started experimenting.
> Received the first incomprehensible situation:
> There are three nodes. One of the quorum exists only, i.e. without a specific
> pacemaker.
What do you mean by "without a specific pacemaker"?
What does your cluster conf
On 2013-06-18T13:47:57, Nikita Michalko wrote:
> I tried build/compile the last version of pacemeker from sources
> (http://blog.clusterlabs.org/blog/2013/release-candidate-1-dot-1-10-rc3/)
> on SLES11/SP2 (kernel 3.0.58-0.6.2-default) with libqb-0.14.4 as follows:
> ./configure --prefix=/usr --
On 2013-06-20T14:53:07, Florian Crouzat wrote:
> >Am Donnerstag, 20. Juni 2013, 13:43:10 schrieb Florian Crouzat:
>
> >My original question was about crm.
>
> It was about an old tool that doesn't work in your environment.
The crm "ptest" command calls crm_simulate in the background.
The crm
On 2013-06-21T10:56:29, andreas graeper wrote:
> hi,
> when only i remove or add resources, corosync starts to eat up all cpu.
> drbd 8.4.1 (build from source)
> corosync 1.4.1
yes, corosync 1.4.1 had one such error, I recall. If you're building
from source, why are you sticking to such an old v
On 2013-06-21T12:56:17, andreas graeper wrote:
> maybe i asked this before, but i could not find message + answer.
>
> when a resource gets unmanaged and the problems has gone, i want the
> resource get managed by pacemaker again. what is to do ?
>
> situation: only on node left (other ill)
>
On 2013-06-21T14:30:42, Andrey Groshev wrote:
> I was wrong - the resource starts in 15 minutes.
> I found a matching entry in the log at the same time:
> grep '11:59.*900' /var/log/cluster/corosync.log
> Jun 21 11:59:50 [23616] dev-cluster2-node4 crmd: info: crm_timer_popped:
> PEngi
On 2013-06-25T10:16:58, Andrey Groshev wrote:
> Ok, I recently became engaged in the PСMK, so for me it is a surprize.
> The more so in all the major linux distributions version 1.1.х.
Pacemaker has very strong regression and system tests, and barring
accidents, it is usually very safe to always
On 2013-06-25T17:08:36, Denis Witt wrote:
> My cluster.conf (I added this later to be able to run tunefs.ocfs
> --update-cluster-stack):
This indicates you have a 'wrong stack' on disk still. You need to run
mkfs.ocfs2/tunefs.ocfs while the o2cb cluster resource is running, or to
set it to "pcmk
On 2013-06-25T12:03:22, Colin Blair wrote:
> Andrew,
>
> Does Pacemaker support GPU processes?
Pacemaker is not very CPU intensive; what would it use a GPU for?
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
21284 (
On 2013-06-25T20:28:29, Andrew Beekhof wrote:
> > Perhaps a numbering scheme like the Linux kernel would fit better than a
> > stable/unstable branch distinction. Changes that deserve the "unstable"
> > term are really really rare (and I'm sure we've all learned from them),
> > so it may be bette
On 2013-06-26T21:31:14, Andrew Beekhof wrote:
> > Distributions can take care of them when they integrate them; basically
> > they'll trickle through until the whole stack the distributions ship
> > builds again.
> If we let 2.0.x be anything like 1.1.x, I suspect this would be rather
> difficul
On 2013-06-27T14:28:19, Andrew Beekhof wrote:
> I wouldn't say the 6 months between 1.1.7 and 1.1.8 was a particularly
> aggressive release cycle.
For the amount of changes in there, I think yes. And the intrusive ones
didn't show up all at the beginning of that cycle, either. That just
made in
On 2013-06-27T20:50:34, Andrew Beekhof wrote:
> There was one :-)
> I merged the best bits of three parallel CPG code paths.
> The things that prompted the extra bits in one also applied to the others.
Ah, that wasn't so obvious to me when I tried making sense of the
commit. ;-) But that's clear
On 2013-06-27T16:52:02, Dejan Muhamedagic wrote:
> > I don't want the cluster stack to start on boot, so I disable
> > pacemaker/corosync. However, I do want the node to power back on so that
> > I can log into it when the alarms go off. Yes, I could log into the good
> > node, manually unfence/b
On 2013-06-27T10:56:40, Digimer wrote:
> However, this feels like a really bad solution. It's not uncommon to
> have two separate power rails feeding either side of the node's PSUs.
> Particularly in HA environments.
True. But gating them through the same power switch is *not* a SPoF from
the cl
On 2013-06-27T17:01:37, Jacobo García wrote:
> Enable assymetric clustering
> crm_attribute --attr-name symmetric-cluster --attr-value false
>
> Then I configure the resource:
> crm configure primitive ping ocf:pacemaker:ping params
> host_list="10.34.151.73" op monitor interval=15s timeout=5s
>
On 2013-06-27T11:32:36, Digimer wrote:
> >> time and I expect many users will run into this problem as they try to
> >> migrate to RHEL 7. I see no reason why this can't be properly handled in
> >> pacemaker directly.
> > Yes, why not, choice is a good thing ;-)
> If an established configuration
On 2013-06-27T12:53:01, Digimer wrote:
> primitive fence_n01_psu1_off stonith:fence_apc_snmp \
> params ipaddr="an-p01" pcmk_reboot_action="off" port="1"
> pcmk_host_list="an-c03n01.alteeve.ca"
> primitive fence_n01_psu1_on stonith:fence_apc_snmp \
> params ipaddr="an-p01" pcmk_re
On 2013-06-28T11:11:00, Andrew Beekhof wrote:
> >> Maybe you're right, maybe I should stop fighting it and go with the
> >> firefox approach.
> >> That certainly seemed to piss a lot of people off though...
> > If there's one message I've learned in 13 years of work on Linux HA,
> > then it is th
On 2013-06-28T20:21:22, Andrew Beekhof wrote:
> > It looks correct, but not quite sane. ;-) That seems not to be
> > something you can address, though. I'm thinking that fencing topology
> > should be smart enough to, if multiple fencing devices are specified, to
> > know how to expand them to "f
On 2013-06-28T18:41:35, Andrew Beekhof wrote:
> > There's an exception: dropping commonly used external interfaces (say,
> > "ptest") needs to be announced a few releases in advance before enacted
> > upstream. (And if Enterprise distributions want to keep something, they
> > have time to prepare
On 2013-06-28T21:01:55, Andrew Beekhof wrote:
> > I'd agree, but it's not multiple ports on the same device, it's multiple
> > ports on *different* devices. I don't think a single fencing agent can
> > handle that - it really looks like something only the higher level can
> > cope with.
> True, i
On 2013-06-28T14:49:06, Dejan Muhamedagic wrote:
> > If cluster-glue's LRM had had such a suite, it'd certainly have
> > helped tons.)
> It did have a regression suite.
Yes, well, but it didn't test for LRM_MAX_CHILDREN or the secret
support, for example. So it didn't really document the interfa
On 2013-06-28T22:04:48, Andrew Beekhof wrote:
> I think he did actually.
Well, yes, but the hg history or reading the existing code would
probably have been quite helpful. I'll take "not well documented", but
it's hard to say the rewrite was handled very well. But I don't want to
get drawn into
On 2013-06-28T10:20:56, Digimer wrote:
> >> primitive fence_n01_psu1_off stonith:fence_apc_snmp \
> >> params ipaddr="an-p01" pcmk_reboot_action="off" port="1"
> >> pcmk_host_list="an-c03n01.alteeve.ca"
> >> primitive fence_n01_psu1_on stonith:fence_apc_snmp \
> >> params ipaddr="
On 2013-06-28T10:27:54, Digimer wrote:
> > Basically, unless we can do this better, having multiple devices per
> > fence topology level needs to be considered broken and might be better
> > removed.
> NO NO NO NO.
>
> Please do not remove this. I can not use pacemaker unless I can keep the
> po
On 2013-06-28T11:29:35, Digimer wrote:
> In rhcs, you can control the fence device's action using 'action="..."'
> attribute in the element. So for us rhcs migrants, we
> expect that action="..." in the fence primitive will have the same
> effect. As of now, as you know, this is ignored in favou
On 2013-06-28T11:20:32, Digimer wrote:
> Yes, a failed "on" action would then fail the method. This is
> sub-optimal as FenceAgentAPI says that only the "off" portion of
> "reboot" needs to succeed. However, I don't consider this a show stopper
> because "on" action of PDUs simply means "re-energ
On 2013-06-29T09:22:20, Andrew Beekhof wrote:
> > This doesn't help people who have dual power rails/PDUs for power
> > redundancy.
> I'm yet to be convinced that having two PDUs is helping those people in the
> first place.
> If it were actually useful, I suspect more than two/three people woul
Hi,
sbd connects to the CIB and watches updates come in to see if pacemaker
considers the node healthy still, and if the cluster partition is
quorate according to the CIB. That's all working fine.
But I've noticed that during start-up of a regular cluster, sbd would
get disconnected briefly:
Jun
On 2013-07-01T21:37:38, Andrew Beekhof wrote:
> > And apparently, this is one of the scenarios for which fence topology
> > was created and supports multiple devices per level. I'd venture the
> > opinion that the current implementation of "multiple devices per level"
> > is broken (since it requ
On 2013-07-01T21:09:18, Andrew Beekhof wrote:
> > Anything I should worry about?
>
> I would say so, because I can't think of a valid reason for it to happen.
> You'll probably want to use the blackbox to diagnose this.
>
> Reproducible or random?
Reproducible on the non-DC node during full st
On 2013-07-01T14:15:01, Lars Marowsky-Bree wrote:
> Reproducible on the non-DC node during full start-up of a cluster, yes.
And it turns out to be a CIB problem afterall. Or I'm doing something
else wrong:
I'm doing, basically straight from crm_mon.c:
xmlNode *cib_co
On 2013-07-01T11:53:29, Digimer wrote:
> You are right, of course. Imagine though that the IPMI BMC's network
> port or cable could have silently failed some time before the node
> failed.
Pacemaker can monitor the fencing device if you configure a monitor
action for it, for exactly this reason.
On 2013-07-01T12:58:25, Digimer wrote:
> > Pacemaker can monitor the fencing device if you configure a monitor
> > action for it, for exactly this reason.
> My *very* initial testing of op monitor="30" didn't detect the failure
> or recovery of the fence device. I may very well have screwed somet
On 2013-07-01T13:44:43, Digimer wrote:
> > I only use fence_*, so the wrapper would need to be there for me to test it.
> >
> > Tell me about how sbd works, please.
> nm, found the page for it.
>
> http://www.linux-ha.org/wiki/SBD_Fencing
Yeah, smart me, forgot to add the URL.
The above one i
On 2013-07-01T13:52:22, Digimer wrote:
> 1. It won't (reliably) work with DRBD because.
Not by itself, no. You need shared storage for it, not replicated
storage. (Though the shared storage can be provided by other nodes via
iSCSI too.)
> 2. I never trust a fence method that requires the victim
On 2013-07-02T10:46:09, Andrew Beekhof wrote:
> > Our problem is that if i give "crm resource stop vm1" and immediatly after
> > "crm resource stop vm2"
> > it happens that pacemaker begins to stop vm2 only after vm1 is stopped.
> There's nothing in the config to suggest that this would happen.
On 2013-07-02T08:56:06, Andrew Beekhof wrote:
> > My *very* initial testing of op monitor="30" didn't detect the failure
> > or recovery of the fence device.
> That might come down to the quality of the monitor action in the agent though.
Would be my best guess - suggest to file a bugzilla for t
On 2013-07-02T08:25:18, Andrew Beekhof wrote:
> >if (cli_config_update(&cib_copy, NULL, FALSE) == FALSE) {
> Also, change FALSE -> TRUE here so that you see the validation errors.
OK.
> > What could cause cli_config_update() to fail in this way?
> Beats me. Can you log the xml before th
On 2013-07-02T20:12:08, Andrew Beekhof wrote:
> > It seems related to the number of times I poll the CIB, too; I seem to
> > hit a transient window there, maybe. Since I dropped the number of polls
> > (instead of requesting the full CIB once per second) it hasn't
> > reproduced. But I'll reinsta
On 2013-07-02T21:58:57, Andrew Beekhof wrote:
> Ah, thats probably the issue.
> Occasionally a diff doesn't apply correctly (think ordering changes) and your
> copy of the error handling code results in cli_config_update() being called
> with a NULL pointer.
>
> Fix the case statements and you
On 2013-07-03T12:21:37, Denis Witt wrote:
> Hi List,
>
> we have a two node cluster (test1-node1, test1-node2) with an additional
> quorum node (test1). On all nodes MySQL is running. test1-node1 and
> test1-node2 sharing the MySQL-Database via DRBD, so only one Node
> should run MySQL. On test1
On 2013-07-08T14:35:09, Andrew Morgan wrote:
> Thanks Florian.
>
> The problem I have is that I'd like to define a HA configuration that isn't
> dependent on a specific set of fencing hardware (or any fencing hardware at
> all for that matter) and as the stack has the quorum capability included
On 2013-07-08T09:57:38, Digimer wrote:
> Building a shared storage cluster without fencing is asking for heart-ache.
> There is no case, quorum or not, where it is ok to skip fencing. If a node
> locks up mid-write and the other node simply assumes it's dead, cleans up
> and goes on using storage
On 2013-07-08T10:13:50, Digimer wrote:
> >While in general I agree, the above failure case is not likely with
> >DRBD.
> >
> It was one example.
Yes, but the use case here happened to be drbd, and thus replicated (not
shared) storage.
> You are right though, the "good" node would disconnect,
>
On 2013-07-09T09:14:30, emmanuel segura wrote:
> I know is easy, but for enable this option the pacemaker needs to be
> compiled with --with-acl, if i understand well
Yes. If you want that feature, make sure your distributor builds
packages with that enabled. After that, it becomes a runtime opt
On 2013-07-10T14:32:04, Andrew Morgan wrote:
> First of all, setting the 3rd host to be a standby (this was done before
> any of the resources were created) didn't stop Pacemaker attempting to
> start the resources there (that fails as MySQL isn't installed on that
> server)
It didn't start
On 2013-07-11T23:20:57, Gregg Stock wrote:
> I'm adding monitor_scripts to my VirtualDomain resources and had a couple of
> questions.
>
> 1. Does the monitor script run on the domain controller? Right now, I have
> shared storage nodes that are firewalled from accessing the virtual machines
> b
On 2013-07-12T08:59:16, Andrey Groshev wrote:
> I understand that it may be correct to do so...
> But why so difficult?
> Assume, I make a small НА cluster in my garage.
> And, I not have a managed switch or managed UPS.
> I can not corrupt the data.
> I just need to returning node as soon as po
On 2013-07-12T14:18:03, Digimer wrote:
> In any case, if a split-brain is not a concern, then why use an HA stack at
> all?
That is not the same. DRBD behaves differently from "normal" shared
storage, and can recover from concurrent activation differently; either
automatically or by manually mer
On 2013-07-15T06:27:47, Zvi wrote:
> is there an option to configure corosync to use serial?
No. corosync no longer supports serial.
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
21284 (AG Nürnberg)
"Experience is t
On 2013-07-12T08:32:53, Andrew Beekhof wrote:
> Some parts of cman - dlm specifically, insist on fencing anyway.
> If you're not planning on using gfs, you could investigate turning off the
> dlm.
I keep thinking that those parts should then fail cleanly instead of
fuzzy errors. i.e., not start
On 2013-07-19T16:49:21, "\"Tomcsányi, Domonkos\"" wrote:
> Now the behaviour I would like to achieve:
> If NODE 1 goes offline its services should get migrated to NODE 2 AND NODE
> 3's services should get migrated to NODE 4.
> If NODE 3 goes offline its services should get migrated to NODE4 AND N
On 2013-07-19T23:18:29, Lars Ellenberg wrote:
> You may use node attributes in colocation constraints.
Ohhh, good thinking. I had forgotten about that too.
But I wonder if that really is bi-directional; is the PE smart enough to
figure out where resources need to go if one of them can't run on
On 2013-07-22T12:09:22, "\"Tomcsányi, Domonkos\"" wrote:
> crm(live)configure# colocation by_color inf: HTTPS_SERVICE_GROUP
> OPENVPN_SERVICE_GROUP node-attribute=color
> ERROR: 4: constraint by_color references a resource node-attribute=color
> which doesn't exist
Which crmsh version are you us
On 2013-07-22T14:00:34, Thibaut Pouzet wrote:
> NPS-8HD20-2
> The fence agent fence_wti that is shipped with
> "fence-agents-3.1.5-25.el6_4.2.x86_64" on CentOS 6.4 cannot work with named
> port groups. It only works with single outlets. We have patched fence_wti in
> order to make it also work wi
On 2013-07-22T16:25:29, "\"Tomcsányi, Domonkos\"" wrote:
> You were right, the version in the Ubuntu repository is outdated, so I
> decided to install it from a PPA-repo. Now I have the latest version of
> everything, but naturally I already have an error:
>
> ERROR: running cibadmin -Ql: Could
On 2013-07-24T09:00:23, Andrew Beekhof wrote:
> > 4. Node A is back with a different internal ip address.
>
> This is your basic problem.
>
> I don't believe there is any cluster software that is designed to support
> this sort of scenario.
> Even at the corosync level, it has no knowledge tha
On 2013-07-24T21:40:40, Andrew Beekhof wrote:
> > Statically assigned nodeids?
> Wouldn't hurt, but you still need to bring down the still-active node to get
> it to talk to the new node.
> Which sucks
Hm. But ... corosync/pacemaker ought to identify the node via the
nodeid. If it comes back w
On 2013-08-14T16:02:26, "Howley, Tom" wrote:
> property $id="cib-bootstrap-options" \
> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
Upgrade. There were bugs fixed in handling m/s resources and location
constraints since. 1.1.10 is a good choice.
Regards,
L
On 2013-08-20T08:52:00, "Angel L. Mateo" wrote:
Sorry, I was on vacation for a few weeks, thus only chiming in now.
Instead of the Linux-HA Wiki page, please look here for the
documentation: https://github.com/l-mb/sbd/blob/master/man/sbd.8.pod
(Or, on a system with sbd installed, simply type "
On 2013-08-21T18:15:39, Jan Christian Kaldestad wrote:
> In my case I should mention that stonithing works occasionally when the SBD
> resource is defined on one node only, but not too often. Unfortunately I
> can't seem to find a pattern when it's working or failing. What I'm curious
> about is
On 2013-08-23T15:16:17, Jan Christian Kaldestad wrote:
> It looks like pacemaker 1.1.10 fixes a few things related to crmd and
> fencing failures, so I'd like to test 1.1.10. Unfortunately it's not
> available for SLES yet, so I guess I just have to wait...
If you have SLE, please file a request
On 2013-08-28T08:18:48, Andrew Beekhof wrote:
> > 4. once pacemaker has brought the node online, all vm resource are
> > started one at a time (for X...; crm resource start $X; sleep 45s; done)
>
> Are you aware of this cluster option?
>
>batch-limit = integer [30]
>The numb
On 2013-09-05T12:23:23, Andreas Mock wrote:
> - resource monitoring failed on node 1
> => stop of resource on node 1 failed
> => stonith off node 1 worked
> - more or less parallel as resource is clone resource
> resource monitoring failed on node 2
> => stop of resource on node 2 failed
On 2013-09-05T11:23:20, David Coulson wrote:
> ocf-tester -n reload -o binfile="/usr/sbin/rndc" -o cmdline_options="reload"
> /usr/lib/ocf/resource.d/heartbeat/anything
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/anything...
> * rc=1: Monitoring an active resource should return 0
> *
On 2013-04-02T17:02:01, David Vossel wrote:
> I'm convinced this useful.
>
> I'll add PCMK_MAX_CHILDREN to the sysconfig documentation. To be backwards
> compatible I'll have the lrmd internally interpret your LRMD_MAX_CHILDREN
> environment variable as well.
>
> sound reasonable?
Hi David,
On 2013-09-11T19:55:38, Andrew Beekhof wrote:
> > sorry for being thick, but I can't find this in the code now. Did this
> > slip through again in April?
> Apparently. But before we add it, I'd like to see if we can do something
> coherent.
> Having 3 (or more) different variables (batch-limit,
On 2013-09-12T14:34:02, Andrew Beekhof wrote:
> > Well, they're all doing something completely different.
> No, they're all crude approximations designed to stop the cluster as a whole
> from using up so much cpu/network/etc that recovery introduces more failures
> than it resolves.
OK. Though
On 2013-09-12T16:56:35, Andrew Beekhof wrote:
> > The most directly equivalent solution would be to number the per-node
> > in-flight operations similar to what migration-threshold does. (I think
> > we can safely continue to treat all resources as equal to start with.)
> Agreed. Perhaps even re
On 2013-09-13T12:20:54, Xiaomin Zhang wrote:
> Hi, Gurus:
> Here's a question about service Monitor Interval: considering this value is
> configured as '15' seconds, does this mean corosync/pacemaker will take
> average 15 seconds to schedule failed resource on a ready node?
It'll take about a m
On 2013-09-13T11:15:16, Vladislav Bogdanov wrote:
> > The most directly equivalent solution would be to number the per-node
> > in-flight operations similar to what migration-threshold does.
> You probably meant migration-limit here and then?
>
> migration-threshold is a different beast.
Uhm, y
On 2013-09-14T00:29:50, Xiaomin Zhang wrote:
> Hello Lars:
> I'm still somewhat not clear about this monitor interval setting. What I
> observed is that the pacemaker always quickly (in less then 2 seconds)
> schedule the failed resource when I just cut down the network (via DROP
> INPUT, or free
On 2013-09-16T18:19:55, "Gao,Yan" wrote:
> > 1. Global property: operations-limit, operations-limit-migrate (alias
> > for migration-threshold)
> How about to keep it "migration-limit", and name the new property
> "actions-limit" -- Other global properties use th
On 2013-09-18T17:53:16, "Gao,Yan" wrote:
> > "actions-limit" is a good name. (One of the three hard problems in
> > computer science! ;-)
> >
> > (If we need it more fine grained at any point in the future, we can
> > always add class/provider/type/op elements to it.)
> Been thinking how to expr
On 2013-09-18T11:13:46, Radoslaw Garbacz
wrote:
> Hi,
>
> I have a question regarding the "monitor" operation on disabled nodes.
>
> I noticed that this operation is called even, when an agent is disabled for
> a node. Is it an indented behavior or is there something wrong with my
> configurat
On 2013-09-18T12:20:08, Radoslaw Garbacz
wrote:
> Sorry for not being specific.
>
> The agent is meant to run only on a specific node (the head), and by
> constraints is disabled on all other nodes.
>
> 'pcs constraint' reports:
> Location Constraints:
> Resource: dbx_nfs_head
> Enabled
On 2013-09-17T13:37:54, Andreas Mock wrote:
> I have the problem that after a node rejoins the cluster some
> resources are move back to that node.
> Now I want to see the calculated scores to see where I do
> have to adjust the stickyness to get the behaviour I like.
>
> I'm not sure how to us
1 - 100 of 560 matches
Mail list logo