On 2012-11-14T15:11:05, Digimer wrote:
> any reason at all, to try new things. Sometimes it is superior, often it
> is not. In either case, users are free to go where they feel is best.
Ah, but can they? How likely is it that the large distributions will
offer both? Only those that don't supply
On 2012-11-15T09:20:44, Andrew Beekhof wrote:
> > LCMC and crmsh/hawk are at least conceptionally very very different;
> Conceptually LCMC and hawk are both web based GUIs, its the
> implementation that makes them so different.
Not quite. LCMC is pretty heavily different from a deployment
perspe
On 2012-11-15T10:00:21, Hill Fang wrote:
> Hi friend:
>
> I want know heartbeat is support oracle ASM now??
No - and yes.
Oracle RAC (I assume that's the context for ASM?) does not tolerate any
cluster solution except itself. This is not supported together with
Pacemaker.
Pacemaker with t
On 2012-11-14T12:44:53, Digimer wrote:
> Not really, to be honest. The way I see it is that Pacemaker is in tech
> preview (on rhel, which is where I live). So almost by definition,
> anything can change at any time. This is what happened here, so I don't
> see a problem.
That is a pretty limite
On 2012-11-14T09:24:45, Rasto Levrinc wrote:
> What doesn't work? I think that at this point of time, it's be easier to
> get crmsh going/fixed with pcmk 1.1.8. It's probably just some path
> somewhere. If really nothing works, you *must* use LCMC, Pacemaker GUI. :)
crmsh's latest release is sup
On 2012-11-14T09:33:22, Digimer wrote:
> As it was told to me, pcs was going to be what whas used "officially",
> but that anyone and everyone was welcome to continue using and
> developing crm or any other existing or new management tool. My
> take-away was that the devs wanted pcs, for reasons
On 2012-11-14T08:46:25, Ulrich Windl wrote:
> This recommendation is against best practices: The FQHN is usually the first
> name in /etc/hosts, aliases (short names) following. Probably it's better to
> fix the application rather than fiddling with /etc/hosts.
Of course. But I was assuming Eric
On 2012-11-14T09:08:58, Ulrich Windl wrote:
> > The "official" management tool is/will be pcs. That said, crm has been
> > around for a while, so it might be more complete/stable.
> Ist this wishful thinking? In SLES11 SP2 it's not available for installation,
> so it's either very new or not con
On 2012-11-13T17:06:31, "Robinson, Eric" wrote:
> I'm not sure how to correct this. Here are the results of my name resolution
> test on node ha09a...
I'd probably strip everything except the short names out of
/etc/HOSTNAME and /etc/hosts, though it may be sufficient to make sure
the short nam
On 2012-11-13T16:34:23, "Robinson, Eric" wrote:
> bump.
>
> Could someone please review the logs in the links below and tell me what the
> heck is going on with this cluster? I've never encountered anything like this
> before. Basically, corosync thinks the cluster is healthy but Pacemaker won
On 2012-11-12T10:07:50, Andrew Beekhof wrote:
> Um, are you setting a nodeid in corosync.conf?
>
> Because I see this:
>
> Nov 09 09:07:25 [2609] ha09a.mycharts.md crmd: crit:
> crm_get_peer: Node ha09a.mycharts.md and ha09a share the same cluster
> node id '973777088'!
This
On 2012-11-12T15:01:47, alain.mou...@bull.net wrote:
> Thanks but no, in older releases, the op monitoring failed leaded to
> "fence" as required by "on-fail=fence" .
yes, that's what should happen.
You can file a crm_report with the PE inputs showing this for 1.1.7, or
directly retest with 1.1
On 2012-11-07T12:51:25, Ulrich Windl wrote:
> I agree that one shouldn't have to do it, but I've seen cases (two node
> cluster with quorum-policy=ignore) where one node was down while the
> "cluster" wanted to fence both nodes. So when the other node goes up, nodes
> will shoot each other.
>
On 2012-11-05T17:05:35, Dejan Muhamedagic wrote:
> > It's a debug instrumentation message. But it is only triggered when
> > someone runs crmadmin -S, -H to look up the DC or something, it isn't
> > triggered by the stack internally.
> If it's a debug message, why is it then at severity "info"?
On 2012-11-05T15:31:25, Ulrich Windl wrote:
> I just experienced that the syslog message "crmd: [12771]: info:
> handle_request: Current ping state: S_TRANSITION_ENGINE" is sent out several
> times per second for an extended period of time.
>
> So I wonder: Is it a left-over debug message, and
On 2012-10-31T15:59:05, "Robinson, Eric" wrote:
> Nobody has any thoughts on why my 2-node cluster has no DC? As I mentioned,
> corosync-cfgtool -s shows the ring active with no faults.
That probably means that someone (i.e., you ;-) needs to dig more into
the logs of corosync & pacemaker. Th
On 2012-10-25T11:30:32, Ulrich Windl wrote:
> I just wonder: If the reason is some kind of resource shortage in the Xen
> Host that causes Xen guests to fail booting, it would ne nice if that
> situation could be detected. I was just asking for an already known effect,
> before digging deeper.
On 2012-10-25T08:28:29, Ulrich Windl wrote:
> The VM would not be able to boot due to lack of a boot disk. All three VMs on
> a specific node had the very same problem after being rebooted (through OS,
> not Xen RA).
The Xen RA, by default, only monitors the existence of the VM on the
hypervis
On 2012-10-24T13:23:09, Dimitri Maziuk wrote:
> PS. but for the most part, like you said: you *have* people stuck on
> 2.1.4 and you keep supporting them much as you hate it.
Yes, but on SLES10, that was an actually shipping version with full
support.
EPEL has different policies than RHEL. Thos
On 2012-10-24T13:17:57, Dimitri Maziuk wrote:
> I have e.g. mon script that greps 'lsof -i' to see if httpd is listening
> on * or cluster ip. Which IMO is a way saner check than wget'ting
> http://localhost/server-status -- and treating a [34]04 as a fail. Hence
> the "plus" quip. ;)
That the p
On 2012-10-22T14:12:17, Ulrich Windl wrote:
> Interesting formula: I'd use something like "number of CPUs" * 4, not divided
> by.
>
> Reason: Today's workload is usually limited by I/O, not by CPU power.
>
> However with something crazy like 32 CPUs, 32 tasks can easily be run, but
> most lik
On 2012-10-24T11:15:14, Dimitri Maziuk wrote:
> > I'm happy you have something that works for you.
> > Although even if you're using it in haresources mode, your resource
> > agents are still years out of date.
> It doesn't have resource agents (that's one of its pluses in my book).
It has them;
On 2012-10-17T09:18:06, Michael Schwartzkopff wrote:
> If you have errors in the network you eventually loose packets.
> corosync/paceamker doesn't like this and sometimes reacts on heavy packet
> loss.
It's not really pacemaker that is affected, but corosync's totem
protocol implementation. T
On 2012-10-16T23:04:59, RaSca wrote:
> Hi all,
> I hope that you can help me with this strange problem. I've got a nine
> node cluster which is configured with no-quorum-policy to stop.
> Two days ago I came across this error on one of the nodes:
>
> Oct 14 00:00:38 kvm06 kernel: Uhhuh. NMI rece
On 2012-10-16T11:20:21, alain.mou...@bull.net wrote:
> OK thanks.
>
> And so what is the real consequences for Pacemaker on a HA cluster if we
> remove all cib-xx.raw and cib-xx.raw.sig ?
None. They are backup data for support.
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products
On 2012-10-16T08:20:56, alain.mou...@bull.net wrote:
> may I ask the formula or values of max number of children with regard to
> the number of processors ?
http://hg.linux-ha.org/glue/rev/1f36e9cdcc13 - number of CPUs divided by
two or four, whatever is lower.
Regards,
Lars
--
Architect
On 2012-09-27T17:32:58, Ulrich Windl wrote:
> Just a note: As it turned out, the Xen RA (SLES11 SP2,
> resource-agents-3.9.3-0.7.1) is broken, because migrate will never look at
> the node_ip_attribute you configured.
>
> It's line 369:
> target_attr="$OCF_RESKEY_CRM_node_ip_attribute"
>
> I
On 2012-09-27T16:36:08, Ulrich Windl wrote:
Hi Ulrich,
we always appreciate your friendly, constructive and non-condescending
feedback.
> However if you specify a duration like "P2", the duration is not added to the
> current time; instead the current time is used as lifetime (it seems):
"P2"
On 2012-09-24T08:45:39, Ulrich Windl wrote:
> So I select on unique attribute name for Xen migration, specify that
> in the Xen resource, and then define that attribute per node, using
> one of the node's own IP addresses?
Yes. The idea is that this allows you to override the IP address we'd
pic
On 2012-09-05T06:26:50, Stefan Schloesser wrote:
> Hi Lars,
>
> my problem with the rolling upgrade is the drbd partition. If you migrate the
> service its data will move too. If you then restart the cluster and migrate
> back the data will not be in an upgraded state and thus not match the bi
On 2012-09-05T07:54:46, Andrew Beekhof wrote:
> > (Or rather, obscure enough to configure that it might well be a
> > bug.) It'd be trivial to just append the role to the operation key
> > too. (It'd cause a few monitors to be recreated on update, but
> > that'd be harmless.)
> Not really that tr
On 2012-09-04T15:56:14, Stefan Schloesser wrote:
> Hi,
>
> I would like to know what the recommended way is to update a cluster. Every
> week or so bug fixes and security patches are released for a various parts of
> the used software.
I prefer rolling upgrades; migrate service, stop cluster,
On 2012-09-04T10:50:11, "EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)"
wrote:
> I was reporting a serious bug in _your_ product and instead of
> thanking for the bugreport you simply closed it as invalid
The bug was reported without a support contract. A support contract
usually being the pre
On 2012-08-31T13:41:14, Ulrich Windl wrote:
> Hi!
>
> There are things I don't understand: Even after
> # /usr/lib64/heartbeat/send_arp -i 200 -r 5 br0 172.20.3.59 f1e991b1b951
> not_used not_used
>
> neither the local arp table (arp) not the software bridge (brctl ...
> showmacs) know anythi
On 2012-08-30T12:53:45, Stefan Schloesser wrote:
> I would like to configure the resource-stickiness to "0" tuesdays between 2
> and 2:20 am local time.
>
> I could not find any examples on how to do this using crm configure ... but
> only the XML snippets to accomplish this.
I don't think th
On 2012-08-29T13:31:05, Ulrich Windl wrote:
> > Well, you should see the MAC/IP mapping in the arp table if the host
> > is on the same ethernet segment, yes. Otherwise the host doesn't
> > know where to send the packets to.
> I checked the arp table of the host that is hosting the cluster IP
> a
On 2012-08-20T11:31:07, Lars Marowsky-Bree wrote:
> Okay, so there's a bug in the NFS agent, point taken. I'll investigate
> why it took so long to release as a real maintenance update; you're
> right, that shouldn't happen. (I can already see it in the update queue
On 2012-08-29T10:15:50, Ulrich Windl wrote:
> The network guys say no. Should "arp" show the Cluster-IP? I cannot see it,
> so I wonder if something's wrong.
Well, you should see the MAC/IP mapping in the arp table if the host is
on the same ethernet segment, yes. Otherwise the host doesn't kno
On 2012-08-27T12:14:46, Ulrich Windl wrote:
> Hi!
>
> I set up a Clustered Samba Server with SLES11 SP2 according to the manual
> "Chapter 18. Samba Clustering". Everything seems to run now, but I cannot
> reach the configured clustered IP address from an outside host. Local pings
> on the IP
On 2012-08-23T09:35:51, Francis SOUYRI wrote:
> Hello Dejan,
>
> With the FC 16 heartbeat is a 3.0.4 not a v1.
>
> I do not use crm because I can success to implement ipfail.
Dejan was refering to the "v1 mode", namely the one that uses
haresources. haresources can't drive systemd scripts. You
On 2012-08-22T10:32:57, RaSca wrote:
> Thank you Lars,
> In fact, this is what I've done and now everything is ok. But I want to
> understand one last thing: if the ID is calculated with the value of
> interval then why I don't have errors even if I've got two slaves, which
> means that I've got
On 2012-08-22T10:08:14, RaSca wrote:
> Thank you Ulrich,
> As far as you know, Is there a way to override the ID for each cloned
> instance of the mysql resource? How can I resolve the problem?
Just make the intervals slightly different - 31s, 30s, 29s ...
Regards,
Lars
--
Architect Stor
On 2012-08-21T15:39:06, Carlos Pedro wrote:
> Dear Sirs,
>
> I´m working in a project
> and I was proposed to build three clusters using a common node, that
> is:
Nodes cannot be shared between clusters like this.
You can either build a >2 node cluster (with all nodes in one), or use
virtual i
On 2012-08-21T13:16:29, David Lang wrote:
> with ldirectord you have an extra network hop, and you have all your
> traffic going through one system. This is a scalability bottleneck as
> well as bing a separate system to configure.
>
> CLUSTERIP isn't the solution to every problem, but it works
On 2012-08-21T14:32:53, Ulrich Windl wrote:
> Maybe I'm expecting too much, but isn't it possible to simply log "Telling
> other nodes that PV blabla is being created"?
The problem is the error case, in which we want more logs. There is
progress (libqb with the flight recorder/blackbox thingy w
On 2012-08-21T00:22:00, Dimitri Maziuk wrote:
> CLUSTERIP which you presumably mean by "fun with iptables" is basically
> "Jack gets all calls from even area codes and Jill: from odd area
> codes". Yeah, you cold do that, I just can't imagine why.
>
> Because the commonly given rationale for a
On 2012-08-17T18:14:18, "EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)"
wrote:
> On the other hand you sofar did not provide any case where SLES11 SP2 runs
> reliably unmodified in a mission critical environment (e.g. a HA NFS server)
> without local bugfixes.
Okay, so there's a bug in the NF
On 2012-08-17T16:42:42, Ulrich Windl wrote:
> obviously not, because I have the latest updates installed. It happens
> frequently enough to care about it:
>
> # zgrep sscan /var/log/messages-201208*.bz2 |wc -l
> 76
> Here are some:
> /var/log/messages-20120816.bz2:Aug 16 13:55:21 so3 crmd: [270
On 2012-08-17T16:38:01, "EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)"
wrote:
> > I don't see an open bug for something like this right now.
> Are you serious?
>
> It was you who resolved this bug as INVALID in bugzilla
> https://bugzilla.novell.com/show_bug.cgi?id=769292.
Uhm, yes, I was se
On 2012-08-17T11:43:13, Nikita Michalko wrote:
> - e.g. the problem with SLES 11 SP2 kernels crash - the same as described by
> Martin:
> >> SP2 kernels crash seriously (when a node rejoins the cluster) when using
> STCP as
> >> recommended in the SLES HA documentation and offered via the wiza
On 2012-08-17T08:19:45, Ulrich Windl wrote:
> Likewise if you use resource utilization on primitives in a group, the group
> begains to start on one node, then stalls when the next primitive's
> utilization cannot be fulfilled. That's bad especially when there are enough
> resources for the wh
On 2012-08-17T08:41:15, Nikita Michalko wrote:
> I am also testing SP2 - and yes, it's true: not yet ready for production ;-(
What problems did you find?
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
21284 (AG Nürnb
On 2012-08-16T17:54:06, "EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)"
wrote:
Hi Martin,
> From my experience with SLES11 SP2 (with all current updates) I conclude that
> actually nobody is seriously running SP2 without local bugfixes.
That isn't quite true.
> E.g. Even the most simple exam
On 2012-08-16T09:51:52, Ulrich Windl wrote:
> Hi!
>
> Can somebody explain (found in resource-agents-3.9.2-0.25.5 of SLES11 SP2):
> # OCF_RESKEY_ip=172.20.3.99 OCF_RESKEY_cidr_netmask=26
> /usr/lib64/heartbeat/findif -C
> eth0netmask 26 broadcast 172.20.3.127
>
> If I guess that 26 bi
On 2012-08-14T17:48:47, Ulrich Windl wrote:
FWIW, if you can try to reproduce in 1.1.7, that may be interesting. I'm
still not sure on the sequence of events to cause it, so I can't try
locally.
hb_report would be the minimum.
> Your message arrived, BTW. ;-)
It's not that we don't want to hel
On 2012-08-14T16:59:02, Ulrich Windl wrote:
> While starting a clone resource (mount OCFS2 filesystem), I see this message
> in syslog:
>
> crmd: [31942]: notice: do_lrm_invoke: Not creating resource for a delete
> event: (null)
> info: notify_deleted: Notifying 25438_crm_resource on rkdvmso1
On 2012-08-14T12:44:43, Ulrich Windl wrote:
> > The messages are coming from the stonith plugin (it's actually
> > in pacemaker). But I think that that got fixed in the meantime.
^
> > Do you have the latest maintenance update?
>
> Yes, "latest" on SLES is relative:
>
> # rpm -qf
On 2012-07-18T20:01:35, Arnold Krille wrote:
> That would mean that your system runs the same whether one or two links are
> present.
That's not what I said. What I said (or at least meant ;-) is that, even
in the degraded state, the performance must still be within acceptable
range.
Hence, th
On 2012-07-17T23:44:13, Arnold Krille wrote:
> Additionally: If its two direct links dedicated to your storage network,
> there is no reason going active/backup and discarding half of the
> available bandwidth.
Since the system must be designed for one link to have adequate
bandwidth to provide
On 2012-07-16T11:53:55, Volker Poplawski wrote:
> Hello everyone.
>
> Could you please tell me the recommended mode for a bonded network
> interface, which is used as the direct link in a two machine cluster?
>
> There are 'balance-rr', 'active-backup', 'balance-xor' etc
> Which one to choose
On 2012-07-12T10:31:53, Caspar Smit wrote:
> Now the interesting part. I would like to create a software raid6 set
> (or multiple) with the disks in the JBOD and have the possibility to
> use
> the raid6 in an active/passive cluster.
Sure. md RAID in a fail-over configuration is managed by the R
On 2012-07-03T11:26:11, darren.mans...@opengi.co.uk wrote:
> I'd like to second Lars' comments here. I was strong-armed into doing a
> dual-primary DRBD + OCFS2 cluster and it's a nightmare to manage. There's no
> reason for us to do it other than 'we could'. It just needed something simple
> l
On 2012-07-02T12:37:52, Ulrich Windl wrote:
> > I've seen very few scenarios where OCFS2 was worth it over just using a
> > "regular" file system like XFS in a fail-over configuration in this kind
> > of environment.
> How would you fail over if your shared storage went toast? Or did you mean
>
On 2012-07-02T12:05:33, Ulrich Windl wrote:
> Unfortunately unless there's a real cluster filesystem that supports
> mirroring with shared devices also, DRBD on some locally mirrored device on
> each node seems to be the only alternative. (Talking about desasters)
I've seen very few scenarios
On 2012-07-02T10:42:33, "EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)"
wrote:
> when a split brain (drbd) happens mount.ocfs2 remains hanging unkillable in
> D-state.
Unsurprising, since all IO is frozen during that time (depending on your
drbd setup, but I'm assuming that's what you are seei
On 2012-06-28T11:37:37, Heitor Lessa wrote:
> Such issue happens because OCFS does not support changes (modify/del) nodes
> in a running cluster, such tasks requires cluster down though.
If driven by Pacemaker, OCFS2 does support adding/removing nodes at
runtime.
(Though if you run out of nod
On 2012-06-29T08:19:41, Ulrich Windl wrote:
> > For SLE HA 11 SP1, please report these issues to NTS and SUSE support.
> As I'm sure they won't fix it in SP1 (that PTF is one year old now),
SP1 is still supported by SUSE, and noone but our support folks know
what exactly is in that PTF. I mean,
On 2012-06-27T14:18:26, Ulrich Windl wrote:
> Hello,
>
> I see problems with applying configuration diffs so frequrntly that I suspect
> there's a bug in the code.
>
> This is for SLES11 SP1 on x86_64 with corosync-1.4.1-0.3.3.3518.1.PTF.712037
> and libcorosync4-1.4.1-0.3.3.3518.1.PTF.712037
On 2012-06-21T08:02:25, Ulrich Windl wrote:
> > See, it's simple. Any "partially" completed operation or state -> not
> > successful, ergo failure must be reported.
> Is it correct that the standard recovery procedure for this failure is node
> fencing then? If so it makes things worse IMHO.
Th
On 2012-06-20T17:46:19, Andreas Kurz wrote:
> > hb_report does not work.
> > how to do a report tarball ?
> It has been renamed to "crm_report"
There's still both around. Just that different distributions ship
different implementations.
Because. Well. Because.
Regards,
Lars
--
Archite
On 2012-06-20T16:37:35, Ulrich Windl wrote:
> so what exit code is failed? Then: With the standard logic of "stop"
> only performing when the resource is up (i.e. monitor reports
> "stopped"), a partially started resource that the monitor considers
> "stopped" may fail to be cleanly stopped on "s
On 2012-06-20T08:44:33, Ulrich Windl wrote:
> > > The problem is: What to do if "1 out of n" exports fails: Is the resource
> > > "started" or "stopped" then. Likewise for unexporting and monitoring.
> > If the operation partially failed, it is failed.
>
> But to have a "clean stopped", the res
On 2012-06-19T14:13:06, Ulrich Windl wrote:
> The problem is: What to do if "1 out of n" exports fails: Is the resource
> "started" or "stopped" then. Likewise for unexporting and monitoring.
If the operation partially failed, it is failed.
Regards,
Lars
--
Architect Storage/HA
SUSE LIN
On 2012-06-19T08:38:11, alain.mou...@bull.net wrote:
> So that means that my modifications by crm configure edit , even if they
> are correct (I've re-checked them) ,
> have potentially corrupt the Pacemaker configuration ?
No. The CIB automatically recovers from this by doing a full sync. The
m
On 2012-06-06T17:26:41, RaSca wrote:
> Thank you Florian, but how can one declare an anonymous clone? Is it
> implicit with the globally-unique=false?
You don't need to explicitly declare that. It is the default.
(But yes, the default is globally-unique=false.)
Regards,
Lars
--
Architec
On 2012-06-01T13:10:17, alain.mou...@bull.net wrote:
> -does that mean that it will be this Pacemaker/cman on RH ans SLES ?
> -or do RH and SLES wil require a different stack under Pacemaker ?
Right now, SLE HA is on the plugin version of pacemaker, and SLE HA 11
will likely remain on it - that'
On 2012-05-15T13:17:11, William Seligman wrote:
> I can post details and logs and whatnot, but I don't think I need to do
> detailed
> debugging. My question is:
I don't think your rationale holds true, though. Like Andrew said, this
is only ever just written, not read.
> If I were to set up a
On 2012-04-04T11:28:31, Rainer Krienke wrote:
> There is one basic thing however I do not understand: My setup involves
> only a clustered filesystem. What I do not understand is why a stonith
> resource is needed at all in this case which after all causes freezes
> of the cl-filesystem dependin
On 2012-04-03T15:59:00, Rainer Krienke wrote:
> Hi Lars,
>
> this was something I detected already. And I changed the timeout in the
> cluster configuration to 200sec. So the log I posted was the result of
> the configuration below (200sec). Is this still to small?
>
> $ crm configure show
> ..
On 2012-04-03T15:50:29, Rainer Krienke wrote:
> rzinstal4:~ # sbd -d /dev/disk/by-id/scsi-259316a7265713551-part1 dump
> ==Dumping header on disk /dev/disk/by-id/scsi-259316a7265713551-part1
> Header version : 2
> Number of slots: 255
> Sector size: 512
> Timeout (watchdog) : 90
>
On 2012-04-03T14:06:44, Rainer Krienke wrote:
> thanks for the hint to enable the stonith resource. I did and checked
> that it is set to true now, but after all the behaviour of the cluster
> is still the same, if I do a halt -f on one node.
> Access on the clusterfilesystem on the still running
On 2012-04-03T10:32:48, Rainer Krienke wrote:
Hi Rainer,
> I am new to HA setup and my first try was to set up a HA cluster (using
> SLES 11 SP2 and the SLES11 SP2 HA extension) that simply offers an
> OCFS2 filesystem. I did the setup according to the SLES 11 SP2 HA
> manual, that describes th
On 2012-03-29T11:31:38, Ulrich Windl wrote:
> pengine: [17043]: WARN: pe_fence_node: Node h07 will be fenced because it is
> un-expectedly down
>
> Th software bind used is basically SLES11 SP1 with a newer corosync
> (corosync-1.4.1-0.3.3.3518.1.PTF.712037). Were there any improvements since
On 2012-03-15T15:59:21, William Seligman wrote:
> Could this be an issue? I've noticed that my fencing agent always seems to be
> called with "action=reboot" when a node is fenced. Why is it using 'reboot'
> and
> not 'off'? Is this the standard, or am I missing a definition somewhere?
Make sur
On 2012-03-14T18:22:42, William Seligman wrote:
> Now consider a primary-primary cluster. Both run the same resource.
> One fails. There's no failover here; the other box still runs the
> resource. In my case, the only thing that has to work is cloned
> cluster IP address, and that I've verified
On 2012-03-14T11:41:53, William Seligman wrote:
> I'm mindful of the issues involved, such as those Lars Ellenberg
> brought up in his response. I need something that will failover with
> a minimum of fuss. Although I'm encountering one problem after
> another, I think I'm closing in on my goal.
On 2012-03-14T09:02:59, William Seligman wrote:
To ask a slightly different question - why? Does your workload require /
benefit from a dual-primary architecture? Most don't.
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
On 2012-02-06T22:13:20, Mayank wrote:
> with-rsc="pgsql9" with-rsc-role="Master"/>
>
> The intention behind defining such constraints is to make sure that the
> postgre should always run in the master role on the node which is a DC.
>
> Is something wrong with this?
There's nothing wrong with
On 2012-02-06T09:05:13, Ulrich Windl wrote:
> but like with CPU affinity there should be no needless change of the DC. I
> also wondered why after each configuration change the DC is newly elected (it
> seems).
It isn't (or shouldn't be). Still, the DC election is an internal detail
that shoul
On 2011-12-08T12:08:06, Ulrich Windl wrote:
> >>> Dejan Muhamedagic schrieb am 08.12.2011 um 11:28 in
> Nachricht <20111208102833.GA12338@walrus.homenet>:
> > Hi,
> >
> > On Wed, Dec 07, 2011 at 02:26:52PM +0100, Ulrich Windl wrote:
> > > Hi!
> > >
> > > While the openais cluster (SLES11 SP1)
On 2011-12-05T22:37:03, Andreas Kurz wrote:
> Did you clone the sbd resource? If yes, don't do that. Start it as a
> primitive, so in case of a split brain at least one node needs to start
> the stonith resource which should give the other node an advantage ...
> adding a start-delay should furth
On 2011-12-04T00:57:05, Andreas Kurz wrote:
> the concept of an arbitrator for split-site cluster is already
> implemented and should be available with Pacemaker 1.1.6 though it seem
> to be "not directly documented" ... beside source code and this draft
> document:
Documentation is always a wor
On 2011-12-01T13:48:56, Ulrich Windl wrote:
> I wonder about that usefulness of that value, especially as any configuration
> change seems to increase the epoch anyway. I never saw that CRM cares about
> the cib-last-written string.
It is for easy inspection by admins and for display by UIs.
On 2011-11-29T12:36:39, Dimitri Maziuk wrote:
> If you repeatedly try to re-sync with a dying disk, with each resync
> interrupted by i/o error, you will get data corruption sooner or later.
No, you shouldn't. (Unless the drive returns faulty data on read, which
is actually a pretty rare failure
On 2011-11-29T22:10:10, Andreas Kurz wrote:
> IIRC stonith resources are always started first and stopped last anyways
> ... without extra constraints ... implicitly. Please someone correct me
> if I'm wrong.
Yes, but they are not mandatory. The configuration that was discussed
here would actual
On 2011-11-29T08:33:01, Ulrich Windl wrote:
> The state of an unmanaged resource is the state when it left the managed
> meta-state.
That is not correct. An unmanaged resource is not *managed*, but its
state is still relevant to other resources that possibly depend on it.
The original design g
On 2011-11-28T15:04:45, alain.mou...@bull.net wrote:
> sorry but I forgot if there is another way than "crm configure edit" to
> modify
> all the value of on-fail="" for all resources in the configuration ?
If they're explicitly set, you have to modify them all.
Otherwise, look at op_defau
On 2011-11-29T08:35:10, Ulrich Windl wrote:
> While we're at it: Does specifying a priority implicitly create a start
> order, or ist that just a start preference? Maybe if the sbd is not handled
> specially, it may be a good idea to give sdb a higher priority than anything
> else, at least if
On 2011-11-24T14:46:13, Andrew Beekhof wrote:
> >> Looks like you forgot to specify the sbd_device parameter.
> > That is no longer necessary. It'll inherit the settings from
> > /etc/sysconfig/sbd.
> Perhaps its not set there then?
> Its the only difference I can see between the two types of cal
On 2011-11-24T11:14:05, Andrew Beekhof wrote:
> > Relevant portions of crm config:
> > primitive stonith-sbd stonith:external/sbd \
> > meta is-managed="true" target-role="Started"
>
> Looks like you forgot to specify the sbd_device parameter.
That is no longer necessary. It'll inherit t
201 - 300 of 916 matches
Mail list logo