On 2011-01-12T22:52:14, Bart Coninckx bart.conin...@telenet.be wrote:
Jan 12 22:20:34 xen2 pengine: [6633]: WARN: unpack_rsc_op: Processing failed
op intranet1_stop_0 on xen1: unknown exec error (-2)
My monitors are set to restart a resorce. What makes the PE decide to fence
the node in
Hi all,
sorry for the delay in posting this.
IntroductioN: At LPC 2010, we discussed (once more) that a key feature
for pacemaker in 2011 would be improved support for multi-site clusters;
by multi-site, we mean two (or more) sites with a local cluster each,
and some higher level entity
On Thu, Jan 13, 2011 at 10:14:09AM +0100, Lars Marowsky-Bree wrote:
Introduction: At LPC 2010, we discussed (once more) that a key feature
for pacemaker in 2011 would be improved support for multi-site clusters;
by multi-site, we mean two (or more) sites with a local cluster each,
and some
On Thursday 13 January 2011 09:51:16 Lars Marowsky-Bree wrote:
On 2011-01-12T22:52:14, Bart Coninckx bart.conin...@telenet.be wrote:
Jan 12 22:20:34 xen2 pengine: [6633]: WARN: unpack_rsc_op: Processing
failed op intranet1_stop_0 on xen1: unknown exec error (-2)
My monitors are set to
On 2011-01-13T11:08:49, Bart Coninckx bart.conin...@telenet.be wrote:
thx for your answer.
So do I get this straight:
- resource undergoes monitor operation
- monitor reports failure
- a restart of the resource is issued (stop and start)
- stop fails
- PE decides to fence the node because
On Thursday 13 January 2011 11:13:42 Lars Marowsky-Bree wrote:
On 2011-01-13T11:08:49, Bart Coninckx bart.conin...@telenet.be wrote:
thx for your answer.
So do I get this straight:
- resource undergoes monitor operation
- monitor reports failure
- a restart of the resource is issued
On Thursday 13 January 2011 11:13:42 Lars Marowsky-Bree wrote:
On 2011-01-13T11:08:49, Bart Coninckx bart.conin...@telenet.be wrote:
thx for your answer.
So do I get this straight:
- resource undergoes monitor operation
- monitor reports failure
- a restart of the resource is issued
On Thursday 13 January 2011 11:58:03 Lars Marowsky-Bree wrote:
On 2011-01-13T11:48:41, Bart Coninckx bart.conin...@telenet.be wrote:
I notice that you work Novell, this is a SLES11SP1 installation so if the
resource agent for Xen is faulty I guess you know about it?
Yes, I think I'd know
On 2011-01-13 13:16, Bart Coninckx wrote:
On Thursday 13 January 2011 11:58:03 Lars Marowsky-Bree wrote:
On 2011-01-13T11:48:41, Bart Coninckx bart.conin...@telenet.be wrote:
I notice that you work Novell, this is a SLES11SP1 installation so if the
resource agent for Xen is faulty I guess you
Bart Coninckx wrote:
By the way: things seem better when I change the monitor time out to 30
seconds in stead of 10 seconds. Very strange though, because the resource
agent basically does a xm list --long while monitoring, which takes less
than half a second in a console.
I think sometimes
On 2011-01-13T09:30:48, Michael Smith msm...@cbnco.com wrote:
the resource agent basically does a xm list --long while
monitoring, which takes less than half a second in a console.
I think sometimes xend hangs for a while. 30 seconds should be good.
There's a pending fix for this, which
I read the thread related to this startup problem (dlm segfaults when
server comes up with corosync auto starting up). I just have one
follow-up question:
The 3.07 package in Ubuntu-HA has not been patched for Lucid yet and there
is not a backport of 3.0.12 for Lucid to fix this problem. So
Tom, others,
Please, what was the solution to this issue?
Thanks,
Bob Haxo
On Mon, 2010-09-06 at 09:50 +0200, Tom Tux wrote:
Yes, corosync is running after the reboot. It comes up with the
regular init-procedure (runlevel 3 in my case).
2010/9/6 Andrew Beekhof and...@beekhof.net:
On
I don't know. I still have this issue (and it seems, that I'm not the
only one...). I'll have a look, if there are pacemaker-updates through
the zypper-update-channel available (sles11-sp1).
Regards,
Tom
2011/1/13 Bob Haxo bh...@sgi.com:
Tom, others,
Please, what was the solution to this
So, Tom ...how do you get the failed node online?
I've re-installed with the same image that is running on three other
nodes, but still fails. This node was quite happy for the past 3
months. As I'm testing installs, this and other nodes have been
installed a significant number of times
Hi,
I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade Edition
1.81 ...)
But I'm not able to connect with them via the external/riloe agent.
When i try:
stonith -t external/riloe -p hostlist=node1 ilo_hostname=ilo1
ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1
Hi Tom (and Andrew),
I figured out an easy fix for the problem that I encountered. However,
there would seem to be a problem lurking in the code.
Here is what I found. On one of the servers that was online and hosting
resources:
r2lead1:~ # netstat -a | grep crm
Proto RefCnt Flags Type
Hi Christoph,
Have you taken a look in /usr/lib64/stonith/plugins/external?
The ipmi plugin might serve as a coding example/template. Or maybe the
drac5 plugin. At first glance, drac5 appears to be using ssh.
Bob Haxo
On Thu, 2011-01-13 at 21:09 +0100, Christoph Herrmann wrote:
Hi,
I have
Hi guys,
I'm having a hard time finding the info I need to configure pacemaker
from an input file. I've been using Zookeeper a lot in our application
tier, so I'm familiar with clusters, however I'm struggling to adapt
that knowledge to the pacemaker configuration.
Here is an overview of our
when I use command
crm configure property start-failure-is-fatal=FALSE
it shows
WARNING: status: operation not recognized
WARNING: status: operation not recognized
WARNING: status: operation not recognized
WARNING: status: operation not recognized
WARNING: status: operation not recognized
20 matches
Mail list logo