On Mon, Jun 14, 2010 at 9:26 PM, Maros Timko tim...@gmail.com wrote:
Date: Mon, 14 Jun 2010 08:13:59 +0200
From: Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager
pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] How to really deal with gateway restarts?
On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
I filed bug 2435, glad to hear it's not me
Andrew closed this bug
(http://developerbugs.linux-foundation.org/show_bug.cgi?id=2435) as resolved,
but I respectfully
On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
I filed bug 2435, glad to hear it's not me
Andrew closed this bug
Good day!
I have 2 servers: Main and Backup . Both servers have 2 Ethernet cintrollers.
One controller is used for LAN, second - for Internet.
IP addresess:
Main: 192.168.104.101/27 89.151.191.133/29
Bachup: 192.168.104.102/27 89.151.191.134/29
shared IP for heartbeat: 192.168.104.100 and
On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz andreas.k...@linbit.com wrote:
On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 7, 2010, at 8:04 AM, Vadym Chepkov wrote:
I filed bug 2435, glad to hear it's
2010/6/15 Michail Bogatyrev bogat...@mfisoft.ru:
Good day!
I have 2 servers: Main and Backup . Both servers have 2 Ethernet
cintrollers. One controller is used for LAN, second - for Internet.
IP addresess:
Main: 192.168.104.101/27 89.151.191.133/29
Bachup: 192.168.104.102/27
Hi,
On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz andreas.k...@linbit.com
wrote:
On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 7,
Hi,
On Tue, Jun 15, 2010 at 11:09:15AM +0200, patrik.rappo...@knapp.com wrote:
hy guys,
my colleague gave me a tip, that the stonith ressource on node 1, when node
2 is offline, won't work cause of a false state (cant reach the asm module
of node 2) and so the other ressources (vg, lv)
On Tue, Jun 15, 2010 at 12:14 PM, Dejan Muhamedagic deja...@fastmail.fm wrote:
Hi,
On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz andreas.k...@linbit.com
wrote:
On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
On Mon,
On Tue, Jun 15, 2010 at 12:30:45PM +0200, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 12:14 PM, Dejan Muhamedagic deja...@fastmail.fm
wrote:
Hi,
On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz andreas.k...@linbit.com
On Tue, Jun 15, 2010 at 12:39 PM, Dejan Muhamedagic deja...@fastmail.fm wrote:
On Tue, Jun 15, 2010 at 12:30:45PM +0200, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 12:14 PM, Dejan Muhamedagic deja...@fastmail.fm
wrote:
Hi,
On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof
On Jun 15, 2010, at 4:57 AM, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz andreas.k...@linbit.com
wrote:
On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
On Mon, Jun 14, 2010 at 4:22 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 7, 2010, at 8:04 AM, Vadym
On Tue, Jun 15, 2010 at 1:38 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 15, 2010, at 4:57 AM, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz andreas.k...@linbit.com
wrote:
On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
On Mon, Jun 14, 2010 at 4:22 PM,
On Tue, Jun 15, 2010 at 1:50 PM, Andrew Beekhof and...@beekhof.net wrote:
[snip]
Score = -inf, plus the patch, plus sequential = true (or unset).
Not sure how that looks in shell syntax though.
Which patch?
___
Pacemaker mailing list:
Hi,
I have just made an update from heartbeat 2.x to the latest pacemaker with
heartbeat and corosync from the clusterlabs repo on a 2 node CentOS-Cluster.
(uninstall the heartbeat rpm, yum install from the new repo)
The Cluster is holding one IP-resource. When I start the first node with
Original-Nachricht
Datum: Tue, 15 Jun 2010 14:13:19 +0200
Von: Testuser SST fatcha...@gmx.de
An: pacemaker@oss.clusterlabs.org
Betreff: [Pacemaker] after update one node in crm is getting offline CentOS
Hi,
I have just made an update from heartbeat 2.x to the latest pacemaker
Hi,
I´m sorry but the problem was generated by some kind of watchdog-script which
stopped the heartbeat service.
Kind Regards
f_c
Original-Nachricht
Datum: Tue, 15 Jun 2010 14:13:19 +0200
Von: Testuser SST fatcha...@gmx.de
An: pacemaker@oss.clusterlabs.org
Betreff:
On Jun 15, 2010, at 6:14 AM, Dejan Muhamedagic wrote:
Hi,
On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz andreas.k...@linbit.com
wrote:
On Tuesday 15 June 2010 08:40:58 Andrew Beekhof wrote:
On Mon, Jun 14, 2010 at 4:22 PM,
On Jun 15, 2010, at 8:11 AM, Gianluca Cecchi wrote:
On Tue, Jun 15, 2010 at 1:50 PM, Andrew Beekhof and...@beekhof.net wrote:
[snip]
Score = -inf, plus the patch, plus sequential = true (or unset).
Not sure how that looks in shell syntax though.
Which patch?
On 2010-06-14T17:24:16, Aleksey Zholdak alek...@zholdak.com wrote:
Hi Aleksey,
Can anybody explain me more clear than on official and (IMHO)
outdated page http://www.linux-ha.org/wiki/SBD_Fencing next:
What timeouts I must specify, if my multipath needs from 90 to 160
secs to be switched
On Tue, Jun 15, 2010 at 2:57 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 15, 2010, at 7:50 AM, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 1:38 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 15, 2010, at 4:57 AM, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 10:23 AM, Andreas
On Tue, Jun 15, 2010 at 9:14 AM, Andrew Beekhof and...@beekhof.net wrote:
On Tue, Jun 15, 2010 at 2:57 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 15, 2010, at 7:50 AM, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 1:38 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 15, 2010, at
Can anybody explain me more clear than on official and (IMHO)
outdated page http://www.linux-ha.org/wiki/SBD_Fencing next:
What timeouts I must specify, if my multipath needs from 90 to 160
secs to be switched off the dead path... Timeouts below are maybe
wrong because sometime node1 kills node2
On 2010-06-15T16:32:12, Aleksey Zholdak alek...@zholdak.com wrote:
Timeout (watchdog) : 180
Timeout (allocate) : 2
Timeout (loop) : 10
Timeout (msgwait) : 200
But I see, that node1 resets node2 (or vice versa, or each other)
when it does not update its slot for 10 seconds...
sbd
On Tue, Jun 15, 2010 at 3:36 PM, Lars Marowsky-Bree l...@novell.com wrote:
On 2010-06-15T16:32:12, Aleksey Zholdak alek...@zholdak.com wrote:
[snip]
Why is the MPIO scenario so slow?
These questions needs to be asked to developers mptsas (novell + hp)
You should really file a service
Can you elaborate does not update its slot for 10 seconds more
clearly?
Unfortunately nowhere is not described in detail the work sbd, so many
things have to only guess ... So I could misunderstand the logic of his
work ... And anyway - I've finally got confused.
If I set 180 secs to
On Tue, Jun 15, 2010 at 2:48 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 15, 2010, at 8:11 AM, Gianluca Cecchi wrote:
On Tue, Jun 15, 2010 at 1:50 PM, Andrew Beekhof and...@beekhof.netwrote:
[snip]
Score = -inf, plus the patch, plus sequential = true (or unset).
Not sure how that
On 2010-06-15T17:04:39, Aleksey Zholdak alek...@zholdak.com wrote:
Can you elaborate does not update its slot for 10 seconds more
clearly?
Unfortunately nowhere is not described in detail the work sbd,
Uhm, what is unclear about http://www.linux-ha.org/wiki/SBD_Fencing ? It
does explain how
Lars,
Uhm, what is unclear about http://www.linux-ha.org/wiki/SBD_Fencing ? It
does explain how sbd works (not all of the timeouts though).
Exacly! It does not explain loop timeout, for example...
Depending on the watchdog device you are using, it is conceivable that
it refuses to accept a
On 2010-06-15T17:32:51, Aleksey Zholdak alek...@zholdak.com wrote:
Uhm, what is unclear about http://www.linux-ha.org/wiki/SBD_Fencing ? It
does explain how sbd works (not all of the timeouts though).
Exacly! It does not explain loop timeout, for example...
The loop timeout is just the time
On Tue, Jun 15, 2010 at 4:08 PM, Gianluca Cecchi
gianluca.cec...@gmail.com wrote:
On Tue, Jun 15, 2010 at 2:48 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 15, 2010, at 8:11 AM, Gianluca Cecchi wrote:
On Tue, Jun 15, 2010 at 1:50 PM, Andrew Beekhof and...@beekhof.net
wrote:
[snip]
On Tue, Jun 15, 2010 at 4:43 PM, Andrew Beekhof and...@beekhof.net wrote:
But that is for 1.1 branch that is not considered as stable...
No, existing functionality its very stable.
Its just the new features that might have some extra corner cases
we've not seen exercised yet.
Put it
I thought dampen attribute could help with some of the options, but
actually it is does not.
It should do. ?Hard to say without any logs from the two machines.
Unfort. I don't have log files here, can provide you if that would help.
Are you sure dampen should help here? From my testing it
On Fri, Jun 11, 2010 at 03:45:19PM +0100, Maros Timko wrote:
Hi all,
using heartbeat stack. I have a system with one node offline:
Last updated: Fri Jun 11 13:52:40 2010
Stack: Heartbeat
Current DC: vsp7.example.com (ba6d6332-71dd-465b-a030-227bcd31a25f) -
partition with
On Jun 15, 2010, at 9:26 AM, Vadym Chepkov wrote:
what about this part? what do I need to do to prevent them from running on
different nodes for sure?
You can't have it both ways.
Either they have to run on the same node or they can remain active
when one or more die.
Although you
On Tuesday 15 June 2010, Schaefer, Diane E wrote:
Hi,
We are having trouble with our two node cluster after one node
experiences an abrupt power failure. The resources do not seem to start
on the remaining node (ie DRBD resources do not promote to master). In
the log we notice:
Jan
Thanks for the idea. Is there any way to automatically recover resources
without manual intervention?
Diane
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you received
this in error, please contact the
Hello Diane,
the problem is that pacemaker is not allowed to take over resources until
stonith succeeds, as it simply does not know about the state of the other
server. Lets assume the other node would still be up and running, would have
mounted a shared storage device an would write to it,
On Tue, Jun 15, 2010 at 5:13 PM, Gianluca Cecchi
gianluca.cec...@gmail.com wrote:
On Tue, Jun 15, 2010 at 4:43 PM, Andrew Beekhof and...@beekhof.net wrote:
But that is for 1.1 branch that is not considered as stable...
No, existing functionality its very stable.
Its just the new features
On 06/14/2010 11:01 PM, Vadym Chepkov wrote:
On Mon, Jun 14, 2010 at 4:37 PM, Erich Weilerwei...@soe.ucsc.edu wrote:
Hi All,
We have this interesting problem I was hoping someone could shed some light
on. Basically, we have 2 servers acting as a pacemaker cluster for DRBD and
VirtualDomain
Am Dienstag, 15. Juni 2010 20:25:09 schrieb Dennis J.:
(...)
Has anybody played with this yet:
http://www.linux-kvm.com/content/qemu-kvm-012-adds-block-migration-feature
Technically something like this should make it possible to do a live
migration event when not using shared storage. I
On Jun 15, 2010, at 3:36 PM, Dejan Muhamedagic wrote:
Hi,
On Tue, Jun 15, 2010 at 08:45:37AM -0400, Vadym Chepkov wrote:
On Jun 15, 2010, at 6:14 AM, Dejan Muhamedagic wrote:
Hi,
On Tue, Jun 15, 2010 at 10:57:47AM +0200, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 10:23 AM,
On Tue, Jun 15, 2010 at 01:50:06PM +0200, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 1:38 PM, Vadym Chepkov vchep...@gmail.com wrote:
On Jun 15, 2010, at 4:57 AM, Andrew Beekhof wrote:
On Tue, Jun 15, 2010 at 10:23 AM, Andreas Kurz andreas.k...@linbit.com
wrote:
On Tuesday 15 June
On Tue, Jun 15, 2010 at 12:53:07PM -0400, Vadym Chepkov wrote:
On Jun 15, 2010, at 9:26 AM, Vadym Chepkov wrote:
what about this part? what do I need to do to prevent them from running
on different nodes for sure?
You can't have it both ways.
Either they have to run on the same
On Tue, Jun 15, 2010 at 03:41:17PM -0400, Vadym Chepkov wrote:
On Jun 15, 2010, at 3:36 PM, Dejan Muhamedagic wrote:
Hi,
On Tue, Jun 15, 2010 at 08:45:37AM -0400, Vadym Chepkov wrote:
On Jun 15, 2010, at 6:14 AM, Dejan Muhamedagic wrote:
Hi,
On Tue, Jun 15, 2010 at
Hi,
On Tue, Jun 15, 2010 at 05:09:14PM +0100, Maros Timko wrote:
On Fri, Jun 11, 2010 at 03:45:19PM +0100, Maros Timko wrote:
Hi all,
using heartbeat stack. I have a system with one node offline:
Last updated: Fri Jun 11 13:52:40 2010
Stack: Heartbeat
Current DC:
Hi,
On Tue, Jun 15, 2010 at 01:15:08PM -0600, Dan Urist wrote:
I've recently had exactly the same thing happen. One (highly kludgey!)
solution I've considered is hacking a custom version of the stonith IPMI
agent that would check whether the node was at all reachable following a
stonith
On Tue, 15 Jun 2010 22:08:37 +0200
Dejan Muhamedagic deja...@fastmail.fm wrote:
Hi,
On Tue, Jun 15, 2010 at 01:15:08PM -0600, Dan Urist wrote:
I've recently had exactly the same thing happen. One (highly
kludgey!) solution I've considered is hacking a custom version of
the stonith IPMI
Hi,
On Tue, Jun 15, 2010 at 02:25:51PM -0600, Dan Urist wrote:
On Tue, 15 Jun 2010 22:08:37 +0200
Dejan Muhamedagic deja...@fastmail.fm wrote:
Hi,
On Tue, Jun 15, 2010 at 01:15:08PM -0600, Dan Urist wrote:
I've recently had exactly the same thing happen. One (highly
kludgey!)
On Tue, Jun 15, 2010 at 04:44:31PM -0400, Vadym Chepkov wrote:
On Jun 15, 2010, at 3:55 PM, Dejan Muhamedagic wrote:
On Tue, Jun 15, 2010 at 03:41:17PM -0400, Vadym Chepkov wrote:
On Jun 15, 2010, at 3:36 PM, Dejan Muhamedagic wrote:
Hi,
On Tue, Jun 15, 2010 at 08:45:37AM
On Jun 15, 2010, at 5:26 PM, Dejan Muhamedagic wrote:
On Tue, Jun 15, 2010 at 04:44:31PM -0400, Vadym Chepkov wrote:
On Jun 15, 2010, at 3:55 PM, Dejan Muhamedagic wrote:
On Tue, Jun 15, 2010 at 03:41:17PM -0400, Vadym Chepkov wrote:
On Jun 15, 2010, at 3:36 PM, Dejan Muhamedagic
On Tuesday 15 June 2010, Dejan Muhamedagic wrote:
Hi,
On Tue, Jun 15, 2010 at 02:25:51PM -0600, Dan Urist wrote:
On Tue, 15 Jun 2010 22:08:37 +0200
Dejan Muhamedagic deja...@fastmail.fm wrote:
Hi,
On Tue, Jun 15, 2010 at 01:15:08PM -0600, Dan Urist wrote:
I've recently had
You'd have a message in the logs about the driver rejecting the timeout,
I think.
That's what I see in the logs:
sles2 sbd: [5059]: notice: Using watchdog device: /dev/watchdog
sles2 sbd: [5059]: info: Set watchdog timeout to 180 seconds.
sles2 kernel: [ 68.552201] hpwdt: New timer passed in
53 matches
Mail list logo