On 2016-04-15 07:46, Klaus Wenninger wrote:
Which IP-address did you use to ssh to that box? One controlled
by pacemaker and possibly being migrated or a fixed one assigned
to that box?
Good try but no: the "sunken" (as opposed to floating ;) address of course.
If what digimer says is true, i
On 2016-04-24 16:20, Ken Gaillot wrote:
Correct, you would need to customize the RA.
Well, you wouldn't because your custom RA will be overwritten by the
next RPM update.
Dimitri
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.
On 2016-04-26 00:58, Klaus Wenninger wrote:
But what you are attempting doesn't sound entirely proprietary.
So once you have something that looks like it might be useful
for others as well let the community participate and free yourself
from having to always take care of your private copy ;-)
On 2016-05-05 23:50, Moiz Arif wrote:
Hi Dimitri,
Try cleanup of the fail count for the resource with the any of the below
commands:
via pcs : pcs resource cleanup rsyncd
Tried it, didn't work. Tried pcs resource debug-start rsyncd -- got no
errors, resource didn't start. Tried disable/enabl
On 2016-05-17 09:21, Ken Gaillot wrote:
What happens after "pcs resource cleanup"? "pcs status" reports the
time associated with each failure, so you can check whether you are
seeing the same failure or a new one.
The system log is usually the best starting point, as it will have
messages from
On 2016-06-04 01:10, Digimer wrote:
We're running postfix/dovecot/postgres for our mail on an HA cluster,
but we put it all in a set of VMs and made the VMs HA on DRBD.
Hmm. I deliver to ~/Maildir and /home is NFS-mounted all over the place,
so my primary goal is HA NFS server. I'd hesitate t
On 2016-06-08 09:11, Ken Gaillot wrote:
On 06/08/2016 03:26 AM, Jan Pokorný wrote:
Pacemaker can drive systemd-managed services for quite some time.
This is as easy as changing lsb:dovecot to systemd:dovecot.
Great! Any chance that could be mentioned on
http://www.linux-ha.org/wiki/Resour
On 2016-06-18 05:15, Ferenc Wágner wrote:
...
On the other hand, one could argue that restarting failed services
should be the default behavior of systemd (or any init system). Still,
it is not.
As an off-topic snide comment, I never understood the thinking behind
that: restarting without rem
On 2016-06-20 09:13, Jehan-Guillaume de Rorthais wrote:
I've heard multiple time this kind of argument on the field, but soon or later,
these clusters actually had a split brain scenario with clients connected on
both side, some very bad corruptions, data lost, etc.
I'm sure it's a very helpfu
On 2016-06-20 17:19, Digimer wrote:
Nikhil indicated that they could switch where traffic went up-stream
without issue, if I understood properly.
They have some interesting setup, but that notwithstanding: if split
brain happens some clients will connect to "old master" and some: to
"new mas
On 2016-08-10 10:04, Jason A Ramsey wrote:
Traceback (most recent call last):
File "eps/fence_eps", line 14, in
if sys.version_info.major > 2:
AttributeError: 'tuple' object has no attribute 'major'
Replace with sys.version_info[0]
Dima
___
Us
On 2016-08-26 08:56, Ken Gaillot wrote:
On 08/26/2016 08:11 AM, Gabriele Bulfon wrote:
I tried adding some debug in ocf-shellfuncs, showing env and ps -ef into
the corosync.log
I suspect it's always using ksh, because in the env output I produced I
find this: KSH_VERSION=.sh.version
This is norm
On 2016-08-29 04:06, Gabriele Bulfon wrote:
Thanks, though this does not work :)
Uhm... right. Too many languages, sorry: perl's system() will call the
login shell, system system() uses /bin/sh, and exec()s will run whatever
the programmer tells them to. The point is none of them cares what
On 2016-08-30 03:44, Dejan Muhamedagic wrote:
The kernel reads the shebang line and it is what defines the
interpreter which is to be invoked to run the script.
Yes, and does the kernel read when the script is source'd or executed
via any of the mechanisms that have the executable specified i
On 2016-08-31 03:59, Dejan Muhamedagic wrote:
On Tue, Aug 30, 2016 at 12:32:36PM -0500, Dimitri Maziuk wrote:
I expect you're being deliberately obtuse.
Not sure why do you think that
Because the point I was trying to make was that having shebang line say
#!/opt/swf/bin/bash
does not guara
On 2016-09-06 14:04, Devin Ortner wrote:
I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD.
I have been using the "Clusters from Scratch" documentation to create my
cluster and I am running into a problem where DRBD is not failing over
to the other node when one goes down.
I f
On 2016-09-08 02:03, Digimer wrote:
You need to solve the problem with fencing in DRBD. Leaving it off WILL
result in a split-brain eventually, full stop. With working fencing, you
will NOT get a split-brain, full stop.
"Split brain is a situation where, due to temporary failure of all
networ
On 2016-09-14 09:30, NetLink wrote:
1.Put node 2 in standby
2.Change and configure the new bigger disk on node 2
3.Put node 2 back online and wait for syncing.
4.Put node 1 in standby and repeat the procedure
Would this approach work?
I wonde
On 2016-09-20 09:53, Ken Gaillot wrote:
I do think ifdown is not quite the best failure simulation, since there
aren't that many real-world situation that merely take an interface
down. To simulate network loss (without pulling the cable), I think
maybe using the firewall to block all traffic to
On 2016-10-07 01:18, Ulrich Windl wrote:
Any hardware may fail at any time. We even had an onboard NIC, that
stopped operating correctly some day, we had CPU chache errors, RAM
parity errors, PCI bus errors, and everything you can imagine.
:) http://dilbert.com/strip/1995-06-24
Our vendor's b
On 2016-10-15 01:56, Jay Scott wrote:
So, what's wrong? (I'm a newbie, of course.)
Here's what worked for me on centos 7:
http://octopus.bmrb.wisc.edu/dokuwiki/doku.php?id=sysadmin:pacemaker
YMMV and all that.
cheers,
Dima
___
Users mailing lis
On 2016-10-17 02:12, Ulrich Windl wrote:
Have you tried a proper variant of "lsof" before? So maybe you know
which process might block the device. I also think if you have LVM on
top of DRBD, you must deactivate the VG before trying to unmount.
No LVM here: AFAIMC these days it's another solut
On 2016-10-18 01:18, Ulrich Windl wrote:
Ah, that rings a bell: Sometimes when kernel modules are updated,
somescripts think they must unload modues, then reload them. With a new
kernel not being bootet yet, the modules on disk don't fit the running
kernel. Maybe your problem is like this?
I t
On 2016-11-22 10:35, Jason A Ramsey wrote:
Can anyone recommend a bulletproof process for OS patching a pacemaker
cluster that manages a drbd mirror (with LVM on top of the drbd and luns
defined for an iscsi target cluster if that matters)? Any time I’ve
tried to mess with the cluster, it seems l
On 2016-11-23 02:23, Ulrich Windl wrote:
I'd recommend making a backup of the DRBD data (you always should, anyway),
the shut down the cluster, upgrade all the needed components, then start the
cluster again. Do your basic tests. If you corrupted your data, re-create DRBD
from scratch. Then test
On 2016-11-24 10:41, Toni Tschampke wrote:
We recently did an upgrade for our cluster nodes from Wheezy to Jessie.
IIRC it's the MIT CS joke that they have clusters whose uptime goes way
back past the manufacturing date of any/every piece of hardware they're
running on. They aren't linux-ha c
On 2017-04-13 01:39, Jan Pokorný wrote:
After a bit of a search, the best practice at the list server seems to
be:
[...] if you change the message (eg, by adding a list signature or
by adding the list name to the Subject field), you *should* DKIM
sign.
This is of course going entirely off-to
On 2017-04-16 15:04, Eric Robinson wrote:
On 16/04/17 01:53 PM, Eric Robinson wrote:
I was reading in "Clusters from Scratch" where Beekhof states, "Some
would argue that two-node clusters are always pointless, but that is an
argument for another time."
What you want to know is whether the c
On 4/22/2017 12:02 PM, Digimer wrote:
Having SBD properly configured is *massively* safer than no fencing at
all. So for people where other fence methods are not available for
whatever reason, SBD is the way to go.
Now you're talking. IMO in a 2-node cluster, a node that kills itself in
respo
On 4/22/2017 11:51 PM, Andrei Borzenkov wrote:
As a real life example (not Linux/pacemaker) - panicking node flush
eddisk buffers, so it was not safe to access shared filesystem until
this was complete. This could take quite a lot of time, so without agent
on *surviving* node(s) that monitors an
On 2017-05-17 06:24, Lentes, Bernd wrote:
...
I'd like to know what the software is use is doing. Am i the only one having
that opinion ?
No.
How do you solve the problem of a deathmatch or killing the wrong node ?
*I* live dangerously with fencing disabled. But then my clusters only
r
On 2017-06-16 02:21, Eric Robinson wrote:
Someone talk me off the ledge here.
Step over to the *bsd side. They have cookies. Also zfs.
And no lennartware, that alone's worth $700/year.
Dima
___
Users mailing list: Users@clusterlabs.org
http://list
On 2017-06-16 10:16, Digimer wrote:
On 16/06/17 11:07 AM, Eric Robinson wrote:
Step over to the *bsd side. They have cookies. Also zfs.
And no lennartware, that alone's worth $700/year.
Dima
I left BSD for Linux back in 2000 or so. I have often been wistful for those
days. ;-)
--Eric
Jok
On 7/12/2017 4:33 AM, ArekW wrote:
Hi,
Can in be fixed that the drbd is entering split brain after cluster
node recovery?
I always configure "after-sb*" handlers and drbd-level fence but I never
ran it with allow-two-primaries. You'll have read the fine manual on how
that works in a dual-prim
On 7/14/2017 3:57 AM, ArekW wrote:
Hi, I have stonith run and tested. The problem was that there is
mistake in drbd documentation. The 'fencing' belongs to net (not
disk).
If you are running NFS on top of a dual-primary DRBD with some sort of a
cluster filesystem, I'd think *that* is your prob
On 7/17/2017 4:51 AM, Lentes, Bernd wrote:
I'm asking myself if a DRBD configuration wouldn't be more redundant and high
available.
...
Is DRBD in conjuction with a database (MySQL or Postgres) possible ?
Have you seen https://github.com/ewwhite/zfs-ha/wiki ? -- I recently
deployed one and
On 7/17/2017 2:07 PM, Chris Adams wrote:
However, just like RAID is not a replacement for backups, DRBD is IMHO
not a replacement for database replication. DRBD would just replicate
database files, so if for example file corruption would be copied from
host to host. When something provides a n
On 7/19/2017 1:29 AM, Ulrich Windl wrote:
Maybe it's like with the cluster: Once you have set it up correctly,
it runs quite well, but the wy to get there may be painful. I quit my
experiments with dual-primary DRBD in some early SLES11 (SP1), because
it fenced a lot and refused to come up auto
So yesterday I ran yum update that puled in the new pacemaker and tried
to restart it. The node went into its usual "can't unmount drbd because
kernel is using it" and got stonith'ed in the middle of yum transaction.
The end result: DRBD reports split brain, HA daemons don't start on
boot, RPM
On 2017-07-23 07:40, Valentin Vidic wrote:
It seems you did not put the node into standby before the upgrade as it
still had resources running. What was the old/new pacemaker version there?
Versions: whatever's in centos repos.
Any attempts to migrate the services: standby, reboot, etc. resu
On 2017-07-24 07:51, Tomer Azran wrote:
We don't have the ability to use it.
Is that the only solution?
No, but I'd recommend thinking about it first. Are you sure you will
care about your cluster working when your server room is on fire? 'Cause
unless you have halon suppression, your server
On 2017-08-01 03:05, Stephen Carville (HA List) wrote:
Can
clustering even be done reliably on CentOS 6? I have no objection to
moving to 7 but I was hoping I could get this up quicker than building
out a bunch of new balancers.
I have a number of centos 6 active/passive pairs running heartb
42 matches
Mail list logo