routing), but option 1 is way simpler.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http
James R. Leu wrote:
For locally originated connections that do not bind to an interface you
can use the SNAT target of iptables.
iptables -A POSTROUTING -o eth0 -j SNAT --to-source 192.168.1.3
There's another problem with using cluster ip for outgoing address: if
it fails over in the
Bernie Wu wrote:
The company I work for wants us to start investigating HA.
My first POC setup was a 2 node cluster with a floating IP and that worked
out quite well.
Now the second POC was to work with an application, in this case, apache2 in
a active/passive configuration.
My question is
apps, those will break during
failover.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also
blue_hmq wrote:
hi, i am sorry to disturb you because i have some questions about using HA
i had configured the HA on two computer(HA01 as master,HA02),using apache2
service.
if i stop apache2 service on HA01,but the Ethernet and heartbeat is
Ok. as this situation ,will the ha02 take over
Ehlers, Kolja wrote:
Could somebody explain how the APC Smart shutdown command works? Does it
actually allow to take away the power from only one of the
connected servers or does it just take away the power from everything that is
connected?
IIRC we have the previous model of that network
Ehlers, Kolja wrote:
Hello Dima,
can you explain how you get that card to shutdown the whole UPS or how to
send the powerfail command to connected machines (are you
talking about the PowerChute agent?). Is any of that possible through a
stonith plugin?
I never got a 'round tuit: we have
cluster should give you enough time to reinstall the OS if
one fails. I would keep a spare disk handy.
Dimitri
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
like it's using a restart action, usually coded as stop followed
by start.
If the daemon isn't running to begin with, stop will fail.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing
that stop will wait for the daemons to shut down gracefully. Which
means it could potentially block the failover for a while (unlikely worst
case: deadlock).
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
ideas?
You have anything nfs-mounted (automounted?) on the servers? That tends to do
it.
Dimitri
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http
.
Especially if it's nfs-mounted from its own ipaddr as you'd do with /home.
While we're at it, you did move the rpc-pipefs mountpoint out of /var/lib/nfs,
right?
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
need to manually change the mount point somewhere as to not reference
the sym linked path?
Yes. See www.linux-ha.org/HaNFS -- you need to reconfigure idmapd and the
kernel module.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Cameron Smith wrote:
Yeah!
Since I already have that in place for http and mysql I just wanted to know
if there was anything unique I need to do for postfix config for when it is
running on primary (managed by heartbeat) and how do I handle the sending of
system emails on the secondary
Michael Schwartzkopff wrote:
The only reason to do a postfix cluster is to deliver locally queued mail
after
a failover.
Ah! That's what I didn't think of.
In theory you could restart postfix w/ different config files: send
only on the passive node and full setup on the active.
Dima
and raid-0,1,10 is fast and very safe.
I'd set it up as 1 system disk + 2 raid-1 vm/data space + 1 spare for the
raid. And buy a seagate, wd, and a hitachi for those last 3.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
: if it's a simple /etc/init.d/service restart then you can of
course do (and modify haresources accordingly) it without restarting
heartbeat. I doubt you can easily do that with IPAddr or, say, DRBD resource.
Dimitri
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
give you a better advice, but what I'd do at this point
is boot both systems with drbd and no heartbeat (I think it should come
up as secondary/secondary), boot both systems with heartbeat but no
drbd (e.g. with ipaddr only) and see which one works.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
in particular).
When I need a 3-node cluster I'll think about those. Until then, 2.1.4 is not
perfect but it works well enough.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA
the hardware is just fine.
Everything just became slower.
Is your filesystem 75-80% full? That would do it on most unix filesystems,
esp. when served over nfs.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
, upgrade to the latest every time.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http
On Wednesday 11 August 2010 18:26, Greg Woods wrote:
On Wed, 2010-08-11 at 17:13 -0500, Dimitri Maziuk wrote:
So is it not practical to run RHEL or CentOS 5.x where you'd get this
version and several more years of disto maintenance?
It's not practical if you want to have both distro
On Wednesday 11 August 2010 23:29, Greg Woods wrote:
On Wed, 2010-08-11 at 20:01 -0500, Dimitri Maziuk wrote:
That aside, the real problem for me is I haven't seen V2-style docs that
actually made sense yet.
I found the clusterlabs documents useful, but I too had to learn much
through
you configure
active/passive nfs. (Note that pacemaker does resource health monitoring, so
you don't need mon anymore -- here's how you tell it to check the status of
your rpc.statd and initiate failover it it's sick.)
That's all I need.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank
On Friday 13 August 2010 04:11, Dejan Muhamedagic wrote:
On Thu, Aug 12, 2010 at 03:56:09PM -0500, Dimitri Maziuk wrote:
On Thursday 12 August 2010 15:08, Dejan Muhamedagic wrote:
On the plus side, there are more people available creating the mess.
Right. So forking off another heartbeat
... google for tunnel udp over ssh to
see how to make a splint out of netcat and a fifo. (What's the smiley
for straight face.)
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux
I'm doing and why.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org
On Monday 16 August 2010 14:35, Andrew Beekhof wrote:
On Mon, Aug 16, 2010 at 7:30 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu
wrote:
...
On the other hand, back in v1 days resource startup was done by a shell
script, and resource monitoring was done by a perl script.
v1 had no resource
will lose pings because it takes time to fail over to the other
server. Expect it to be longer than ping's timeout.
You will also lose stateful connections, e.g. servlet - applet, imap,
etc., unless your services know how to replicate state to the other server.
Dima
--
Dimitri Maziuk
Programmer
On 9/21/2010 10:06 AM, Steve Davies wrote:
- Kill the master (A).
- The slave (B) is coming up
- Some transient issue prevents the RC scripts running on (B).
- (B) backs down and requests to become slave again
- (A) is down, so (B) never gets confirmation of its slave request.
Nothing more
;
}
(in the common section)
HTH
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux
On 10/4/2010 9:28 PM, Karl Kloppenborg wrote:
Hi Dimitri!
Thank you so much for you input, you've set me on the track to fixing the
problem.
It was indeed the killproc nfsd -9 :)
Weird: I wouldn't expect the system to crash because of it. NFS daemons
not restarting possibly, but
to care about split brain and everything that
comes with that.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo
Serge Dubrouski wrote:
On Tue, Oct 19, 2010 at 1:49 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
The easiest fix was to create /drbdfs/pgsql with proper ownership and
symlink /var/lib/pgsql to it. Now that he's recompiled everything, who
knows.
Or manually mount /var/lib/pgsql/data
are not updated to suit --configure options).
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See
the
new one running now that I know how it's done, and the vendor replies
I dispute that.
This is the kind of thing that makes me tell people that compared to
linux/drbd/nfs, a $40K netapp is cheap at the price.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http
which part of all this says 'run ifup eth0:0, then start
httpd, then samba on linuxha1' in one simple easy to read sentence?
It's not about syntax. It's not about advanced features. It's about a
new user setting up a simple stupid active/passive failover pair.
Dima
--
Dimitri Maziuk
operation WebSite_start_0 (call=9, rc=1, cib-update=34,
confirmed=true) unknown error
What the hell does it mean and how do I fix it?
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing
Dimitri Maziuk wrote:
LRM operation WebSite_start_0 (call=9, rc=1, cib-update=34,
confirmed=true) unknown error
OTOH, changing to crm no and
cat nodename ip/mask httpd haresources
gives me running apache.
Tell me about advantages of heartbeat v2 again.
Dima
--
Dimitri Maziuk
Programmer
primitive website lsb:httpd. And there is no such thing
as unknown error -- there's messed up config file, socket already in
use, apache binary is wrong elfclass or not executable at all, and
that's about it.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http
not found
or 403 forbidden would've been nice. It's RA: waiting for apache to
come up followed by the unknown error and then lrmd: WebSite stop.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA
Dejan Muhamedagic wrote:
Hi,
On Tue, Nov 02, 2010 at 12:01:37PM -0500, Dimitri Maziuk wrote:
And no, it doesn't log anything useful -- 404 /server-status not found
or 403 forbidden would've been nice.
Yes, obviously it could do better.
So can 50% of my code (the rest should be taken out
On 11/4/2010 4:50 AM, Lars Ellenberg wrote:
On Thu, Nov 04, 2010 at 10:33:27AM +0100, Dejan Muhamedagic wrote:
...
Did you take a look at the RA meta-data (crm ra info apache)?
Or http://www.linux-ha.org/doc/
specifically http://www.linux-ha.org/doc/re-ra-apache.html
Of course not: neither
On 11/12/2010 7:43 PM, Syn, Joonho wrote:
I think you have to
-remove the journal of the ext3 partition tune2fs –O ^has_journal [my
device]
- fsck at this point
-delete and recreate the partition using fdisk
- resize2fs at this point
-check the newly expanded partition for errors fsck –n [my
On 11/29/2010 8:24 AM, Mia Lueng wrote:
How can i monitor the nic link status to protect the virtual ip address?
Run heartbeat pings over eth0?
I've been wondering about that: if I have ucast other node's IP and
one of the nodes loses its eth0, it should cause the failover. What
happens if
realize that since the node is not connected to any network, you
don't actually care if it has a split brain or not?
With an even number of nodes you may not have quorum. With two nodes you
can't have quorum.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http
are on the net, only one runs ipaddr/apache, and 2) if one of them is
off the net, the one still on runs ipaddr/apache. The rest is featuritis
and bloat.
(Obviously, drbd, 2+-node clusters, etc. are a whole different story.)
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http
off grepping for link detected in the output of ethtool --
that's not portable and may change in the next version of ethtool. And
so on.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA
update on the fileservers and find out it helpfully fixed your broken
(as in pointing to the drbd filesystem that isn't mounted) symlinks
and you have to spend the next 8 hours unfscking the resulting mess.
(That is aside from the stuff you mentioned earlier.)
Dima
--
Dimitri Maziuk
Programmer
(logged in as root you don't depend on shared
homedirs, ldap, etc.) so you don't notice the problem right away. Now I
simply check a few things in /var and /etc if any clustered services got
updated. Pity you can't chattr +i a symlink. :(
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank
a few minutes between the
reboots.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http
(that netapp is looking cheaper every day)
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux
cluster ip, if you get no response, bring up eth0:0 and fire up
services -- a 20-line script?
Dimitri
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http
Igor Chudov wrote:
What next? Do I restart heartbeat only, or should I reboot both servers?
You don't need to do anything on the passive node. Restart heartbeat on
the active node.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
for version of heartbeat that ships
with RHEL 5 (or Suse 10, as I understand).
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org
Serge Dubrouski wrote:
On Fri, Dec 10, 2010 at 12:54 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu
wrote:
Les Mikesell wrote:
...
What I
wanted was advice on the best platform that had a packaged, re-usable
setup available that was likely to be maintained in updates for a long
time.
There's
- doesn't seem to do anything other than print
hb_standby[6096]: Going standby [all].
Neither does hb_takeover on the other node. They aren't even logging
anything.
Is this the expected behaviour (this is r1-style setup w/ haresources)?
Dima
--
Dimitri Maziuk
Programmer/sysadmin
Dimitri Maziuk wrote:
- doesn't seem to do anything other than print
hb_standby[6096]: Going standby [all].
Neither does hb_takeover on the other node. They aren't even logging
anything.
Is this the expected behaviour (this is r1-style setup w/ haresources)?
The winning answer
vaguely recall some deprecated do not use R1 webpage that explained
auto_failback, however, I can't find it anymore and check if it
mentioned the relationship between hb_standby and auto_failback. My
recollection is, it didn't.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
On 1/6/2011 9:23 AM, Max wrote:
Alain,
IMHO the current s/w does not do a great job of 'monitoring'
the link - you can pull an ethernet plug and this will not
be noticed (other than by remote ping's failing...
The problem with failing pings is they could be failing on the switch or
on the
Dimitri Maziuk wrote:
The best I could come up with grep on the output of ethtool.
^ was
And by that I mean I have a mon script working in my R1 clusters.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
On 1/10/2011 9:16 AM, Rasto Levrinc wrote:
I am sorry to inform you, that you have a Mac :) Are you saying that there
is ~/.ssh/known_hosts on Mac? In that case I'll enable it.
Perhaps you should know: there is another one, usually in /etc/ssh.
There is also an openldap patch for storing them
(as opposed to just heartbeat)
correctly. If it works, you can use procfix. (I've been meaning to try
that myself but haven't got a round tuit yet.)
Or roll your own check/restart cron job.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
crm with 2.1.4, I'd either upgrade or use R1 setup. For
the latter, see /usr/share/doc/heartbeat-2.1.4/GettingStarted.html. And
/usr/share/doc/heartbeat-2.1.4/faqntips.html, Q5 how to monitor services.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http
a bad choice, Utilization may have been more
appropriate.
But try to define cpu health...
Of course, with 4+-core CPUs, you'd very rarely see all of them at 100%
busy. Especially when it only takes one to saturate your i/o bus.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW
doing.)
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org
be set for an interface.
Or you could e-mail net-snmp-cod...@list.sourceforge.net and ask where
snmpd is getting the IF-MIB::ifOperStatus.X from (or at least where to
look for it in the code).
Or you could (should) run snmpd and simply snmpget ifOperStatus.X from
localhost.
Dima
--
Dimitri
On 2/22/2011 8:53 AM, Dejan Muhamedagic wrote:
Hi Bernd,
On Tue, Feb 22, 2011 at 12:49:00AM +0100, Bernd Schubert wrote:
Hello Dejan,
...
And of course, no filesystem is free of bugs. Which is why until now
extX suggests frequent fscks.
Hmpf. OK, must say that I expected it to be more
the active node, fail over, update the other active
node -- that will avoid broken symlinks problem, but cause some downtime
on ldap service.
Either way make a backup copy of everything first.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Serge Dubrouski wrote:
On Wed, Feb 23, 2011 at 2:56 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
Serge Dubrouski wrote:
Why not to use ldap syncrepl feature instead of DRBD?
The problem with syncrepl is not the replication, it's the timeouts in
the failover. As in you type ls -l, your
server-2 in /etc/ldap.conf. That is
the setup that's not so great when things actually fail.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http
aren't falling out of their
slots, and are half-decent quality hardware, and the drivers aren't
alpha prototype code, and so on, the chances of it being the link down
case should be fairly low.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Dejan Muhamedagic wrote:
On Tue, Mar 08, 2011 at 02:27:52PM -0600, Dimitri Maziuk wrote:
Well, realistically, if the link is a foot of x/over cable and gremlins
have not been pulling on it, and the NICs aren't falling out of their
slots, and are half-decent quality hardware, and the drivers
cabinet are actually far from it.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http
.
Dima (hoping ceph/btrfs will reach production quality before I have to
upgrade our R1 nfs cluster)
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http
not be
detected either. Is there a way to get heartbeat to check if mysql is
running as well and switch over in case of software crash?
Mysql monitoring script comes standard with mon.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
not seem to happen in R1 config where drbd is started
at boot and heartbeat only handles promotion to primary and fs mount. So
this may be a genuine problem in the resource agent.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
split brain
recovery, if possible, and presumably call some pri-lost handler if
not. Presumably it has to connect first, so degr-timeout of 0 would
presumably be incompatible with those, but TFM does not say.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http
representative sample while you're at it.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also
On 4/2/2011 12:40 AM, Vadym Chepkov wrote:
Ok, lets see how this might work.
You would need a separate monitor for the cluster and since this
monitor also can potentially crash, you would need another monitor to
observer the first one, then we would want the first one to monitor
second one,
/hb_standby. Not sure how you'd do the
locked up bit.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See
On 4/22/2011 4:25 AM, SEILLIER Mathieu wrote:
Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 :
active
Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 :
dead
Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 :
On 5/6/2011 7:34 AM, Lacoco, Joshua wrote:
It's unlikely that heartbeat itself is causing the rebooting unless
you enabled/configured stonith.
Drbd can be configured to halt the machine, though.
I've seen linux miss packets and time out on sockets under high load --
are you monitoring load
On 5/17/2011 8:25 AM, Sascha Hagedorn wrote:
Hi everyone,
...
- Pulled the HA network cable
- Put it back after a couple of seconds
Result:
- Node 2 is being restarted
- Load average on Node 1 increases until the system becomes
unreachable
-
can get some
configuration parameters.
Upgrade to scientific linux 6 and read RHEL's Cluster Administration docs.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
with.
Sometimes we call them gremlins.
HTH
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http
node(s) due to, indeed, a problem with
comms channel. However, I can think of only one way to make that happen
over unicast but not broadcast: unicasting to a wrong host.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description
?
Did you try /usr/share/doc/heartbeat-XYZ/GettingStarted.html?
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA
-- and gets to keep the VIP. Which in
general can't be done from the nodes themselves.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Linux-HA
On 8/5/2011 7:18 AM, Dejan Muhamedagic wrote:
Hi,
On Fri, Aug 05, 2011 at 01:55:25PM +0200, Ulrich Windl wrote:
...
When I tried a vgs manually, it could not be suspended or killed, and
it took more than 30 seconds to complete.
Thus the LVM monitoring is quite useless as it is now (SLES 11
is started on node1 while the statd
lockd for it are started on node2? Despite inf: colocation constraint?
(SL6 w/ stock rpms plus drbd from atrpms)
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital
errors on the clients after
failover. There's also lock persistence, but since file locking never
actually worked with nfs, I really have no idea how to test that.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP
don't already know how to use that, you're SOL.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux
aliases created with ip
command don't show up in ifconfig output and vice versa. So if you're
using an IPx resource agent that uses ip, which was the default in the
last pacemaker howto I looked at, ifconfig -a won't show your alias. Try
ip addr show.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
is for the distro vendors who want your money for
letting you know how to set it up. Surprisingly, if you don't pay,
things won't work quite so well.
Plain old heartbeat is fairly simple stupid -- if you understand what
you're doing, it's not confusing at all.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
to spend on that.
Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman
1 - 100 of 209 matches
Mail list logo