Re: [Linux-HA] Outgoing IP address

2009-02-04 Thread Dimitri Maziuk
routing), but option 1 is way simpler. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http

Re: [Linux-HA] Outgoing IP address

2009-02-05 Thread Dimitri Maziuk
James R. Leu wrote: For locally originated connections that do not bind to an interface you can use the SNAT target of iptables. iptables -A POSTROUTING -o eth0 -j SNAT --to-source 192.168.1.3 There's another problem with using cluster ip for outgoing address: if it fails over in the

Re: [Linux-HA] How to setup a 2 node active/passive apache2 cluster for Proof of Concept

2009-05-29 Thread Dimitri Maziuk
Bernie Wu wrote: The company I work for wants us to start investigating HA. My first POC setup was a 2 node cluster with a floating IP and that worked out quite well. Now the second POC was to work with an application, in this case, apache2 in a active/passive configuration. My question is

Re: [Linux-HA] How to setup a 2 node active/passive apache2 cluster for Proof of Concept

2009-05-29 Thread Dimitri Maziuk
apps, those will break during failover. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also

Re: [Linux-HA] questions about using HA

2009-06-02 Thread Dimitri Maziuk
blue_hmq wrote: hi, i am sorry to disturb you because i have some questions about using HA i had configured the HA on two computer(HA01 as master,HA02),using apache2 service. if i stop apache2 service on HA01,but the Ethernet and heartbeat is Ok. as this situation ,will the ha02 take over

Re: [Linux-HA] Stonith with APCSmart UPS1000 +Network ManagementCard

2009-07-11 Thread Dimitri Maziuk
Ehlers, Kolja wrote: Could somebody explain how the APC Smart shutdown command works? Does it actually allow to take away the power from only one of the connected servers or does it just take away the power from everything that is connected? IIRC we have the previous model of that network

Re: [Linux-HA] Stonith with APCSmart UPS1000 +Network ManagementCard

2009-07-13 Thread Dimitri Maziuk
Ehlers, Kolja wrote: Hello Dima, can you explain how you get that card to shutdown the whole UPS or how to send the powerfail command to connected machines (are you talking about the PowerChute agent?). Is any of that possible through a stonith plugin? I never got a 'round tuit: we have

Re: [Linux-HA] Raid and drdb

2009-07-27 Thread Dimitri Maziuk
cluster should give you enough time to reinstall the OS if one fails. I would keep a spare disk handy. Dimitri -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org

Re: [Linux-HA] postfix recover failed.

2009-11-09 Thread Dimitri Maziuk
like it's using a restart action, usually coded as stop followed by start. If the daemon isn't running to begin with, stop will fail. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing

Re: [Linux-HA] postfix recover failed.

2009-11-10 Thread Dimitri Maziuk
that stop will wait for the daemons to shut down gracefully. Which means it could potentially block the failover for a while (unlikely worst case: deadlock). Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] NFS and DRBD

2010-03-23 Thread Dimitri Maziuk
ideas? You have anything nfs-mounted (automounted?) on the servers? That tends to do it. Dimitri -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http

Re: [Linux-HA] NFS and DRBD

2010-03-23 Thread Dimitri Maziuk
. Especially if it's nfs-mounted from its own ipaddr as you'd do with /home. While we're at it, you did move the rpc-pipefs mountpoint out of /var/lib/nfs, right? Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] NFS and DRBD

2010-03-24 Thread Dimitri Maziuk
need to manually change the mount point somewhere as to not reference the sym linked path? Yes. See www.linux-ha.org/HaNFS -- you need to reconfigure idmapd and the kernel module. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] Heartbeat and Postfix

2010-05-05 Thread Dimitri Maziuk
Cameron Smith wrote: Yeah! Since I already have that in place for http and mysql I just wanted to know if there was anything unique I need to do for postfix config for when it is running on primary (managed by heartbeat) and how do I handle the sending of system emails on the secondary

Re: [Linux-HA] Heartbeat and Postfix

2010-05-10 Thread Dimitri Maziuk
Michael Schwartzkopff wrote: The only reason to do a postfix cluster is to deliver locally queued mail after a failover. Ah! That's what I didn't think of. In theory you could restart postfix w/ different config files: send only on the passive node and full setup on the active. Dima

Re: [Linux-HA] very odd iowait problem

2010-06-21 Thread Dimitri Maziuk
and raid-0,1,10 is fast and very safe. I'd set it up as 1 system disk + 2 raid-1 vm/data space + 1 spare for the raid. And buy a seagate, wd, and a hitachi for those last 3. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] How to restart HA with new haresources without causing failover

2010-08-06 Thread Dimitri Maziuk
: if it's a simple /etc/init.d/service restart then you can of course do (and modify haresources accordingly) it without restarting heartbeat. I doubt you can easily do that with IPAddr or, say, DRBD resource. Dimitri -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-09 Thread Dimitri Maziuk
-- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Dimitri Maziuk
. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat does not take over if BOTH machinesarebootedat the same time

2010-08-10 Thread Dimitri Maziuk
give you a better advice, but what I'd do at this point is boot both systems with drbd and no heartbeat (I think it should come up as secondary/secondary), boot both systems with heartbeat but no drbd (e.g. with ipaddr only) and see which one works. Dima -- Dimitri Maziuk Programmer/sysadmin

Re: [Linux-HA] Am I even on the right track here with Heartbeat?

2010-08-11 Thread Dimitri Maziuk
in particular). When I need a 3-node cluster I'll think about those. Until then, 2.1.4 is not perfect but it works well enough. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA

Re: [Linux-HA] Am I even on the right track here with Heartbeat?

2010-08-11 Thread Dimitri Maziuk
the hardware is just fine. Everything just became slower. Is your filesystem 75-80% full? That would do it on most unix filesystems, esp. when served over nfs. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] Am I even on the right track here with Heartbeat?

2010-08-11 Thread Dimitri Maziuk
-- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Am I even on the right track here with Heartbeat?

2010-08-11 Thread Dimitri Maziuk
, upgrade to the latest every time. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http

Re: [Linux-HA] Am I even on the right track here with Heartbeat?

2010-08-11 Thread Dimitri Maziuk
On Wednesday 11 August 2010 18:26, Greg Woods wrote: On Wed, 2010-08-11 at 17:13 -0500, Dimitri Maziuk wrote: So is it not practical to run RHEL or CentOS 5.x where you'd get this version and several more years of disto maintenance? It's not practical if you want to have both distro

Re: [Linux-HA] Am I even on the right track here with Heartbeat?

2010-08-12 Thread Dimitri Maziuk
On Wednesday 11 August 2010 23:29, Greg Woods wrote: On Wed, 2010-08-11 at 20:01 -0500, Dimitri Maziuk wrote: That aside, the real problem for me is I haven't seen V2-style docs that actually made sense yet. I found the clusterlabs documents useful, but I too had to learn much through

Re: [Linux-HA] Am I even on the right track here with Heartbeat?

2010-08-12 Thread Dimitri Maziuk
you configure active/passive nfs. (Note that pacemaker does resource health monitoring, so you don't need mon anymore -- here's how you tell it to check the status of your rpc.statd and initiate failover it it's sick.) That's all I need. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank

Re: [Linux-HA] Am I even on the right track here with Heartbeat?

2010-08-13 Thread Dimitri Maziuk
On Friday 13 August 2010 04:11, Dejan Muhamedagic wrote: On Thu, Aug 12, 2010 at 03:56:09PM -0500, Dimitri Maziuk wrote: On Thursday 12 August 2010 15:08, Dejan Muhamedagic wrote: On the plus side, there are more people available creating the mess. Right. So forking off another heartbeat

Re: [Linux-HA] Using TCP instead of UDP for heartbeat packets

2010-08-16 Thread Dimitri Maziuk
... google for tunnel udp over ssh to see how to make a splint out of netcat and a fifo. (What's the smiley for straight face.) Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux

Re: [Linux-HA] time to fork heartbeat?

2010-08-16 Thread Dimitri Maziuk
I'm doing and why. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org

Re: [Linux-HA] time to fork heartbeat?

2010-08-16 Thread Dimitri Maziuk
On Monday 16 August 2010 14:35, Andrew Beekhof wrote: On Mon, Aug 16, 2010 at 7:30 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: ... On the other hand, back in v1 days resource startup was done by a shell script, and resource monitoring was done by a perl script. v1 had no resource

Re: [Linux-HA] 2 Server HA Setup

2010-09-03 Thread Dimitri Maziuk
will lose pings because it takes time to fail over to the other server. Expect it to be longer than ping's timeout. You will also lose stateful connections, e.g. servlet - applet, imap, etc., unless your services know how to replicate state to the other server. Dima -- Dimitri Maziuk Programmer

Re: [Linux-HA] Interesting scenario, is there a solution?

2010-09-22 Thread Dimitri Maziuk
On 9/21/2010 10:06 AM, Steve Davies wrote: - Kill the master (A). - The slave (B) is coming up - Some transient issue prevents the RC scripts running on (B). - (B) backs down and requests to become slave again - (A) is down, so (B) never gets confirmation of its slave request. Nothing more

Re: [Linux-HA] Linux HA - DRBD - NFS

2010-10-04 Thread Dimitri Maziuk
; } (in the common section) HTH Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux

Re: [Linux-HA] Linux HA - DRBD - NFS

2010-10-05 Thread Dimitri Maziuk
On 10/4/2010 9:28 PM, Karl Kloppenborg wrote: Hi Dimitri! Thank you so much for you input, you've set me on the track to fixing the problem. It was indeed the killproc nfsd -9 :) Weird: I wouldn't expect the system to crash because of it. NFS daemons not restarting possibly, but

Re: [Linux-HA] heartbeat with postgresql

2010-10-19 Thread Dimitri Maziuk
to care about split brain and everything that comes with that. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo

Re: [Linux-HA] heartbeat with postgresql

2010-10-19 Thread Dimitri Maziuk
Serge Dubrouski wrote: On Tue, Oct 19, 2010 at 1:49 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: The easiest fix was to create /drbdfs/pgsql with proper ownership and symlink /var/lib/pgsql to it. Now that he's recompiled everything, who knows. Or manually mount /var/lib/pgsql/data

Re: [Linux-HA] heartbeat with postgresql

2010-10-19 Thread Dimitri Maziuk
are not updated to suit --configure options). Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See

Re: [Linux-HA] heartbeat with postgresql

2010-10-20 Thread Dimitri Maziuk
the new one running now that I know how it's done, and the vendor replies I dispute that. This is the kind of thing that makes me tell people that compared to linux/drbd/nfs, a $40K netapp is cheap at the price. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http

Re: [Linux-HA] heartbeat with postgresql

2010-10-22 Thread Dimitri Maziuk
which part of all this says 'run ifup eth0:0, then start httpd, then samba on linuxha1' in one simple easy to read sentence? It's not about syntax. It's not about advanced features. It's about a new user setting up a simple stupid active/passive failover pair. Dima -- Dimitri Maziuk

[Linux-HA] LRM operation WebSite_start_0 unknown error

2010-11-01 Thread Dimitri Maziuk
operation WebSite_start_0 (call=9, rc=1, cib-update=34, confirmed=true) unknown error What the hell does it mean and how do I fix it? Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing

Re: [Linux-HA] LRM operation WebSite_start_0 unknown error

2010-11-01 Thread Dimitri Maziuk
Dimitri Maziuk wrote: LRM operation WebSite_start_0 (call=9, rc=1, cib-update=34, confirmed=true) unknown error OTOH, changing to crm no and cat nodename ip/mask httpd haresources gives me running apache. Tell me about advantages of heartbeat v2 again. Dima -- Dimitri Maziuk Programmer

Re: [Linux-HA] LRM operation WebSite_start_0 unknown error

2010-11-01 Thread Dimitri Maziuk
primitive website lsb:httpd. And there is no such thing as unknown error -- there's messed up config file, socket already in use, apache binary is wrong elfclass or not executable at all, and that's about it. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http

Re: [Linux-HA] LRM operation WebSite_start_0 unknown error

2010-11-02 Thread Dimitri Maziuk
not found or 403 forbidden would've been nice. It's RA: waiting for apache to come up followed by the unknown error and then lrmd: WebSite stop. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA

Re: [Linux-HA] LRM operation WebSite_start_0 unknown error

2010-11-03 Thread Dimitri Maziuk
Dejan Muhamedagic wrote: Hi, On Tue, Nov 02, 2010 at 12:01:37PM -0500, Dimitri Maziuk wrote: And no, it doesn't log anything useful -- 404 /server-status not found or 403 forbidden would've been nice. Yes, obviously it could do better. So can 50% of my code (the rest should be taken out

Re: [Linux-HA] LRM operation WebSite_start_0 unknown error

2010-11-04 Thread Dimitri Maziuk
On 11/4/2010 4:50 AM, Lars Ellenberg wrote: On Thu, Nov 04, 2010 at 10:33:27AM +0100, Dejan Muhamedagic wrote: ... Did you take a look at the RA meta-data (crm ra info apache)? Or http://www.linux-ha.org/doc/ specifically http://www.linux-ha.org/doc/re-ra-apache.html Of course not: neither

Re: [Linux-HA] unclean mount of shared partition?

2010-11-13 Thread Dimitri Maziuk
On 11/12/2010 7:43 PM, Syn, Joonho wrote: I think you have to -remove the journal of the ext3 partition tune2fs –O ^has_journal [my device] - fsck at this point -delete and recreate the partition using fdisk - resize2fs at this point -check the newly expanded partition for errors fsck –n [my

Re: [Linux-HA] How to monitor the nic link status

2010-11-29 Thread Dimitri Maziuk
On 11/29/2010 8:24 AM, Mia Lueng wrote: How can i monitor the nic link status to protect the virtual ip address? Run heartbeat pings over eth0? I've been wondering about that: if I have ucast other node's IP and one of the nodes loses its eth0, it should cause the failover. What happens if

Re: [Linux-HA] confused in two node heartbeat cluster

2010-11-30 Thread Dimitri Maziuk
realize that since the node is not connected to any network, you don't actually care if it has a split brain or not? With an even number of nodes you may not have quorum. With two nodes you can't have quorum. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http

Re: [Linux-HA] confused in two node heartbeat cluster

2010-11-30 Thread Dimitri Maziuk
are on the net, only one runs ipaddr/apache, and 2) if one of them is off the net, the one still on runs ipaddr/apache. The rest is featuritis and bloat. (Obviously, drbd, 2+-node clusters, etc. are a whole different story.) Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http

Re: [Linux-HA] How to monitor the nic link status

2010-12-03 Thread Dimitri Maziuk
off grepping for link detected in the output of ethtool -- that's not portable and may change in the next version of ethtool. And so on. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA

Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-08 Thread Dimitri Maziuk
update on the fileservers and find out it helpfully fixed your broken (as in pointing to the drbd filesystem that isn't mounted) symlinks and you have to spend the next 8 hours unfscking the resulting mess. (That is aside from the stuff you mentioned earlier.) Dima -- Dimitri Maziuk Programmer

Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-08 Thread Dimitri Maziuk
(logged in as root you don't depend on shared homedirs, ldap, etc.) so you don't notice the problem right away. Now I simply check a few things in /var and /etc if any clustered services got updated. Pity you can't chattr +i a symlink. :( Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank

Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-09 Thread Dimitri Maziuk
a few minutes between the reboots. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http

Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-09 Thread Dimitri Maziuk
(that netapp is looking cheaper every day) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux

Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-09 Thread Dimitri Maziuk
cluster ip, if you get no response, bring up eth0:0 and fire up services -- a 20-line script? Dimitri -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http

Re: [Linux-HA] [drbd + heartbeat] What should be restarted upon a change to haresources?

2010-12-09 Thread Dimitri Maziuk
Igor Chudov wrote: What next? Do I restart heartbeat only, or should I reboot both servers? You don't need to do anything on the passive node. Restart heartbeat on the active node. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-10 Thread Dimitri Maziuk
for version of heartbeat that ships with RHEL 5 (or Suse 10, as I understand). Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org

Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-10 Thread Dimitri Maziuk
Serge Dubrouski wrote: On Fri, Dec 10, 2010 at 12:54 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: Les Mikesell wrote: ... What I wanted was advice on the best platform that had a packaged, re-usable setup available that was likely to be maintained in updates for a long time. There's

[Linux-HA] hb_standby in 3.0.3

2010-12-17 Thread Dimitri Maziuk
- doesn't seem to do anything other than print hb_standby[6096]: Going standby [all]. Neither does hb_takeover on the other node. They aren't even logging anything. Is this the expected behaviour (this is r1-style setup w/ haresources)? Dima -- Dimitri Maziuk Programmer/sysadmin

Re: [Linux-HA] hb_standby in 3.0.3

2010-12-21 Thread Dimitri Maziuk
Dimitri Maziuk wrote: - doesn't seem to do anything other than print hb_standby[6096]: Going standby [all]. Neither does hb_takeover on the other node. They aren't even logging anything. Is this the expected behaviour (this is r1-style setup w/ haresources)? The winning answer

Re: [Linux-HA] hb_standby in 3.0.3

2010-12-21 Thread Dimitri Maziuk
vaguely recall some deprecated do not use R1 webpage that explained auto_failback, however, I can't find it anymore and check if it mentioned the relationship between hb_standby and auto_failback. My recollection is, it didn't. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Dimitri Maziuk
-- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Question about IPaddr

2011-01-06 Thread Dimitri Maziuk
On 1/6/2011 9:23 AM, Max wrote: Alain, IMHO the current s/w does not do a great job of 'monitoring' the link - you can pull an ethernet plug and this will not be noticed (other than by remote ping's failing... The problem with failing pings is they could be failing on the switch or on the

Re: [Linux-HA] Question about IPaddr

2011-01-06 Thread Dimitri Maziuk
Dimitri Maziuk wrote: The best I could come up with grep on the output of ethtool. ^ was And by that I mean I have a mon script working in my R1 clusters. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] DRBD MC 0.8.8 / Pacemaker GUI

2011-01-10 Thread Dimitri Maziuk
On 1/10/2011 9:16 AM, Rasto Levrinc wrote: I am sorry to inform you, that you have a Mac :) Are you saying that there is ~/.ssh/known_hosts on Mac? In that case I'll enable it. Perhaps you should know: there is another one, usually in /etc/ssh. There is also an openldap patch for storing them

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-10 Thread Dimitri Maziuk
(as opposed to just heartbeat) correctly. If it works, you can use procfix. (I've been meaning to try that myself but haven't got a round tuit yet.) Or roll your own check/restart cron job. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] how to configure heartbeat for polling MySql server?

2011-02-02 Thread Dimitri Maziuk
crm with 2.1.4, I'd either upgrade or use R1 setup. For the latter, see /usr/share/doc/heartbeat-2.1.4/GettingStarted.html. And /usr/share/doc/heartbeat-2.1.4/faqntips.html, Q5 how to monitor services. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http

Re: [Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-04 Thread Dimitri Maziuk
a bad choice, Utilization may have been more appropriate. But try to define cpu health... Of course, with 4+-core CPUs, you'd very rarely see all of them at 100% busy. Especially when it only takes one to saturate your i/o bus. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW

Re: [Linux-HA] NFSv4 with Heartbeat and DRBD

2011-02-07 Thread Dimitri Maziuk
doing.) Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org

Re: [Linux-HA] Question about IPaddr

2011-02-11 Thread Dimitri Maziuk
be set for an interface. Or you could e-mail net-snmp-cod...@list.sourceforge.net and ask where snmpd is getting the IF-MIB::ifOperStatus.X from (or at least where to look for it in the code). Or you could (should) run snmpd and simply snmpget ifOperStatus.X from localhost. Dima -- Dimitri

Re: [Linux-HA] fsck filesystem?

2011-02-22 Thread Dimitri Maziuk
On 2/22/2011 8:53 AM, Dejan Muhamedagic wrote: Hi Bernd, On Tue, Feb 22, 2011 at 12:49:00AM +0100, Bernd Schubert wrote: Hello Dejan, ... And of course, no filesystem is free of bugs. Which is why until now extX suggests frequent fscks. Hmpf. OK, must say that I expected it to be more

Re: [Linux-HA] Updating LDAP in Heartbeat/DRDB Cluster

2011-02-23 Thread Dimitri Maziuk
the active node, fail over, update the other active node -- that will avoid broken symlinks problem, but cause some downtime on ldap service. Either way make a backup copy of everything first. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] Updating LDAP in Heartbeat/DRDB Cluster

2011-02-23 Thread Dimitri Maziuk
Serge Dubrouski wrote: On Wed, Feb 23, 2011 at 2:56 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: Serge Dubrouski wrote: Why not to use ldap syncrepl feature instead of DRBD? The problem with syncrepl is not the replication, it's the timeouts in the failover. As in you type ls -l, your

Re: [Linux-HA] Updating LDAP in Heartbeat/DRDB Cluster

2011-02-23 Thread Dimitri Maziuk
server-2 in /etc/ldap.conf. That is the setup that's not so great when things actually fail. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-08 Thread Dimitri Maziuk
aren't falling out of their slots, and are half-decent quality hardware, and the drivers aren't alpha prototype code, and so on, the chances of it being the link down case should be fairly low. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-09 Thread Dimitri Maziuk
Dejan Muhamedagic wrote: On Tue, Mar 08, 2011 at 02:27:52PM -0600, Dimitri Maziuk wrote: Well, realistically, if the link is a foot of x/over cable and gremlins have not been pulling on it, and the NICs aren't falling out of their slots, and are half-decent quality hardware, and the drivers

Re: [Linux-HA] Server becomes unresponsive after node failure

2011-03-11 Thread Dimitri Maziuk
cabinet are actually far from it. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http

Re: [Linux-HA] Antw: Re: DRBD and pacemaker interaction

2011-03-28 Thread Dimitri Maziuk
. Dima (hoping ceph/btrfs will reach production quality before I have to upgrade our R1 nfs cluster) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http

Re: [Linux-HA] DRBD+Heartbeat

2011-03-29 Thread Dimitri Maziuk
not be detected either. Is there a way to get heartbeat to check if mysql is running as well and switch over in case of software crash? Mysql monitoring script comes standard with mon. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] DRBD problems are not reported

2011-03-31 Thread Dimitri Maziuk
not seem to happen in R1 config where drbd is started at boot and heartbeat only handles promotion to primary and fs mount. So this may be a genuine problem in the resource agent. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Re: [Linux-HA] DRBD problems are not reported

2011-03-31 Thread Dimitri Maziuk
split brain recovery, if possible, and presumably call some pri-lost handler if not. Presumably it has to connect first, so degr-timeout of 0 would presumably be incompatible with those, but TFM does not say. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http

Re: [Linux-HA] DRBD and pacemaker interaction

2011-04-01 Thread Dimitri Maziuk
representative sample while you're at it. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also

Re: [Linux-HA] DRBD and pacemaker interaction

2011-04-02 Thread Dimitri Maziuk
On 4/2/2011 12:40 AM, Vadym Chepkov wrote: Ok, lets see how this might work. You would need a separate monitor for the cluster and since this monitor also can potentially crash, you would need another monitor to observer the first one, then we would want the first one to monitor second one,

Re: [Linux-HA] Does heartbeat only use ping to check health of otherserver?

2011-04-04 Thread Dimitri Maziuk
/hb_standby. Not sure how you'd do the locked up bit. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See

Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

2011-04-26 Thread Dimitri Maziuk
On 4/22/2011 4:25 AM, SEILLIER Mathieu wrote: Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 : active Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 : dead Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 :

Re: [Linux-HA] EXTERNAL: HA servers rebooting

2011-05-06 Thread Dimitri Maziuk
On 5/6/2011 7:34 AM, Lacoco, Joshua wrote: It's unlikely that heartbeat itself is causing the rebooting unless you enabled/configured stonith. Drbd can be configured to halt the machine, though. I've seen linux miss packets and time out on sockets under high load -- are you monitoring load

Re: [Linux-HA] Massive amount of log messages after node failure

2011-05-17 Thread Dimitri Maziuk
On 5/17/2011 8:25 AM, Sascha Hagedorn wrote: Hi everyone, ... - Pulled the HA network cable - Put it back after a couple of seconds Result: - Node 2 is being restarted - Load average on Node 1 increases until the system becomes unreachable -

Re: [Linux-HA] managing resource httpd in heartbeat

2011-05-19 Thread Dimitri Maziuk
can get some configuration parameters. Upgrade to scientific linux 6 and read RHEL's Cluster Administration docs. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature

Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Dimitri Maziuk
with. Sometimes we call them gremlins. HTH Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http

Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Dimitri Maziuk
node(s) due to, indeed, a problem with comms channel. However, I can think of only one way to make that happen over unicast but not broadcast: unicasting to a wrong host. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description

Re: [Linux-HA] how to configure ipfail

2011-05-25 Thread Dimitri Maziuk
? Did you try /usr/share/doc/heartbeat-XYZ/GettingStarted.html? Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA

Re: [Linux-HA] heartbeat step down after split brain scenario

2011-06-16 Thread Dimitri Maziuk
-- and gets to keep the VIP. Which in general can't be done from the nodes themselves. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA

Re: [Linux-HA] ocf::LVM monitor needs excessive time to complete

2011-08-05 Thread Dimitri Maziuk
On 8/5/2011 7:18 AM, Dejan Muhamedagic wrote: Hi, On Fri, Aug 05, 2011 at 01:55:25PM +0200, Ulrich Windl wrote: ... When I tried a vgs manually, it could not be suspended or killed, and it took more than 30 seconds to complete. Thus the LVM monitoring is quite useless as it is now (SLES 11

[Linux-HA] Forcing primitive_nfslock away from node

2011-08-18 Thread Dimitri Maziuk
is started on node1 while the statd lockd for it are started on node2? Despite inf: colocation constraint? (SL6 w/ stock rpms plus drbd from atrpms) Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital

Re: [Linux-HA] NFS with two IPs / various NFS questions / NFSv4

2011-08-24 Thread Dimitri Maziuk
errors on the clients after failover. There's also lock persistence, but since file locking never actually worked with nfs, I really have no idea how to test that. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP

Re: [Linux-HA] A little confused

2011-09-15 Thread Dimitri Maziuk
don't already know how to use that, you're SOL. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux

Re: [Linux-HA] A little confused

2011-09-15 Thread Dimitri Maziuk
aliases created with ip command don't show up in ifconfig output and vice versa. So if you're using an IPx resource agent that uses ip, which was the default in the last pacemaker howto I looked at, ifconfig -a won't show your alias. Try ip addr show. Dima -- Dimitri Maziuk Programmer/sysadmin

Re: [Linux-HA] A little confused

2011-09-15 Thread Dimitri Maziuk
is for the distro vendors who want your money for letting you know how to set it up. Surprisingly, if you don't pay, things won't work quite so well. Plain old heartbeat is fairly simple stupid -- if you understand what you're doing, it's not confusing at all. Dima -- Dimitri Maziuk Programmer/sysadmin

Re: [Linux-HA] A little confused

2011-09-16 Thread Dimitri Maziuk
to spend on that. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman

  1   2   3   >