Re: NFS server fail-over - how do you do it?
** Reply to note from "adp" <[EMAIL PROTECTED]> Mon, 31 May 2004 12:33:24 -0500 > I was thinking that > since NFS is udp-based, that if the primary NFS server failed, and the > secondary assumed the primary NFS server's IP address, that things would at > least return to normal (of course, any writes that had been in progress > would fail horribly). That doesn't seem to be the case. During a test we > killed the main NFS server and brought up the NFS IP as an alias on the > backup. Didn't work. Has anyone tried anything like this? The idea makes me shiver, as I'm quite sure there would be data losses. However, if you are so brave... have you tried freevrrpd? The problem might be that clients still have that IP associated with the old MAC address in their tables. VRRP is a protocol designed to handel failovers that should also deal with this, by changing the IP *and* the MAC address of the card. bye av. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS server fail-over - how do you do it?
On Sun, 30 May 2004 02:43:37 -0500 "adp" <[EMAIL PROTECTED]> wrote: > I am running a FreeBSD 4.9-REL NFS server. Once every several hours our main > NFS server replicates everything to a backup FreeBSD NFS server. We are okay > with the gap in time between replication. What we aren't sure about is how > to automate the fail-over between the primary to the secondary NFS server. > This is for a web cluster. Each client mounts several directories from the > NFS server. > > Let's say that our primary NFS server dies and just goes away. What then? > Are you periodically doing a mount or a file look-up of a mounted filesystem > to check if your NFS server died? If so are you just unmounting and > remounting everything using the backup NFS server? > > Just curious how this problem is being solved. > > > ___ > [EMAIL PROTECTED] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "[EMAIL PROTECTED]" > Have you looked into amd (or, am-utils) ? I haven't used its failover feature, but it certainly does have it. horio shoichi ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS server fail-over - how do you do it?
Couple of issues regarding failover. 1) If system B is going to take over system a's IP, it also needs to take it's MAC address. Else you have to wait for an ARP timeout. Some systems (all?) perform a gratuitous arp-reply when an if comes up. But some other systems ignore this if they already have an arp entry, or if they weren't asking for the arp in the first place. 2) The failed system must be made to stay failed, else there is hell to pay when it comes back and finds another system in the bed, er, server room! In a main/standby scenario, this is doable with some simple scripting. Any more than that and you will need some dynamic voting algortihm support. A nice thing about *real* computers is that they have an RS-232 console port and can be made to stay down with a BRK. I believe the PC weasel will allow that, as well. A remote power controller can also serve this need. 3) One argument for run-levels in init was to keep a system at rl 2 monitoring the primary, then go to rl 3 if the primary failes. This, of course, can be done with flat rc.d, and entirely without it, as well. But it made the primary/hotstandby scheme trivial to set-up. Regardless of where you put it and what all it calls, make a single script that can be run from your monitor app once it decides the master is gone. It ensures the primary is dead, starts the server processes, and screams like the dickens for help. 4) NFS may be stateless, but NFS over TCP is common nowadays, and it isn't. Though, I believe the automounter can help with that. 5) NAS serving SAN is nice if you can afford all that fiber term gear. But you can do the same with a scsi raid array that has two host ports. You don't even need the second host port if you can change the scsi initiator ID of one of the hosts. Just keep your cable lengths as short as you can. 6) It is generally cheaper to buy than build, unless you have done it before. The devil is in the details. I've done it before, and I'll buy every time. Given that, a plug for some friends of mine that have made this work in the pri/hs mode. www.nssolutions.com Cheers! -sam ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS server fail-over - how do you do it?
In the last episode (May 31), adp said: > Very useful information, thanks. We have a very stable NFS server, > but I am still working hard to put some redundancy into place. I was > thinking that since NFS is udp-based, that if the primary NFS server > failed, and the secondary assumed the primary NFS server's IP > address, that things would at least return to normal (of course, any > writes that had been in progress would fail horribly). That doesn't > seem to be the case. During a test we killed the main NFS server and > brought up the NFS IP as an alias on the backup. Didn't work. Has > anyone tried anything like this? That should work, I believe. NFS is stateless so as long as "a" server starts responding to the client, it should wake up. You may get "stale NFS handle" errors on open files or ones not synched to the slave when the master failed, but apart from that you should be okay. Does a tcpdump show any NFS traffic at all? I have a port of the heartbeat program (from the badly-named www.linux-ha.org site) that automates the IP failover part that I will be submitting soon. 1.2.1 actually works out of the box on FreeBSD, but 1.2.2 has problems releasing the IP when you try to move an active server to standby. -- Dan Nelson [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS server fail-over - how do you do it?
adp wrote: We can live with the chance that a file write might fail as long as we can switch over to another NFS server if the primary fails. Sorry, NFS simply won't work with the model of operation you've described. There is no way to do fallback to a secondary NFS server if the primary goes down when using read/write shares, nor does there exist any way to push the changes made to a secondary fileserver back to the primary, even if you could convince the clients to fail-over in the first place. Maybe Samba/CIFS would come closer to what you want, or else WebDAV over HTTP? -- -Chuck ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS server fail-over - how do you do it?
Very useful information, thanks. We have a very stable NFS server, but I am still working hard to put some redundancy into place. I was thinking that since NFS is udp-based, that if the primary NFS server failed, and the secondary assumed the primary NFS server's IP address, that things would at least return to normal (of course, any writes that had been in progress would fail horribly). That doesn't seem to be the case. During a test we killed the main NFS server and brought up the NFS IP as an alias on the backup. Didn't work. Has anyone tried anything like this? - Original Message - From: "Chuck Swiger" <[EMAIL PROTECTED]> To: "adp" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Monday, May 31, 2004 11:55 AM Subject: Re: NFS server fail-over - how do you do it? > adp wrote: > > One of my big problems right now is that if our primary NFS server goes down > > then everything using that NFS mount locks up. If I change to the mounted > > filesystem on the client then it stalls: > > > > # pwd > > /root > > # cd /nfs-mount-dir > > [locks] > > > > If I try to reboot the reboot fails as well since FreeBSD can't unmount the > > filesystem!? > > Solaris provides mechanisms for NFS-failover for read-only NFS shares, but > FreeBSD doesn't seem to support that. Besides, most people seem to want to > use read/write filesystems, which makes the former solution not very useful to > most people's requirements. > > The solution to the problem is to make very certain that your primary NFS > server does not go down, ever, period. Reasonable people who identify a > mission-critical system such as a primary NFS server ought to be willing to > spend money to get really good hardware, have a UPS, and so forth to facility > the goal of 100% uptime. A Sun E450 still makes a nice primary fileserver, > although NAS solutions like a NetApp or an Auspex (not cheap!) should also be > considered. > > The other choice would be to switch from using NFS to using a distributed > filesystem which implements fileserver redundancy, such as AFS and it's > successor, DFS. > > -- > -Chuck > > ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS server fail-over - how do you do it?
We can live with the chance that a file write might fail as long as we can switch over to another NFS server if the primary fails. So amd will help us avoid the "client hung" issue? I will have to take a look. That is the worst thing of all when it comes to a failed NFS server. You can't even remotely reboot the NFS client! Someone has to power reset the damn thing. That's bad. On Sun, May 30, 2004 at 02:43:37AM -0500, adp wrote: > I am running a FreeBSD 4.9-REL NFS server. Once every several hours our main > NFS server replicates everything to a backup FreeBSD NFS server. We are okay > with the gap in time between replication. What we aren't sure about is how > to automate the fail-over between the primary to the secondary NFS server. > This is for a web cluster. Each client mounts several directories from the > NFS server. > > Let's say that our primary NFS server dies and just goes away. What then? > Are you periodically doing a mount or a file look-up of a mounted filesystem > to check if your NFS server died? If so are you just unmounting and > remounting everything using the backup NFS server? > > Just curious how this problem is being solved. If you're mounting those NFS partitions read/write, then there really isn't a good solution for this problem[1] -- you need your NFS server up and running 24x7. If you are NFS mounting those partitions read-only, then you can in principle construct a fail-over system between those servers. Some Unix OSes let you specify a list of servers in fstab(5) (eg. Solaris) and clients will mount from one or other of them. Unfortunately you can't do that with standard NFS mounts under FreeBSD. You could try using VRRP -- see the net/freevrrpd port for example -- but I'm not sure how well that would work if the system failed-over in the middle of an IO transaction. In any case -- certainly if your NFS partitions are read/write, but also for read-only, perhaps the best compromise is to use the automounter amd(8) This certainly does help with the 'nightmare filesystem' scenario, where loss of a server prevents the clients doing anything, even rebooting cleanly. You can create a limited and rudimentary form of failover by using role-base hostnames in your internal DNS -- eg nfsserv.example.com as a CNAME pointing at your main server, and then modify the DNS when you need the failover to occur. It's a bit clunky and needs manual intervention, but it beats having nothing at all. Cheers, Matthew [1] Well, I assume you haven't got the resources to set up a storage array with multiple servers accessing the same disk sets. -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS server fail-over - how do you do it?
adp wrote: One of my big problems right now is that if our primary NFS server goes down then everything using that NFS mount locks up. If I change to the mounted filesystem on the client then it stalls: # pwd /root # cd /nfs-mount-dir [locks] If I try to reboot the reboot fails as well since FreeBSD can't unmount the filesystem!? Solaris provides mechanisms for NFS-failover for read-only NFS shares, but FreeBSD doesn't seem to support that. Besides, most people seem to want to use read/write filesystems, which makes the former solution not very useful to most people's requirements. The solution to the problem is to make very certain that your primary NFS server does not go down, ever, period. Reasonable people who identify a mission-critical system such as a primary NFS server ought to be willing to spend money to get really good hardware, have a UPS, and so forth to facility the goal of 100% uptime. A Sun E450 still makes a nice primary fileserver, although NAS solutions like a NetApp or an Auspex (not cheap!) should also be considered. The other choice would be to switch from using NFS to using a distributed filesystem which implements fileserver redundancy, such as AFS and it's successor, DFS. -- -Chuck ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS server fail-over - how do you do it?
On Sun, May 30, 2004 at 02:43:37AM -0500, adp wrote: > I am running a FreeBSD 4.9-REL NFS server. Once every several hours our main > NFS server replicates everything to a backup FreeBSD NFS server. We are okay > with the gap in time between replication. What we aren't sure about is how > to automate the fail-over between the primary to the secondary NFS server. > This is for a web cluster. Each client mounts several directories from the > NFS server. > > Let's say that our primary NFS server dies and just goes away. What then? > Are you periodically doing a mount or a file look-up of a mounted filesystem > to check if your NFS server died? If so are you just unmounting and > remounting everything using the backup NFS server? > > Just curious how this problem is being solved. If you're mounting those NFS partitions read/write, then there really isn't a good solution for this problem[1] -- you need your NFS server up and running 24x7. If you are NFS mounting those partitions read-only, then you can in principle construct a fail-over system between those servers. Some Unix OSes let you specify a list of servers in fstab(5) (eg. Solaris) and clients will mount from one or other of them. Unfortunately you can't do that with standard NFS mounts under FreeBSD. You could try using VRRP -- see the net/freevrrpd port for example -- but I'm not sure how well that would work if the system failed-over in the middle of an IO transaction. In any case -- certainly if your NFS partitions are read/write, but also for read-only, perhaps the best compromise is to use the automounter amd(8) This certainly does help with the 'nightmare filesystem' scenario, where loss of a server prevents the clients doing anything, even rebooting cleanly. You can create a limited and rudimentary form of failover by using role-base hostnames in your internal DNS -- eg nfsserv.example.com as a CNAME pointing at your main server, and then modify the DNS when you need the failover to occur. It's a bit clunky and needs manual intervention, but it beats having nothing at all. Cheers, Matthew [1] Well, I assume you haven't got the resources to set up a storage array with multiple servers accessing the same disk sets. -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK pgp3LgQX3cSP5.pgp Description: PGP signature
Re: NFS server fail-over - how do you do it?
One of my big problems right now is that if our primary NFS server goes down then everything using that NFS mount locks up. If I change to the mounted filesystem on the client then it stalls: # pwd /root # cd /nfs-mount-dir [locks] If I try to reboot the reboot fails as well since FreeBSD can't unmount the filesystem!? How do I stop this from happening? I am using this to mount NFS filesystems: # mount -o bg,intr,soft ... - Original Message - From: "adp" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, May 30, 2004 2:43 AM Subject: NFS server fail-over - how do you do it? > I am running a FreeBSD 4.9-REL NFS server. Once every several hours our main > NFS server replicates everything to a backup FreeBSD NFS server. We are okay > with the gap in time between replication. What we aren't sure about is how > to automate the fail-over between the primary to the secondary NFS server. > This is for a web cluster. Each client mounts several directories from the > NFS server. > > Let's say that our primary NFS server dies and just goes away. What then? > Are you periodically doing a mount or a file look-up of a mounted filesystem > to check if your NFS server died? If so are you just unmounting and > remounting everything using the backup NFS server? > > Just curious how this problem is being solved. > > ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS server fail-over - how do you do it?
On Sun, 30 May 2004 02:43:37 -0500 "adp" <[EMAIL PROTECTED]> wrote: > Just curious how this problem is being solved. I cant say i've ever looked into it myself but id susjest an easy solution would be to have a cron script store run every now and again to ping the servers and change the mounts depending on what the responce is. also if your backup system is bespoke and can be modified you could use amd and have the script read stored data on nfs server availability so it can decide where to backup the data. -- Mike Woods IT Technician ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"