Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Frank Cusack wrote: On September 13, 2006 7:07:40 PM -0700 Richard Elling [EMAIL PROTECTED] wrote: Dale Ghent wrote: James C. McPherson wrote: As I understand things, SunCluster 3.2 is expected to have support for HA-ZFS and until that version is released you will not be running in a supported configuration and so any errors you encounter are *your fault alone*. Still, after reading Mathias's description, it seems that the former node is doing an implicit forced import when it boots back up. This seems wrong to me. Repeat the experiment with UFS, or most other file systems, on a raw device and you would get the same behaviour as ZFS: corruption. Again, the difference is that with UFS your filesystems won't auto mount at boot. If you repeated with UFS, you wouldn't try to mount until you decided you should own the disk. Normally on Solaris UFS filesystems are mounted via /etc/vfstab so yes the will probably automatically mount at boot time. If you are either removing them from vfstab, not having them there, or setting the 'mount at boot' flag in /etc/vfstab to off; then what ever it is that is doing that *is* your cluster framework. You need to rewrite/extend that to deal with the fact that ZFS doesn't use vfstab and instead express it in terms of ZFS import/export. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Darren Dunham wrote: Exactly. What method could such a framework use to ask ZFS to import a pool *now*, but not also automatically at next boot? (How does the upcoming SC do it?) I don't know how Sun Cluster does it and I don't know where the source is. As others have pointed out you could use the fully supported alternate root support for this. The zpool create -R and zpool import -R commands allow users to create and import a pool with a different root path. By default, whenever a pool is created or imported on a system, it is permanently added so that it is available whenever the system boots. For removable media, or when in recovery situations, this may not always be desirable. An alternate root pool does not persist on the system. Instead, ^ it exists only until exported or the system is rebooted, at which point it will have to be imported again. Sounds exactly like what is needed. As I said I don't know if this is what Sun Cluster does but that is a possible way to build an HA-ZFS solution, just remember not to have the scripts blindly do zpool import -Rf :-) -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
As others have pointed out you could use the fully supported alternate root support for this. The zpool create -R and zpool import -R commands allow Yes. I tried that. It should work well. In addition, I'm happy to note that '-R /' appears to be valid, allowing all the filenames to remain unchanged, but still giving the noautoremount behavior. Sounds exactly like what is needed. As I said I don't know if this is what Sun Cluster does but that is a possible way to build an HA-ZFS solution, just remember not to have the scripts blindly do zpool import -Rf :-) Sure. Anything that does had better be making its own determination of which host owns the pool independently. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Well, we are using the -f parameter to test failover functionality. If one system with mounted ZFS is down, we have to use the force to mount it on the failover system. But when the failed system comes online again, it remounts the ZFS without errors, so it is mounted simultanously on both nodes That's the real problem we have :[ mfg Mathias I think this is user error: the man page explicitly says: -f Forces import, even if the pool appears to be potentially active. y what you did. If the behaviour had been the same without the -f option, I guess this would be a bug. HTH This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Mathias F wrote: Well, we are using the -f parameter to test failover functionality. If one system with mounted ZFS is down, we have to use the force to mount it on the failover system. But when the failed system comes online again, it remounts the ZFS without errors, so it is mounted simultanously on both nodes ZFS currently doesn't support this, I'm sorry to say. *You* have to make sure that a zpool is not imported on more than one node at a time. regards -- Michael Schuster +49 89 46008-2974 / x62974 visit the online support center: http://www.sun.com/osc/ Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On Wed, Sep 13, 2006 at 12:28:23PM +0200, Michael Schuster wrote: Mathias F wrote: Well, we are using the -f parameter to test failover functionality. If one system with mounted ZFS is down, we have to use the force to mount it on the failover system. But when the failed system comes online again, it remounts the ZFS without errors, so it is mounted simultanously on both nodes This is used on a regularly basis within cluster frameworks... ZFS currently doesn't support this, I'm sorry to say. *You* have to make sure that a zpool is not imported on more than one node at a time. Why not using a real cluster-software as *You*, taking care of using resources like a filesystem (ufs, zfs, others...) in a consistent way? I think ZFS does enough to make shure not accidentially using filesystems/pools from more then one hosts at a time. If you want more, please consider using a cluster-framework with heartbeats and all that great stuff ... Regards, Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Without -f option, the ZFS can't be imported while reserved for the other host, even if that host is down. As I said, we are testing ZFS as a [b]replacement for VxVM[/b], which we are using atm. So as a result our tests have failed and we have to keep on using Veritas. Thanks for all your answers. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Mathias F wrote: Without -f option, the ZFS can't be imported while reserved for the other host, even if that host is down. As I said, we are testing ZFS as a [b]replacement for VxVM[/b], which we are using atm. So as a result our tests have failed and we have to keep on using Veritas. Thanks for all your answers. I think I get the whole picture, let me summarise: - you create a pool P and an FS on host A - Host A crashes - you import P on host B; this only works with -f, as zpool import otherwise refuses to do so. - now P is imported on B - host A comes back up and re-accesses P, thereby leading to (potential) corruption. - your hope was that when host A comes back, there exists a mechanism for telling it you need to re-import. - Vxvm, as you currently use it, has this functionality Is that correct? regards -- Michael Schuster +49 89 46008-2974 / x62974 visit the online support center: http://www.sun.com/osc/ Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Mathias F wrote: Without -f option, the ZFS can't be imported while reserved for the other host, even if that host is down. This is the correct behaviour. What do you want to cause? data corruption? As I said, we are testing ZFS as a [b]replacement for VxVM[/b], which we are using atm. So as a result our tests have failed and we have to keep on using Veritas. As I understand things, SunCluster 3.2 is expected to have support for HA-ZFS and until that version is released you will not be running in a supported configuration and so any errors you encounter are *your fault alone*. Didn't we have the PMC (poor man's cluster) talk last week as well? James C. McPherson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Hi Mathias, Mathias F wrote: Without -f option, the ZFS can't be imported while reserved for the other host, even if that host is down. As I said, we are testing ZFS as a [b]replacement for VxVM[/b], which we are using atm. So as a result our tests have failed and we have to keep on using Veritas. Sun Cluster 3.2, which is in beta at the moment, will allow you to do this automatically. I don't think what you are trying to do here will be supportable unless it's managed by SC3.2. Let me know if you'd like to try out the SC3.2 beta. Thanks, Zoram -- Zoram Thanga::Sun Cluster Development::http://blogs.sun.com/zoram ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
James C. McPherson wrote: As I understand things, SunCluster 3.2 is expected to have support for HA-ZFS and until that version is released you will not be running in a supported configuration and so any errors you encounter are *your fault alone*. Still, after reading Mathias's description, it seems that the former node is doing an implicit forced import when it boots back up. This seems wrong to me. zpools should be imported only of the zpool itself says it's not already taken, which of course would be overidden by a manual -f import. zpool sorry, i already have a boyfriend, host b host a darn, ok, maybe next time rather than the current scenario: zpool host a, I'm over you now. host b is now the man in my life! host a I don't care! you're coming with me anyways. you'll always be mine! * host a stuffs zpool into the car and drives off ...and we know those situations never turn out particularly well. /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On September 13, 2006 6:09:50 AM -0700 Mathias F [EMAIL PROTECTED] wrote: [...] a product which is *not* currently multi-host-aware to behave in the same safe manner as one which is. That`s the point we figured out while testing it ;) I just wanted to have our thoughts reviewed by other ZFS users. Our next steps IF the failover would have succeeded would be to create a little ZFS-agent for a VCS testing cluster. We haven't used Sun Cluster and won't use it in future. /etc/zfs/zpool.cache is used at boot time to find what pools to import. Remove it when the system boots and after it goes down and comes back up it won't import any pools. Not quite the same as not importing if they are imported elsewhere, but perhaps close enough for you. On September 13, 2006 10:15:28 PM +1000 James C. McPherson [EMAIL PROTECTED] wrote: As I understand things, SunCluster 3.2 is expected to have support for HA-ZFS and until that version is released you will not be running in a supported configuration and so any errors you encounter are *your fault alone*. Didn't we have the PMC (poor man's cluster) talk last week as well? I understand the objection to mickey mouse configurations, but I don't understand the objection to (what I consider) simply improving safety. Why again shouldn't zfs have a hostid written into the pool, to prevent import if the hostid doesn't match? And why should failover be limited to SC? Why shouldn't VCS be able to play? Why should SC have secrets on how to do failover? After all, this is OPENsolaris. And anyway many homegrown solutions (the kind I'm familiar with anyway) are of high quality compared to commercial ones. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On Wed, Sep 13, 2006 at 09:14:36AM -0700, Frank Cusack wrote: Why again shouldn't zfs have a hostid written into the pool, to prevent import if the hostid doesn't match? See: 6282725 hostname/hostid should be stored in the label Keep in mind that this is not a complete clustering solution - only a mechanism to prevent administrator misconfiguration. In particular, it's possible for one host to be doing a failover, and the other host open the pool before the hostid has been written to the disk. And why should failover be limited to SC? Why shouldn't VCS be able to play? Why should SC have secrets on how to do failover? After all, this is OPENsolaris. And anyway many homegrown solutions (the kind I'm familiar with anyway) are of high quality compared to commercial ones. I'm not sure I understand this. There is no built-in clustering support for UFS - simultaneously mounting the same UFS filesystem on different hosts will corrupt your data as well. You need some sort of higher level logic to correctly implement clustering. This is not a SC secret - it's how you manage non-clustered filesystems in a failover situation. Storing the hostid as a last-ditch check for administrative error is a reasonable RFE - just one that we haven't yet gotten around to. Claiming that it will solve the clustering problem oversimplifies the problem and will lead to people who think they have a 'safe' homegrown failover when in reality the right sequence of actions will irrevocably corrupt their data. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On September 13, 2006 9:32:50 AM -0700 Eric Schrock [EMAIL PROTECTED] wrote: On Wed, Sep 13, 2006 at 09:14:36AM -0700, Frank Cusack wrote: Why again shouldn't zfs have a hostid written into the pool, to prevent import if the hostid doesn't match? See: 6282725 hostname/hostid should be stored in the label Keep in mind that this is not a complete clustering solution - only a mechanism to prevent administrator misconfiguration. In particular, it's possible for one host to be doing a failover, and the other host open the pool before the hostid has been written to the disk. And why should failover be limited to SC? Why shouldn't VCS be able to play? Why should SC have secrets on how to do failover? After all, this is OPENsolaris. And anyway many homegrown solutions (the kind I'm familiar with anyway) are of high quality compared to commercial ones. I'm not sure I understand this. There is no built-in clustering support for UFS - simultaneously mounting the same UFS filesystem on different hosts will corrupt your data as well. You need some sort of higher level logic to correctly implement clustering. This is not a SC secret - it's how you manage non-clustered filesystems in a failover situation. But UFS filesystems don't automatically get mounted (well, we know how to not automatically mount them in /etc/vfstab). The SC secret is in how importing of pools is prevented at boot time. Of course you need more than that, but my complaint was against the idea that you cannot build a reliable solution yourself, instead of just sharing info about zpool.cache albeit with a warning. Storing the hostid as a last-ditch check for administrative error is a reasonable RFE - just one that we haven't yet gotten around to. Claiming that it will solve the clustering problem oversimplifies the problem and will lead to people who think they have a 'safe' homegrown failover when in reality the right sequence of actions will irrevocably corrupt their data. Thanks for that clarification, very important info. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On Sep 13, 2006, at 12:32 PM, Eric Schrock wrote: Storing the hostid as a last-ditch check for administrative error is a reasonable RFE - just one that we haven't yet gotten around to. Claiming that it will solve the clustering problem oversimplifies the problem and will lead to people who think they have a 'safe' homegrown failover when in reality the right sequence of actions will irrevocably corrupt their data. HostID is handy, but it'll only tell you who MIGHT or MIGHT NOT have control of the pool. Such an RFE would even more worthwhile if it included something such as a time stamp. This time stamp (or similar time-oriented signature) would be updated regularly (bases on some internal ZFS event). If this stamp goes for an arbitrary length of time without being updated, another host in the cluster could force import it on the assumption that the original host is no longer able to communicate to the zpool. This is a simple idea description, but perhaps worthwhile if you're already going to change the label structure for adding the hostid. /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On September 13, 2006 1:28:47 PM -0400 Dale Ghent [EMAIL PROTECTED] wrote: On Sep 13, 2006, at 12:32 PM, Eric Schrock wrote: Storing the hostid as a last-ditch check for administrative error is a reasonable RFE - just one that we haven't yet gotten around to. Claiming that it will solve the clustering problem oversimplifies the problem and will lead to people who think they have a 'safe' homegrown failover when in reality the right sequence of actions will irrevocably corrupt their data. HostID is handy, but it'll only tell you who MIGHT or MIGHT NOT have control of the pool. Such an RFE would even more worthwhile if it included something such as a time stamp. This time stamp (or similar time-oriented signature) would be updated regularly (bases on some internal ZFS event). If this stamp goes for an arbitrary length of time without being updated, another host in the cluster could force import it on the assumption that the original host is no longer able to communicate to the zpool. This is a simple idea description, but perhaps worthwhile if you're already going to change the label structure for adding the hostid. Sounds cool! Better than depending on an out-of-band heartbeat. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Frank Cusack wrote: Sounds cool! Better than depending on an out-of-band heartbeat. I disagree it sounds really really bad. If you want a high availability cluster you really need a faster interconnect than spinning rust which is probably the slowest interface we have now! -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On Sep 13, 2006, at 1:37 PM, Darren J Moffat wrote: That might be acceptable in some environments but that is going to cause disks to spin up. That will be very unacceptable in a laptop and maybe even in some energy conscious data centres. Introduce an option to 'zpool create'? Come to think of it, describing attributes for a pool seems to be lacking (unlike zfs volumes) What you are proposing sounds a lot like a cluster hear beat which IMO really should not be implemented by writing to disks. That would be an extreme example of the use for this. While it / could/ be used as a heart beat mechanism, it would be useful administratively. # zpool status foopool Pool foopool is currently imported by host.blah.com Import time: 4 April 2007 16:20:00 Last activity: 23 June 2007 18:42:53 ... ... /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On Wed, Sep 13, 2006 at 06:37:25PM +0100, Darren J Moffat wrote: Dale Ghent wrote: On Sep 13, 2006, at 12:32 PM, Eric Schrock wrote: Storing the hostid as a last-ditch check for administrative error is a reasonable RFE - just one that we haven't yet gotten around to. Claiming that it will solve the clustering problem oversimplifies the problem and will lead to people who think they have a 'safe' homegrown failover when in reality the right sequence of actions will irrevocably corrupt their data. HostID is handy, but it'll only tell you who MIGHT or MIGHT NOT have control of the pool. Such an RFE would even more worthwhile if it included something such as a time stamp. This time stamp (or similar time-oriented signature) would be updated regularly (bases on some internal ZFS event). If this stamp goes for an arbitrary length of time without being updated, another host in the cluster could force import it on the assumption that the original host is no longer able to communicate to the zpool. That might be acceptable in some environments but that is going to cause disks to spin up. That will be very unacceptable in a laptop and maybe even in some energy conscious data centres. What you are proposing sounds a lot like a cluster hear beat which IMO really should not be implemented by writing to disks. Wouldn't it be possible to implement this via SCSI reservations (where available) a la quorum devices? Ceri -- That must be wonderful! I don't understand it at all. -- Moliere pgpbrlHYCwiGr.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Frank Cusack wrote: ...[snip James McPherson's objections to PMC] I understand the objection to mickey mouse configurations, but I don't understand the objection to (what I consider) simply improving safety. ... And why should failover be limited to SC? Why shouldn't VCS be able to play? Why should SC have secrets on how to do failover? After all, this is OPENsolaris. And anyway many homegrown solutions (the kind I'm familiar with anyway) are of high quality compared to commercial ones. Frank, this isn't a SunCluster vs VCS argument. It's an argument about * doing cluster-y stuff with the protection that a cluster framework provides versus * doing cluster-y stuff without the protection that a cluster framework provides If you want to use VCS be my guest, and let us know how it goes. If you want to use a homegrown solution, then please let us know what you did to get it working, how well it copes and how you are addressing any data corruption that might occur. I tend to refer to SunCluster more than VCS simply because I've got more in depth experience with Sun's offering. James C. McPherson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On September 13, 2006 4:33:31 PM -0700 Frank Cusack [EMAIL PROTECTED] wrote: You'd typically have a dedicated link for heartbeat, what if that cable gets yanked or that NIC port dies. The backup system could avoid mounting the pool if zfs had its own heartbeat. What if the cluster software has a bug and tells the other system to take over? zfs could protect itself. hmm actually probably not considering heartbeat intervals and failover time vs. probable zpool update frequency. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
Dale Ghent wrote: James C. McPherson wrote: As I understand things, SunCluster 3.2 is expected to have support for HA-ZFS and until that version is released you will not be running in a supported configuration and so any errors you encounter are *your fault alone*. Still, after reading Mathias's description, it seems that the former node is doing an implicit forced import when it boots back up. This seems wrong to me. Repeat the experiment with UFS, or most other file systems, on a raw device and you would get the same behaviour as ZFS: corruption. The question on the table is why doesn't ZFS behave like a cluster-aware volume manager not why does ZFS behave like UFS when 2 nodes mount the same file system simultaneously? -- richard -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...
On September 13, 2006 7:07:40 PM -0700 Richard Elling [EMAIL PROTECTED] wrote: Dale Ghent wrote: James C. McPherson wrote: As I understand things, SunCluster 3.2 is expected to have support for HA-ZFS and until that version is released you will not be running in a supported configuration and so any errors you encounter are *your fault alone*. Still, after reading Mathias's description, it seems that the former node is doing an implicit forced import when it boots back up. This seems wrong to me. Repeat the experiment with UFS, or most other file systems, on a raw device and you would get the same behaviour as ZFS: corruption. Again, the difference is that with UFS your filesystems won't auto mount at boot. If you repeated with UFS, you wouldn't try to mount until you decided you should own the disk. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss