Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

Ross Smith Sat, 30 Aug 2008 12:33:42 -0700

Triple mirroring you say?  That'd be me then :D

The reason I really want to get ZFS timeouts sorted is that our long term goal 
is to mirror that over two servers too, giving us a pool mirrored across two 
servers, each of which is actually a zfs iscsi volume hosted on triply mirrored 
disks.

Oh, and we'll have two sets of online off-site backups running raid-z2, plus a 
set of off-line backups too.

All in all I'm pretty happy with the integrity of the data, wouldn't want to 
use anything other than ZFS for that now.  I'd just like to get the 
availability working a bit better, without having to go back to buying raid 
controllers.  We have big plans for that too; once we get the iSCSI / iSER 
timeout issue sorted our long term availability goals are to have the setup I 
mentioned above hosted out from a pair of clustered Solaris NFS / CIFS servers.

Failover time on the cluster is currently in the order of 5-10 seconds, if I 
can get the detection of a bad iSCSI link down under 2 seconds we'll 
essentially have a worst case scenario of < 15 seconds downtime.  Downtime that 
low means it's effectively transparent for our users as all of our applications 
can cope with that seamlessly, and I'd really love to be able to do that this 
calendar year.

Anyway, getting back on topic, it's a good point about moving forward while 
redundancy exists.  I think the flag for specifying the write behavior should 
have that as the default, with the optional setting being to allow the pool to 
continue accepting writes while the pool is in a non redundant state.

Ross

> Date: Sat, 30 Aug 2008 10:59:19 -0500
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> CC: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / 
> driver failure better
> 
> On Sat, 30 Aug 2008, Ross wrote:
> > while the problem is diagnosed. - With that said, could the write 
> > timeout default to on when you have a slog device?  After all, the 
> > data is safely committed to the slog, and should remain there until 
> > it's written to all devices.  Bob, you seemed the most concerned 
> > about writes, would that be enough redundancy for you to be happy to 
> > have this on by default?  If not, I'd still be ok having it off by 
> > default, we could maybe just include it in the evil tuning guide 
> > suggesting that this could be turned on by anybody who has a 
> > separate slog device.
> 
> It is my impression that the slog device is only used for synchronous 
> writes.  Depending on the system, this could be just a small fraction 
> of the writes.
> 
> In my opinion, ZFS's primary goal is to avoid data loss, or 
> consumption of wrong data.  Availability is a lesser goal.
> 
> If someone really needs maximum availability then they can go to 
> triple mirroring or some other maximally redundant scheme.  ZFS should 
> to its best to continue moving forward as long as some level of 
> redundancy exists.  There could be an option to allow moving forward 
> with no redundancy at all.
> 
> Bob
> ======================================
> Bob Friesenhahn
> [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> 

_________________________________________________________________
Win a voice over part with Kung Fu Panda & Live Search   and   100’s of Kung Fu 
Panda prizes to win with Live Search
http://clk.atdmt.com/UKM/go/107571439/direct/01/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

Reply via email to