> We are running Solaris 10u4 is the log option in
> there?

Someone more familiar with the specifics of the ZFS releases will have to 
answer that.

> 
> If this ZIL disk also goes dead, what is the failure
> mode and recovery option then?

The ZIL should at a minimum be mirrored.  But since that won't give you as much 
redundancy as your main pool has, perhaps you should create a small 5-disk 
RAID-0 LUN sharing the disks of each RAID-5 LUN and mirror the log to all four 
of them:  even if one entire array box is lost, the other will still have a 
mirrored ZIL and all the RAID-5 LUNs will be the same size (not that I'd expect 
a small variation in size between the two pairs of LUNs to be a problem that 
ZFS couldn't handle:  can't it handle multiple disk sizes in a mirrored pool as 
long as each individual *pair* of disks matches?).

Having 4 copies of the ZIL on disks shared with the RAID-5 activity will 
compromise the log's performance, since each log write won't complete until the 
slowest copy finishes (i.e., congestion in either of the RAID-5 pairs could 
delay it).  It still should usually be faster than just throwing the log in 
with the rest of the RAID-5 data, though.

Then again, I see from your later comment that you have the same questions that 
I had about whether the results reported in 
http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on suggest that having 
a ZIL may not help much anyway (at least for your specific workload:  I can 
imagine circumstances in which performance of small, synchronous writes might 
be more critical than other performance, in which case separating them out 
could be useful).

> 
> We did get the 2540 fully populated with 15K 146-gig
> drives.  With 12 disks, and wanting to have at least
> ONE hot global spare in each array, and needing to
> keep LUNs the same size, you end up doing 2 5-disk
> RAID-5 LUNs and 2 hot spares in each array.  Not that
> I really need 2 spares I just didn't see any way to
> make good use of an extra disk in each array.  If we
> wanted to dedicate them instead to this ZIL need,
> what is best way to go about that?

As I noted above, you might not want to have less redundancy in the ZIL than 
you have in the main pool:  while the data in the ZIL is only temporary (until 
it gets written back to the main pool), there's a good chance that there will 
*always* be *some* data in it, so if you lost one array box entirely at least 
that small amount of data would be at the mercy of any failure on the log disk 
that made any portion of the log unreadable.

Now, if you could dedicate all four spare disks to the log (mirroring it 4 
ways) and make each box understand that it was OK to steal one of them to use 
as a hot spare should the need arise, that might give you reasonable protection 
(since then any increased exposure would only exist until the failed disk was 
manually replaced - and normally the other box would still hold two copies as 
well).  But I have no idea whether the box provides anything like that level of 
configurability.

...

> Hundreds of POP and IMAP user processes coming and
> going from users reading their mail.  Hundreds more
> LMTP processes from mail being delivered to the Cyrus
> mail-store.

And with 10K or more users a *lot* of parallelism in the workload - which is 
what I assumed given that you had over 1 TB of net email storage space (but I 
probably should have made that assumption more explicit, just in case it was 
incorrect).

  Sometimes writes predominate over reads,
> depends on time of day whether backups are running,
> etc.  The servers are T2000 with 16 gigs RAM so no
> shortage of room for ARC cache. I have turned off
> cache flush also pursuing performance.

>From Neil's comment in the blog entry that you referenced, that sounds *very* 
>dicey (at least by comparison with the level of redundancy that you've built 
>into the rest of your system) - even if you have rock-solid UPSs (which have 
>still been known to fail).  Allowing a disk to lie to higher levels of the 
>system (if indeed that's what you did by 'turning off cache flush') by saying 
>that it's completed a write when it really hasn't is usually a very bad idea, 
>because those higher levels really *do* make important assumptions based on 
>that information.

- bill
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to