> We are running Solaris 10u4 is the log option in > there? Someone more familiar with the specifics of the ZFS releases will have to answer that.
> > If this ZIL disk also goes dead, what is the failure > mode and recovery option then? The ZIL should at a minimum be mirrored. But since that won't give you as much redundancy as your main pool has, perhaps you should create a small 5-disk RAID-0 LUN sharing the disks of each RAID-5 LUN and mirror the log to all four of them: even if one entire array box is lost, the other will still have a mirrored ZIL and all the RAID-5 LUNs will be the same size (not that I'd expect a small variation in size between the two pairs of LUNs to be a problem that ZFS couldn't handle: can't it handle multiple disk sizes in a mirrored pool as long as each individual *pair* of disks matches?). Having 4 copies of the ZIL on disks shared with the RAID-5 activity will compromise the log's performance, since each log write won't complete until the slowest copy finishes (i.e., congestion in either of the RAID-5 pairs could delay it). It still should usually be faster than just throwing the log in with the rest of the RAID-5 data, though. Then again, I see from your later comment that you have the same questions that I had about whether the results reported in http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on suggest that having a ZIL may not help much anyway (at least for your specific workload: I can imagine circumstances in which performance of small, synchronous writes might be more critical than other performance, in which case separating them out could be useful). > > We did get the 2540 fully populated with 15K 146-gig > drives. With 12 disks, and wanting to have at least > ONE hot global spare in each array, and needing to > keep LUNs the same size, you end up doing 2 5-disk > RAID-5 LUNs and 2 hot spares in each array. Not that > I really need 2 spares I just didn't see any way to > make good use of an extra disk in each array. If we > wanted to dedicate them instead to this ZIL need, > what is best way to go about that? As I noted above, you might not want to have less redundancy in the ZIL than you have in the main pool: while the data in the ZIL is only temporary (until it gets written back to the main pool), there's a good chance that there will *always* be *some* data in it, so if you lost one array box entirely at least that small amount of data would be at the mercy of any failure on the log disk that made any portion of the log unreadable. Now, if you could dedicate all four spare disks to the log (mirroring it 4 ways) and make each box understand that it was OK to steal one of them to use as a hot spare should the need arise, that might give you reasonable protection (since then any increased exposure would only exist until the failed disk was manually replaced - and normally the other box would still hold two copies as well). But I have no idea whether the box provides anything like that level of configurability. ... > Hundreds of POP and IMAP user processes coming and > going from users reading their mail. Hundreds more > LMTP processes from mail being delivered to the Cyrus > mail-store. And with 10K or more users a *lot* of parallelism in the workload - which is what I assumed given that you had over 1 TB of net email storage space (but I probably should have made that assumption more explicit, just in case it was incorrect). Sometimes writes predominate over reads, > depends on time of day whether backups are running, > etc. The servers are T2000 with 16 gigs RAM so no > shortage of room for ARC cache. I have turned off > cache flush also pursuing performance. >From Neil's comment in the blog entry that you referenced, that sounds *very* >dicey (at least by comparison with the level of redundancy that you've built >into the rest of your system) - even if you have rock-solid UPSs (which have >still been known to fail). Allowing a disk to lie to higher levels of the >system (if indeed that's what you did by 'turning off cache flush') by saying >that it's completed a write when it really hasn't is usually a very bad idea, >because those higher levels really *do* make important assumptions based on >that information. - bill This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss