Re: [zfs-discuss] # devices in raidz.
Hello ozan, Friday, November 3, 2006, 3:57:00 PM, you wrote: osy for s10u2, documentation recommends 3 to 9 devices in raidz. what is the osy basis for this recommendation? i assume it is performance and not failure osy resilience, but i am just guessing... [i know, recommendation was intended osy for people who know their raid cold, so it needed no further explanation] Performance reason for random reads. ps. however the bigger raid-z group the more risky it could be - but this is obvious. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Default zpool on Thumpers
Richard Elling - PAE wrote: Robert Milkowski wrote: I almost completely agree with your points 1-5, except that I think that having at least one hot spare by default would be better than having none at all - especially with SATA drives. Yes, I pushed for it, but didn't win. In a perfect world one could simply pull one of the raidz1 groups out of the pool and allocate hot spares out of it. That way you're one or two commands away from the most space config to lots of redundancy config. Not that recreating the pool is a lot of work but I can see some folks just using the box for awhile and then thinking, Heyno hot spares? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS Performance Question
Jay Grogan wrote: The V120 has 4GB of RAM , on the HDS side we are in a RAID 5 on the LUN and not shairing any ports on the MCdata, but with so much cache we aren't close to taxing the disk. Are you sure? At some point data has to get flushed from the cache to the drives themselves. In most of the arrays I looked at, granted this was a while ago, the cache could only stay dirty for so long before it was flushed no matter how much cache was in use. Also, if your workload looks sequential to the HDS box it will just push your data right past the cache to the drives themselves. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zfs sharenfs inheritance
An alternate way will be to use NFSv4. When an NFSv4 client crosses a mountpoint on the server, it can detect this and mount the filesystem. It can feel like a lite version of the automounter in practice, as you just have to mount the root and discover the filesystems as needed. The Solaris NFSv4 client can't do this yet. Any news on when the Solaris NFSv4 client will be able to do this? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: zfs sharenfs inheritance
Chris Gerhard wrote: An alternate way will be to use NFSv4. When an NFSv4 client crosses a mountpoint on the server, it can detect this and mount the filesystem. It can feel like a lite version of the automounter in practice, as you just have to mount the root and discover the filesystems as needed. The Solaris NFSv4 client can't do this yet. Any news on when the Solaris NFSv4 client will be able to do this? We have someone actively working on it, so sooner than later. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] # devices in raidz.
ozan s. yigit wrote: for s10u2, documentation recommends 3 to 9 devices in raidz. what is the basis for this recommendation? i assume it is performance and not failure resilience, but i am just guessing... [i know, recommendation was intended for people who know their raid cold, so it needed no further explanation] Both actually. The small, random read performance will approximate that of a single disk. The probability of data loss increases as you add disks to a RAID-5/6/Z/Z2 volumes. For example, suppose you have 12 disks and insist on RAID-Z. Given 1. small, random read iops for a single disk is 141 (eg. 2.5 SAS 10k rpm drive) 2. MTBF = 1.4M hours (0.63% AFR) (so says the disk vendor) 3. no spares 4. service time = 24 hours, resync rate 100 GBytes/hr, 50% space utilization 5. infinite service life Scenario 1: 12-way RAID-Z performance = 141 iops MTTDL[1] = 68,530 years space = 11 * disk size Scenario 2: 2x 6-way RAID-Z+0 performance = 282 iops MTTDL[1] = 150,767 years space = 10 * disk size [1] Using MTTDL = MTBF^2 / (N * (N-1) * MTTR) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Overview (rollup) of recent activity on zfs-discuss
For background on what this is, see: http://www.opensolaris.org/jive/message.jspa?messageID=24416#24416 http://www.opensolaris.org/jive/message.jspa?messageID=25200#25200 = zfs-discuss 10/16 - 10/31 = Size of all threads during period: Thread size Topic --- - 27 Mirrored Raidz 17 ZFS Performance Question 17 Current status of a ZFS root 15 Self-tuning recordsize 14 ZFS hangs systems during copy 13 ZFS Inexpensive SATA Whitebox 13 Snapshots impact on performance 12 ZFS and dual-pathed A5200 12 Configuring a 3510 for ZFS 12 Changing number of disks in a RAID-Z? 10 What is touching my filesystems? 10 Cloning a disk w/ ZFS in it 9 ZFS RAID-10 9 Recommended Minimum Hardware for ZFS Fileserver? 8 chmod A= on ZFS != chmod A=... on UFS 7 zone with lofs zfs - why legacy 7 thousands of ZFS file systems 7 copying a large file.. 7 ZFS, home and Linux 6 zpool snapshot fails on unmounted filesystem 6 recover zfs data from a crashed system? 6 ZFS and IBM sdd (vpath) 5 zpool import takes to long with large numbers of file systems 5 adding to a raidz pool and its discontents 5 ZFS Automatic Device Error Notification? 5 ZFS ACLs and Samba 5 Panic while scrubbing 4 zpool question. 4 zfs: zvols minor #'s changing and causing probs w/ volumes 4 Very high system loads with ZFS 4 ENOSPC : No space on file deletion 3 zpool status - very slow during heavy IOs 3 zfs sharenfs inheritance 3 zfs set sharenfs=on 3 zfs on removable drive 3 experiences with zpool errors and glm flipouts 3 determining raidz pool configuration 3 ZFS import Soft-RAID 3 Thumper and ZFS 3 Porting ZFS file system to FreeBSD. 3 Migrating vdevs to new pools 3 Best version of Solaris 10 fro ZFS ? 3 ?: ZFS and POSIX 2 zpool history integrated 2 s10u3 query 2 ZFS thinks my 7-disk pool has imaginary disks 2 ZFS panics with I/O failure 2 ZFS Usability issue : improve means of finding ZFS-physdevice(s) mapping 2 Planing to use ZFS in production.. some queries... 2 Oracle raw volumes 2 ?: cyclical kernel/system processing (approx every 4 minutes) 1 zpool iostat - 0 read operations? 1 zfs and zones 1 panic during recv 1 legato support 1 Zfs Performance with millions of small files in Sendmail messaging environment] 1 Oracle 11g Performace 1 Help message translation quiete strange (nv 50) 1 Disregard: determining raidz pool configuration Posting activity by person for period: # of posts By -- -- 28 rmilkowski at task.gda.pl (robert milkowski) 27 richard.elling at sun.com (richard elling - pae) 16 matthew.ahrens at sun.com (matthew ahrens) 16 fcusack at fcusack.com (frank cusack) 14 torrey.mcmahon at sun.com (torrey mcmahon) 9 eric.schrock at sun.com (eric schrock) 7 white.wristband at gmail.com (jeremy teo) 7 wes at classiarius.com (wes williams) 6 mark.shellenbaum at sun.com (mark shellenbaum) 6 jonathan.edwards at sun.com (jonathan edwards) 6 eric.kustarz at sun.com (eric kustarz) 6 daleg at elemental.org (dale ghent) 5 ndellofano at apple.com (=?iso-8859-1?q?no=ebl_dellofano?=) 5 jk at tools.de (=?utf-8?q?j=c3=bcrgen_keil?=) 5 erblichs at earthlink.net (erblichs) 5 ddunham at taos.com (darren dunham) 5 darren.moffat at sun.com (darren j moffat) 4 solaris at deadcafe.de (daniel rock) 4 rasputnik at gmail.com (dick davies) 4 oz at somanetworks.com (ozan s. yigit) 4 milek at task.gda.pl (robert milkowski) 4 llonergan at greenplum.com (luke lonergan) 4 exitware at gmail.com (siegfried nikolaivich) 4 darren.reed at sun.com (darren reed) 4 chris.gerhard at sun.com (chris gerhard) 3 wonko at 4amlunch.net (brian hechinger) 3 vadud3 at gmail.com (asif iqbal) 3 roch.bourbonnais at sun.com (roch) 3 roch.bourbonnais at sun.com (roch - pae) 3 opensolaris at posix.brte.com.br (msl) 3 ocalagan at verizon.net (edmundo
[zfs-discuss] zfs receive into zone?
If I add a ZFS dataset to a zone, and then want to zfs send from another computer into a file system that the zone has created in that data set, can I zfs send to the zone, or can I send to that zone's global zone, or will either of those work? -- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.com OS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: reproducible zfs panic on Solaris 10 06/06
Matthew Flanagan wrote: Matt, Matthew Flanagan wrote: mkfile 100m /data zpool create tank /data ... rm /data ... panic[cpu0]/thread=2a1011d3cc0: ZFS: I/O failure (write on unknown off 0: zio 60007432bc0 [L0 unallocated] 4000L/400P DVA[0]=0:b000:400 DVA[1]=0:120a000:400 fletcher4 lzjb BE contiguous birth=6 fill=0 cksum=672165b9e7:328e78ae25fd:ed007c9008f5f:34c05b1090 0b668): error 6 ... is there a fix for this? Um, don't do that? This is a known bug that we're working on. What is the bugid for this an ETA for fix? 6417779 ZFS: I/O failure (write on ...) -- need to reallocate writes and 6322646 ZFS should gracefully handle all devices failing (when writing) These bugs are actively being worked on, but it will probably be a while before fixes appear. -Mark I'm extremely surprised that this kind of bug can make it into a Solaris release. This is the second zfs related panic that I've found in testing it in our labs. The first was caused to the system to panic when the ZFS volume got close to 100% full (Sun case id #10914593). I've just replicated this panic with a USB flash drive as well by creating the zpool and then yanking the drive out. This is probably a common situation for desktop/laptop users who would not be impressed that their otherwise robust Solaris system crashed. regards matthew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS/NFS issue...
I actually think this is an NFSv4 issue, but I'm going to ask here anyway... Server:Solaris 10 Update 2 (SPARC), with several ZFS file systems shared via the legacy method (/etc/dfs/dfstab and share(1M), not via the ZFS property). Default settings in /etc/default/nfs bigbox# share - /data/archive rw,anon=0 bigbox# ls -ld /data/archive drwxrwxrwx 9 root other 10 Nov 3 14:15 /data/archived Client A: Solaris 10 (various patchlevels, both x86 and SPARC) user1% cd /net/bigbox/data/archived user1% ls -ld . drwxrwxrwx 9 nobody nobody10 Nov 3 14:49 ./ user1% touch me user1% mkdir foo mkdir: Failed to make directory foo; Permission denied Client B: Solaris 8/9, various Linuxes, both x86/SPARC user1% cd /net/bigbox/data/archived user1% ls -ld . drwxrwxrwx 9 root other 11 Nov 3 14:49 ./ user1% touch me user1% mkdir foo It looks like the Solaris 10 machines aren't mapping the userIDs correctly. All machines belong to the same NIS domain. I suspect NFSv4, but can't be sure. Am I doing something wrong here? -- Erik Trimble Java System Support Mailstop: usca14-102 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/NFS issue...
Erik Trimble wrote: I actually think this is an NFSv4 issue, but I'm going to ask here anyway... Server:Solaris 10 Update 2 (SPARC), with several ZFS file systems shared via the legacy method (/etc/dfs/dfstab and share(1M), not via the ZFS property). Default settings in /etc/default/nfs bigbox# share - /data/archive rw,anon=0 bigbox# ls -ld /data/archive drwxrwxrwx 9 root other 10 Nov 3 14:15 /data/archived Client A: Solaris 10 (various patchlevels, both x86 and SPARC) user1% cd /net/bigbox/data/archived user1% ls -ld . drwxrwxrwx 9 nobody nobody10 Nov 3 14:49 ./ user1% touch me user1% mkdir foo mkdir: Failed to make directory foo; Permission denied Client B: Solaris 8/9, various Linuxes, both x86/SPARC user1% cd /net/bigbox/data/archived user1% ls -ld . drwxrwxrwx 9 root other 11 Nov 3 14:49 ./ user1% touch me user1% mkdir foo It looks like the Solaris 10 machines aren't mapping the userIDs correctly. All machines belong to the same NIS domain. I suspect NFSv4, but can't be sure. Am I doing something wrong here? Make sure you're NFSv4 mapid domain matches (client and server). http://blogs.sun.com/erickustarz/entry/nfsmapid_domain You can override the default in /etc/default/nfs, and can check what your current one is /var/run/nfs4_domain. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] # devices in raidz.
On Fri, 3 Nov 2006, Richard Elling - PAE wrote: ozan s. yigit wrote: for s10u2, documentation recommends 3 to 9 devices in raidz. what is the basis for this recommendation? i assume it is performance and not failure resilience, but i am just guessing... [i know, recommendation was intended for people who know their raid cold, so it needed no further explanation] Both actually. The small, random read performance will approximate that of a single disk. The probability of data loss increases as you add disks to a RAID-5/6/Z/Z2 volumes. For example, suppose you have 12 disks and insist on RAID-Z. Given 1. small, random read iops for a single disk is 141 (eg. 2.5 SAS 10k rpm drive) 2. MTBF = 1.4M hours (0.63% AFR) (so says the disk vendor) 3. no spares 4. service time = 24 hours, resync rate 100 GBytes/hr, 50% space utilization 5. infinite service life Scenario 1: 12-way RAID-Z performance = 141 iops MTTDL[1] = 68,530 years space = 11 * disk size Scenario 2: 2x 6-way RAID-Z+0 performance = 282 iops MTTDL[1] = 150,767 years space = 10 * disk size [1] Using MTTDL = MTBF^2 / (N * (N-1) * MTTR) But ... I'm not sure I buy into your numbers given the probability that more than one disk will fail inside the service window - given that the disks are identical? Or ... a disk failure occurs at 5:01 PM (quitting time) on a Friday and won't be replaced until 8:00AM on Monday morning. Does the failure data you have access to support my hypothesis that failures of identical mechanical systems tend to occur in small clusters within a relatively small window of time? Call me paranoid, but I'd prefer to see a product like thumper configured with 50% of the disks manufactured by vendor A and the other 50% manufactured by someone else. This paranoia is based on a personal experience, many years ago (before we had smart fans etc), where we had a rack full of expensive custom equipment cooled by (what we thought was) a highly redundant group of 5 fans. One fan suffered infant mortality and its failure went unnoticed, leaving 4 fans running. Two of the fans died on the same extended weekend (public holiday). It was an expensive and embarassing disaster. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/NFS issue...
Don't forget to restart mapid after modifying default domain in /etc/default/nfs. As root, run svcadm restart svc:/network/nfs/mapid. I've run into this in the past. Karen eric kustarz wrote: Erik Trimble wrote: I actually think this is an NFSv4 issue, but I'm going to ask here anyway... Server:Solaris 10 Update 2 (SPARC), with several ZFS file systems shared via the legacy method (/etc/dfs/dfstab and share(1M), not via the ZFS property). Default settings in /etc/default/nfs bigbox# share -/data/archiverw,anon=0 bigbox# ls -ld /data/archive drwxrwxrwx 9 root other 10 Nov 3 14:15 /data/archived Client A: Solaris 10 (various patchlevels, both x86 and SPARC) user1% cd /net/bigbox/data/archived user1% ls -ld . drwxrwxrwx 9 nobody nobody10 Nov 3 14:49 ./ user1% touch me user1% mkdir foo mkdir: Failed to make directory foo; Permission denied Client B: Solaris 8/9, various Linuxes, both x86/SPARC user1% cd /net/bigbox/data/archived user1% ls -ld . drwxrwxrwx 9 root other 11 Nov 3 14:49 ./ user1% touch me user1% mkdir foo It looks like the Solaris 10 machines aren't mapping the userIDs correctly. All machines belong to the same NIS domain. I suspect NFSv4, but can't be sure. Am I doing something wrong here? Make sure you're NFSv4 mapid domain matches (client and server). http://blogs.sun.com/erickustarz/entry/nfsmapid_domain You can override the default in /etc/default/nfs, and can check what your current one is /var/run/nfs4_domain. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Filebench, X4200 and Sun Storagetek 6140
Hi there I'm busy with some tests on the above hardware and will post some scores soon. For those that do _not_ have the above available for tests, I'm open to suggestions on potential configs that I could run for you. Pop me a mail if you want something specific _or_ you have suggestions concerning filebench (varmail) config setup. Cheers This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] # devices in raidz.
Al Hopper wrote: [1] Using MTTDL = MTBF^2 / (N * (N-1) * MTTR) But ... I'm not sure I buy into your numbers given the probability that more than one disk will fail inside the service window - given that the disks are identical? Or ... a disk failure occurs at 5:01 PM (quitting time) on a Friday and won't be replaced until 8:00AM on Monday morning. Does the failure data you have access to support my hypothesis that failures of identical mechanical systems tend to occur in small clusters within a relatively small window of time? Separating the right hand side: MTTDL = MTBF/N * MTBF/(N-1)*MTTR the right-most product is the probability that one of the N-1 disks fail during the recovery window for the first disk's failure. As the MTTR increases, the probability of the 2nd disk failure also increases. RAIDoptimizer calculates the MTTR as: MTTR = service response time + resync time where resync time = size * space used (%) / resync rate Incidentally, since ZFS schedules the resync iops itself, then it can really move along on a mostly idle system. You should be able to resync at near the media speed for an idle system. By contrast, a hardware RAID array has no knowledge of the context of the data or the I/O scheduling, so they will perform resyncs using a throttle. Not only do they end up resyncing unused space, but they also take a long time (4-18 GBytes/hr for some arrays) and thus expose you to a higher probability of second disk failure. Call me paranoid, but I'd prefer to see a product like thumper configured with 50% of the disks manufactured by vendor A and the other 50% manufactured by someone else. Diversity is usually a good thing. Unfortunately, this is often impractical for a manufacturer. This paranoia is based on a personal experience, many years ago (before we had smart fans etc), where we had a rack full of expensive custom equipment cooled by (what we thought was) a highly redundant group of 5 fans. One fan suffered infant mortality and its failure went unnoticed, leaving 4 fans running. Two of the fans died on the same extended weekend (public holiday). It was an expensive and embarassing disaster. Modelling such as this assumes independence of failures. Common cause or bad lots are not that hard to model, but you may never find any failure rate data for them. You can look at the MTBF sensitivities, though that is an opening to another set of results. I prefer to ignore the absolute values and judge competing designs by their relative results. To wit, I fully expect to be beyond dust in 150,767 years, and the expected lifetime of most disks is 5 years. But given two competing designs using the same model, a design predicting and MTTDL 150,767 years will very likely demonstrate better MTTDL than a design predicting 68,530 years. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filebench, X4200 and Sun Storagetek 6140
Hi Louwtjie, Are you running FC or SATA-II disks in the 6140? How many spindles too? Best Regards, Jason On 11/3/06, Louwtjie Burger [EMAIL PROTECTED] wrote: Hi there I'm busy with some tests on the above hardware and will post some scores soon. For those that do _not_ have the above available for tests, I'm open to suggestions on potential configs that I could run for you. Pop me a mail if you want something specific _or_ you have suggestions concerning filebench (varmail) config setup. Cheers This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive into zone?
Jeff Victor wrote: If I add a ZFS dataset to a zone, and then want to zfs send from another computer into a file system that the zone has created in that data set, can I zfs send to the zone, or can I send to that zone's global zone, or will either of those work? I believe that the 'zfs send' can be done from either the global or local zone just fine. You can certainly do it from the local zone. FYI, if you are doing a 'zfs recv' into a filesystem that's been designated to a zone, you should do the 'zfs recv' inside the zone. (I think it's possible to do the 'zfs recv' in the global zone, but I think you'll have to first make sure that it isn't mounted in the local zone. This is because the global zone doesn't know how to go into the local zone and unmount it.) --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss