[zfs-discuss] ZFS Dataset lost structure
After a crash, in my zpool tree, some dataset report this we i do a ls -la: brwxrwxrwx 2 777 root 0, 0 Oct 18 2009 mail-cts also if i set zfs set mountpoint=legacy dataset and then i mount the dataset to other location before the directory tree was only : dataset - vdisk.raw The file was a backing device of a Xen VM, but i cannot access the directory structure of this dataset. However i can send a snapshot of this dataset to another system, but the same behavior occurs. If i do zdb - dataset at the end of the output i can se the references to my file: Object lvl iblk dblk dsize lsize %full type 7516K 128K 149G 256G 58.26 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 2097152 path/vdisk.raw uid 777 gid 60001 atime Sun Oct 18 00:49:05 2009 mtime Thu Sep 9 16:22:14 2010 ctime Thu Sep 9 16:22:14 2010 crtime Sun Oct 18 00:49:05 2009 gen 53 mode100777 size274877906945 parent 3 links 1 pflags 4080104 xattr 0 rdev0x if i further investigate: zdb -d dataset 7 Dataset store/nfs/ICLOS/prod/mail-cts [ZPL], ID 4525, cr_txg 91826, 149G, 5 objects, rootbp DVA[0]=0:6654f24000:200 DVA[1]=1:1a1e3c3600:200 [L0 D MU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=182119L/182119P fill=5 cksum=177e7dd4cd:81ae6d143ee:1782c972431a0:2f927ca7 a1de2c Object lvl iblk dblk dsize lsize %full type 7516K 128K 149G 256G 58.26 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 2097152 path/vdisk.raw uid 777 gid 60001 atime Sun Oct 18 00:49:05 2009 mtime Thu Sep 9 16:22:14 2010 ctime Thu Sep 9 16:22:14 2010 crtime Sun Oct 18 00:49:05 2009 gen 53 mode100777 size274877906945 parent 3 links 1 pflags 4080104 xattr 0 rdev0x Indirect blocks: 0 L4 1:6543e22800:400 4000L/400P F=1221767 B=177453/177453 0 L31:65022f8a00:2000 4000L/2000P F=1221766 B=177453/177453 0 L2 1:65325a0400:1c00 4000L/1c00P F=16229 B=177453/177453 0L1 1:6530718400:1600 4000L/1600P F=128 B=177453/177453 0 L0 0:433c473a00:2 2L/2P F=1 B=177453/177453 2 L0 1:205c471600:2 2L/2P F=1 B=91830/91830 4 L0 0:3c418ac600:2 2L/2P F=1 B=91830/91830 6 L0 0:3c418cc600:2 2L/2P F=1 B=91830/91830 8 L0 0:3c418ec600:2 2L/2P F=1 B=91830/91830 a L0 0:3c4190c600:2 2L/2P F=1 B=91830/91830 c L0 0:3c4192c600:2 2L/2P F=1 B=91830/91830 e L0 0:3c4194c600:2 2L/2P F=1 B=91830/91830 10 L0 0:3c4198c600:2 2L/2P F=1 B=91830/91830 12 L0 0:3c4196c600:2 2L/2P F=1 B=91830/91830 14 L0 1:205c491600:2 2L/2P F=1 B=91830/91830 16 L0 1:205c4b1600:2 2L/2P F=1 B=91830/91830 18 L0 1:205c4d1600:2 2L/2P F=1 B=91830/91830 1a L0 1:205c4f1600:2 2L/2P F=1 B=91830/91830 1c L0 1:205c511600:2 2L/2P F=1 B=91830/91830 1e L0 1:205c531600:2 2L/2P F=1 B=91830/91830 20 L0 1:205c551600:2 2L/2P F=1 B=91830/91830 22 L0 1:205c571600:2 2L/2P F=1 B=91830/91830 24 L0 0:3c419ac600:2 2L/2P F=1 B=91830/91830 26 L0 0:3c419cc600:2 2L/2P F=1 B=91830/91830 28 L0 0:3c419ec600:2 2L/2P F=1 B=91830/91830 2a L0 0:3c41a0c600:2 2L/2P F=1 B=91830/91830 .. many more lines till 149G It seems all data blocks are there. Any ideas on hot to recover from this situation? Valerio Piancastelli piancaste...@iclos.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of David Dyer-Bennet For example, if you start with an empty drive, and you write a large amount of data to it, you will have no fragmentation. (At least, no significant fragmentation; you may get a little bit based on random factors.) As life goes on, as long as you keep plenty of empty space on the drive, there's never any reason for anything to become significantly fragmented. Sure, if only a single thread is ever writing to the disk store at a time. This has already been discussed in this thread. The threading model doesn't affect the outcome of files being fragmented or unfragmented on disk. The OS is smart enough to know these blocks writen by process A are all sequential, and those blocks all written by process B are also sequential, but separate. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Marty Scholes What appears to be missing from this discussion is any shred of scientific evidence that fragmentation is good or bad and by how much. We also lack any detail on how much fragmentation does take place. Agreed. I've been rather lazily asserting a few things here and there that I expected to be challenged, so I've been thinking up tests to verify/dispute my claims, but then nobody challenged. Specifically, the blocks on disk are not interleaved just because multiple threads were writing at the same time. So there's at least one thing which is testable, if anyone cares. But there's also no way that I know of, to measure fragmentation in a real system that's been in production for a year. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bryan Horstmann-Allen The ability to remove the slogs isn't really the win here, it's import -F. The Disagree. Although I agree the -F is important and good, I think the log device removal is the main win. Prior to log device removal, if you lose your slog, then you lose your whole pool, and probably your system halts (or does something equally bad, which isn't strictly halting). Therefore you want your slog to be as redundant as the rest of your pool. With log device removal, if you lose a slog while the system is up, worst case is performance degradation. With log device removal, there's only one thing you have to worry about: Your slog goes bad, and undetected. So the system keeps writing to it, unaware that it will never be able to read, and therefore when you get a system crash, and for the first time your system tries to read that device, you lose information. Not your whole pool. You lose up to 30 sec of writes that the system thought it wrote, but never did. You require the -F to import. Historically, people always recommend mirroring your log device, even with log device removal, to protect against the above situation. But in a recent conversation including Neil, it seems there might be a bug which causes the log device mirror to be ignored during import, thus rendering the mirror useless in the above situation. Neil, or anyone, is there any confirmation or development on that bug? Given all of this, I would say it's recommended to forget about mirroring log devices for now. In the past, the recommendation was Yes mirror. Right now, it's No don't mirror, and after the bug is fixed, the recommendation will again become Yes, mirror. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] resilver that never finishes
Morning, c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it? This is actually an old capture of the status output, it got to nearly 10T before deciding that there was an error and not completing, reseat disk and it's doing it all again. It's happened on another pool as well, looking at a load av of around 40 on the box currently, just sitting there churning disk IO. OS is snv_134 on x86. # zpool status -x pool: content4 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver in progress for 147h39m, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM content4 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c7t5000CCA221DE1E1Dd0ONLINE 0 0 0 c7t5000CCA221DE17BFd0ONLINE 0 0 0 c7t5000CCA221DE2229d0ONLINE 0 0 0 replacing-3 DEGRADED 0 0 0 c7t5000CCA221DE0CC7d0 UNAVAIL 0 0 0 cannot open c7t5000CCA221F4EC54d0 ONLINE 0 0 0 5.63T resilvered c7t5000CCA221DE200Ad0ONLINE 0 0 0 c7t5000CCA221DDFE6Ed0ONLINE 0 0 0 c7t5000CCA221DE0103d0ONLINE 0 0 0 c7t5000CCA221DE00C9d0ONLINE 0 0 0 c7t5000CCA221DE0D2Dd0ONLINE 0 0 0 c7t5000CCA221DE189Cd0ONLINE 0 0 0 c7t5000CCA221DE18A7d0ONLINE 0 0 0 c7t5000CCA221DE2A47d0ONLINE 0 0 0 c7t5000CCA221DE1E48d0ONLINE 0 0 0 c7t5000CCA221DE18A1d0ONLINE 0 0 0 c7t5000CCA221DE18A2d0ONLINE 0 0 0 c7t5000CCA221DE2A3Ed0ONLINE 0 0 0 c7t5000CCA221DE2A42d0ONLINE 0 0 0 c7t5000CCA221DE2225d0UNAVAIL 0 0 0 cannot open c7t5000CCA221DE28A3d0ONLINE 0 0 0 c7t5000CCA221DE2A46d0ONLINE 0 0 0 c7t5000CCA221DE0789d0ONLINE 0 0 0 c7t5000CCA221DE221Dd0ONLINE 0 0 0 c7t5000CCA221DE054Fd0ONLINE 0 0 0 c7t5000CCA221DE2EBEd0ONLINE 0 0 0 errors: No known data errors -- Tom // www.portfast.co.uk // hosted services, domains, virtual machines, consultancy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dataset lost structure
What OpenSolaris build are you running? victor On 17.09.10 13:53, Valerio Piancastelli wrote: After a crash, in my zpool tree, some dataset report this we i do a ls -la: brwxrwxrwx 2 777 root 0, 0 Oct 18 2009 mail-cts also if i set zfs set mountpoint=legacy dataset and then i mount the dataset to other location before the directory tree was only : dataset - vdisk.raw The file was a backing device of a Xen VM, but i cannot access the directory structure of this dataset. However i can send a snapshot of this dataset to another system, but the same behavior occurs. If i do zdb - dataset at the end of the output i can se the references to my file: Object lvl iblk dblk dsize lsize %full type 7516K 128K 149G 256G 58.26 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 2097152 path/vdisk.raw uid 777 gid 60001 atime Sun Oct 18 00:49:05 2009 mtime Thu Sep 9 16:22:14 2010 ctime Thu Sep 9 16:22:14 2010 crtime Sun Oct 18 00:49:05 2009 gen 53 mode100777 size274877906945 parent 3 links 1 pflags 4080104 xattr 0 rdev0x if i further investigate: zdb -d dataset 7 Dataset store/nfs/ICLOS/prod/mail-cts [ZPL], ID 4525, cr_txg 91826, 149G, 5 objects, rootbp DVA[0]=0:6654f24000:200 DVA[1]=1:1a1e3c3600:200 [L0 D MU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=182119L/182119P fill=5 cksum=177e7dd4cd:81ae6d143ee:1782c972431a0:2f927ca7 a1de2c Object lvl iblk dblk dsize lsize %full type 7516K 128K 149G 256G 58.26 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 2097152 path/vdisk.raw uid 777 gid 60001 atime Sun Oct 18 00:49:05 2009 mtime Thu Sep 9 16:22:14 2010 ctime Thu Sep 9 16:22:14 2010 crtime Sun Oct 18 00:49:05 2009 gen 53 mode100777 size274877906945 parent 3 links 1 pflags 4080104 xattr 0 rdev0x Indirect blocks: 0 L4 1:6543e22800:400 4000L/400P F=1221767 B=177453/177453 0 L31:65022f8a00:2000 4000L/2000P F=1221766 B=177453/177453 0 L2 1:65325a0400:1c00 4000L/1c00P F=16229 B=177453/177453 0L1 1:6530718400:1600 4000L/1600P F=128 B=177453/177453 0 L0 0:433c473a00:2 2L/2P F=1 B=177453/177453 2 L0 1:205c471600:2 2L/2P F=1 B=91830/91830 4 L0 0:3c418ac600:2 2L/2P F=1 B=91830/91830 6 L0 0:3c418cc600:2 2L/2P F=1 B=91830/91830 8 L0 0:3c418ec600:2 2L/2P F=1 B=91830/91830 a L0 0:3c4190c600:2 2L/2P F=1 B=91830/91830 c L0 0:3c4192c600:2 2L/2P F=1 B=91830/91830 e L0 0:3c4194c600:2 2L/2P F=1 B=91830/91830 10 L0 0:3c4198c600:2 2L/2P F=1 B=91830/91830 12 L0 0:3c4196c600:2 2L/2P F=1 B=91830/91830 14 L0 1:205c491600:2 2L/2P F=1 B=91830/91830 16 L0 1:205c4b1600:2 2L/2P F=1 B=91830/91830 18 L0 1:205c4d1600:2 2L/2P F=1 B=91830/91830 1a L0 1:205c4f1600:2 2L/2P F=1 B=91830/91830 1c L0 1:205c511600:2 2L/2P F=1 B=91830/91830 1e L0 1:205c531600:2 2L/2P F=1 B=91830/91830 20 L0 1:205c551600:2 2L/2P F=1 B=91830/91830 22 L0 1:205c571600:2 2L/2P F=1 B=91830/91830 24 L0 0:3c419ac600:2 2L/2P F=1 B=91830/91830 26 L0 0:3c419cc600:2 2L/2P F=1 B=91830/91830 28 L0 0:3c419ec600:2 2L/2P F=1 B=91830/91830 2a L0 0:3c41a0c600:2 2L/2P F=1 B=91830/91830 .. many more lines till 149G It seems all data blocks are there. Any ideas on hot to recover from this situation? Valerio Piancastelli piancaste...@iclos.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- -- Victor Latushkin phone: x11467 / +74959370467 TSC-Kernel EMEAmobile: +78957693012 Sun Services, Moscow blog: http://blogs.sun.com/vlatushkin Sun Microsystems
Re: [zfs-discuss] ZFS Dataset lost structure
with uname -a : SunOS disk-01 5.11 snv_111b i86pc i386 i86pc Solaris it is Opesolaris 2009.06 other useful info: zfs list sas/mail-cts NAME USED AVAIL REFER MOUNTPOINT sas/mail-cts 149G 250G 149G /sas/mail-cts and with df Filesystem 1K-blocks Used Available Use% Mounted on sas/mail-cts 418174037 156501827 261672210 38% /sas/mail-cts Do you need any other infos? Valerio Piancastelli piancaste...@iclos.com - Messaggio originale - Da: Victor Latushkin victor.latush...@sun.com A: Valerio Piancastelli piancaste...@iclos.com Cc: zfs-discuss@opensolaris.org Inviato: Venerdì, 17 settembre 2010 16:46:31 Oggetto: Re: [zfs-discuss] ZFS Dataset lost structure What OpenSolaris build are you running? victor On 17.09.10 13:53, Valerio Piancastelli wrote: After a crash, in my zpool tree, some dataset report this we i do a ls -la: brwxrwxrwx 2 777 root 0, 0 Oct 18 2009 mail-cts also if i set zfs set mountpoint=legacy dataset and then i mount the dataset to other location before the directory tree was only : dataset - vdisk.raw The file was a backing device of a Xen VM, but i cannot access the directory structure of this dataset. However i can send a snapshot of this dataset to another system, but the same behavior occurs. If i do zdb - dataset at the end of the output i can se the references to my file: Object lvl iblk dblk dsize lsize %full type 7516K 128K 149G 256G 58.26 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 2097152 path/vdisk.raw uid 777 gid 60001 atime Sun Oct 18 00:49:05 2009 mtime Thu Sep 9 16:22:14 2010 ctime Thu Sep 9 16:22:14 2010 crtime Sun Oct 18 00:49:05 2009 gen 53 mode100777 size274877906945 parent 3 links 1 pflags 4080104 xattr 0 rdev0x if i further investigate: zdb -d dataset 7 Dataset store/nfs/ICLOS/prod/mail-cts [ZPL], ID 4525, cr_txg 91826, 149G, 5 objects, rootbp DVA[0]=0:6654f24000:200 DVA[1]=1:1a1e3c3600:200 [L0 D MU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=182119L/182119P fill=5 cksum=177e7dd4cd:81ae6d143ee:1782c972431a0:2f927ca7 a1de2c Object lvl iblk dblk dsize lsize %full type 7516K 128K 149G 256G 58.26 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 2097152 path/vdisk.raw uid 777 gid 60001 atime Sun Oct 18 00:49:05 2009 mtime Thu Sep 9 16:22:14 2010 ctime Thu Sep 9 16:22:14 2010 crtime Sun Oct 18 00:49:05 2009 gen 53 mode100777 size274877906945 parent 3 links 1 pflags 4080104 xattr 0 rdev0x Indirect blocks: 0 L4 1:6543e22800:400 4000L/400P F=1221767 B=177453/177453 0 L31:65022f8a00:2000 4000L/2000P F=1221766 B=177453/177453 0 L2 1:65325a0400:1c00 4000L/1c00P F=16229 B=177453/177453 0L1 1:6530718400:1600 4000L/1600P F=128 B=177453/177453 0 L0 0:433c473a00:2 2L/2P F=1 B=177453/177453 2 L0 1:205c471600:2 2L/2P F=1 B=91830/91830 4 L0 0:3c418ac600:2 2L/2P F=1 B=91830/91830 6 L0 0:3c418cc600:2 2L/2P F=1 B=91830/91830 8 L0 0:3c418ec600:2 2L/2P F=1 B=91830/91830 a L0 0:3c4190c600:2 2L/2P F=1 B=91830/91830 c L0 0:3c4192c600:2 2L/2P F=1 B=91830/91830 e L0 0:3c4194c600:2 2L/2P F=1 B=91830/91830 10 L0 0:3c4198c600:2 2L/2P F=1 B=91830/91830 12 L0 0:3c4196c600:2 2L/2P F=1 B=91830/91830 14 L0 1:205c491600:2 2L/2P F=1 B=91830/91830 16 L0 1:205c4b1600:2 2L/2P F=1 B=91830/91830 18 L0 1:205c4d1600:2 2L/2P F=1 B=91830/91830 1a L0 1:205c4f1600:2 2L/2P F=1 B=91830/91830 1c L0 1:205c511600:2 2L/2P F=1 B=91830/91830 1e L0 1:205c531600:2 2L/2P F=1 B=91830/91830 20 L0 1:205c551600:2 2L/2P F=1 B=91830/91830 22 L0 1:205c571600:2 2L/2P F=1 B=91830/91830 24 L0 0:3c419ac600:2 2L/2P F=1
Re: [zfs-discuss] resilver that never finishes
On Fri, 17 Sep 2010, Tom Bird wrote: Morning, c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it? This is actually an old capture of the status output, it got to nearly 10T before deciding that there was an error and not completing, reseat disk and it's doing it all again. You have twice as many big slow drives in a raidz2 that any sane person would recommend. It looks like you either have drives which are too weak to sustain resilvering a failed disk, or a chassis which is not strong enough. Your only option seems to be to also replace c7t5000CCA221DE2225d0 and hope for the best. Expect the replacement to take a very long time. It is wise to restart the pool from scratch with multiple vdevs comprised of fewer devices. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] can ufs zones and zfs zones coexist on a single global zone
Looking at migrating zones built on an M8000 and M5000 to a new M9000. On the M9000 we started building new deployments using ZFS. The environments on the M8/M5 are UFS. these are whole root zones. they will use global zone resources. Can this be done? Or would a ZFS migration be needed? thank you, -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
On Sep 16, 2010, at 12:33 PM, Marty Scholes wrote: David Dyer-Bennet wote: Sure, if only a single thread is ever writing to the disk store at a time. This situation doesn't exist with any kind of enterprise disk appliance, though; there are always multiple users doing stuff. Ok, I'll bite. Your assertion seems to be that any kind of enterprise disk appliance will always have enough simultaneous I/O requests queued that any sequential read from any application will be sufficiently broken up by requests from other applications, effectively rendering all read requests as random. If I follow your logic, since all requests are essentially random anyway, then where they fall on the disk is irrelevant. Allan and Neel did a study of this for MySQL. http://www.youtube.com/watch?v=a31NhwzlAxs -- richard -- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On 09/18/10 04:28 AM, Tom Bird wrote: Bob Friesenhahn wrote: On Fri, 17 Sep 2010, Tom Bird wrote: Morning, c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it? This is actually an old capture of the status output, it got to nearly 10T before deciding that there was an error and not completing, reseat disk and it's doing it all again. You have twice as many big slow drives in a raidz2 that any sane person would recommend. It looks like you either have drives which are too weak to sustain resilvering a failed disk, or a chassis which is not strong enough. The drives and the chassis are fine, what I am questioning is how can it be resilvering more data to a device than the capacity of the device? Is the pool in use? If so, data will be changing while the resliver is running. With such a ridiculously wide vdev and large drives, the resliver will take a very very long time it complete. if the pool is sufficiently busy, it may never complete. Your only option seems to be to also replace c7t5000CCA221DE2225d0 and hope for the best. Expect the replacement to take a very long time. It is wise to restart the pool from scratch with multiple vdevs comprised of fewer devices. This stuff should just work, if it only rewrote the 2T that was meant to be on the drive the rebuild would take a day or so. Bob's comments about the pool design are correct, you have a disaster waiting to happen. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Tom Bird We recently had a long discussion in this list, about resilver times versus raid types. In the end, the conclusion was: resilver code is very inefficient for raidzN. Someday it may be better optimized, but until that day comes, you really need to break your giant raidzN into smaller vdev's. 3 vdev's of 7 disk raidz is preferable over a 21 disk raidz3. If you want this resilver to complete, you should do anything you can to (a) stop taking snapshots (b) don't scrub (c) stop all IO possible. And be patient. Most people in your situation find it faster to zfs send to some other storage, and then destroy recreate the pool. I know it stinks. But that's what you're facing. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?
On Sep 17, 2010, at 20:32, Edward Ned Harvey wrote: When did that become default? Should I *ever* say 30 sec anymore? June 8, 2010, revision 12586:b118bbd65be9: http://src.opensolaris.org/source/history/onnv/onnv-gate/usr/src/uts/common/fs/zfs/txg.c ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?
On 09/17/10 18:32, Edward Ned Harvey wrote: From: Neil Perrin [mailto:neil.per...@oracle.com] you lose information. Not your whole pool. You lose up to 30 sec of writes The default is now 5 seconds (zfs_txg_timeout). When did that become default? It was changed more recently than I remember in snv_143 as part of a of set of bug fixes: 6494473, 6743992, 6936821, 6956464. They were integrated on 6/8/10. Should I *ever* say 30 sec anymore? Well for versions before snv_143 then 30 seconds is correct. I was just giving a heads up that it has changed. In my world, the oldest machine is 10u6. (Except one machine named dinosaur that is sol8) I believe George responded on that thread that we do handle log mirrors correctly. That is, if one side fails to checksum a block we do indeed check the other side. I should have been more cautious with my concern. I think I said I don't know if we handle it correctly, and George confirmed we do. Sorry for the false alarm. Great. ;-) Thank you. So the recommendation is still to mirror log devices, because the recommendation will naturally be ultra-conservative. ;-) The risk is far smaller now than it was before. So make up your own mind. If you are willing to risk 5sec or 30sec of data in the situation of (a) undetected failed log device *and* (b) ungraceful system crash, then you are willing to run with unmirrored log devices. In no situation does the filesystem become inconsistent or corrupt. In the worst case, you have a filesystem which is consistent with a valid filesystem state, a few seconds before the system crash. (Assuming you have a zpool recent enough to support log device removal.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?
On 09/18/10 04:46 PM, Neil Perrin wrote: On 09/17/10 18:32, Edward Ned Harvey wrote: From: Neil Perrin [mailto:neil.per...@oracle.com] you lose information. Not your whole pool. You lose up to 30 sec of writes The default is now 5 seconds (zfs_txg_timeout). When did that become default? It was changed more recently than I remember in snv_143 as part of a of set of bug fixes: 6494473, 6743992, 6936821, 6956464. They were integrated on 6/8/10. Should I *ever* say 30 sec anymore? Well for versions before snv_143 then 30 seconds is correct. I was just giving a heads up that it has changed. In the context of this thread, was the change integrated in update 9? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss