[zfs-discuss] Lost zpool after reboot
Hello, I have a dual boot with Windows 7 64 bit enterprise edition and Opensolaris build 134. This is on Sun Ultra 40 M1 workstation. Three hard drives, 2 in ZFS mirror, 1 is shared with Windows. Last 2 days I was working in Windows. I didn't touch the hard drives in any way except I once opened Disk Management to figure out why a external USB hard drive is not being listed. That's it. That is the only thing related to disk I can recall that I did in last several days. Today I booted into Opensolaris and my mirrored pool is gone. I did a zpool status and it gave me zfs 8000-3C error, saying my pool is unavailable. Since I am able to boot access browser, I tried a zpool import without arguments, with trying to export my pool, more fiddling. Now I can't get zpool status to show my pool. Help me. How do I recover my old pool back? I know its there somewhere. Thanks in advance This is the result of fmdump -eV Jul 16 2010 15:17:43.657125275 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x14c954e68900801 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x4406b127a905c5be vdev = 0xe7dce33be87eeca7 (end detector) pool = rsgis pool_guid = 0x4406b127a905c5be pool_context = 1 pool_failmode = wait vdev_guid = 0xe7dce33be87eeca7 vdev_type = disk vdev_path = /dev/dsk/c9t0d0s0 vdev_devid = id1,s...@ahitachi_hds7225scsun250g_0719bn9e3k=vfa100r1dn9e3k/a parent_guid = 0xb89f3c5a72a22939 parent_type = mirror prev_state = 0x1 __ttl = 0x1 __tod = 0x4c40be67 0x272aef9b Jul 16 2010 15:17:43.657125080 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x14c954e68900801 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x4406b127a905c5be vdev = 0x6f08aad645681b14 (end detector) pool = rsgis pool_guid = 0x4406b127a905c5be pool_context = 1 pool_failmode = wait vdev_guid = 0x6f08aad645681b14 vdev_type = disk vdev_path = /dev/dsk/c8t0d0s0 vdev_devid = id1,s...@ahitachi_hds7225sbsun250g_0615ne18bj=vds41dt4ee18bj/a parent_guid = 0xb89f3c5a72a22939 parent_type = mirror prev_state = 0x1 __ttl = 0x1 __tod = 0x4c40be67 0x272aeed8 Jul 16 2010 15:17:43.657125769 ereport.fs.zfs.vdev.no_replicas nvlist version: 0 class = ereport.fs.zfs.vdev.no_replicas ena = 0x14c954e68900801 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x4406b127a905c5be vdev = 0xb89f3c5a72a22939 (end detector) pool = rsgis pool_guid = 0x4406b127a905c5be pool_context = 1 pool_failmode = wait vdev_guid = 0xb89f3c5a72a22939 vdev_type = mirror parent_guid = 0x4406b127a905c5be parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x4c40be67 0x272af189 Jul 16 2010 15:17:43.657125226 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0x14c954e68900801 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x4406b127a905c5be (end detector) pool = rsgis pool_guid = 0x4406b127a905c5be pool_context = 1 pool_failmode = wait __ttl = 0x1 __tod = 0x4c40be67 0x272aef6a Jul 16 2010 15:25:55.572108990 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x1588f5aa2b00801 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x4406b127a905c5be vdev = 0x6f08aad645681b14 (end detector) pool = rsgis pool_guid = 0x4406b127a905c5be pool_context = 1 pool_failmode = wait vdev_guid = 0x6f08aad645681b14 vdev_type = disk vdev_path = /dev/dsk/c8t0d0s0 vdev_devid = id1,s...@ahitachi_hds7225sbsun250g_0615ne18bj=vds41dt4ee18bj/a parent_guid = 0xb89f3c5a72a22939 parent_type = mirror prev_state = 0x1 __ttl = 0x1 __tod = 0x4c40c053 0x2219b0be Jul 16 2010 15:25:55.572108617 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x1588f5aa2b00801 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x4406b127a905c5be vdev = 0xe7dce33be87eeca7 (end detector) pool = rsgis pool_guid = 0x4406b127a905c5be pool_context = 1 pool_failmode = wait vdev_guid = 0xe7dce33be87eeca7 vdev_type = disk vdev_path = /dev/dsk/c9t0d0s0 vdev_devid = id1,s...@ahitachi_hds7225scsun250g_0719bn9e3k=vfa100r1dn9e3k/a parent_guid = 0xb89f3c5a72a22939 parent_type = mirror prev_state = 0x1 __ttl = 0x1
Re: [zfs-discuss] raidz capacity osol vs freebsd
When viewing a raidz|raidz1|raidz2 pool, 'zpool list|status' will report the total device space; ie: 3 1TB drives in a raidz will show approx. 3TB space. 'zfs list' will show available FILESYSTEM space, ie: 3 1TB raidz disks, approx 2TB space. Logic wrote: Ian Collins (i...@ianshome.com) wrote: On 07/18/10 11:19 AM, marco wrote: *snip* Yes, that is correct. zfs list reports usable space, which is 2 out of the three drives (parity isn't confined to one device). *snip* Are you sure? That result looks odd. It is what I'd expect to see from a stripe, rather than a raidz. What does zpool iostat -v pool2 report? Hi Ian, I'm the friend with the osol release(snv_117) installed. The output you asked for is: % zpool iostat -v pool2 capacity operationsbandwidth pool used avail read write read write -- - - - - - - pool2 4.26T 1.20T208 78 22.1M 409K raidz1 4.26T 1.20T208 78 22.1M 409K c2d1- - 81 37 7.97M 208K c1d0- - 82 38 7.85M 209K c2d0- - 79 37 7.79M 209K -- - - - - - - It really is a raidz, created a long time ago with build 27a, and I have been replacing the disks ever since, by removing one disk at a time and waiting for the resilvering to be done. greets Leon ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss +-+ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool access hangs
Hello, I have a very strange isue with a root zpool. I have a test machine which is running OpenSolaris 2009.06 and has a mirrored root pool and, recently, one of the drives failed. I replaced the drive and resilvered the mirror successfully. Shortly afterwards, the machine crashed and hung at the GRUB menu; selecting any BE to boot from caused the menu to vanish and the OpenSolaris splash screen to stay in place. I have tried booting from a LiveCD with the idea of importing and reparing the zpool manually but any command which seems to access the zpool (such as zbd or zpool) now hangs with no access going on on the disks in the zpool. Does anyone have any ideas? I'm not sure what to try next. Thanks, Nick ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS
I'm building my new storage server, all the parts should come in this week... How did it turn out ? Did 8x1TB Drives seem to be the correct number or a couple too many (based on the assumption that you did not run out of space; I mean solely from a performance / 'ZFS usability' standpoint - as opposed to over three dozen tiny Drives). Thanks for your reply, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS
- Original Message - I'm building my new storage server, all the parts should come in this week... How did it turn out ? Did 8x1TB Drives seem to be the correct number or a couple too many (based on the assumption that you did not run out of space; I mean solely from a performance / 'ZFS usability' standpoint - as opposed to over three dozen tiny Drives). It's quite possible to stack up hundreds of drives for ZFS, just don't put them all in the same vdev. Say, you have 32 2TB drives, split that into 4 RAIDz2 vdevs in the same pool, and both speed and safety will be good. Also, add a couple of fast SSDs for the SLOG if you expect lots of sync writes (NFS, iSCSI etc) and some other (cheaper?) SSDs for L2ARC to help out reads. This is particularly important if you want your server to work well during scrub (osol does NOT work well during scrub performance-wice unless you use SLOG and perhaps L2ARC). Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool access hangs
Hello, I have a very strange isue with a root zpool. I have a test machine which is running OpenSolaris 2009.06 and has a mirrored root pool and, recently, one of the drives failed. I replaced the drive and resilvered the mirror successfully. Shortly afterwards, the machine crashed and hung at the GRUB menu; selecting any BE to boot from caused the menu to vanish and the OpenSolaris splash screen to stay in place. I have tried booting from a LiveCD with the idea of importing and reparing the zpool manually but any command which seems to access the zpool (such as zbd or zpool) now hangs with no access going on on the disks in the zpool. Does anyone have any ideas? I'm not sure what to try next. IIRC grub lives outside the pool, so after resilvering, you'll need to install grub again on the failed disk. Do NOT try to import the rpool with a different name. It'll break a lot of stuff, rpools are quite touchy, just mount it somewhere else. If you can't mount it, I don't know, perhaps someone else knows Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Help identify failed drive
This is a situation: I've got an error on one of the drives in 'zpool status' output: zpool status tank pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz2ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 1 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 So I would like to replace 'c2t3d0'. I know for a fact the pool has 7 physical drives : 5 of Seagate and 2 of WD. I want to know if 'c2t3d0' Seagate or WD. If I run 'iostat -En' it shows that all c*t*d0 drives are Seagate and sd11/sd12 are WD. This totally confuses me... Why there are two different types of drives in iostat output : c*t*d0 and sd* ??? How come all c*t*d0 appear as Seagate.I know for sure two of them are WD. Why WD drives appears as sd* and not as c*t*d0 ? Please help. -- # iostat -En c1t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 54 Predictive Failure Analysis: 0 c2t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c2t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c2t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c2t3d0 Soft Errors: 0 Hard Errors: 9 Transport Errors: 9 Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 7 Device Not Ready: 0 No Device: 2 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c2t4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c2t5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c2t6d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c2t7d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3500320AS Revision: SD15 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 [b]sd11 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD5001AALS-0 Revision: 1D05 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 sd12 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD5001AALS-0 Revision: 0K05 Serial No: Size: 500.11GB 500107862016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0[/b] Thanks a lot. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] carrying on [was: Legality and the future of zfs...]
On Jul 15, 2010, at 4:48 AM, BM wrote: On Thu, Jul 15, 2010 at 10:53 AM, Garrett D'Amore garr...@nexenta.com wrote: The *code* is probably not going away (even updates to the kernel). Even if the community dies, is killed, or commits OGB induced suicide. 1. You used correct word: probably. The sun will probably rise tomorrow :-) 2. No community = stale outdated code. But there is a community. What is lacking is that Oracle, in their infinite wisdom, has stopped producing OpenSolaris developer binary releases. Not to be outdone, they've stopped other OS releases as well. Surely, this is a temporary situation. Of the remaining distro builders who offer updated builds based on OpenSolaris code, I'm proud to be a part of the Nexenta team. There is another piece I'll add: even if Oracle were to stop releasing ZFS or OpenSolaris source code, there are enough of us with a vested interest (commercial!) in its future that we would continue to develop it outside of Oracle. It won't just go stagnant and die. So you're saying let's fork it. No. What he is saying is that distro builders need to step up to the challenge and release distros. For some reason (good marketing) people seem to think that Linux == Red Hat. Clearly, that is not the case. Please, do not confuse distribution of binaries with distribution of source. I believe I can safely say that Nexenta is committed to the continued development and enhancement of this code base -- and to doing so in the open. Yeah, and Nexenta is also committed to backport newest updates from 140 and younger builds just back to snv_134. So I can imagine that soon new OS from Nexenta will be called Super Nexenta Version 134. :-) Please. The NexentaStor OS 3.0.3 release is b134f. b134g will be next. We do not expect the OpenSolaris community to replace b135 with Nexenta Core 3.0.3. Rather, we would very much like to see Oracle continue to produce developer distributions which more closely track the source changes. NexentaStor has a very focused market. The losers in the Oracle deaf-mute game are the people who want to use OpenSolaris for applications other than a NAS server. Currently from what I see, I think Nexenta will also die eventually. Indeed. We will all die. And the good news is that someone will pick up the knowledge and evolve. Darwin was right. This is the circle of life. Because of BTRFS for Linux, Linux's popularity itself and also thanks to the Oracle's help. BTRFS does not matter until it is a primary file system for a dominant distribution. From what I can tell, the dominant Linux distribution file system is ext. That will change some day, but we heard the same story you are replaying about BTRFS from the Reiser file system aficionados and the XFS evangelists. There is absolutely no doubt that Solaris will use ZFS as its primary file system. But there is no internal or external force causing Red Hat to change their primary file system from ext. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Move Fedora or Windows disk image to ZFS (iScsi Boot)
I've found plenty of documentation on how to create a ZFS volume, iscsi share it, and then do a fresh install of Fedora or Windows on the volume. What I can not find is how to take an existing Fedora image and copy the it's contents into a ZFS volume so that I can migrate this image from my existing Fedora iScsi target to a Solaris iScsi target (and of course get the advantages of having that disk image hosted on ZFS). Do I just zfs create -V and then somehow dd the Fedora .img file on top of the newly created volume? I've spent hours and have not been able to find any example on how to do this. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Trou ble Moving opensolaris to new HD
I'm running opensolaris 0906 in a triple boot environment with linux and windows. I just slapped a new hard drive into my machine and moved everything over with acronis migrate easy. Unfortunately, this failed to set up grub correctly so I resorted to the brute force solution of just reinstalling opensolaris to get things booting again. However I had underestimated the difficulty of getting my nice lived in install of opensolaris off of the old drive and onto the new one. (Probably should have tried a little harder to fix the booting issue, but it's too late for that now). The main issue is that opensolaris doesn't want to let me mount the old root file system after I stick it in there as a second hard drive. I figure that there's probably a fairly simple solution here, but the learning curve on zfs is a bit on the steep side for all of its alleged ease of use. Any suggestions? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommended RAM for ZFS on various platforms
On Jul 18, 2010, at 3:40 PM, Peter Jeremy wrote: On 2010-Jul-17 01:24:57 +0800, Michael Johnson mjjohnson@yahoo.com wrote: I'm currently planning on running FreeBSD with ZFS, but I wanted to double-check how much memory I'd need for it to be stable. The ZFS wiki currently says you can go as low as 1 GB, but recommends 2 GB; however, elsewhere I've seen someone claim that you need at least 4 GB. Does anyone here know how much RAM FreeBSD would need in this case? I am running FreeBSD 8.x with ZFS on several systems. From my experiences, 2GB is a bare minimum and it seems a lot happier with 3.5GB. Note that in any case, patching ARC to work around the out- of-free-memory bug is fairly important. Do you have a CR for this bug? -- richard -- ZFS and performance consulting http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Debunking the dedup memory myth
Even the most expensive decompression algorithms generally run significantly faster than I/O to disk -- at least when real disks are involved. So, as long as you don't run out of CPU and have to wait for CPU to be available for decompression, the decompression will win. The same concept is true for dedup, although I don't necessarily think of dedup as a form of compression (others might reasonably do so though.) Effectively, dedup is a form of compression of the filesystem rather than any single file, but one oriented to not interfering with access to any of what may be sharing blocks. I would imagine that if it's read-mostly, it's a win, but otherwise it costs more than it saves. Even more conventional compression tends to be more resource intensive than decompression... What I'm wondering is when dedup is a better value than compression. Most obviously, when there are a lot of identical blocks across different files; but I'm not sure how often that happens, aside from maybe blocks of zeros (which may well be sparse anyway). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Debunking the dedup memory myth
On 7/18/2010 4:18 PM, Richard L. Hamilton wrote: Even the most expensive decompression algorithms generally run significantly faster than I/O to disk -- at least when real disks are involved. So, as long as you don't run out of CPU and have to wait for CPU to be available for decompression, the decompression will win. The same concept is true for dedup, although I don't necessarily think of dedup as a form of compression (others might reasonably do so though.) Effectively, dedup is a form of compression of the filesystem rather than any single file, but one oriented to not interfering with access to any of what may be sharing blocks. I would imagine that if it's read-mostly, it's a win, but otherwise it costs more than it saves. Even more conventional compression tends to be more resource intensive than decompression... What I'm wondering is when dedup is a better value than compression. Most obviously, when there are a lot of identical blocks across different files; but I'm not sure how often that happens, aside from maybe blocks of zeros (which may well be sparse anyway). From my own experience, a dedup win is much more data-usage-dependent than compression. Compression seems to be of general use across the vast majority of data I've encountered - with the sole big exception of media file servers (where the data is already compressed pictures, audio, or video). It seems to be of general utility, since I've always got spare CPU cycles, and it's really not very expensive in terms of CPU in most cases. Of course, the *value* of compression varies according to the data (i.e. how much it will compress), but that doesn't matter for *utility* for the most part. Dedup, on the other hand, currently has a very steep price in terms of needed ARC/L2ARC/RAM, so it's much harder to justify in those cases where it only provides modest benefits. Additionally, we're still in the development side of dedup (IMHO), so I can't really make a full evaluation of Dedup concept, as many of its issues today are implementation-related, not concept-related. All that said, Dedup has a showcase use case where it is of *massive* benefit: hosting Virtual Machines. For a machine hosting only 100 VM data stores, I can see 99% space savings. And, I see a significant performance boost, since I can cache that one VM image in RAM easily. There's other places where Dedup seems modestly useful these days (one is in the afore-mentioned media-file server, which you'd be surprised how much duplication there is), but it's *much* harder to pre-determine dedup's utility for a given dataset, unless you have highly detailed knowledge of that dataset's composition. I'll admit to not being a big fan of the Dedup concept originally (go back a couple of years here on this list), but, given that the world is marching straight to Virtualization as fast as we can go, I'm a convert now. From my perspective, here's a couple of things that I think would help improve dedup's utility for me: (a) fix the outstanding issues in the current implementation (duh!). (b) add the ability to store the entire DDT in the backing store, and not have to construct it in ARC from disk-resident info (this would be of great help where backing store = SSD or RAM-based things) (c) be able to test-dedup a given filesystem. I'd like ZFS to be able to look at a filesystem and tell me how much dedup I'd get out of it, WITHOUT having to actually create a dedup-enabled filesystem and copy the data to it. While it would be nice to be able to simply turn on dedup for a filesystem, and have ZFS dedup the existing data there (in-place, without copying), I realize the implementation is hard given how things currently work, and frankly, that's of much lower priority for me than being able to test-dedup a dataset. (d) increase the slab (record) size significantly, to at least 1MB or more. I daresay the primary way VM images are stored these days are as single, large files (though iSCSI volumes are coming up fast), and as such, I've got 20G files which would really, really, benefit from having a much larger slab size. (e) and, of course, seeing if there's some way we can cut down on dedup's piggy DDT size. :-) -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Trou ble Moving opensolaris to new HD
On 7/18/2010 3:45 PM, splazo doberman wrote: I'm running opensolaris 0906 in a triple boot environment with linux and windows. I just slapped a new hard drive into my machine and moved everything over with acronis migrate easy. Unfortunately, this failed to set up grub correctly so I resorted to the brute force solution of just reinstalling opensolaris to get things booting again. However I had underestimated the difficulty of getting my nice lived in install of opensolaris off of the old drive and onto the new one. (Probably should have tried a little harder to fix the booting issue, but it's too late for that now). The main issue is that opensolaris doesn't want to let me mount the old root file system after I stick it in there as a second hard drive. I figure that there's probably a fairly simple solution here, but the learning curve on zfs is a bit on the steep side for all of its alleged ease of use. Any suggestions? Most likely, the problem is that both the old and new disks have a pool named 'rpool'. You thus can't do anything like 'import rpool'. I'm assuming that you can at least see the old disk's pools via a plain 'import', correct? Have you tried importing via UID rather than via name - also, try importing with a different mountpoint option. Last resort - boot from the LiveCD, import the old disks' rpool by UID, and then rename the whole pool something else (maybe 'oldrpool'). -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Debunking the dedup memory myth
On Sun, 2010-07-18 at 16:18 -0700, Richard L. Hamilton wrote: I would imagine that if it's read-mostly, it's a win, but otherwise it costs more than it saves. Even more conventional compression tends to be more resource intensive than decompression... What I'm wondering is when dedup is a better value than compression. Most obviously, when there are a lot of identical blocks across different files; but I'm not sure how often that happens, aside from maybe blocks of zeros (which may well be sparse anyway). Shared/identical blocks come into play in several specific scenarios: 1) Multiple VMs, cloud. If you have multiple guest OS' installed, they're going to benefit heavily from dedup. Even Zones can benefit here. 2) Situations with lots of copies of large amounts of data where only some of the data is different between each copy. The classic example is a Solaris build server, hosting dozens or even hundreds, of copies of the Solaris tree, each being worked on by different developers. Typically the developer is working on something less than 1% of the total source code, so the other 99% can be shared via dedup. For general purpose usage, e.g. hosting your music or movie collection, I doubt that dedup offers any real advantage. If I were talking about deploying dedup, I'd only use it in situations like the two I mentioned, and not for just a general purpose storage server. For general purpose applications I think compression is better. (Though I think dedup will have higher savings -- significantly so -- in the particular situation where you know you lots and lots of duplicate/redundant data.) Note also that dedup actually does some things where your duplicated data may gain an effective increase in redundancy/security, because it does make sure that the data that is deduped has higher redundancy than non-deduped data. (This sounds counterintuitive, but as long as you have at least 3 copies of the duplicated data, its a net win.) Btw, compression on top of dedup may actually kill your benefit of dedup. My hypothesis (unproven, admittedly) is that because many compression algos actually cause small permutations of data to significantly change the bit values (even just by changing their offset in the binary) in the overall compressed object, it can seriously defeat dedup's efficacy. - Garrett ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] carrying on
re == Richard Elling rich...@nexenta.com writes: re we would very much like to see Oracle continue to produce re developer distributions which more closely track the source re changes. I'd rather someone else than Oracle did it. Until someone else is doing the ``building'', whatever that entails all the way from Mercurial to DVD, we will never know if the source we have is complete enough to do a fork if we need to. I realize everyone has in their heads, FORK == BAD. Yes, forks are usually bad, but the *ability to make forks* is good, because it ``decouples the investments our businesses make in OpenSolaris/ZFS from the volatility of Sun and Oracle's business cycle,'' to paraphrase some blog comment. Particularly when you are dealing with datasets so large it might cost tens of thousands to copy them into another format than ZFS, it's important to have a 2 year plan for this instead of being subject to ``I am altering the deal. Pray I don't alter it any further.'' Nexenta being stuck at b134, and secret CVE fixes, does not look good. Though yeah, it looks better than it would if Nexenta didn't exist. IMHO it's important we don't get stuck running Nexenta in the same spot we're now stuck with OpenSolaris: with a bunch of CDDL-protected source that few people know how to use in practice because the build procedure is magical and secret. This is why GPL demands you release ``all build scripts''! One good way to help make sure you've the ability to make a fork, is to get the source from one organization and the binary distribution from another. As long as they're not too collusive, you can relax and rely on one of them to complain to the other. Another way is to use a source-based distribution like Gentoo or BSD, where the distributor includes a deliverable tool that produces bootable DVD's from the revision control system, and ordinary contributors can introspect these tools and find any binary blobs that may exist. pgpf3OSDelKXh.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss