[zfs-discuss] Lost zpool after reboot

2010-07-18 Thread Amit Kulkarni
Hello,

I have a dual boot with Windows 7 64 bit enterprise edition and Opensolaris 
build 134. This is on Sun Ultra 40 M1 workstation. Three hard drives, 2 in ZFS 
mirror, 1 is shared with Windows.

Last 2 days I was working in Windows. I didn't touch the hard drives in any way 
except I once opened Disk Management to figure out why a external USB hard 
drive is not being listed. That's it. That is the only thing related to disk I 
can recall that I did in last several days.

Today I booted into Opensolaris and my mirrored pool is gone.

I did a zpool status and it gave me zfs 8000-3C error, saying my pool is 
unavailable. Since I am able to boot  access browser, I tried a zpool import 
without arguments, with trying to export my pool, more fiddling. Now I can't 
get zpool status to show my pool.

Help me. How do I recover my old pool back? I know its there somewhere.

Thanks in advance



This is the result of fmdump -eV

Jul 16 2010 15:17:43.657125275 ereport.fs.zfs.vdev.open_failed
nvlist version: 0
    class = ereport.fs.zfs.vdev.open_failed
    ena = 0x14c954e68900801
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = zfs
        pool = 0x4406b127a905c5be
        vdev = 0xe7dce33be87eeca7
    (end detector)

    pool = rsgis
    pool_guid = 0x4406b127a905c5be
    pool_context = 1
    pool_failmode = wait
    vdev_guid = 0xe7dce33be87eeca7
    vdev_type = disk
    vdev_path = /dev/dsk/c9t0d0s0
    vdev_devid = id1,s...@ahitachi_hds7225scsun250g_0719bn9e3k=vfa100r1dn9e3k/a
    parent_guid = 0xb89f3c5a72a22939
    parent_type = mirror
    prev_state = 0x1
    __ttl = 0x1
    __tod = 0x4c40be67 0x272aef9b

Jul 16 2010 15:17:43.657125080 ereport.fs.zfs.vdev.open_failed
nvlist version: 0
    class = ereport.fs.zfs.vdev.open_failed
    ena = 0x14c954e68900801
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = zfs
        pool = 0x4406b127a905c5be
        vdev = 0x6f08aad645681b14
    (end detector)

    pool = rsgis
    pool_guid = 0x4406b127a905c5be
    pool_context = 1
    pool_failmode = wait
    vdev_guid = 0x6f08aad645681b14
    vdev_type = disk
    vdev_path = /dev/dsk/c8t0d0s0
    vdev_devid = id1,s...@ahitachi_hds7225sbsun250g_0615ne18bj=vds41dt4ee18bj/a
    parent_guid = 0xb89f3c5a72a22939
    parent_type = mirror
    prev_state = 0x1
    __ttl = 0x1
    __tod = 0x4c40be67 0x272aeed8

Jul 16 2010 15:17:43.657125769 ereport.fs.zfs.vdev.no_replicas
nvlist version: 0
    class = ereport.fs.zfs.vdev.no_replicas
    ena = 0x14c954e68900801
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = zfs
        pool = 0x4406b127a905c5be
        vdev = 0xb89f3c5a72a22939
    (end detector)

    pool = rsgis
    pool_guid = 0x4406b127a905c5be
    pool_context = 1
    pool_failmode = wait
    vdev_guid = 0xb89f3c5a72a22939
    vdev_type = mirror
    parent_guid = 0x4406b127a905c5be
    parent_type = root
    prev_state = 0x1
    __ttl = 0x1
    __tod = 0x4c40be67 0x272af189

Jul 16 2010 15:17:43.657125226 ereport.fs.zfs.zpool
nvlist version: 0
    class = ereport.fs.zfs.zpool
    ena = 0x14c954e68900801
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = zfs
        pool = 0x4406b127a905c5be
    (end detector)

    pool = rsgis
    pool_guid = 0x4406b127a905c5be
    pool_context = 1
    pool_failmode = wait
    __ttl = 0x1
    __tod = 0x4c40be67 0x272aef6a

Jul 16 2010 15:25:55.572108990 ereport.fs.zfs.vdev.open_failed
nvlist version: 0
    class = ereport.fs.zfs.vdev.open_failed
    ena = 0x1588f5aa2b00801
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = zfs
        pool = 0x4406b127a905c5be
        vdev = 0x6f08aad645681b14
    (end detector)

    pool = rsgis
    pool_guid = 0x4406b127a905c5be
    pool_context = 1
    pool_failmode = wait
    vdev_guid = 0x6f08aad645681b14
    vdev_type = disk
    vdev_path = /dev/dsk/c8t0d0s0
    vdev_devid = id1,s...@ahitachi_hds7225sbsun250g_0615ne18bj=vds41dt4ee18bj/a
    parent_guid = 0xb89f3c5a72a22939
    parent_type = mirror
    prev_state = 0x1
    __ttl = 0x1
    __tod = 0x4c40c053 0x2219b0be

Jul 16 2010 15:25:55.572108617 ereport.fs.zfs.vdev.open_failed
nvlist version: 0
    class = ereport.fs.zfs.vdev.open_failed
    ena = 0x1588f5aa2b00801
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = zfs
        pool = 0x4406b127a905c5be
        vdev = 0xe7dce33be87eeca7
    (end detector)

    pool = rsgis
    pool_guid = 0x4406b127a905c5be
    pool_context = 1
    pool_failmode = wait
    vdev_guid = 0xe7dce33be87eeca7
    vdev_type = disk
    vdev_path = /dev/dsk/c9t0d0s0
    vdev_devid = id1,s...@ahitachi_hds7225scsun250g_0719bn9e3k=vfa100r1dn9e3k/a
    parent_guid = 0xb89f3c5a72a22939
    parent_type = mirror
    prev_state = 0x1
    __ttl = 0x1
 

Re: [zfs-discuss] raidz capacity osol vs freebsd

2010-07-18 Thread Craig Cory
When viewing a raidz|raidz1|raidz2 pool, 'zpool list|status' will report the
total device space; ie: 3 1TB drives in a raidz will show approx. 3TB space.
'zfs list' will show available FILESYSTEM space, ie: 3 1TB raidz disks, approx
2TB space.


Logic wrote:
 Ian Collins (i...@ianshome.com) wrote:
 On 07/18/10 11:19 AM, marco wrote:
 *snip*


 Yes, that is correct. zfs list reports usable space, which is 2 out of
 the three drives (parity isn't confined to one device).

 *snip*


 Are you sure?  That result looks odd.  It is what I'd expect to see from
 a stripe, rather than a raidz.

 What does zpool iostat -v pool2 report?

 Hi Ian,

 I'm the friend with the osol release(snv_117) installed.

 The output you asked for is:
 % zpool iostat -v pool2
 capacity operationsbandwidth
 pool used  avail   read  write   read  write
 --  -  -  -  -  -  -
 pool2   4.26T  1.20T208 78  22.1M   409K
 raidz1  4.26T  1.20T208 78  22.1M   409K
 c2d1-  - 81 37  7.97M   208K
 c1d0-  - 82 38  7.85M   209K
 c2d0-  - 79 37  7.79M   209K
 --  -  -  -  -  -  -

 It really is a raidz, created a long time ago with build 27a, and I have been
 replacing the disks ever since, by removing one disk at a time and waiting for
 the
 resilvering to be done.

 greets Leon
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


+-+
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool access hangs

2010-07-18 Thread ixcg

Hello, 

I have a very strange isue with a root zpool. I have a test machine
which is running OpenSolaris 2009.06 and has a mirrored root pool
and, recently, one of the drives failed. I replaced the drive and
resilvered the mirror successfully. 

Shortly afterwards, the machine crashed and hung at the GRUB menu;
selecting any BE to boot from caused the menu to vanish and the
OpenSolaris splash screen to stay in place. I have tried booting from
a LiveCD with the idea of importing and reparing the zpool manually
but any command which seems to access the zpool (such as zbd or
zpool) now hangs with no access going on on the disks in the zpool. 

Does anyone have any ideas? I'm not sure what to try next. 

Thanks, 

Nick
 ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS

2010-07-18 Thread Rob Clark
 I'm building my new storage server, all the parts should come in this week...

How did it turn out ? Did 8x1TB Drives seem to be the correct number or a 
couple too many (based on 
the assumption that you did not run out of space; I mean solely from a 
performance / 'ZFS usability' 
standpoint - as opposed to over three dozen tiny Drives).

Thanks for your reply,
Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS

2010-07-18 Thread Roy Sigurd Karlsbakk
- Original Message -
  I'm building my new storage server, all the parts should come in
  this week...
 
 How did it turn out ? Did 8x1TB Drives seem to be the correct number
 or a couple too many (based on
 the assumption that you did not run out of space; I mean solely from a
 performance / 'ZFS usability'
 standpoint - as opposed to over three dozen tiny Drives).

It's quite possible to stack up hundreds of drives for ZFS, just don't put them 
all in the same vdev. Say, you have 32 2TB drives, split that into 4 RAIDz2 
vdevs in the same pool, and both speed and safety will be good. Also, add a 
couple of fast SSDs for the SLOG if you expect lots of sync writes (NFS, iSCSI 
etc) and some other (cheaper?) SSDs for L2ARC to help out reads. This is 
particularly important if you want your server to work well during scrub (osol 
does NOT work well during scrub performance-wice unless you use SLOG and 
perhaps L2ARC).

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool access hangs

2010-07-18 Thread Roy Sigurd Karlsbakk






Hello, 

I have a very strange isue with a root zpool. I have a test machine which is 
running OpenSolaris 2009.06 and has a mirrored root pool and, recently, one of 
the drives failed. I replaced the drive and resilvered the mirror successfully. 

Shortly afterwards, the machine crashed and hung at the GRUB menu; selecting 
any BE to boot from caused the menu to vanish and the OpenSolaris splash screen 
to stay in place. I have tried booting from a LiveCD with the idea of importing 
and reparing the zpool manually but any command which seems to access the zpool 
(such as zbd or zpool) now hangs with no access going on on the disks in the 
zpool. 

Does anyone have any ideas? I'm not sure what to try next. IIRC grub lives 
outside the pool, so after resilvering, you'll need to install grub again on 
the failed disk. Do NOT try to import the rpool with a different name. It'll 
break a lot of stuff, rpools are quite touchy, just mount it somewhere else. If 
you can't mount it, I don't know, perhaps someone else knows 

Vennlige hilsener / Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Help identify failed drive

2010-07-18 Thread Alxen4
This is a situation:

I've got an error on one of the drives in 'zpool status' output:

 zpool status tank

  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz2ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c2t0d0  ONLINE   0 0 0
c2t2d0  ONLINE   0 0 0
c2t3d0  ONLINE   1 0 0
c2t4d0  ONLINE   0 0 0
c2t5d0  ONLINE   0 0 0
c2t7d0  ONLINE   0 0 0

So I would like to replace 'c2t3d0'.

I know for a fact the pool has 7 physical drives : 5 of Seagate and 2 of WD.

I want to know if 'c2t3d0' Seagate or WD.

If I run 'iostat -En' it shows that all  c*t*d0 drives are Seagate and 
sd11/sd12 are WD.

This totally confuses me...
Why there are two different types of drives in iostat output : c*t*d0 and sd* 
???
How come all c*t*d0 appear as Seagate.I know for sure two of them are WD.
Why WD drives appears as sd* and not as c*t*d0 ?

Please help.


--

# iostat -En


c1t1d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3500320AS  Revision: SD15 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 54 Predictive Failure Analysis: 0

c2t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3500320AS  Revision: SD15 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c2t1d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3500320AS  Revision: SD15 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c2t2d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3500320AS  Revision: SD15 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c2t3d0   Soft Errors: 0 Hard Errors: 9 Transport Errors: 9
Vendor: ATA  Product: ST3500320AS  Revision: SD15 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 7 Device Not Ready: 0 No Device: 2 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c2t4d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3500320AS  Revision: SD15 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c2t5d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3500320AS  Revision: SD15 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c2t6d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3500320AS  Revision: SD15 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c2t7d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3500320AS  Revision: SD15 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

[b]sd11 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD5001AALS-0 Revision: 1D05 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

sd12 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD5001AALS-0 Revision: 0K05 Serial No:
Size: 500.11GB 500107862016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0[/b]





Thanks a lot.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] carrying on [was: Legality and the future of zfs...]

2010-07-18 Thread Richard Elling
On Jul 15, 2010, at 4:48 AM, BM wrote:

 On Thu, Jul 15, 2010 at 10:53 AM, Garrett D'Amore garr...@nexenta.com wrote:
 The *code* is probably not going away (even updates to the kernel).
 Even if the community dies, is killed, or commits OGB induced suicide.
 
 1. You used correct word: probably.

The sun will probably rise tomorrow :-)

 2. No community = stale outdated code.

But there is a community.  What is lacking is that Oracle, in their infinite
wisdom, has stopped producing OpenSolaris developer binary releases.
Not to be outdone, they've stopped other OS releases as well.  Surely,
this is a temporary situation.

Of the remaining distro builders who offer updated builds based on 
OpenSolaris code, I'm proud to be a part of the Nexenta team.

 There is another piece I'll add: even if Oracle were to stop releasing
 ZFS or OpenSolaris source code, there are enough of us with a vested
 interest (commercial!) in its future that we would continue to develop
 it outside of Oracle.  It won't just go stagnant and die.
 
 So you're saying let's fork it.

No.  What he is saying is that distro builders need to step up to the 
challenge and release distros.  For some reason (good marketing)
people seem to think that Linux == Red Hat.  Clearly, that is not the
case.  Please, do not confuse distribution of binaries with distribution
of source.

  I believe I can safely say that Nexenta is committed to the continued 
 development and enhancement of this code base -- and to doing so in the open.
 Yeah, and Nexenta is also committed to backport newest updates from
 140 and younger builds just back to snv_134. So I can imagine that
 soon new OS from Nexenta will be called Super Nexenta Version 134.
 :-)

Please.  The NexentaStor OS 3.0.3 release is b134f.  b134g will be next.
We do not expect the OpenSolaris community to replace b135 with 
Nexenta Core 3.0.3. Rather, we would very much like to see Oracle 
continue to produce developer distributions which more closely track
the source changes. NexentaStor has a very focused market. The losers
in the Oracle deaf-mute game are the people who want to use OpenSolaris 
for applications other than a NAS server.

 Currently from what I see, I think Nexenta will also die eventually.

Indeed. We will all die. And the good news is that someone will pick up
the knowledge and evolve.  Darwin was right. This is the circle of life.

 Because of BTRFS for Linux, Linux's popularity itself and also thanks
 to the Oracle's help.

BTRFS does not matter until it is a primary file system for a dominant 
distribution.  
From what I can tell, the dominant Linux distribution file system is ext.  
That will 
change some day, but we heard the same story you are replaying about BTRFS 
from the Reiser file system aficionados and the XFS evangelists. There is 
absolutely no doubt that Solaris will use ZFS as its primary file system. But 
there is 
no internal or external force causing Red Hat to change their primary file 
system 
from ext.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Move Fedora or Windows disk image to ZFS (iScsi Boot)

2010-07-18 Thread Packet Boy
I've found plenty of documentation on how to create a ZFS volume, iscsi share 
it, and then do a fresh install of Fedora or Windows on the volume.

What I can not find is how to take an existing Fedora image and copy the it's 
contents into a ZFS volume so that I can migrate this image from my existing 
Fedora iScsi target to a Solaris iScsi target (and of course get the advantages 
of having that disk image hosted on ZFS).

Do I just zfs create -V and then somehow dd the Fedora .img file on top of the 
newly created volume?

I've spent hours and have not been able to find any example on how to do this.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Trou ble Moving opensolaris to new HD

2010-07-18 Thread splazo doberman
I'm running opensolaris 0906 in a triple boot environment with linux and 
windows.

I just slapped a new hard drive into my machine and moved everything over with 
acronis migrate easy. 

Unfortunately, this failed to set up grub correctly so I resorted to the brute 
force solution of just reinstalling opensolaris to get things booting again.

However I had underestimated the difficulty of getting my nice lived in install 
of opensolaris off of the old drive and onto the new one. (Probably should have 
tried a little harder to fix the booting issue, but it's too late for that now).

The main issue is that opensolaris doesn't want to let me mount the old root 
file system after I stick it in there as a second hard drive.

I figure that there's probably a fairly simple solution here, but the learning 
curve on zfs is a bit on the steep side for all of its alleged ease of use. 

Any suggestions?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommended RAM for ZFS on various platforms

2010-07-18 Thread Richard Elling
On Jul 18, 2010, at 3:40 PM, Peter Jeremy wrote:

 On 2010-Jul-17 01:24:57 +0800, Michael Johnson mjjohnson@yahoo.com 
 wrote:
 I'm currently planning on running FreeBSD with ZFS, but I wanted to 
 double-check 
 how much memory I'd need for it to be stable.  The ZFS wiki currently says 
 you 
 can go as low as 1 GB, but recommends 2 GB; however, elsewhere I've seen 
 someone 
 claim that you need at least 4 GB.  Does anyone here know how much RAM 
 FreeBSD 
 would need in this case?
 
 I am running FreeBSD 8.x with ZFS on several systems.  From my
 experiences, 2GB is a bare minimum and it seems a lot happier with
 3.5GB.  Note that in any case, patching ARC to work around the out-
 of-free-memory bug is fairly important.

Do you have a CR for this bug?
 -- richard

-- 
ZFS and performance consulting
http://www.RichardElling.com









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-18 Thread Richard L. Hamilton
 Even the most expensive decompression algorithms
 generally run
 significantly faster than I/O to disk -- at least
 when real disks are
 involved.  So, as long as you don't run out of CPU
 and have to wait for
 CPU to be available for decompression, the
 decompression will win.  The
 same concept is true for dedup, although I don't
 necessarily think of
 dedup as a form of compression (others might
 reasonably do so though.)

Effectively, dedup is a form of compression of the
filesystem rather than any single file, but one
oriented to not interfering with access to any of what
may be sharing blocks.

I would imagine that if it's read-mostly, it's a win, but
otherwise it costs more than it saves.  Even more conventional
compression tends to be more resource intensive than decompression...

What I'm wondering is when dedup is a better value than compression.
Most obviously, when there are a lot of identical blocks across different
files; but I'm not sure how often that happens, aside from maybe
blocks of zeros (which may well be sparse anyway).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-18 Thread Erik Trimble

On 7/18/2010 4:18 PM, Richard L. Hamilton wrote:

Even the most expensive decompression algorithms
generally run
significantly faster than I/O to disk -- at least
when real disks are
involved.  So, as long as you don't run out of CPU
and have to wait for
CPU to be available for decompression, the
decompression will win.  The
same concept is true for dedup, although I don't
necessarily think of
dedup as a form of compression (others might
reasonably do so though.)
 

Effectively, dedup is a form of compression of the
filesystem rather than any single file, but one
oriented to not interfering with access to any of what
may be sharing blocks.

I would imagine that if it's read-mostly, it's a win, but
otherwise it costs more than it saves.  Even more conventional
compression tends to be more resource intensive than decompression...

What I'm wondering is when dedup is a better value than compression.
Most obviously, when there are a lot of identical blocks across different
files; but I'm not sure how often that happens, aside from maybe
blocks of zeros (which may well be sparse anyway).
   


From my own experience, a dedup win is much more data-usage-dependent 
than compression.


Compression seems to be of general use across the vast majority of data 
I've encountered - with the sole big exception of media file servers 
(where the data is already compressed pictures, audio, or video).  It 
seems to be of general utility, since I've always got spare CPU cycles, 
and it's really not very expensive in terms of CPU in most cases. Of 
course, the *value* of compression varies according to the data (i.e. 
how much it will compress), but that doesn't matter for *utility* for 
the most part.


Dedup, on the other hand, currently has a very steep price in terms of 
needed ARC/L2ARC/RAM, so it's much harder to justify in those cases 
where it only provides modest benefits. Additionally, we're still in the 
development side of dedup (IMHO), so I can't really make a full 
evaluation of Dedup concept, as many of its issues today are 
implementation-related, not concept-related.   All that said, Dedup has 
a showcase use case where it is of *massive* benefit:  hosting Virtual 
Machines.  For a machine hosting only 100 VM data stores, I can see 99% 
space savings. And, I see a significant performance boost, since I can 
cache that one VM image in RAM easily.   There's other places where 
Dedup seems modestly useful these days (one is in the afore-mentioned 
media-file server, which you'd be surprised how much duplication there 
is), but it's *much* harder to pre-determine dedup's utility for a given 
dataset, unless you have highly detailed knowledge of that dataset's 
composition.


I'll admit to not being a big fan of the Dedup concept originally (go 
back a couple of years here on this list), but, given that the world is 
marching straight to Virtualization as fast as we can go, I'm a convert 
now.



From my perspective, here's a couple of things that I think would help 
improve dedup's utility for me:


(a) fix the outstanding issues in the current implementation (duh!).

(b) add the ability to store the entire DDT in the backing store, and 
not have to construct it in ARC from disk-resident info (this would be 
of great help where backing store = SSD or RAM-based things)


(c) be able to test-dedup a given filesystem.  I'd like ZFS to be able 
to look at a filesystem and tell me how much dedup I'd get out of it, 
WITHOUT having to actually create a dedup-enabled filesystem and copy 
the data to it.  While it would be nice to be able to simply turn on 
dedup for a filesystem, and have ZFS dedup the existing data there 
(in-place, without copying), I realize the implementation is hard given 
how things currently work, and frankly, that's of much lower priority 
for me than being able to test-dedup a dataset.


(d) increase the slab (record) size significantly, to at least 1MB or 
more. I daresay the primary way VM images are stored these days are as 
single, large files (though iSCSI volumes are coming up fast), and as 
such, I've got 20G files which would really, really, benefit from having 
a much larger slab size.


(e) and, of course, seeing if there's some way we can cut down on 
dedup's piggy DDT size.  :-)



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Trou ble Moving opensolaris to new HD

2010-07-18 Thread Erik Trimble

On 7/18/2010 3:45 PM, splazo doberman wrote:

I'm running opensolaris 0906 in a triple boot environment with linux and 
windows.

I just slapped a new hard drive into my machine and moved everything over with 
acronis migrate easy.

Unfortunately, this failed to set up grub correctly so I resorted to the brute 
force solution of just reinstalling opensolaris to get things booting again.

However I had underestimated the difficulty of getting my nice lived in install 
of opensolaris off of the old drive and onto the new one. (Probably should have 
tried a little harder to fix the booting issue, but it's too late for that now).

The main issue is that opensolaris doesn't want to let me mount the old root 
file system after I stick it in there as a second hard drive.

I figure that there's probably a fairly simple solution here, but the learning 
curve on zfs is a bit on the steep side for all of its alleged ease of use.

Any suggestions?
   



Most likely, the problem is that both the old and new disks have a pool 
named 'rpool'.  You thus can't do anything like 'import rpool'.


I'm assuming that you can at least see the old disk's pools via a plain 
'import', correct?   Have you tried importing via UID rather than via 
name - also, try importing with a different mountpoint option.


Last resort - boot from the LiveCD, import the old disks' rpool by UID, 
and then rename the whole pool something else (maybe 'oldrpool').



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-18 Thread Garrett D'Amore
On Sun, 2010-07-18 at 16:18 -0700, Richard L. Hamilton wrote:

 
 I would imagine that if it's read-mostly, it's a win, but
 otherwise it costs more than it saves.  Even more conventional
 compression tends to be more resource intensive than decompression...
 
 What I'm wondering is when dedup is a better value than compression.
 Most obviously, when there are a lot of identical blocks across different
 files; but I'm not sure how often that happens, aside from maybe
 blocks of zeros (which may well be sparse anyway).

Shared/identical blocks come into play in several specific scenarios:

1) Multiple VMs, cloud.  If you have multiple guest OS' installed,
they're going to benefit heavily from dedup.  Even Zones can benefit
here.

2) Situations with lots of copies of large amounts of data where only
some of the data is different between each copy.  The classic example is
a Solaris build server, hosting dozens or even hundreds, of copies of
the Solaris tree, each being worked on by different developers.
Typically the developer is working on something less than 1% of the
total source code, so the other 99% can be shared via dedup.

For general purpose usage, e.g. hosting your music or movie collection,
I doubt that dedup offers any real advantage.  If I were talking about
deploying dedup, I'd only use it in situations like the two I mentioned,
and not for just a general purpose storage server.  For general purpose
applications I think compression is better.  (Though I think dedup will
have higher savings -- significantly so -- in the particular situation
where you know you lots and lots of duplicate/redundant data.)

Note also that dedup actually does some things where your duplicated
data may gain an effective increase in redundancy/security, because it
does make sure that the data that is deduped has higher redundancy than
non-deduped data.  (This sounds counterintuitive, but as long as you
have at least 3 copies of the duplicated data, its a net win.)

Btw, compression on top of dedup may actually kill your benefit of
dedup.   My hypothesis (unproven, admittedly) is that because many
compression algos actually cause small permutations of data to
significantly change the bit values (even just by changing their offset
in the binary) in the overall compressed object, it can seriously defeat
dedup's efficacy.

- Garrett

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] carrying on

2010-07-18 Thread Miles Nordin
 re == Richard Elling rich...@nexenta.com writes:

re we would very much like to see Oracle continue to produce
re developer distributions which more closely track the source
re changes.

I'd rather someone else than Oracle did it.  Until someone else is
doing the ``building'', whatever that entails all the way from
Mercurial to DVD, we will never know if the source we have is complete
enough to do a fork if we need to.

I realize everyone has in their heads, FORK == BAD.  Yes, forks are
usually bad, but the *ability to make forks* is good, because it
``decouples the investments our businesses make in OpenSolaris/ZFS
from the volatility of Sun and Oracle's business cycle,'' to
paraphrase some blog comment.  

Particularly when you are dealing with datasets so large it might cost
tens of thousands to copy them into another format than ZFS, it's
important to have a 2 year plan for this instead of being subject to
``I am altering the deal.  Pray I don't alter it any further.''
Nexenta being stuck at b134, and secret CVE fixes, does not look good.
Though yeah, it looks better than it would if Nexenta didn't exist.

IMHO it's important we don't get stuck running Nexenta in the same
spot we're now stuck with OpenSolaris: with a bunch of CDDL-protected
source that few people know how to use in practice because the build
procedure is magical and secret.  This is why GPL demands you release
``all build scripts''!

One good way to help make sure you've the ability to make a fork, is
to get the source from one organization and the binary distribution
from another.  As long as they're not too collusive, you can relax and
rely on one of them to complain to the other.

Another way is to use a source-based distribution like Gentoo or BSD,
where the distributor includes a deliverable tool that produces
bootable DVD's from the revision control system, and ordinary
contributors can introspect these tools and find any binary blobs that
may exist.


pgpf3OSDelKXh.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss