from:"GREGG WONDERLY"

Re: [zfs-discuss] how to know available disk space

2013-02-06 Thread Gregg Wonderly

This is one of the greatest annoyances of ZFS.  I don't really understand how, 
a zvol's space can not be accurately enumerated from top to bottom of the tree 
in 'df' output etc.  Why does a zvol divorce the space used from the root of 
the volume?

Gregg Wonderly

On Feb 6, 2013, at 5:26 PM, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

 I have a bunch of VM's, and some samba shares, etc, on a pool.  I created the 
 VM's using zvol's, specifically so they would have an 
 appropriaterefreservation and never run out of disk space, even with 
 snapshots.  Today, I ran out of disk space, and all the VM's died.  So 
 obviously it didn't work.
  
 When I used zpool status after the system crashed, I saw this:
 NAME  SIZE  ALLOC   FREE  EXPANDSZCAP  DEDUP  HEALTH  ALTROOT
 storage   928G   568G   360G -61%  1.00x  ONLINE  -
  
 I did some cleanup, so I could turn things back on ... Freed up about 4G.
  
 Now, when I use zpool status I see this:
 NAME  SIZE  ALLOC   FREE  EXPANDSZCAP  DEDUP  HEALTH  ALTROOT
 storage   928G   564G   364G -60%  1.00x  ONLINE  -
  
 When I use zfs list storage I see this:
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 storage   909G  4.01G  32.5K  /storage
  
 So I guess the lesson is (a) refreservation and zvol alone aren't enough to 
 ensure your VM's will stay up.  and (b) if you want to know how much room is 
 *actually* available, as in usable, as in, how much can I write before I 
 run out of space, you should use zfs list and not zpoolstatus
  
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] pool metadata has duplicate children

2013-01-08 Thread Gregg Wonderly

Have you tried importing the pool with that drive completely unplugged?  Which 
HBA are you using?  How many of these disks are on same or separate HBAs?

Gregg Wonderly


On Jan 8, 2013, at 12:05 PM, John Giannandrea j...@meer.net wrote:

 
 I seem to have managed to end up with a pool that is confused abut its 
 children disks.  The pool is faulted with corrupt metadata:
 
  pool: d
 state: FAULTED
 status: The pool metadata is corrupted and the pool cannot be opened.
 action: Destroy and re-create the pool from
   a backup source.
   see: http://illumos.org/msg/ZFS-8000-72
  scan: none requested
 config:
 
   NAME STATE READ WRITE CKSUM
   dFAULTED  0 0 1
 raidz1-0   FAULTED  0 0 6
   da1  ONLINE   0 0 0
   3419704811362497180  OFFLINE  0 0 0  was /dev/da2
   da3  ONLINE   0 0 0
   da4  ONLINE   0 0 0
   da5  ONLINE   0 0 0
 
 But if I look at the labels on all the online disks I see this:
 
 # zdb -ul /dev/da1 | egrep '(children|path)'
children[0]:
path: '/dev/da1'
children[1]:
path: '/dev/da2'
children[2]:
path: '/dev/da2'
children[3]:
path: '/dev/da3'
children[4]:
path: '/dev/da4'
...
 
 But the offline disk (da2) shows the older correct label:
 
children[0]:
path: '/dev/da1'
children[1]:
path: '/dev/da2'
children[2]:
path: '/dev/da3'
children[3]:
path: '/dev/da4'
children[4]:
path: '/dev/da5'
 
 zpool import -F doesnt help because none of the labels on the unfaulted disks 
 seem to have the right label.  And unless I can import the pool I cant 
 replace the bad drive.
 
 Also zpool seems to really not want to import a raidz1 pool with one faulted 
 drive even though that should be readable.  I have read about the 
 undocumented -V option but dont know if that would help.
 
 I got into this state when i noticed the pool was DEGRADED and was trying to 
 replace the bad disk.   I am debugging it under FreeBSD 9.1 
 
 Suggestions of things to try welcome, Im more interested in learning what 
 went wrong than restoring the pool.  I dont think I should have been able to 
 go from one offline drive to a unrecoverable pool this easily.
 
 -jg
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] suggestions for e-SATA HBA card on x86/x64

2012-11-20 Thread Gregg Wonderly

I have seen some drives not be recognized on a hot plug, but cfgadm seemed to 
always fix that.

I don't recall a cold boot not recognizing the drives.  Does the bios boot of 
the card show all of the drives connected?

I did not update the firmware in the cards that I bought.

Gregg

On Nov 19, 2012, at 10:45 AM, Jerry Kemp sun.mail.lis...@oryx.cc wrote:

 Hello Gregg,
 
 I acquired one of these
 
 Intel RAID Controller Card SATA/SAS PCI-E x8 8internal ports (SASUC8I)
 
 from your newegg link below, and then acquired the necessary cables to
 get everything hooked up.  After multiple executions of devfsadm and
 reconfigure boots, the OS see's one of my 4 drives.  The drives are 2 TB
 Seagate drives.
 
 Did you need to do anything special to get your card to work correctly?
 Did you need to do a firmware upgrade or anything?
 
 I am running an up-to-date version of OpenIndiana b151a7.
 
 Thank you,
 
 Jerry
 
 
 
 
 On 10/26/12 10:02 AM, Gregg Wonderly wrote:
 I've been using this card
 
 http://www.newegg.com/Product/Product.aspx?Item=N82E16816117157
 
 for my Solaris/Open Indiana installations because it has 8 ports.  One of 
 the issues that this card seems to have, is that certain failures can cause 
 other secondary problems in other drives on the same SAS connector.  I use 
 mirrors for my storage machines with 4 pairs, and just put half the mirror 
 on one side and the other drive on the other side.  This, in general, has 
 solved my problems.  When a drive fails, I might see more than one drive no 
 functioning.  I can remove (I use hot swap bays such as 
 http://www.newegg.com/Product/Product.aspx?Item=N82E16817994097) a drive, 
 and restore the other to the pool to find which of the failed drives is 
 actually the problem.  What had happened before, was that my case was not 
 moving enough air, and the hot drives had caused odd problems with failure.
 
 For the money, and the experience I have with these controllers, I'd still 
 use them, they are 3GBs controllers.  If you want 6GBs controllers, then 
 some of the other suggestions might be a better choice for you.
 
 Gregg
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Forcing ZFS options

2012-11-09 Thread Gregg Wonderly

Do you move the pools between machines, or just on the same physical machine?  
Could you just use symlinks from the new root to the old root so that the names 
work until you can reboot?  It might be more practical to always use symlinks 
if you do a lot of moving things around, and then you wouldn't have to figure 
out how to do the reboot shuffle.  Instead, you could just shuffle the symlinks.

Gregg Wonderly

On Nov 9, 2012, at 10:47 AM, Jim Klimov jimkli...@cos.ru wrote:

 There are times when ZFS options can not be applied at the moment,
 i.e. changing desired mountpoints of active filesystems (or setting
 a mountpoint over a filesystem location that is currently not empty).
 
 Such attempts now bail out with messages like:
 cannot unmount '/var/adm': Device busy
 cannot mount '/export': directory is not empty
 
 and such.
 
 Is it possible to force the new values to be saved into ZFS dataset
 properties, so they do take effect upon next pool import?
 
 I currently work around the harder of such situations with a reboot
 into a different boot environment or even into a livecd/failsafe,
 just so that the needed datasets or paths won't be busy and so I
 can set, verify and apply these mountpoint values. This is not a
 convenient way to do things :)
 
 Thanks,
 //Jim Klimov
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] suggestions for e-SATA HBA card on x86/x64

2012-10-26 Thread Gregg Wonderly

I've been using this card

http://www.newegg.com/Product/Product.aspx?Item=N82E16816117157

for my Solaris/Open Indiana installations because it has 8 ports.  One of the 
issues that this card seems to have, is that certain failures can cause other 
secondary problems in other drives on the same SAS connector.  I use mirrors 
for my storage machines with 4 pairs, and just put half the mirror on one side 
and the other drive on the other side.  This, in general, has solved my 
problems.  When a drive fails, I might see more than one drive no functioning.  
I can remove (I use hot swap bays such as 
http://www.newegg.com/Product/Product.aspx?Item=N82E16817994097) a drive, and 
restore the other to the pool to find which of the failed drives is actually 
the problem.  What had happened before, was that my case was not moving enough 
air, and the hot drives had caused odd problems with failure.

For the money, and the experience I have with these controllers, I'd still use 
them, they are 3GBs controllers.  If you want 6GBs controllers, then some of 
the other suggestions might be a better choice for you.

Gregg

On Oct 24, 2012, at 10:59 PM, Jerry Kemp sun.mail.lis...@oryx.cc wrote:

 I have just acquired a new JBOD box that will be used as a media
 center/storage for home use only on my x86/x64 box running OpenIndiana
 b151a7 currently.
 
 Its strictly a JBOD, no hw raid options, with an eSATA port to each drive.
 
 I am looking for suggestions for an HBA card with at least (2), but (4)
 external eSATA ports would be nice.  I know enough to stay away from the
 port expander things.
 
 I do not need the HBA to support any internal drives.
 
 In reviewing the archives/past post, it seems that LSI is the way to go.
 I would like to spend USD $200 - $300, but would spend more if
 necessary for a good, trouble free HBA.  I made this comment as I went
 to look at some of the LSI cards previously mentioned, and found they
 were priced $500 - $600 and up.
 
 TIA for any pointers,
 
 Jerry
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment

2012-09-24 Thread Gregg Wonderly

What is the error message you are seeing on the replace?  This sounds like a 
slice size/placement problem, but clearly, prtvtoc seems to think that 
everything is the same.  Are you certain that you did prtvtoc on the correct 
drive, and not one of the active disks by mistake?

Gregg Wonderly

 As does fdisk -G:
 root@nas:~# fdisk -G /dev/rdsk/c16t5000C5002AA08E4Dd0
 * Physical geometry for device /dev/rdsk/c16t5000C5002AA08E4Dd0
 * PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ
   608006080000255   252   512   
 You have new mail in /var/mail/root
 root@nas:~# fdisk -G /dev/rdsk/c16t5000C5005295F727d0
 * Physical geometry for device /dev/rdsk/c16t5000C5005295F727d0
 * PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ
   608006080000255   252   512   
 
 
 
 On Mon, Sep 24, 2012 at 9:01 AM, LIC mesh licm...@gmail.com wrote:
 Yet another weird thing - prtvtoc shows both drives as having the same sector 
 size,  etc:
 root@nas:~# prtvtoc /dev/rdsk/c16t5000C5002AA08E4Dd0
 * /dev/rdsk/c16t5000C5002AA08E4Dd0 partition map
 *
 * Dimensions:
 * 512 bytes/sector
 * 3907029168 sectors
 * 3907029101 accessible sectors
 *
 * Flags:
 *   1: unmountable
 *  10: read-only
 *
 * Unallocated space:
 *   First SectorLast
 *   Sector CountSector 
 *  34   222   255
 *
 *  First SectorLast
 * Partition  Tag  FlagsSector CountSector  Mount Directory
0  400256 3907012495 3907012750
8 1100  3907012751 16384 3907029134
 root@nas:~# prtvtoc /dev/rdsk/c16t5000C5005295F727d0
 * /dev/rdsk/c16t5000C5005295F727d0 partition map
 *
 * Dimensions:
 * 512 bytes/sector
 * 3907029168 sectors
 * 3907029101 accessible sectors
 *
 * Flags:
 *   1: unmountable
 *  10: read-only
 *
 * Unallocated space:
 *   First SectorLast
 *   Sector CountSector 
 *  34   222   255
 *
 *  First SectorLast
 * Partition  Tag  FlagsSector CountSector  Mount Directory
0  400256 3907012495 3907012750
8 1100  3907012751 16384 3907029134
 
 
 
 
 
 On Mon, Sep 24, 2012 at 12:20 AM, Timothy Coalson tsc...@mst.edu wrote:
 I think you can fool a recent Illumos kernel into thinking a 4k disk is 512 
 (incurring a performance hit for that disk, and therefore the vdev and pool, 
 but to save a raidz1, it might be worth it):
 
 http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks , see 
 Overriding the Physical Sector Size
 
 I don't know what you might have to do to coax it to do the replace with a 
 hot spare (zpool replace? export/import?).  Perhaps there should be a feature 
 in ZFS that notifies when a pool is created or imported with a hot spare that 
 can't be automatically used in one or more vdevs?  The whole point of hot 
 spares is to have them automatically swap in when you aren't there to fiddle 
 with things, which is a bad time to find out it won't work.
 
 Tim
 
 On Sun, Sep 23, 2012 at 10:52 PM, LIC mesh licm...@gmail.com wrote:
 Well this is a new one
 
 Illumos/Openindiana let me add a device as a hot spare that evidently has a 
 different sector alignment than all of the other drives in the array.
 
 So now I'm at the point that I /need/ a hot spare and it doesn't look like I 
 have it.
 
 And, worse, the other spares I have are all the same model as said hot spare.
 
 Is there anything I can do with this or am I just going to be up the creek 
 when any one of the other drives in the raidz1 fails?
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] deleting a link in ZFS

2012-08-28 Thread Gregg Wonderly


On Aug 28, 2012, at 6:01 AM, Murray Cullen themurma...@gmail.com wrote:

 I've copied an old home directory from an install of OS 134 to the data pool 
 on my OI install. Opensolaris apparently had wine installed as I now have a 
 link to / in my data pool. I've tried everything I can think of to remove 
 this link with one exception. I have not tried mounting the pool on a 
 different OS yet, I'm trying to avoid that.
 
 Does anyone have any advice or suggestions? Ulink and rm error out as root.

What is the error?  Is it permission denied, I/O error, or what?

Gregg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] unable to import the zpool

2012-08-02 Thread GREGG WONDERLY

My experience has always been that ZFS tries hard to keep you from doing 
something wrong when devices are failing or otherwise unavailable.  With 
mirrors, it will import with a device missing from a mirror vdev.   I don't use 
cache or log devices in my mainly storage pools, so I've not seen a failure 
with a required device like that missing.  But, I've seen problems with a 
raid-z missing and the pool not coming on line.

As Richard says, it would seem there is a cache or log vdev missing since it is 
showing 1 of 2 mirrored devices in that vdev missing, but still complaining 
about a missing device.

The older OS and ZFS version may in fact have a misbehavior due to some error 
condition not being correctly managed.

Gregg Wonderly


On Aug 2, 2012, at 4:49 PM, Richard Elling richard.ell...@gmail.com wrote:

 
 On Aug 1, 2012, at 12:21 AM, Suresh Kumar wrote:
 
 Dear ZFS-Users,
 
 I am using Solarisx86 10u10, All the devices which are belongs to my zpool 
 are in available state .
 But I am unable to import the zpool.
 
 #zpool import tXstpool
 cannot import 'tXstpool': one or more devices is currently unavailable
 ==
 bash-3.2# zpool import
   pool: tXstpool
 id: 13623426894836622462
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
 devices and try again.
see: http://www.sun.com/msg/ZFS-8000-6X
 config:
 
 tXstpool UNAVAIL  missing device
   mirror-0   DEGRADED
 c2t210100E08BB2FC85d0s0  FAULTED  corrupted data
 c2t21E08B92FC85d2ONLINE
 
 Additional devices are known to be part of this pool, though their
 exact configuration cannot be determined.
 
 
 This message is your clue. The pool is missing a device. In most of the cases
 where I've seen this, it occurs on older ZFS implementations and the missing
 device is an auxiliary device: cache or spare.
  -- richard
 
 --
 ZFS Performance and Training
 richard.ell...@richardelling.com
 +1-760-896-4422
 
 
 
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can the ZFS copies attribute substitute HW disk redundancy?

2012-07-30 Thread GREGG WONDERLY


On Jul 29, 2012, at 3:12 PM, opensolarisisdeadlongliveopensolaris 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jim Klimov
 
   I wondered if the copies attribute can be considered sort
 of equivalent to the number of physical disks - limited to seek
 times though. Namely, for the same amount of storage on a 4-HDD
 box I could use raidz1 and 4*1tb@copies=1 or 4*2tb@copies=2 or
 even 4*3tb@copies=3, for example.
 
 The first question - reliability...
 
 copies might be on the same disk.  So it's not guaranteed to help if you 
 have a disk failure.

I thought I understood that copies would not be on the same disk, I guess I 
need to go read up on this again.

Gregg Wonderly
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] stop sparing process

2012-07-28 Thread GREGG WONDERLY

Well, I hate to do it, but sometimes, I've just unplugged the power on my SATA 
drives, or ejected them if hot plug to stop nonsense that I could not stop.

As a matter of fact, I recently received a replacement drive for one I RMA'd.  
I attached it to the mate drive, and it floundered for more than 5 minutes, 
disabling 'format' and 'spool status' pretty much.  If there was anything that 
I'd change about ZPOOL activities, is I'd change user ioctl operations into 
async activities on kernel threads, and have a complete view of the pools 
stored in ram that could be read (as updates occurred) via a unix domain socket 
by a status reporting tool.  That tool, in a GUI desktop environment, would 
post a read, and when satisfied, it would report the details of the error, 
transition, etc. as a popup on the desktop.  Whether a GUI was present or 
not, it should log the data to syslog.

That would make it much more nice to use ZFS so that admins could always take 
action on multiple pools and devices without being burdened by the constant 
problems with failing devices locking you out of system administration 
activities.

Gregg Wonderly

On Jul 28, 2012, at 6:45 AM, Antonio S. Cofiño antonio.cof...@unican.es wrote:

 Hello everyone,
 
 Somebody knows how to stop a sparing process?
 
 I have tried:
 ad...@seal.macc.unican.es:~$ pfexec zpool detach oceano c8t24d0
 cannot detach c8t24d0: no valid replicas
 
 without success.
 
 I know that the drive been spared is OK and I want to stop the process.
 
 Here you can see my zpool status (the failing disk which originated a general 
 failure is been replaced):
 
 admin@seal:~$ zpool status
  pool: oceano
 state: DEGRADED
 status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
 action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h3m, 0.01% done, 883h56m to go
 config:
 
NAME STATE READ WRITE CKSUM
oceano   DEGRADED 0 0 0
  raidz2-0   ONLINE   0 0 0
c5t5000CCA369C5A416d0ONLINE   0 0 0
c5t5000CCA369C5A420d0ONLINE   0 0 0
c5t5000CCA369C5A432d0ONLINE   0 0 0
c10t5000CCA369C505D5d0   ONLINE   0 0 0
spare-4  ONLINE   0 0 0
  c10t5000CCA369C506AFd0 ONLINE   0 0 0
  c8t24d0ONLINE   0 0 0  131M 
 resilvered
c10t5000CCA369C506BBd0   ONLINE   0 0 0
c5t5000CCA369C5C19Ad0ONLINE   0 0 0
c10t5000CCA369C508C9d0   ONLINE   0 0 0
c5t5000CCA369C52E05d0ONLINE   0 0 0
c10t5000CCA369C508E0d0   ONLINE   0 0 0
c10t5000CCA369C50609d0   ONLINE   0 0 0
  raidz2-1   ONLINE   0 0 0
c4t5d0   ONLINE   0 0 0
c4t6d0   ONLINE   0 0 0
c4t7d0   ONLINE   0 0 0
c8t10d0  ONLINE   0 0 0
c8t11d0  ONLINE   0 0 0
c8t12d0  ONLINE   0 0 0
c8t13d0  ONLINE   0 0 0
c8t14d0  ONLINE   0 0 0
c8t15d0  ONLINE   0 0 0
c8t16d0  ONLINE   0 0 0
c8t17d0  ONLINE   0 0 0
  raidz2-2   ONLINE   0 0 0
c4t8d0   ONLINE   0 0 0
c4t9d0   ONLINE   0 0 0
c4t10d0  ONLINE   0 0 0
c4t11d0  ONLINE   0 0 0
c8t6d0   ONLINE   0 0 0
c8t18d0  ONLINE   0 0 0
c8t19d0  ONLINE   0 0 0
c8t20d0  ONLINE   0 0 0
c8t21d0  ONLINE   0 0 0
c8t22d0  ONLINE   0 0 0
c8t23d0  ONLINE   0 0 0
  raidz2-3   ONLINE   0 0 0
c5t5000CCA369C5A41Dd0ONLINE   0 0 0
c10t5000CCA369C4E90Bd0   ONLINE   0 0 0
c5t5000CCA369C5A42Dd0ONLINE   0 0 0
c10t5000CCA369C4F888d0   ONLINE   0 0 0
c5t5000CCA369C5A374d0ONLINE   0 0 0
c10t5000CCA369C50F1Fd0   ONLINE   0 0 0

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly

Since there is a finite number of bit patterns per block, have you tried to 
just calculate the SHA-256 or SHA-512 for every possible bit pattern to see if 
there is ever a collision?  If you found an algorithm that produced no 
collisions for any possible block bit pattern, wouldn't that be the win?

Gregg Wonderly

On Jul 11, 2012, at 5:56 AM, Sašo Kiselkov wrote:

 On 07/11/2012 12:24 PM, Justin Stringfellow wrote:
 Suppose you find a weakness in a specific hash algorithm; you use this
 to create hash collisions and now imagined you store the hash collisions 
 in a zfs dataset with dedup enabled using the same hash algorithm.
 
 Sorry, but isn't this what dedup=verify solves? I don't see the problem 
 here. Maybe all that's needed is a comment in the manpage saying hash 
 algorithms aren't perfect.
 
 It does solve it, but at a cost to normal operation. Every write gets
 turned into a read. Assuming a big enough and reasonably busy dataset,
 this leads to tremendous write amplification.
 
 Cheers,
 --
 Saso
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly

But this is precisely the kind of observation that some people seem to miss 
out on the importance of.  As Tomas suggested in his post, if this was true, 
then we could have a huge compression ratio as well.  And even if there was 10% 
of the bit patterns that created non-unique hashes, you could use the fact that 
a block hashed to a known bit pattern that didn't have collisions, to compress 
the other 90% of your data.

I'm serious about this from a number of perspectives.  We worry about the time 
it would take to reverse SHA or RSA hashes to passwords, not even thinking that 
what if someone has been quietly computing all possible hashes for the past 
10-20 years into a database some where, with every 5-16 character password, and 
now has an instantly searchable hash-to-password database.

Sometimes we ignore the scale of time, thinking that only the immediately 
visible details are what we have to work with.

If no one has computed the hashes for every single 4K and 8K block, then fine.  
But, if that was done, and we had that data, we'd know for sure which algorithm 
was going to work the best for the number of bits we are considering.

Speculating based on the theory of the algorithms for random number of bits 
is just silly.  Where's the real data that tells us what we need to know?

Gregg Wonderly

On Jul 11, 2012, at 9:02 AM, Sašo Kiselkov wrote:

 On 07/11/2012 03:57 PM, Gregg Wonderly wrote:
 Since there is a finite number of bit patterns per block, have you tried to 
 just calculate the SHA-256 or SHA-512 for every possible bit pattern to see 
 if there is ever a collision?  If you found an algorithm that produced no 
 collisions for any possible block bit pattern, wouldn't that be the win?
 
 Don't think that, if you can think of this procedure, that the crypto
 security guys at universities haven't though about it as well? Of course
 they have. No, simply generating a sequence of random patterns and
 hoping to hit a match won't do the trick.
 
 P.S. I really don't mean to sound smug or anything, but I know one thing
 for sure: the crypto researchers who propose these algorithms are some
 of the brightest minds on this topic on the planet, so I would hardly
 think they didn't consider trivial problems.
 
 Cheers,
 --
 Saso

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly

Unfortunately, the government imagines that people are using their home 
computers to compute hashes and try and decrypt stuff.  Look at what is 
happening with GPUs these days.  People are hooking up 4 GPUs in their 
computers and getting huge performance gains.  5-6 char password space covered 
in a few days.  12 or so chars would take one machine a couple of years if I 
recall.  So, if we had 20 people with that class of machine, we'd be down to a 
few months.   I'm just suggesting that while the compute space is still huge, 
it's not actually undoable, it just requires some thought into how to approach 
the problem, and then some time to do the computations.

Huge space, but still finite…

Gregg Wonderly

On Jul 11, 2012, at 9:13 AM, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Gregg Wonderly
 
 Since there is a finite number of bit patterns per block, have you tried
 to just
 calculate the SHA-256 or SHA-512 for every possible bit pattern to see if
 there
 is ever a collision?  If you found an algorithm that produced no
 collisions for
 any possible block bit pattern, wouldn't that be the win?
 
 Maybe I misunderstand what you're saying, but if I got it right, what you're
 saying is physically impossible to do in the time of the universe...  And
 guaranteed to fail even if you had all the computational power of God.
 
 I think you're saying ... In a block of 128k, sequentially step through all
 the possible values ... starting with 0, 1, 2, ... 2^128k ... and compute
 the hashes of each value, and see if you ever find a hash collision.  If
 this is indeed what you're saying, recall, the above operation will require
 on order 2^128k operations to complete.  But present national security
 standards accept 2^256 operations as satisfactory to protect data from brute
 force attacks over the next 30 years.  Furthermore, in a 128k block, there
 exist 2^128k possible values, while in a 512bit hash, there exist only 2^512
 possible values (which is still a really huge number.)  This means there
 will exist at least 2^127.5k collisions.  However, these numbers are so
 astronomically universally magnanimously huge, it will still take more than
 a lifetime to find any one of those collisions.  So it's impossible to
 perform such a computation, and if you could, you would be guaranteed to
 find a LOT of collisions.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly

This is exactly the issue for me.  It's vital to always have verify on.  If you 
don't have the data to prove that every possible block combination possible, 
hashes uniquely for the small bit space we are talking about, then how in the 
world can you say that verify is not necessary?  That just seems ridiculous 
to propose.

Gregg Wonderly

On Jul 11, 2012, at 9:22 AM, Bob Friesenhahn wrote:

 On Wed, 11 Jul 2012, Sašo Kiselkov wrote:
 the hash isn't used for security purposes. We only need something that's
 fast and has a good pseudo-random output distribution. That's why I
 looked toward Edon-R. Even though it might have security problems in
 itself, it's by far the fastest algorithm in the entire competition.
 
 If an algorithm is not 'secure' and zfs is not set to verify, doesn't that 
 mean that a knowledgeable user will be able to cause intentional data 
 corruption if deduplication is enabled?  A user with very little privilege 
 might be able to cause intentional harm by writing the magic data block 
 before some other known block (which produces the same hash) is written.  
 This allows one block to substitute for another.
 
 It does seem that security is important because with a human element, data is 
 not necessarily random.
 
 Bob
 -- 
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,
 http://www.GraphicsMagick.org/___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly

Yes, but from the other angle, the number of unique 128K blocks that you can 
store on your ZFS pool, is actually finitely small, compared to the total 
space.  So the patterns you need to actually consider is not more than the 
physical limits of the universe.

Gregg Wonderly

On Jul 11, 2012, at 9:39 AM, Sašo Kiselkov wrote:

 On 07/11/2012 04:27 PM, Gregg Wonderly wrote:
 Unfortunately, the government imagines that people are using their home 
 computers to compute hashes and try and decrypt stuff.  Look at what is 
 happening with GPUs these days.  People are hooking up 4 GPUs in their 
 computers and getting huge performance gains.  5-6 char password space 
 covered in a few days.  12 or so chars would take one machine a couple of 
 years if I recall.  So, if we had 20 people with that class of machine, we'd 
 be down to a few months.   I'm just suggesting that while the compute space 
 is still huge, it's not actually undoable, it just requires some thought 
 into how to approach the problem, and then some time to do the computations.
 
 Huge space, but still finite…
 
 There are certain physical limits which one cannot exceed. For instance,
 you cannot store 2^256 units of 32-byte quantities in Earth. Even if you
 used proton spin (or some other quantum property) to store a bit, there
 simply aren't enough protons in the entire visible universe to do it.
 You will never ever be able to search a 256-bit memory space using a
 simple exhaustive search. The reason why our security hashes are so long
 (256-bits, 512-bits, more...) is because attackers *don't* do an
 exhaustive search.
 
 --
 Saso

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly

So, if I had a block collision on my ZFS pool that used dedup, and it had my 
bank balance of $3,212.20 on it, and you tried to write your bank balance of 
$3,292,218.84 and got the same hash, no verify, and thus you got my 
block/balance and now your bank balance was reduced by 3 orders of magnitude, 
would you be okay with that?  What assurances would you be content with using 
my ZFS pool?

Gregg Wonderly

On Jul 11, 2012, at 9:43 AM, Sašo Kiselkov wrote:

 On 07/11/2012 04:30 PM, Gregg Wonderly wrote:
 This is exactly the issue for me.  It's vital to always have verify on.  If 
 you don't have the data to prove that every possible block combination 
 possible, hashes uniquely for the small bit space we are talking about, 
 then how in the world can you say that verify is not necessary?  That just 
 seems ridiculous to propose.
 
 Do you need assurances that in the next 5 seconds a meteorite won't fall
 to Earth and crush you? No. And yet, the Earth puts on thousands of tons
 of weight each year from meteoric bombardment and people have been hit
 and killed by them (not to speak of mass extinction events). Nobody has
 ever demonstrated of being able to produce a hash collision in any
 suitably long hash (128-bits plus) using a random search. All hash
 collisions have been found by attacking the weaknesses in the
 mathematical definition of these functions (i.e. some part of the input
 didn't get obfuscated well in the hash function machinery and spilled
 over into the result, resulting in a slight, but usable non-randomness).
 
 Cheers,
 --
 Saso

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly

I'm just suggesting that the time frame of when 256-bits or 512-bits is less 
safe, is closing faster than one might actually think, because social elements 
of the internet allow a lot more effort to be focused on a single problem 
than one might consider.  

Gregg Wonderly

On Jul 11, 2012, at 9:50 AM, Edward Ned Harvey wrote:

 From: Gregg Wonderly [mailto:gr...@wonderly.org]
 Sent: Wednesday, July 11, 2012 10:28 AM
 
 Unfortunately, the government imagines that people are using their home
 computers to compute hashes and try and decrypt stuff.  Look at what is
 happening with GPUs these days.  
 
 heheheh.  I guess the NSA didn't think of that.;-)
 (That's sarcasm, in case anyone didn't get it.)
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly

You're entirely sure that there could never be two different blocks that can 
hash to the same value and have different content?

Wow, can you just send me the cash now and we'll call it even?

Gregg

On Jul 11, 2012, at 9:59 AM, Sašo Kiselkov wrote:

 On 07/11/2012 04:56 PM, Gregg Wonderly wrote:
 So, if I had a block collision on my ZFS pool that used dedup, and it had my 
 bank balance of $3,212.20 on it, and you tried to write your bank balance of 
 $3,292,218.84 and got the same hash, no verify, and thus you got my 
 block/balance and now your bank balance was reduced by 3 orders of 
 magnitude, would you be okay with that?  What assurances would you be 
 content with using my ZFS pool?
 
 I'd feel entirely safe. There, I said it.
 
 --
 Saso

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly

What I'm saying is that I am getting conflicting information from your 
rebuttals here.

I (and others) say there will be collisions that will cause data loss if verify 
is off.
You say it would be so rare as to be impossible from your perspective.
Tomas says, well then lets just use the hash value for a 4096X compression.
You fluff around his argument calling him names.
I say, well then compute all the possible hashes for all possible bit patterns 
and demonstrate no dupes.
You say it's not possible to do that.
I illustrate a way that loss of data could cost you money.
You say it's impossible for there to be a chance of me constructing a block 
that has the same hash but different content.
Several people have illustrated that 128K to 32bits is a huge and lossy ratio 
of compression, yet you still say it's viable to leave verify off.
I say, in fact that the total number of unique patterns that can exist on any 
pool is small, compared to the total, illustrating that I understand how the 
key space for the algorithm is small when looking at a ZFS pool, and thus could 
have a non-collision opportunity.

So I can see what perspective you are drawing your confidence from, but I, and 
others, are not confident that the risk has zero probability.

I'm pushing you to find a way to demonstrate that there is zero risk because if 
you do that, then you've, in fact created the ultimate compression factor (but 
enlarged the keys that could collide because the pool is now virtually larger), 
to date for random bit patterns, and you've also demonstrated that the 
particular algorithm is very good for dedup. 

That would indicate to me, that you can then take that algorithm, and run it 
inside of ZFS dedup to automatically manage when verify is necessary by 
detecting when a collision occurs.

I appreciate the push back.  I'm trying to drive thinking about this into the 
direction of what is known and finite, away from what is infinitely complex and 
thus impossible to explore.

Maybe all the work has already been done…

Gregg

On Jul 11, 2012, at 11:02 AM, Sašo Kiselkov wrote:

 On 07/11/2012 05:58 PM, Gregg Wonderly wrote:
 You're entirely sure that there could never be two different blocks that can 
 hash to the same value and have different content?
 
 Wow, can you just send me the cash now and we'll call it even?
 
 You're the one making the positive claim and I'm calling bullshit. So
 the onus is on you to demonstrate the collision (and that you arrived at
 it via your brute force method as described). Until then, my money stays
 safely on my bank account. Put up or shut up, as the old saying goes.
 
 --
 Saso

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gregg Wonderly


On Jul 11, 2012, at 12:06 PM, Sašo Kiselkov wrote:

 I say, in fact that the total number of unique patterns that can exist on 
 any pool is small, compared to the total, illustrating that I understand how 
 the key space for the algorithm is small when looking at a ZFS pool, and 
 thus could have a non-collision opportunity.
 
 This is so profoundly wrong that it leads me to suspect you never took
 courses on cryptography and/or information theory. The size of your
 storage pool DOESN'T MATTER ONE BIT to the size of the key space. Even
 if your pool were the size of a single block, we're talking here about
 the *mathematical* possibility of hitting on a random block that hashes
 to the same value. Given a stream of random data blocks (thus simulating
 an exhaustive brute-force search) and a secure pseudo-random hash
 function (which has a roughly equal chance of producing any output value
 for a given input block), you've got only a 10^-77 chance of getting a
 hash collision. If you don't understand how this works, read a book on
 digital coding theory.

The size of the pool does absolutely matter, because it represents the total 
number of possible bit patterns you can involve in the mapping (through the 
math).  If the size of the ZFS pool is limited, the total number of unique 
blocks is in fact limited by the size of the pool.  This affects how many 
collisions are possible, and thus how effective dedup can be. 

Overtime, if the bit patterns can change on each block, at some point, you can 
arrive at one of the collisions.  Yes, it's rare, I'm not disputing that, I am 
disputing that the risk is discardable in computer applications where data 
integrity matters. For example, losing money as the example I used.

 I'm pushing you to find a way to demonstrate that there is zero risk because 
 if you do that, then you've, in fact created the ultimate compression factor 
 (but enlarged the keys that could collide because the pool is now virtually 
 larger), to date for random bit patterns, and you've also demonstrated that 
 the particular algorithm is very good for dedup. 
 That would indicate to me, that you can then take that algorithm, and run it 
 inside of ZFS dedup to automatically manage when verify is necessary by 
 detecting when a collision occurs.
 
 Do you know what a dictionary is in compression algorithms?

Yes I am familiar with this kind of compression

 Do you even
 know how things like Huffman coding or LZW work, at least in principle?

Yes

 If not, then I can see why you didn't understand my earlier explanations
 of why hashes aren't usable for compression.

With zero collisions in a well defined key space, they work perfectly for 
compression.  To whit, you are saying that you are comfortable enough using 
them for dedup, which is exactly a form of compression.  I'm agreeing that the 
keyspace is huge, but the collision possibilities mean I'm not comfortable with 
verify=no

If there wasn't a sufficiently small keyspace in a ZFS pool, then dedup would 
never succeed.  There are some block contents that are recurring.  Usually 
blocks filled with 00, FF, or some pattern from a power up memory pattern etc.  
So those few common patterns are easily dedup'd out.  

 
 I appreciate the push back.  I'm trying to drive thinking about this into 
 the direction of what is known and finite, away from what is infinitely 
 complex and thus impossible to explore.
 
 If you don't understand the mathematics behind my arguments, just say so.

I understand the math.   I'm not convinced it's nothing to worry about, because 
my data is valuable enough to me that I am using ZFS.   If I was using dedup, 
I'd for sure turn on verify…

Gregg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-16 Thread Gregg Wonderly

Use 'dd' to replicate as much of lofi/2 as you can onto another device, and then 
cable that into place?


It looks like you just need to put a functioning, working, but not correct 
device, in that slot so that it will import and then you can 'zpool replace' the 
new disk into the pool perhaps?


Gregg Wonderly

On 6/16/2012 2:02 AM, Scott Aitken wrote:

On Sat, Jun 16, 2012 at 08:54:05AM +0200, Stefan Ring wrote:

when you say remove the device, I assume you mean simply make it unavailable
for import (I can't remove it from the vdev).

Yes, that's what I meant.


root@openindiana-01:/mnt# zpool import -d /dev/lofi
??pool: ZP-8T-RZ1-01
?? ??id: 9952605666247778346
??state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
?? ?? ?? ??devices and try again.
?? see: http://www.sun.com/msg/ZFS-8000-3C
config:

?? ?? ?? ??ZP-8T-RZ1-01 ?? ?? ?? ?? ?? ?? ??FAULTED ??corrupted data
?? ?? ?? ?? ??raidz1-0 ?? ?? ?? ?? ?? ?? ?? ??DEGRADED
?? ?? ?? ?? ?? ??12339070507640025002 ??UNAVAIL ??cannot open
?? ?? ?? ?? ?? ??/dev/lofi/5 ?? ?? ?? ?? ?? ONLINE
?? ?? ?? ?? ?? ??/dev/lofi/4 ?? ?? ?? ?? ?? ONLINE
?? ?? ?? ?? ?? ??/dev/lofi/3 ?? ?? ?? ?? ?? ONLINE
?? ?? ?? ?? ?? ??/dev/lofi/1 ?? ?? ?? ?? ?? ONLINE

It's interesting that even though 4 of the 5 disks are available, it still
can import it as DEGRADED.

I agree that it's interesting. Now someone really knowledgable will
need to have a look at this. I can only imagine that somehow the
devices contain data from different points in time, and that it's too
far apart for the aggressive txg rollback that was added in PSARC
2009/479. Btw, did you try that? Try: zpool import -d /dev/lofi -FVX
ZP-8T-RZ1-01.


Hi again,

that got slightly further, but still no dice:

root@openindiana-01:/mnt#  zpool import -d /dev/lofi -FVX ZP-8T-RZ1-01
root@openindiana-01:/mnt# zpool list
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
ZP-8T-RZ1-01  -  -  -  -  -  FAULTED  -
rpool 15.9G  2.17G  13.7G13%  1.00x  ONLINE  -
root@openindiana-01:/mnt# zpool status
   pool: ZP-8T-RZ1-01
  state: FAULTED
status: One or more devices could not be used because the label is missing
 or invalid.  There are insufficient replicas for the pool to continue
 functioning.
action: Destroy and re-create the pool from
 a backup source.
see: http://www.sun.com/msg/ZFS-8000-5E
   scan: none requested
config:

 NAME  STATE READ WRITE CKSUM
 ZP-8T-RZ1-01  FAULTED  0 0 1  corrupted data
   raidz1-0ONLINE   0 0 6
 12339070507640025002  UNAVAIL  0 0 0  was /dev/lofi/2
 /dev/lofi/5   ONLINE   0 0 0
 /dev/lofi/4   ONLINE   0 0 0
 /dev/lofi/3   ONLINE   0 0 0
 /dev/lofi/1   ONLINE   0 0 0

root@openindiana-01:/mnt# zpool scrub ZP-8T-RZ1-01
cannot scrub 'ZP-8T-RZ1-01': pool is currently unavailable

Thanks for your tenacity Stefan.
Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-16 Thread Gregg Wonderly


On Jun 16, 2012, at 9:49 AM, Scott Aitken wrote:

 On Sat, Jun 16, 2012 at 09:09:53AM -0500, Gregg Wonderly wrote:
 Use 'dd' to replicate as much of lofi/2 as you can onto another device, and 
 then 
 cable that into place?
 
 It looks like you just need to put a functioning, working, but not correct 
 device, in that slot so that it will import and then you can 'zpool replace' 
 the 
 new disk into the pool perhaps?
 
 Gregg Wonderly
 
 On 6/16/2012 2:02 AM, Scott Aitken wrote:
 On Sat, Jun 16, 2012 at 08:54:05AM +0200, Stefan Ring wrote:
 when you say remove the device, I assume you mean simply make it 
 unavailable
 for import (I can't remove it from the vdev).
 Yes, that's what I meant.
 
 root@openindiana-01:/mnt# zpool import -d /dev/lofi
 ??pool: ZP-8T-RZ1-01
 ?? ??id: 9952605666247778346
 ??state: FAULTED
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
 ?? ?? ?? ??devices and try again.
 ?? see: http://www.sun.com/msg/ZFS-8000-3C
 config:
 
 ?? ?? ?? ??ZP-8T-RZ1-01 ?? ?? ?? ?? ?? ?? ??FAULTED ??corrupted data
 ?? ?? ?? ?? ??raidz1-0 ?? ?? ?? ?? ?? ?? ?? ??DEGRADED
 ?? ?? ?? ?? ?? ??12339070507640025002 ??UNAVAIL ??cannot open
 ?? ?? ?? ?? ?? ??/dev/lofi/5 ?? ?? ?? ?? ?? ONLINE
 ?? ?? ?? ?? ?? ??/dev/lofi/4 ?? ?? ?? ?? ?? ONLINE
 ?? ?? ?? ?? ?? ??/dev/lofi/3 ?? ?? ?? ?? ?? ONLINE
 ?? ?? ?? ?? ?? ??/dev/lofi/1 ?? ?? ?? ?? ?? ONLINE
 
 It's interesting that even though 4 of the 5 disks are available, it still
 can import it as DEGRADED.
 I agree that it's interesting. Now someone really knowledgable will
 need to have a look at this. I can only imagine that somehow the
 devices contain data from different points in time, and that it's too
 far apart for the aggressive txg rollback that was added in PSARC
 2009/479. Btw, did you try that? Try: zpool import -d /dev/lofi -FVX
 ZP-8T-RZ1-01.
 
 Hi again,
 
 that got slightly further, but still no dice:
 
 root@openindiana-01:/mnt#  zpool import -d /dev/lofi -FVX ZP-8T-RZ1-01
 root@openindiana-01:/mnt# zpool list
 NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 ZP-8T-RZ1-01  -  -  -  -  -  FAULTED  -
 rpool 15.9G  2.17G  13.7G13%  1.00x  ONLINE  -
 root@openindiana-01:/mnt# zpool status
   pool: ZP-8T-RZ1-01
  state: FAULTED
 status: One or more devices could not be used because the label is missing
 or invalid.  There are insufficient replicas for the pool to 
 continue
 functioning.
 action: Destroy and re-create the pool from
 a backup source.
see: http://www.sun.com/msg/ZFS-8000-5E
   scan: none requested
 config:
 
 NAME  STATE READ WRITE CKSUM
 ZP-8T-RZ1-01  FAULTED  0 0 1  corrupted data
   raidz1-0ONLINE   0 0 6
 12339070507640025002  UNAVAIL  0 0 0  was 
 /dev/lofi/2
 /dev/lofi/5   ONLINE   0 0 0
 /dev/lofi/4   ONLINE   0 0 0
 /dev/lofi/3   ONLINE   0 0 0
 /dev/lofi/1   ONLINE   0 0 0
 
 root@openindiana-01:/mnt# zpool scrub ZP-8T-RZ1-01
 cannot scrub 'ZP-8T-RZ1-01': pool is currently unavailable
 
 Thanks for your tenacity Stefan.
 Scott
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 
 
 Hi Greg,
 
 lofi/2 is a dd of a real disk.  I am using disk images because I can roll
 back, clone etc without using the original drives (which are long gone
 anyway).
 
 I have tried making /2 unavailable for import, and zfs just moans that it
 can't be opened.  It fails to import even though I have only one disk missing
 of a RAIDZ array.

My experience is that ZFS will not import a pool with a missing disk.  There 
has to be something in that slot before the import will occur.  Even if the 
disk is corrupt, it needs to be there.  I think this is a failsafe mechanism 
that tries to keep a pool from going live when you have mistakenly not 
connected all the drives.  That keeps the disks from becoming 
chronologically/txn misaligned which can result in data loss, in the right 
combinations I believe.

Gregg Wonderly
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-16 Thread Gregg Wonderly


On Jun 16, 2012, at 10:13 AM, Scott Aitken wrote:

 On Sat, Jun 16, 2012 at 09:58:40AM -0500, Gregg Wonderly wrote:
 
 On Jun 16, 2012, at 9:49 AM, Scott Aitken wrote:
 
 On Sat, Jun 16, 2012 at 09:09:53AM -0500, Gregg Wonderly wrote:
 Use 'dd' to replicate as much of lofi/2 as you can onto another device, 
 and then 
 cable that into place?
 
 It looks like you just need to put a functioning, working, but not correct 
 device, in that slot so that it will import and then you can 'zpool 
 replace' the 
 new disk into the pool perhaps?
 
 Gregg Wonderly
 
 On 6/16/2012 2:02 AM, Scott Aitken wrote:
 On Sat, Jun 16, 2012 at 08:54:05AM +0200, Stefan Ring wrote:
 when you say remove the device, I assume you mean simply make it 
 unavailable
 for import (I can't remove it from the vdev).
 Yes, that's what I meant.
 
 root@openindiana-01:/mnt# zpool import -d /dev/lofi
 ??pool: ZP-8T-RZ1-01
 ?? ??id: 9952605666247778346
 ??state: FAULTED
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
 ?? ?? ?? ??devices and try again.
 ?? see: http://www.sun.com/msg/ZFS-8000-3C
 config:
 
 ?? ?? ?? ??ZP-8T-RZ1-01 ?? ?? ?? ?? ?? ?? ??FAULTED ??corrupted data
 ?? ?? ?? ?? ??raidz1-0 ?? ?? ?? ?? ?? ?? ?? ??DEGRADED
 ?? ?? ?? ?? ?? ??12339070507640025002 ??UNAVAIL ??cannot open
 ?? ?? ?? ?? ?? ??/dev/lofi/5 ?? ?? ?? ?? ?? ONLINE
 ?? ?? ?? ?? ?? ??/dev/lofi/4 ?? ?? ?? ?? ?? ONLINE
 ?? ?? ?? ?? ?? ??/dev/lofi/3 ?? ?? ?? ?? ?? ONLINE
 ?? ?? ?? ?? ?? ??/dev/lofi/1 ?? ?? ?? ?? ?? ONLINE
 
 It's interesting that even though 4 of the 5 disks are available, it 
 still
 can import it as DEGRADED.
 I agree that it's interesting. Now someone really knowledgable will
 need to have a look at this. I can only imagine that somehow the
 devices contain data from different points in time, and that it's too
 far apart for the aggressive txg rollback that was added in PSARC
 2009/479. Btw, did you try that? Try: zpool import -d /dev/lofi -FVX
 ZP-8T-RZ1-01.
 
 Hi again,
 
 that got slightly further, but still no dice:
 
 root@openindiana-01:/mnt#  zpool import -d /dev/lofi -FVX ZP-8T-RZ1-01
 root@openindiana-01:/mnt# zpool list
 NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 ZP-8T-RZ1-01  -  -  -  -  -  FAULTED  -
 rpool 15.9G  2.17G  13.7G13%  1.00x  ONLINE  -
 root@openindiana-01:/mnt# zpool status
  pool: ZP-8T-RZ1-01
 state: FAULTED
 status: One or more devices could not be used because the label is missing
or invalid.  There are insufficient replicas for the pool to 
 continue
functioning.
 action: Destroy and re-create the pool from
a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
  scan: none requested
 config:
 
NAME  STATE READ WRITE CKSUM
ZP-8T-RZ1-01  FAULTED  0 0 1  corrupted 
 data
  raidz1-0ONLINE   0 0 6
12339070507640025002  UNAVAIL  0 0 0  was 
 /dev/lofi/2
/dev/lofi/5   ONLINE   0 0 0
/dev/lofi/4   ONLINE   0 0 0
/dev/lofi/3   ONLINE   0 0 0
/dev/lofi/1   ONLINE   0 0 0
 
 root@openindiana-01:/mnt# zpool scrub ZP-8T-RZ1-01
 cannot scrub 'ZP-8T-RZ1-01': pool is currently unavailable
 
 Thanks for your tenacity Stefan.
 Scott
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 
 
 Hi Greg,
 
 lofi/2 is a dd of a real disk.  I am using disk images because I can roll
 back, clone etc without using the original drives (which are long gone
 anyway).
 
 I have tried making /2 unavailable for import, and zfs just moans that it
 can't be opened.  It fails to import even though I have only one disk 
 missing
 of a RAIDZ array.
 
 My experience is that ZFS will not import a pool with a missing disk.  There 
 has to be something in that slot before the import will occur.  Even if the 
 disk is corrupt, it needs to be there.  I think this is a failsafe 
 mechanism that tries to keep a pool from going live when you have mistakenly 
 not connected all the drives.  That keeps the disks from becoming 
 chronologically/txn misaligned which can result in data loss, in the right 
 combinations I believe.
 
 Gregg Wonderly
 
 
 Hi again Gregg,
 
 not sure if I should be top posting this...
 
 Given I am working with images, it's hard to put just anything in place of
 lofi/2.  ZFS scans all of the files in the directory for ZFS labels, so just
 replacing lofi/2 with an empty file (for example) just means ZFS skips it,
 which is the same result as deleting lofi/2 altogether.  I did this, but to
 no avail.  ZFS complains about having insufficient replicas.

I don't really know much about the total space layout of a ZFS disk surface, 
because I

Re: [zfs-discuss] What is your data error rate?

2012-01-24 Thread Gregg Wonderly

What I've noticed, is that when I have my drives in a situation of small 
airflow, and hence hotter operating temperatures, my disks will drop quite 
quickly.  I've now moved my systems into large cases, which large amounts of 
airflow and using the icydock brand of removable drive enclosures.


http://www.newegg.com/Product/Product.aspx?Item=N82E16817994097
http://www.newegg.com/Product/Product.aspx?Item=N82E16817994113

I use the SASUC8I SATA/SAS controller to access 8 drives.

http://www.newegg.com/Product/Product.aspx?Item=N82E16816117157

I put it in PCI-e x16 slots on graphics heavy motherboards which might have as 
many as 4x PCI-e x16 slots.  I am replacing an old motherboard with this one.


http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=1124780

The case that I found to be a good match for my needs is the Raven

http://www.newegg.com/Product/Product.aspx?Item=N82E16811163180

It has enough slots (7) to put 2x 3-in-2 and 1x 4-in-3 icy dock bays in to 
provide 10 drives in hot swap bays.


I really think that the big issue is that you must move the air.  The drives 
really need to stay cool or else you will see degraded performance and/or data 
loss much more often.


Gregg Wonderly

On 1/24/2012 9:50 AM, Stefan Ring wrote:

After having read this mailing list for a little while, I get the
impression that there are at least some people who regularly
experience on-disk corruption that ZFS should be able to report and
handle. I’ve been running a raidz1 on three 1TB consumer disks for
approx. 2 years now (about 90% full), and I scrub the pool every 3-4
weeks and have never had a single error. From the oft-quoted 10^14
error rate that consumer disks are rated at, I should have seen an
error by now -- the scrubbing process is not the only activity on the
disks, after all, and the data transfer volume from that alone clocks
in at almost exactly 10^14 by now.

Not that I’m worried, of course, but it comes at a slight surprise to
me. Or does the 10^14 rating just reflect the strength of the on-disk
ECC algorithm?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can I create a mirror for a root rpool?

2011-12-20 Thread Gregg Wonderly


On 12/19/2011 8:51 PM, Frank Cusack wrote:
If you don't detach the smaller drive, the pool size won't increase.  Even if 
the remaining smaller drive fails, that doesn't mean you have to detach it.  
So yes, the pool size might increase, but it won't be unexpectedly.  It will 
be because you detached all smaller drives.  Also, even if a smaller drive is 
failed, it can still be attached.
If you don't have a controller slot to connect the replacement drive through, 
then you have to remove the smaller drive, physically.  You can, then attach the 
replacement drive, but will replace work then, or must you remove and then add 
it because it is the same disk?

It doesn't make sense for attach to do anything with partition tables, IMHO.
I understand that in some cases, it might be more problematic for attach to 
assume some things about partitioning.  I don't know that I have the answer, 
but I know, from experience, that there is nothing I hate more than anything, 
then having to figure out how to partition disks on Solaris.  It's just too 
painful to have so many steps with conditions of use.
I *always* order the spare when I order the original drives, to have it on 
hand, even for my home system.  Drive sizes change more frequently than they 
fail, for me.  Sure, when I use the spare I may not be able to order a new 
spare of the same size, but at least at that time I have time to prepare and 
am not scrambling.
Most of the time, I have spares ready too.  I have returned 4 of one 
manufactures, and 2 of another, with 2 more disks showing signs of failure.  
These are all SATA disks on my home server.  At this point, with drive prices so 
high, it's not simple to pick up a couple of more spares to have on hand.   
For my Root pool, I had only no remaining 250GB disks that I've been using for 
root.So, I put in one of my 1.5TB spares for the moment, until I decide 
whether or not to order a new small drive.
On Mon, Dec 19, 2011 at 3:55 PM, Gregg Wonderly gregg...@gmail.com 
mailto:gregg...@gmail.com wrote:


That's why I'm asking.  I think it should always mirror the partition
table and allocate exactly the same amount of space so that the pool
doesn't suddenly change sizes unexpectedly and require a disk size that I
don't have at hand, to put the mirror back up.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can I create a mirror for a root rpool?

2011-12-19 Thread Gregg Wonderly

That's why I'm asking.  I think it should always mirror the partition table and 
allocate exactly the same amount of space so that the pool doesn't suddenly 
change sizes unexpectedly and require a disk size that I don't have at hand, to 
put the mirror back up.


Gregg

On 12/18/2011 4:08 PM, Nathan Kroenert wrote:
Do note, that though Frank is correct, you have to be a little careful around 
what might happen should you drop your original disk, and only the large 
mirror half is left... ;)


On 12/16/11 07:09 PM, Frank Cusack wrote:
You can just do fdisk to create a single large partition.  The attached 
mirror doesn't have to be the same size as the first component.


On Thu, Dec 15, 2011 at 11:27 PM, Gregg Wonderly gregg...@gmail.com 
mailto:gregg...@gmail.com wrote:


Cindy, will it ever be possible to just have attach mirror the surfaces,
including the partition tables?  I spent an hour today trying to get a
new mirror on my root pool.  There was a 250GB disk that failed.  I only
had a 1.5TB handy as a replacement.  prtvtoc ... | fmthard does not work
in this case and so you have to do the partitioning by hand, which is
just silly to fight with anyway.

Gregg

Sent from my iPhone

On Dec 15, 2011, at 6:13 PM, Tim Cook t...@cook.ms mailto:t...@cook.ms
wrote:


Do you still need to do the grub install?

On Dec 15, 2011 5:40 PM, Cindy Swearingen cindy.swearin...@oracle.com
mailto:cindy.swearin...@oracle.com wrote:

Hi Anon,

The disk that you attach to the root pool will need an SMI label
and a slice 0.

The syntax to attach a disk to create a mirrored root pool
is like this, for example:

# zpool attach rpool c1t0d0s0 c1t1d0s0

Thanks,

Cindy

On 12/15/11 16:20, Anonymous Remailer (austria) wrote:


On Solaris 10 If I install using ZFS root on only one drive is
there a way
to add another drive as a mirror later? Sorry if this was discussed
already. I searched the archives and couldn't find the answer.
Thank you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-19 Thread Gregg Wonderly


On 12/18/2011 4:23 PM, Jan-Aage Frydenbø-Bruvoll wrote:

Hi,

On Sun, Dec 18, 2011 at 22:14, Nathan Kroenertnat...@tuneunix.com  wrote:

  I know some others may already have pointed this out - but I can't see it
and not say something...

Do you realise that losing a single disk in that pool could pretty much
render the whole thing busted?

At least for me - the rate at which _I_ seem to lose disks, it would be
worth considering something different ;)

Yeah, I have thought that thought myself. I am pretty sure I have a
broken disk, however I cannot for the life of me find out which one.
zpool status gives me nothing to work on, MegaCli reports that all
virtual and physical drives are fine, and iostat gives me nothing
either.

What other tools are there out there that could help me pinpoint
what's going on?



One choice would be to take a single drive that you believe is in good working 
condition, and add it as a mirror to each single disk in turn.  If there is a 
bad disk, you will find out if the mirror fails because of a read error.  Scrub, 
though, should really be telling you everything that you need to know about disk 
failings, once the surface becomes corrupted enough that it can't be corrected 
by re-reading enough times.


It looks like you've started mirroring some of the drives.  That's really what 
you should be doing for the other non-mirror drives.


Gregg Wonderly
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can I create a mirror for a root rpool?

2011-12-16 Thread Gregg Wonderly

The issue is really quite simple.   The solaris install, on x86 at least, 
chooses to use slice-0 for the root partition.  That slice is not created by a 
default format/fdisk, and so we have the web strewn with


prtvtoc path/to/old/slice2 | fmthard -s - path/to/new/slice2

As a way to cause the two commands to access the entire disk.   If you have to 
use dissimilar sized disks because 1) that's the only media you have, or 2) you 
want to increase the size of your root pool, then all we end up with, is an 
error message about overlapping partitions and no ability to make progress.


If I then use dd if=/dev/zero to erase the front of the disk, and the fire up 
format, select fdisk, say yes to create solaris2 partitioning, and then use 
partition to add a slice 0, I will have problems getting the whole disk in play.


So, the end result, is that I have to jump through hoops, when in the end, I'd 
really like to just add the whole disk, every time.  If I say


zpool attach rpool c8t0d0s0 c12d1

I really do mean the whole disk, and I'm not sure why it can't just happen.  
Failing to type a slice reference, is no worse of a 'typo' than typing 's2' by 
accident, because that's what I've been typing with all the other commands to 
try and get the disk partitioned.


I just really think there's not a lot of value in all of this, especially with 
ZFS, where we can, in fact add more disks/vdevs to a keep expanding space, and 
extremely rarely is that going to be done, for the root pool, with fractions of 
disks.


The use of SMI and absolute refusal to use EFI partitioning plus all of this 
just stacks up to a pretty large barrier to simple and/or easy administration.


I'm very nervous when I have a simplex filesystem setting there, and when a disk 
has died, I'm doubly nervous that the other half is going to fall over.


I'm not trying to be hard nosed about this, I'm just trying to share my angst 
and frustration with the details that drove me in that direction.


Gregg Wonderly

On 12/16/2011 2:56 AM, Andrew Gabriel wrote:

On 12/16/11 07:27 AM, Gregg Wonderly wrote:
Cindy, will it ever be possible to just have attach mirror the surfaces, 
including the partition tables?  I spent an hour today trying to get a new 
mirror on my root pool.  There was a 250GB disk that failed.  I only had a 
1.5TB handy as a replacement.  prtvtoc ... | fmthard does not work in this case


Can you be more specific why it fails?
I have seen a couple of cases, and I'm wondering if you're hitting the same 
thing.

Can you post the prtvtoc output of your original disk please?

and so you have to do the partitioning by hand, which is just silly to fight 
with anyway.


Gregg




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can I create a mirror for a root rpool?

2011-12-15 Thread Gregg Wonderly

Cindy, will it ever be possible to just have attach mirror the surfaces, 
including the partition tables?  I spent an hour today trying to get a new 
mirror on my root pool.  There was a 250GB disk that failed.  I only had a 
1.5TB handy as a replacement.  prtvtoc ... | fmthard does not work in this case 
and so you have to do the partitioning by hand, which is just silly to fight 
with anyway.

Gregg

Sent from my iPhone

On Dec 15, 2011, at 6:13 PM, Tim Cook t...@cook.ms wrote:

 Do you still need to do the grub install?
 
 On Dec 15, 2011 5:40 PM, Cindy Swearingen cindy.swearin...@oracle.com 
 wrote:
 Hi Anon,
 
 The disk that you attach to the root pool will need an SMI label
 and a slice 0.
 
 The syntax to attach a disk to create a mirrored root pool
 is like this, for example:
 
 # zpool attach rpool c1t0d0s0 c1t1d0s0
 
 Thanks,
 
 Cindy
 
 On 12/15/11 16:20, Anonymous Remailer (austria) wrote:
 
 On Solaris 10 If I install using ZFS root on only one drive is there a way
 to add another drive as a mirror later? Sorry if this was discussed
 already. I searched the archives and couldn't find the answer. Thank you.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] grrr, How to get rid of mis-touched file named `-c'

2011-11-28 Thread Gregg Wonderly


On 11/26/2011 5:30 AM, Brandon High wrote:

On Wed, Nov 23, 2011 at 11:43 AM, Harry Putnamrea...@newsguy.com  wrote:

OK, I'm out of escapes.  or other tricks... other than using emacs but
I haven't installed emacs as yet.

I can just ignore them of course, until such time as I do get emacs
installed, but by now I just want to know how it might be done from a
shell prompt.

rm ./-c ./-O ./-k
And many versions of getopt support the use of -- as the end of options 
indicator so that you can do


rm -- -c -O -k

to remove those as well.

Gregg Wonderly
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] how to set up solaris os and cache within one SSD

2011-11-16 Thread Gregg Wonderly


On 11/10/2011 7:42 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of darkblue

1 * XEON 5606
1 * supermirco X8DT3-LN4F
6 * 4G RECC RAM
22 * WD RE3 1T harddisk
4 * intel 320 (160G) SSD
1 * supermicro 846E1-900B chassis

I just want to say, this isn't supported hardware, and although many people 
will say they do this without problem, I've heard just as many people 
(including myself) saying it's unstable that way.

I recommend buying either the oracle hardware or the nexenta on whatever they 
recommend for hardware.

Definitely DO NOT run the free version of solaris without updates and expect it 
to be reliable.  But that's a separate issue.  I'm also emphasizing that even 
if you pay for solaris support on non-oracle hardware, don't expect it to be 
great.  But maybe it will be.
I think the key issue here, is whether this hardware will corrupt a pool or 
not.  Ultimately, the promise of ZFS, for me anyways, is that I can take disks 
to new hardware if/when needed.  I am not dependent on a controller or 
motherboard which provides some feature key to access the data on the disks.


Companies which sell key software, that you depend on working, generally have 
proven that software to work reliably on hardware which they might sell to make 
use of said software.


Apple's business model and success, for example is based on this fact, because 
they have a much smaller bug pool to consider.  Oracle hardware works out the 
same way.


I think supporting the development of ZFS is key to the next generation of 
storage solutions...  But, I don't need the class of hardware that Oracle wants 
me to pay for.  I need disks with 24/7 reliability.  I can wait till tomorrow to 
store something onto my server from my laptop/desktop.  Consumer/non-enterprise 
needs are quite different, and I don't think Oracle understands how to deal in 
the 1,000,000,000 potential customer marketplace.   They've had a hard enough 
time just working in the 100,000 customer marketplace.


Gregg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Data distribution not even between vdevs

2011-11-09 Thread Gregg Wonderly


On 11/9/2011 8:05 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Ding Honghui

But now, as show below, the first 2 raidz1 vdev usage is about 78% and the
last 2 raidz1 vdev usage is about 93%.

In this case, when you write, it should be writing to the first two vdevs,
not the last two.  So the fact that the last two are over 93% full should be
irrelevant in terms of write performance.



All my file is small files which size is about 150KB.

That's too bad.  Raidz performs well with large sequential data, and
performs poor with small random files.



Now the questions is:
1. Should I balance the data between the vdevs by copy the data and remove
the data which locate in last 2 vdevs?

If you want to.  But most people wouldn't bother.  Especially since you're
talking about 75% versus 90%.  It's difficult to balance it so *precisely*
as to get them both around 85%



2. Is there any method to automatically re-balance the data?
or

There is no automatic way to do it.
For me, this is a key issue.  If there was an automatic rebalancing mechanism, 
that same mechanism would work perfectly to allow pools to have disk sets 
removed.  It would provide the needed basic mechanism of just moving stuff 
around to eliminate the use of a particular part of the pool that you wanted to 
remove.


Gregg Wonderly
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Lycom has lots of hardware that looks interesting, is it supported

2010-01-14 Thread Gregg Wonderly

I've been building a few 6disk boxes for VirtualBox servers, and I am also 
surveying how I will add more disks as these boxes need it.  Looking around on 
the HCL, I see the Lycom PE-103 is supported.  That's just 2 more disks, I'm 
typically going to want to add a raid-z w/spare to my zpools, so I need at 
least 4 disks, and I'd prefer to build boxes with multi-lane esata expansion 
and put either 5 or 10 disks in them for expansion.

There are lots of devices on the lycom web site at http://www.lycom.com.tw.
The device at http://www.lycom.com.tw/st126rm.htm looks very attractive for 
bolting on to computer cases that are housing additional drives.

That device says that the PE-102 can be used for multi-lane connectivity.  Is 
multi-lane working in solaris, and since the PE-102 seems to have the same 
chipset as the PE-103, would it work on OpenSolaris?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] how to know available disk space

Re: [zfs-discuss] pool metadata has duplicate children

Re: [zfs-discuss] suggestions for e-SATA HBA card on x86/x64

Re: [zfs-discuss] Forcing ZFS options

Re: [zfs-discuss] suggestions for e-SATA HBA card on x86/x64

Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment

Re: [zfs-discuss] deleting a link in ZFS

Re: [zfs-discuss] unable to import the zpool

Re: [zfs-discuss] Can the ZFS copies attribute substitute HW disk redundancy?

Re: [zfs-discuss] stop sparing process

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

Re: [zfs-discuss] What is your data error rate?

Re: [zfs-discuss] Can I create a mirror for a root rpool?

Re: [zfs-discuss] Can I create a mirror for a root rpool?

Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

Re: [zfs-discuss] Can I create a mirror for a root rpool?

Re: [zfs-discuss] Can I create a mirror for a root rpool?

Re: [zfs-discuss] grrr, How to get rid of mis-touched file named `-c'

Re: [zfs-discuss] how to set up solaris os and cache within one SSD

Re: [zfs-discuss] Data distribution not even between vdevs

[zfs-discuss] Lycom has lots of hardware that looks interesting, is it supported

33 matches

Site Navigation

Mail list logo

Footer information