Re: [zfs-discuss] Re: Expanding raidz2

2006-07-13 Thread Luke Scharf

David Abrahams wrote:

I've seen people wondering if ZFS was a scam because the claims just
seemed too good to be true.  Given that ZFS *is* really great, I don't
think it would hurt to prominently advertise limitations like this one
it would probably benefit credibility considerably, and it's a real
consideration for anyone who's doing RAID-Z.

  
Very true.  I recently had someone imply to me that ZFS was a network 
protocol and everything else related to disks and file sharing -- 
instead of volume manager integrated with a filesystem and an 
automounter.  There is hype and misinformation out there.


As for the claims, I don't buy that it's impossible to corrupt a ZFS 
volume.  I've replicated the demo where the guy dd's /dev/urandom over 
part of the disk, and I believe that works -- but there are a lot of 
other ways to corrupt a filesystem in the real world.  I'm spending this 
morning setting up a server to try ZFS in our environment -- which will 
put it under a heavy load with a lot of large files and heavy churn.   
We'll see what happens!


-Luke



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Expanding raidz2

2006-07-13 Thread David Dyer-Bennet
Luke Scharf [EMAIL PROTECTED] writes:

 As for the claims, I don't buy that it's impossible to corrupt a ZFS
 volume.  I've replicated the demo where the guy dd's /dev/urandom
 over part of the disk, and I believe that works -- but there are a
 lot of other ways to corrupt a filesystem in the real world.  I'm
 spending this morning setting up a server to try ZFS in our
 environment -- which will put it under a heavy load with a lot of
 large files and heavy churn.  We'll see what happens!

I've done that one too.  It's fun -- and caused me to learn the
difference between /dev/random and /dev/urandom :-).

It's easy to corrupt the volume, though -- just copy random data over
*two* disks of a RAIDZ volume.  Okay, you have to either do the whole
volume, or get a little lucky to hit both copies of some piece of
information before you get corruption.  Or pull two disks out of the
rack at once.  

With the transactional nature and rotating pool of top-level blocks, I
think it will be pretty darned hard to corrupt a structure *short of*
deliberate damage exceeding the redundancy of the vdev.  If you
succeed, you've found a bug, don't forget to report it!
-- 
David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/
RKBA: http://www.dd-b.net/carry/
Pics: http://dd-b.lighthunters.net/ http://www.dd-b.net/dd-b/SnapshotAlbum/
Dragaera/Steven Brust: http://dragaera.info/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Expanding raidz2

2006-07-13 Thread Luke Scharf

David Dyer-Bennet wrote:

It's easy to corrupt the volume, though -- just copy random data over
*two* disks of a RAIDZ volume.  Okay, you have to either do the whole
volume, or get a little lucky to hit both copies of some piece of
information before you get corruption.  Or pull two disks out of the
rack at once.  
  
I tried that too - some of the files were borked, but I was impressed 
that other files on the volume were still recoverable.  Also, ZFS 
automatically started the scrub - which was handy.  Unfortunately, my 
test system only had one HDD (with 3 partitions simulating a RAID-Z), so 
the timing wasn't realistic.

With the transactional nature and rotating pool of top-level blocks, I
think it will be pretty darned hard to corrupt a structure *short of*
deliberate damage exceeding the redundancy of the vdev.  If you
succeed, you've found a bug, don't forget to report it!
  
I buy very good, backed by good theory and good coding.  After after a 
few months of testing, I might even buy better than any other general 
purpose filesystem and volume manager.


But infallible?  If so, I shall name my storage server Titanic.

-Luke



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Expanding raidz2

2006-07-13 Thread Rob Logan



comfortable with having 2 parity drives for 12 disks,


the thread starting config of 4 disks per controller(?):
zpool create tank raidz2 c1t1d0 c1t2d0 c1t3d0 c1t4d0c2t1d0 c2t2d0

then later
zpool add tank raidz2 c2t3d0 c2t4d0 c3t1d0 c3t2d0 c3t3d0 c3t4d0

as described, doubles ones IOPs, and usable space in tank, with the loss
of another two disks, splitting the cluster into four (and two parity)
writes per disk.  perhaps a 8 disk controller, and start with

zpool create tank raidz c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0

then do a
zpool add tank raidz c1t6d0 c1t7d0 c1t8d0 c2t1d0 c2t2d0
zpool add tank raidz c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t7d0
zpool add tank spare c2t8d0

gives one the same largeish cluster size div 4 per raidz disk, 3x the
IOPs, less parity math per write, and a hot spare for the same usable
space and loss of 4 disks.

splitting the max 128k cluster into 12 chunks (+2 parity) makes good MTTR
sense but not much performance sense.  if someone wants to do the MTTR
math between all three configs, I'd love to read it.

Rob

http://storageadvisors.adaptec.com/2005/11/02/actual-reliability-calculations-for-raid/
http://www.barringer1.com/ar.htm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Expanding raidz2

2006-07-13 Thread Richard Elling



David Abrahams wrote:

David Dyer-Bennet [EMAIL PROTECTED] writes:


Adam Leventhal [EMAIL PROTECTED] writes:


I'm not sure I even agree with the notion that this is a real
problem (and if it is, I don't think is easily solved). Stripe
widths are a function of the expected failure rate and fault domains
of the system which tend to be static in nature. A coarser solution
would be to create a new pool where you zfs send/zfs recv the
filesystems of the old pool.

RAIDZ expansion is a big enough deal that I may end up buying an
Infrant NAS box and using their X-RAID instead.  The ZFS should be
more secure, and I *really* like the block checksumming -- but the
ability to expand my existing pool by just adding a new disk is REALLY
REALLY USEFUL in a small office or home configuration.  


Yes, and while it's not an immediate showstopper for me, I'll want to
know that expansion is coming imminently before I adopt RAID-Z.


[in brainstorming mode, sans coffee so far this morning]

Better yet, buy two disks, say 500 GByte.  Need more space, replace
them with 750 GByte, because by then the price of the 750 GByte disks
will be as low as the 250 GByte disks today, and the 1.5 TByte disks
will be $400.  Over time, the cost of disks remains the same, but the
density increases.  This will continue to occur faster than the
development and qualification of complex software.  ZFS will already 
expand a mirror as you replace disks :-)  KISS

 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Expanding raidz2

2006-07-13 Thread Erik Trimble
On Thu, 2006-07-13 at 11:42 -0700, Richard Elling wrote:
 [in brainstorming mode, sans coffee so far this morning]
 
 Better yet, buy two disks, say 500 GByte.  Need more space, replace
 them with 750 GByte, because by then the price of the 750 GByte disks
 will be as low as the 250 GByte disks today, and the 1.5 TByte disks
 will be $400.  Over time, the cost of disks remains the same, but the
 density increases.  This will continue to occur faster than the
 development and qualification of complex software.  ZFS will already 
 expand a mirror as you replace disks :-)  KISS
   -- richard

Looking at our (Sun's) product line now, we're not just going after the
Enterprise market anymore. Specifically, the Medium Business market is a
target (few 100 people, a half-dozen IT staff, total). 

RAIDZ expansion for these folks is essentially a must-have to sell to
them.  Being able to expand a 2-drive array into a 5-drive RAIDZ by
simply pushing in new disks and typing a single command is a HUGE win.
Most hardware RAID (even the low-end, and both SCSI  SATA) controllers
can do this on-line nowdays.  It's something that is simply expected,
and not having it is a big black mark. 

A typical instance here is a small business server (2-4 CPUs) hooked to
a small JBOD. We're not going to sell them a fully populated JBOD to
start with, but selling them one 50% full is much more likely.  (look at
the price differential between a 3510FC fully and half populated).   In
the Small Business market, expandability is key, as their limited
budgets tend to make for Just-In-Time purchasing.  They are _much_ more
likely to buy from us things that can be had in a minimum configuration
at low cost, but have considerable future expansion, even if the
expansion costs them considerably more overall than getting the entire
thing in the first place. 

Also, mixing and matching inside a disk server is unlikely until you get
to places that have a highly trained staff. Yes, adding 4 250GB drives
is more expensive than adding 2 750GB ones, but it is nominal, compared
to the extra effort of configuration and maintenance.  At the Medium
Business level, less stress on the Admin staff is usually the driving
factor after raw cost, since Admin staff tend to be extremely
overworked.



-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Expanding raidz2

2006-07-13 Thread grant beattie
On Thu, Jul 13, 2006 at 11:42:21AM -0700, Richard Elling wrote:

 Yes, and while it's not an immediate showstopper for me, I'll want to
 know that expansion is coming imminently before I adopt RAID-Z.
 
 [in brainstorming mode, sans coffee so far this morning]
 
 Better yet, buy two disks, say 500 GByte.  Need more space, replace
 them with 750 GByte, because by then the price of the 750 GByte disks
 will be as low as the 250 GByte disks today, and the 1.5 TByte disks
 will be $400.  Over time, the cost of disks remains the same, but the
 density increases.  This will continue to occur faster than the
 development and qualification of complex software.  ZFS will already 
 expand a mirror as you replace disks :-)  KISS

indeed, but this is not the same as expansion of an existing vdev
because you still have the same number of spindles, with potentially
more data on each, so it may in fact be a net performance loss.

I don't think the only driver for wanting to expand a raidz vdev is
to gain more space...

grant.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss