Re: [zfs-discuss] one more time: pool size changes

2010-06-16 Thread Mertol Özyöney
In addition to all comments below, 7000 series which are competing with
NetApp boxes have the ability to add more storage to the pool in a couple
seconds, online and does load balancing automaticaly. Also we dont have the
16 TB limit NetApp has. Nearly all customers did tihs without any PS
involvement. 



Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email mertol.ozyo...@sun.com



-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Richard Elling
Sent: Thursday, June 03, 2010 3:51 AM
To: Roman Naumenko
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] one more time: pool size changes

On Jun 2, 2010, at 3:54 PM, Roman Naumenko wrote:
 Recently I talked to a co-worker who manages NetApp storages. We discussed
size changes for pools in zfs and aggregates in NetApp.
 
 And some time before I had suggested to a my buddy zfs for his new home
storage server, but he turned it down since there is no expansion available
for a pool. 

Heck, let him buy a NetApp :-)

 And he really wants to be able to add a drive or couple to an existing
pool. Yes, there are ways to expand storage to some extent without
rebuilding it. Like replacing disk with larger ones. Not enough for a
typical home user I would say. 

Why not? I do this quite often. Growing is easy, shrinking is more
challenging.

 And this is might be an important for corporate too. Frankly speaking I
doubt there are many administrators use it in DC environment. 
 
 Nevertheless, NetApp appears to have such feature as I learned from my
co-worker. It works with some restrictions (you have to zero disks before
adding, and rebalance the aggregate after and still without perfect
distribution) - but Ontap is able to do aggregates expansion nevertheless. 
 
 So, my question is: what does prevent to introduce the same for zfs at
present time? Is this because of the design of zfs, or there is simply no
demand for it in community?

Its been there since 2005: zpool subcommand add.
 -- richard

 
 My understanding is that at present time there are no plans to introduce
it.
 
 --Regards,
 Roman Naumenko
 ro...@naumenko.com

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-04 Thread Marty Scholes
On Jun 3, 2010 7:35 PM, David Magda wrote:

 On Jun 3, 2010, at 13:36, Garrett D'Amore wrote:
 
  Perhaps you have been unlucky.  Certainly, there is
 a window with N 
  +1 redundancy where a single failure leaves the
 system exposed in  
  the face of a 2nd fault.  This is a statistics
 game...
 
 It doesn't even have to be a drive failure, but an
 unrecoverable read  
 error.

Well said.

Also include a controller burp, a bit flip somewhere, a drive going offline 
briefly, fibre cable momentary interruption, etc.  The list goes on.

My experience is that these weirdo once in a lifetime issues tend to present 
in clumps which are not as evenly distributred as statistics would lead you to 
believe.  Rather, like my kids, they save up their fun into coordinated bursts.

When these bursts happen, the ensuing conversations with stakeholders about how 
all of this redundancy you tricked them into purchasing has left them 
exposed.  Not good times.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Juergen Nickelsen
Richard Elling rich...@nexenta.com writes:

 And some time before I had suggested to a my buddy zfs for his new
 home storage server, but he turned it down since there is no
 expansion available for a pool.

 Heck, let him buy a NetApp :-)

Definitely a possibility, given the availability and pricing of
oldish NetApp hardware on eBay. Although for home use, it is easier
to put together something adequately power-saving and silent with
OpenSolaris and PC hardware than with NetApp gear.

-- 
I wasn't so desperate yet that I actually looked into documentation.
 -- Juergen Nickelsen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Roman Naumenko

Richard Elling said the following, on 06/02/2010 08:50 PM:

On Jun 2, 2010, at 3:54 PM, Roman Naumenko wrote:


Recently I talked to a co-worker who manages NetApp storages. We discussed size 
changes for pools in zfs and aggregates in NetApp.
And some time before I had suggested to a my buddy zfs for his new home storage 
server, but he turned it down since there is no expansion available for a pool.


Heck, let him buy a NetApp :-)


No chances, he likes to do build everything by himself.


And he really wants to be able to add a drive or couple to an existing pool. 
Yes, there are ways to expand storage to some extent without rebuilding it. 
Like replacing disk with larger ones. Not enough for a typical home user I 
would say.


Why not? I do this quite often. Growing is easy, shrinking is more challenging.


And this is might be an important for corporate too. Frankly speaking I doubt 
there are many administrators use it in DC environment.

Nevertheless, NetApp appears to have such feature as I learned from my 
co-worker. It works with some restrictions (you have to zero disks before 
adding, and rebalance the aggregate after and still without perfect 
distribution) - but Ontap is able to do aggregates expansion nevertheless.

So, my question is: what does prevent to introduce the same for zfs at present 
time? Is this because of the design of zfs, or there is simply no demand for it 
in community?


Its been there since 2005: zpool subcommand add.
  -- richard


Well, I explained it not very clearly. I meant the size of a raidz array 
can't be changed.
For sure zpool add can do the job with a pool. Not with a raidz 
configuration.


Roman Naumenko
ro...@naumenko.ca

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Roman Naumenko

Brandon High said the following, on 06/02/2010 11:47 PM:

On Wed, Jun 2, 2010 at 3:54 PM, Roman Naumenkoro...@naumenko.ca  wrote:
   

And some time before I had suggested to a my buddy zfs for his new home storage 
server, but he turned it down since there is no expansion available for a pool.
 

There's no expansion for aggregates in OnTap, either. You can add more
disks (as a raid-dp or mirror set) to an existing aggr, but you can
also add more vdevs (as raidz or mirrors) to a zpool two.
   


I think there is a difference. Just quickly checked netapp site:

Adding new disks to a RAID group If a volume has more than one RAID 
group, you can specify the RAID group to which you are adding disks.


To add new disks to a specific RAID group of a volume, complete the 
following step.


Example
The following command adds two disks to RAID group 0 of the vol0 volume:
vol add vol0 -g rg0 2

You can obviously add disks just to a raid group as well.

And he really wants to be able to add a drive or couple to an existing pool. 
Yes, there are ways to expand storage to some extent without rebuilding it. 
Like replacing disk with larger ones. Not enough for a typical home user I 
would say.
 

You can do this. 'zpool add'
   

Nevertheless, NetApp appears to have such feature as I learned from my 
co-worker. It works with some restrictions (you have to zero disks before 
adding, and rebalance the aggregate after and still without perfect 
distribution) - but Ontap is able to do aggregates expansion nevertheless.
 

Yeah, you can add to a aggr, but you can't add to a raid-dp set. It's
the same as ZFS.

ZFS doesn't require that you zero disks, and there is no rebalancing.
As more data is written to the pool it will become more balanced
however.

So, my question is: what does prevent to introduce the same for zfs at present 
time? Is this because of the design of zfs, or there is simply no demand for it 
in community?

My understanding is that at present time there are no plans to introduce it.
 

Rebalancing depends on bp_rewrite, which is vaporware still. There has
been discussion of it for a while but no implementation that I know
of.

Once the feature is added, it will be possible to add or remove
devices from a zpool or vdev, something that OnTap can't do.

   

But are there any plans to implement it?

--Roman
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Roman Naumenko

Erik Trimble said the following, on 06/02/2010 07:16 PM:

Roman Naumenko wrote:

Recently I talked to a co-worker who manages NetApp storages. We
discussed size changes for pools in zfs and aggregates in NetApp.

And some time before I had suggested to a my buddy zfs for his new
home storage server, but he turned it down since there is no
expansion available for a pool.
And he really wants to be able to add a drive or couple to an
existing pool. Yes, there are ways to expand storage to some extent
without rebuilding it. Like replacing disk with larger ones. Not
enough for a typical home user I would say.
And this is might be an important for corporate too. Frankly speaking
I doubt there are many administrators use it in DC environment.
Nevertheless, NetApp appears to have such feature as I learned from
my co-worker. It works with some restrictions (you have to zero disks
before adding, and rebalance the aggregate after and still without
perfect distribution) - but Ontap is able to do aggregates expansion
nevertheless.
So, my question is: what does prevent to introduce the same for zfs
at present time? Is this because of the design of zfs, or there is
simply no demand for it in community?

My understanding is that at present time there are no plans to
introduce it.

--Regards,
Roman Naumenko
ro...@naumenko.com


Expanding a RAIDZ (which, really, is the only thing that can't do
right now, w/r/t adding disks) requires the Block Pointer (BP) Rewrite
functionality before it can get implemented.

We've been promised BP rewrite for awhile, but I have no visibility as
to where development on it is in the schedule.


I though it's about hard defined udev configuration when a raidz is created.
But anyway, it's just not there...

--Roman
ro...@naumenko.ca
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Wed, June 2, 2010 17:54, Roman Naumenko wrote:
 Recently I talked to a co-worker who manages NetApp storages. We discussed
 size changes for pools in zfs and aggregates in NetApp.

 And some time before I had suggested to a my buddy zfs for his new home
 storage server, but he turned it down since there is no expansion
 available for a pool.

I set up my home fileserver with ZFS (in 2006) BECAUSE zfs could expand
the pool for me, and nothing else I had access to could do that (home
fileserver, little budget).

My server is currently running with one data pool, three vdevs.  Each of
the data vdev is a two-way mirror.  I started with one, expanded to two,
then expanded to three.  Rather than expanding to four when this fills up,
I'm going to attach a larger drive to the first mirror vdev, and then a
second one, and then remove the two current drives, thus expanding the
vdev without ever compromising the redundancy.

My choice of mirrors rather than RAIDZ is based on the fact that I have
only 8 hot-swap bays (I still think of this as LARGE for a home server;
the competition, things like the Drobo, tends to have 4 or 5), that I
don't need really large amounts of storage (after my latest upgrade I'm
running with 1.2TB of available data space), and that I expected to need
to expand storage over the life of the system.  With mirror vdevs, I can
expand them without compromising redundancy even temporarily, by attaching
the new drives before I detach the old drives; I couldn't do that with
RAIDZ.  Also, the fact that disk is now so cheap means that 100%
redundancy is affordable, I don't have to compromise on RAIDZ.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Richard Bruce
 Expanding a RAIDZ (which, really, is the only thing
 that can't do right 
 now, w/r/t adding disks) requires the Block Pointer
 (BP) Rewrite 
 functionality before it can get implemented.
 
 We've been promised BP rewrite for awhile, but I have
 no visibility as 
 to where development on it is in the schedule.
 
 Fortunately, several other things also depend on BP
 rewrite (e.g.  
 shrinking a pool (removing vdevs), efficient
 defragmentation/compaction, 
 etc.).
 
 So, while resizing a raidZ device isn't really high
 on the list of 
 things to do, the fundamental building block which
 would allow for it to 
 occur is very much important for Oracle. And, once BP
 rewrite is 
 available, I suspect that there might be a raidZ
 resize contribution 
 from one of the non-Oracle folks.  Or, maybe even
 someone like me (whose 
 not a ZFS developer inside Oracle, but I play one on
 TV...)
 
 
 Dev guys - where are we on BP rewrite?

I was thinking about asking the same thing recently as I would really like to 
see BP rewrite implemented.  This seems to pop up here every several months.  
There were rumblings last fall that the BP rewrite stuff would potentially be 
finished by now.  The bug (CR 4852783) has been around for 7 years though.  

The functionality it would enable would be quite attractive for many more 
things than just expanding a raidz vdev, although that is my primary interest 
in it.  The basics of expanding a raidz vdev once BP rewrite is done have 
already been outlined by Adam Leventhal.  See the URL below.

http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z

Richard Bruce
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Garrett D'Amore
Using a stripe of mirrors (RAID0) you can get the benefits of multiple
spindle performance, easy expansion support (just add new mirrors to the
end of the raid0 stripe), and 100% data redundancy.   If you can afford
to pay double for your storage (the cost of mirroring), this is IMO the
best solution.

Note that this solution is not quite as resilient against hardware
failure as raidz2 or raidz3.  While the RAID1+0 solution can tolerate
multiple drive failures, if both both drives in a mirror fail, you lose
data.

If you're clever, you'll also try to make sure each side of the mirror
is on a different controller, and if you have enough controllers
available, you'll also try to balance the controllers across stripes.

One way to help with that is to leave a drive or two available as a hot
spare.

Btw, the above recommendation mirrors what Jeff Bonwick himself (the
creator of ZFS) has advised on his blog.

-- Garrett

On Thu, 2010-06-03 at 09:06 -0500, David Dyer-Bennet wrote:
 On Wed, June 2, 2010 17:54, Roman Naumenko wrote:
  Recently I talked to a co-worker who manages NetApp storages. We discussed
  size changes for pools in zfs and aggregates in NetApp.
 
  And some time before I had suggested to a my buddy zfs for his new home
  storage server, but he turned it down since there is no expansion
  available for a pool.
 
 I set up my home fileserver with ZFS (in 2006) BECAUSE zfs could expand
 the pool for me, and nothing else I had access to could do that (home
 fileserver, little budget).
 
 My server is currently running with one data pool, three vdevs.  Each of
 the data vdev is a two-way mirror.  I started with one, expanded to two,
 then expanded to three.  Rather than expanding to four when this fills up,
 I'm going to attach a larger drive to the first mirror vdev, and then a
 second one, and then remove the two current drives, thus expanding the
 vdev without ever compromising the redundancy.
 
 My choice of mirrors rather than RAIDZ is based on the fact that I have
 only 8 hot-swap bays (I still think of this as LARGE for a home server;
 the competition, things like the Drobo, tends to have 4 or 5), that I
 don't need really large amounts of storage (after my latest upgrade I'm
 running with 1.2TB of available data space), and that I expected to need
 to expand storage over the life of the system.  With mirror vdevs, I can
 expand them without compromising redundancy even temporarily, by attaching
 the new drives before I detach the old drives; I couldn't do that with
 RAIDZ.  Also, the fact that disk is now so cheap means that 100%
 redundancy is affordable, I don't have to compromise on RAIDZ.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 10:15, Garrett D'Amore wrote:
 Using a stripe of mirrors (RAID0) you can get the benefits of multiple
 spindle performance, easy expansion support (just add new mirrors to the
 end of the raid0 stripe), and 100% data redundancy.   If you can afford
 to pay double for your storage (the cost of mirroring), this is IMO the
 best solution.

Referencing RAID0 here in the context of ZFS is confusing, though.  Are
you suggesting using underlying RAID hardware to create virtual volumes to
then present to ZFS, or what?

 Note that this solution is not quite as resilient against hardware
 failure as raidz2 or raidz3.  While the RAID1+0 solution can tolerate
 multiple drive failures, if both both drives in a mirror fail, you lose
 data.

In a RAIDZ solution, two or more drive failures lose your data.  In a
mirrored solution, losing the WRONG two drives will still lose your data,
but you have some chance of surviving losing a random two drives.  So I
would describe the mirror solution as more resilient.

So going to RAIDZ2 or even RAIDZ3 would be better, I agree.

In an 8-bay chassis, there are other concerns, too.  Do I keep space open
for a hot spare?  There's no real point in a hot spare if you have only
one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2
plus a hot spare.  And putting everything into one vdev means that for any
upgrade I have to replace all 8 drives at once, a financial problem for a
home server.

 If you're clever, you'll also try to make sure each side of the mirror
 is on a different controller, and if you have enough controllers
 available, you'll also try to balance the controllers across stripes.

I did manage to split the mirrors accross controllers (I have 6 SATA on
the motherboard and I added an 8-port SAS card with SAS-SATA cabling).

 One way to help with that is to leave a drive or two available as a hot
 spare.

 Btw, the above recommendation mirrors what Jeff Bonwick himself (the
 creator of ZFS) has advised on his blog.

I believe that article directly influenced my choice, in fact.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Freddie Cash
On Wed, Jun 2, 2010 at 8:10 PM, Roman Naumenko ro...@naumenko.ca wrote:

 Well, I explained it not very clearly. I meant the size of a raidz array
 can't be changed.
 For sure zpool add can do the job with a pool. Not with a raidz
 configuration.


You can't increase the number of drives in a raidz vdev, no.  Going from a
4-drive raidz1 to a 5-drive raidz1 is currently impossible.  And going from
a raidz1 to a raidz2 vdev is currently impossible.  On the flip side, it's
rare to find a hardware RAID controller that allows this.

But you can increase the storage space available in a  raidz vdev, by
replacing each drive in the raidz vdev with a larger drive.  We just did
this, going from 8x 500 GB drives in a raidz2 vdev, to 8x 1.5 TB drives in a
raidz2 vdev.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Garrett D'Amore
On Thu, 2010-06-03 at 10:35 -0500, David Dyer-Bennet wrote:
 On Thu, June 3, 2010 10:15, Garrett D'Amore wrote:
  Using a stripe of mirrors (RAID0) you can get the benefits of multiple
  spindle performance, easy expansion support (just add new mirrors to the
  end of the raid0 stripe), and 100% data redundancy.   If you can afford
  to pay double for your storage (the cost of mirroring), this is IMO the
  best solution.
 
 Referencing RAID0 here in the context of ZFS is confusing, though.  Are
 you suggesting using underlying RAID hardware to create virtual volumes to
 then present to ZFS, or what?

RAID0 is basically the default configuration of a ZFS pool -- its a
concatenation of the underlying vdevs.  In this case the vdevs should
themselves be two-drive mirrors. 

This of course has to be done in the ZFS layer, and ZFS doesn't call it
RAID0, any more than it calls a mirror RAID1, but effectively that's
what they are.

 
  Note that this solution is not quite as resilient against hardware
  failure as raidz2 or raidz3.  While the RAID1+0 solution can tolerate
  multiple drive failures, if both both drives in a mirror fail, you lose
  data.
 
 In a RAIDZ solution, two or more drive failures lose your data.  In a
 mirrored solution, losing the WRONG two drives will still lose your data,
 but you have some chance of surviving losing a random two drives.  So I
 would describe the mirror solution as more resilient.
 
 So going to RAIDZ2 or even RAIDZ3 would be better, I agree.

From a data resiliency point, yes, raidz2 or raidz3 offers better
protection.  At a significant performance cost.

Given enough drives, one could probably imagine using raidz3 underlying
vdevs, with RAID0 striping to spread I/O across multiple spindles.  I'm
not sure how well this would perform, but I suspect it would perform
better than straight raidz2/raidz3, but at a significant expense (you'd
need a lot of drives).

 
 In an 8-bay chassis, there are other concerns, too.  Do I keep space open
 for a hot spare?  There's no real point in a hot spare if you have only
 one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2
 plus a hot spare.  And putting everything into one vdev means that for any
 upgrade I have to replace all 8 drives at once, a financial problem for a
 home server.

This is one of the reasons I don't advocate using raidz (any version)
for home use, unless you can't afford the cost in space represented by
mirroring and a hot spare or two.  (The other reason ... for my use at
least... is the performance cost.  I want to use my array to host
compilation workspaces, and for that I would prefer to get the most
performance out of my solution.  I suppose I could add some SSDs... but
I still think multiple spindles are a good option when you can do it.)

In an 8 drive chassis, without any SSDs involved,I'd configure 6 of the
drives as a 3 vdev stripe consisting of mirrors of 2 drives, and I'd
leave the remaining two bays as hot spares.  Btw, using the hot spares
in this way potentially means you can use those bays later to upgrade to
larger drives in the future, without offlining anything and without
taking too much of a performance penalty when you do so.

 
  If you're clever, you'll also try to make sure each side of the mirror
  is on a different controller, and if you have enough controllers
  available, you'll also try to balance the controllers across stripes.
 
 I did manage to split the mirrors accross controllers (I have 6 SATA on
 the motherboard and I added an 8-port SAS card with SAS-SATA cabling).
 
  One way to help with that is to leave a drive or two available as a hot
  spare.
 
  Btw, the above recommendation mirrors what Jeff Bonwick himself (the
  creator of ZFS) has advised on his blog.
 
 I believe that article directly influenced my choice, in fact.

Okay, good. :-)

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Marty Scholes
David Dyer-Bennet wrote:
 My choice of mirrors rather than RAIDZ is based on
 the fact that I have
 only 8 hot-swap bays (I still think of this as LARGE
 for a home server;
 the competition, things like the Drobo, tends to have
 4 or 5), that I
 don't need really large amounts of storage (after my
 latest upgrade I'm
 running with 1.2TB of available data space), and that
 I expected to need
 to expand storage over the life of the system.  With
 mirror vdevs, I can
 expand them without compromising redundancy even
 temporarily, by attaching
 the new drives before I detach the old drives; I
 couldn't do that with
 RAIDZ.  Also, the fact that disk is now so cheap
 means that 100%
 redundancy is affordable, I don't have to compromise
 on RAIDZ.

Maybe I have been unlucky too many times doing storage admin in the 90s, but 
simple mirroring still scares me.  Even with a hot spare (you do have one, 
right?) the rebuild window leaves the entire pool exposed to a single failure.

One of the nice things about zfs is that allows, to each his own.  My home 
server's main pool is 22x 73GB disks in a Sun A5000 configured as RAIDZ3.  Even 
without a hot spare, it takes several failures to get the pool into trouble.

At the same time, there are several downsides to a wide stripe like that, 
including relatively poor iops and longer rebuild windows.  As noted above, 
until bp_rewrite arrives, I cannot change the geometry of a vdev, which kind of 
limits the flexibility.

As a side rant, I still find myself baffled that Oracle/Sun correctly touts the 
benefits of zfs in the enterprise, including tremendous flexibility and 
simplicity of filesystem provisioning and nondisruptive changes to filesystems 
via properties.

These forums are filled with people stating that the enterprise demands simple, 
flexibile and nondisruptive filesystem changes, but no enterprise cares about 
simple, flexibile and nondisruptive pool/vdev changes, e.g. changing a vdev 
geometry or evacuating a vdev.  I can't accept that zfs flexibility is critical 
and zpool flexibility is unwanted.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Dennis Clarke

 If you're clever, you'll also try to make sure each side of the mirror
 is on a different controller, and if you have enough controllers
 available, you'll also try to balance the controllers across stripes.

Something like this ?

# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
fibre0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t16d0  ONLINE   0 0 0
c5t0d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t1d0   ONLINE   0 0 0
c2t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t2d0   ONLINE   0 0 0
c2t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t20d0  ONLINE   0 0 0
c5t4d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t21d0  ONLINE   0 0 0
c5t6d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t19d0  ONLINE   0 0 0
c5t5d0   ONLINE   0 0 0
spares
  c2t22d0AVAIL

errors: No known data errors

However, unlike the bad old days of SVM ( DiskSuite or Solstice Disksuite
or Online Disk Suite etc ) I have no idea what algorithm is used to pick
the hot spare in the event of a failure. I mean, if I had more than one
hotspare there of course. Also, I think the weird order of controllers is
a user mistake on my part. Some of them have c5 listed first and others
have c2 listed first. I don't know if that matters at all however.

I can add mirrors on the fly but I can not ( yet ) remove them. I would
imagine that the algorithm to remove data from vdevs would be fairly
gnarly.

The item that I find somewhat confusing is how to apply multi-path fibre
devices to a stripe of mirrors. Consider these :

# mpathadm list lu
/dev/rdsk/c4t2004CF9B63D0d0s2
Total Path Count: 2
Operational Path Count: 2
/dev/rdsk/c4t2004CFA4D655d0s2
Total Path Count: 2
Operational Path Count: 2
/dev/rdsk/c4t2004CFA4D2D9d0s2
Total Path Count: 2
Operational Path Count: 2
/dev/rdsk/c4t2004CFBFD4BDd0s2
Total Path Count: 2
Operational Path Count: 2
/dev/rdsk/c4t2004CFA4D3A1d0s2
Total Path Count: 2
Operational Path Count: 2
/dev/rdsk/c4t2004CFA4D2C7d0s2
Total Path Count: 2
Operational Path Count: 2
/scsi_vhci/s...@g5080021ad5d8
Total Path Count: 2
Operational Path Count: 2


Here we have each disk device sitting on two fibre loops :

# mpathadm show lu /dev/rdsk/c4t2004CF9B63D0d0s2
Logical Unit:  /dev/rdsk/c4t2004CF9B63D0d0s2
mpath-support:  libmpscsi_vhci.so
Vendor:  SEAGATE
Product:  ST373405FSUN72G
Revision:  0438
Name Type:  unknown type
Name:  2004cf9b63d0
Asymmetric:  no
Current Load Balance:  round-robin
Logical Unit Group ID:  NA
Auto Failback:  on
Auto Probing:  NA

Paths:
Initiator Port Name:  2103ba2cabc6
Target Port Name:  2104cf9b63d0
Override Path:  NA
Path State:  OK
Disabled:  no

Initiator Port Name:  210100e08b24f056
Target Port Name:  2204cf9b63d0
Override Path:  NA
Path State:  OK
Disabled:  no

Target Ports:
Name:  2104cf9b63d0
Relative ID:  0

Name:  2204cf9b63d0
Relative ID:  0

This is not disk redundency but rather fibre path redundency. When I drop
these guys into a ZPool it looks like this :

NAME STATE READ WRITE CKSUM
fp0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c4t2004CFBFD4BDd0s0  ONLINE   0 0 0
c4t2004CFA4D3A1d0s0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c4t2004CFA4D2D9d0s0  ONLINE   0 0 0
c4t2004CFA4D2C7d0s0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
 

Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 10:50, Marty Scholes wrote:
 David Dyer-Bennet wrote:
 My choice of mirrors rather than RAIDZ is based on
 the fact that I have
 only 8 hot-swap bays (I still think of this as LARGE
 for a home server;
 the competition, things like the Drobo, tends to have
 4 or 5), that I
 don't need really large amounts of storage (after my
 latest upgrade I'm
 running with 1.2TB of available data space), and that
 I expected to need
 to expand storage over the life of the system.  With
 mirror vdevs, I can
 expand them without compromising redundancy even
 temporarily, by attaching
 the new drives before I detach the old drives; I
 couldn't do that with
 RAIDZ.  Also, the fact that disk is now so cheap
 means that 100%
 redundancy is affordable, I don't have to compromise
 on RAIDZ.

 Maybe I have been unlucky too many times doing storage admin in the 90s,
 but simple mirroring still scares me.  Even with a hot spare (you do have
 one, right?) the rebuild window leaves the entire pool exposed to a single
 failure.

No hot spare currently.  And now running on 4-year-old disks, too.

For me, mirroring is a big step UP from bare single drives.  That's my
default state.

Of course, I'm a big fan of multiple levels of backup.

 One of the nice things about zfs is that allows, to each his own.  My
 home server's main pool is 22x 73GB disks in a Sun A5000 configured as
 RAIDZ3.  Even without a hot spare, it takes several failures to get the
 pool into trouble.

Yes, it's very flexible, and while there are no doubt useless degenerate
cases here and there, lots of the cases are useful for some environment or
other.

That does seem like rather an extreme configuration.

 At the same time, there are several downsides to a wide stripe like that,
 including relatively poor iops and longer rebuild windows.  As noted
 above, until bp_rewrite arrives, I cannot change the geometry of a vdev,
 which kind of limits the flexibility.

There are a LOT of reasons to want bp_rewrite, certainly.

 As a side rant, I still find myself baffled that Oracle/Sun correctly
 touts the benefits of zfs in the enterprise, including tremendous
 flexibility and simplicity of filesystem provisioning and nondisruptive
 changes to filesystems via properties.

 These forums are filled with people stating that the enterprise demands
 simple, flexibile and nondisruptive filesystem changes, but no enterprise
 cares about simple, flexibile and nondisruptive pool/vdev changes, e.g.
 changing a vdev geometry or evacuating a vdev.  I can't accept that zfs
 flexibility is critical and zpool flexibility is unwanted.

We could certainly use that level of pool-equivalent flexibility at work;
we don't currently have it (not ZFS, not high-end enterprise storage
units).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 10:50, Garrett D'Amore wrote:
 On Thu, 2010-06-03 at 10:35 -0500, David Dyer-Bennet wrote:
 On Thu, June 3, 2010 10:15, Garrett D'Amore wrote:
  Using a stripe of mirrors (RAID0) you can get the benefits of multiple
  spindle performance, easy expansion support (just add new mirrors to
 the
  end of the raid0 stripe), and 100% data redundancy.   If you can
 afford
  to pay double for your storage (the cost of mirroring), this is IMO
 the
  best solution.

 Referencing RAID0 here in the context of ZFS is confusing, though.
 Are
 you suggesting using underlying RAID hardware to create virtual volumes
 to
 then present to ZFS, or what?

 RAID0 is basically the default configuration of a ZFS pool -- its a
 concatenation of the underlying vdevs.  In this case the vdevs should
 themselves be two-drive mirrors.

 This of course has to be done in the ZFS layer, and ZFS doesn't call it
 RAID0, any more than it calls a mirror RAID1, but effectively that's
 what they are.

Kinda mostly, anyway.  I thought we recently had this discussion, and
people were pointing out things like the striping wasn't physically the
same on each drive and such.

  Note that this solution is not quite as resilient against hardware
  failure as raidz2 or raidz3.  While the RAID1+0 solution can tolerate
  multiple drive failures, if both both drives in a mirror fail, you
 lose
  data.

 In a RAIDZ solution, two or more drive failures lose your data.  In a
 mirrored solution, losing the WRONG two drives will still lose your
 data,
 but you have some chance of surviving losing a random two drives.  So I
 would describe the mirror solution as more resilient.

 So going to RAIDZ2 or even RAIDZ3 would be better, I agree.

From a data resiliency point, yes, raidz2 or raidz3 offers better
 protection.  At a significant performance cost.

The place I care about performance is almost entirely sequential
read/write -- loading programs, and loading and saving large image files. 
I don't know a lot of home users that actually need high IOPS.

 Given enough drives, one could probably imagine using raidz3 underlying
 vdevs, with RAID0 striping to spread I/O across multiple spindles.  I'm
 not sure how well this would perform, but I suspect it would perform
 better than straight raidz2/raidz3, but at a significant expense (you'd
 need a lot of drives).

Might well work that way; it does sound about right.

 In an 8-bay chassis, there are other concerns, too.  Do I keep space
 open
 for a hot spare?  There's no real point in a hot spare if you have only
 one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2
 plus a hot spare.  And putting everything into one vdev means that for
 any
 upgrade I have to replace all 8 drives at once, a financial problem for
 a
 home server.

 This is one of the reasons I don't advocate using raidz (any version)
 for home use, unless you can't afford the cost in space represented by
 mirroring and a hot spare or two.  (The other reason ... for my use at
 least... is the performance cost.  I want to use my array to host
 compilation workspaces, and for that I would prefer to get the most
 performance out of my solution.  I suppose I could add some SSDs... but
 I still think multiple spindles are a good option when you can do it.)

 In an 8 drive chassis, without any SSDs involved,I'd configure 6 of the
 drives as a 3 vdev stripe consisting of mirrors of 2 drives, and I'd
 leave the remaining two bays as hot spares.  Btw, using the hot spares
 in this way potentially means you can use those bays later to upgrade to
 larger drives in the future, without offlining anything and without
 taking too much of a performance penalty when you do so.

And the three 2-way mirrors is exactly where I am right now.  I don't have
hot spares in place, but I have the bays reserved for that use.

In the latest upgrade, I added 4 2.5 hot-swap bays (which got the system
disks out of the 3.5 hot-swap bays).  I have two free, and that's the
form-factor SSDs come in these days, so if I thought it would help I could
add an SSD there.  Have to do quite a bit of research to see which uses
would actually benefit me, and how much.  It's not obvious that either
l2arc or zil on SSD would help my program loading, image file loading, or
image file saving cases that much.  There may be more other stuff than I
really think of though.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Richard Elling
On Jun 3, 2010, at 8:36 AM, Freddie Cash wrote:

 On Wed, Jun 2, 2010 at 8:10 PM, Roman Naumenko ro...@naumenko.ca wrote:
 Well, I explained it not very clearly. I meant the size of a raidz array 
 can't be changed.
 For sure zpool add can do the job with a pool. Not with a raidz configuration.
 
 You can't increase the number of drives in a raidz vdev, no.  Going from a 
 4-drive raidz1 to a 5-drive raidz1 is currently impossible.  And going from a 
 raidz1 to a raidz2 vdev is currently impossible.  On the flip side, it's rare 
 to find a hardware RAID controller that allows this.

AFAIK, and someone please correct me, the only DIY/FOSS RAID 
implementation that allows incremental growing of RAID-5 is LVM.
Of course, that means you're stuck with the RAID-5 write hole.  TANSTAAFL.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Bob Friesenhahn

On Thu, 3 Jun 2010, David Dyer-Bennet wrote:


In an 8-bay chassis, there are other concerns, too.  Do I keep space open
for a hot spare?  There's no real point in a hot spare if you have only
one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2
plus a hot spare.  And putting everything into one vdev means that for any
upgrade I have to replace all 8 drives at once, a financial problem for a
home server.


It is not so clear to me that an 8-drive raidz3 is clearly better than 
7-drive raidz2 plus a hot spare.  From a maintenance standpoint, I 
think that it is useful to have a spare drive or even an empty spare 
slot so that it is easy to replace a drive without needing to 
physically remove it from the system.  A true hot spare allows 
replacement to start automatically right away if a failure is 
detected.


With only 8-drives, the reliability improvement from raidz3 is 
unlikely to be borne out in practice.  Other potential failures modes 
will completely drown out the on-paper reliability improvement 
provided by raidz3.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Garrett D'Amore
On Thu, 2010-06-03 at 12:03 -0500, Bob Friesenhahn wrote:
 On Thu, 3 Jun 2010, David Dyer-Bennet wrote:
 
  In an 8-bay chassis, there are other concerns, too.  Do I keep space open
  for a hot spare?  There's no real point in a hot spare if you have only
  one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2
  plus a hot spare.  And putting everything into one vdev means that for any
  upgrade I have to replace all 8 drives at once, a financial problem for a
  home server.
 
 It is not so clear to me that an 8-drive raidz3 is clearly better than 
 7-drive raidz2 plus a hot spare.  From a maintenance standpoint, I 
 think that it is useful to have a spare drive or even an empty spare 
 slot so that it is easy to replace a drive without needing to 
 physically remove it from the system.  A true hot spare allows 
 replacement to start automatically right away if a failure is 
 detected.
 
 With only 8-drives, the reliability improvement from raidz3 is 
 unlikely to be borne out in practice.  Other potential failures modes 
 will completely drown out the on-paper reliability improvement 
 provided by raidz3.

I tend to concur.  I think that raidz3 is primarily useful in situations
with either an extremely large number of drives (very large arrays), or
in situations calling for extremely high fault tolerance (think
loss-of-life kinds of applications, or wall-street trading house
applications where downtime is measured in millions of dollars per
minute.)

And in those situations where raidz3 is called for, I think you still
want some pool of hot spares.  (I'm thinking of the kinds of deployments
where the failure rate of drives approaches the ability of the site to
replace them quickly enough -- think very very large data centers with
hundreds or even thousands of drives.)

raidz3 is not, I think, for the typical home user, or even the typical
workgroup server application.  I think I'd prefer raidz with hot
spare(s) over raidz2, even, for a typical situation.  But I view raidz
in all its forms as a kind of compromise between redundancy,
performance, and capacity -- sort of a jack of all trades and master of
none.  With $/Gb as low as they are today, I would be hard pressed to
recommend any of the raidz configurations except in applications calling
for huge amounts of data with no real performance requirements (nearline
backup kinds of applications) and no requirements for expandability.
(Situations where expansion is resolved by purchasing new arrays, rather
than growing storage within an array.)

-- Garrett
 
 Bob


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Garrett D'Amore
On Thu, 2010-06-03 at 08:50 -0700, Marty Scholes wrote:

 Maybe I have been unlucky too many times doing storage admin in the 90s, but 
 simple mirroring still scares me.  Even with a hot spare (you do have one, 
 right?) the rebuild window leaves the entire pool exposed to a single failure.
 
 One of the nice things about zfs is that allows, to each his own.  My home 
 server's main pool is 22x 73GB disks in a Sun A5000 configured as RAIDZ3.  
 Even without a hot spare, it takes several failures to get the pool into 
 trouble.

Perhaps you have been unlucky.  Certainly, there is a window with N+1
redundancy where a single failure leaves the system exposed in the face
of a 2nd fault.  This is a statistics game...   Mirrors made up of
multiple drives are of course substantially more risky than mirrors made
of just drive pairs.  I would strongly discourage multiple drive mirrors
unless the devices underneath the mirror are somehow configured in a way
that provides additional tolerance.  Such as a mirror of raidz devices.
Although, such a configuration would be a poor choice, since you'd take
a big performance penalty.

Of course, you can have more than a two-way mirror, at substantial
increased cost.

So you balance your needs.

RAIDZ2 and RAIDZ3 give N+2 and N+3 fault tolerance, and represent a
compromise weighted to fault tolerance and capacity, at a significant
penalty to performance (and as noted, the ability to increase capacity).

There certainly are applications where this is appropriate.  I doubt
most home users fall into that category.

Given a relatively small number of spindles (the 8 that was quoted), I
prefer RAID 1+0 with hot spares.  If I can invest in 8 drives, with 1TB
drives I can balance I/O across 3 spindles, get 3TB of storage, have N
+1.x tolerance (N+1, plus the ability to take up to two more faults as
long as they do not occur in the same pair of mirrored drives), and I
can easily grow to larger drives (for example the forthcoming 3TB
drives) when need and cost make that move appropriate.

-- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Garrett D'Amore
On Thu, 2010-06-03 at 12:22 -0400, Dennis Clarke wrote:
  If you're clever, you'll also try to make sure each side of the mirror
  is on a different controller, and if you have enough controllers
  available, you'll also try to balance the controllers across stripes.
 
 Something like this ?
 
 # zpool status fibre0
   pool: fibre0
  state: ONLINE
 status: The pool is formatted using an older on-disk format.  The pool can
 still be used, but some features are unavailable.
 action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
 pool will no longer be accessible on older software versions.
  scrub: none requested
 config:
 
 NAME STATE READ WRITE CKSUM
 fibre0   ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c2t16d0  ONLINE   0 0 0
 c5t0d0   ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c5t1d0   ONLINE   0 0 0
 c2t17d0  ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c5t2d0   ONLINE   0 0 0
 c2t18d0  ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c2t20d0  ONLINE   0 0 0
 c5t4d0   ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c2t21d0  ONLINE   0 0 0
 c5t6d0   ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c2t19d0  ONLINE   0 0 0
 c5t5d0   ONLINE   0 0 0
 spares
   c2t22d0AVAIL
 
 errors: No known data errors

That looks like a good configuration to me!

 
 However, unlike the bad old days of SVM ( DiskSuite or Solstice Disksuite
 or Online Disk Suite etc ) I have no idea what algorithm is used to pick
 the hot spare in the event of a failure. I mean, if I had more than one
 hotspare there of course. Also, I think the weird order of controllers is
 a user mistake on my part. Some of them have c5 listed first and others
 have c2 listed first. I don't know if that matters at all however.

I don't think the order matters.  It certainly won't make a difference
for write, since you have to use both sides of the mirror.  It *could*
make a difference for reading... but I suspect that zfs will try to
sufficiently balance things out that any difference in ordering will be
lost in the noise.

The hotspare replacement shouldn't matter all that much... except when
you're using them to upgrade to bigger drives.  (Then just use a single
hot spare to force the selection.)  

 
 I can add mirrors on the fly but I can not ( yet ) remove them. I would
 imagine that the algorithm to remove data from vdevs would be fairly
 gnarly.

Indeed.  However, you shouldn't need to remove vdevs.  With redundancy,
you can resilver to a hot spare, and then remove the drive, but
ultimately, you can't condense the data onto fewer drives. You have to
accept when you configure your array that, modulo hot spares, your pool
will always consume the same number of spindles.

 
 The item that I find somewhat confusing is how to apply multi-path fibre
 devices to a stripe of mirrors. Consider these :
 


 
 This is not disk redundency but rather fibre path redundency. When I drop
 these guys into a ZPool it looks like this :
 
 NAME STATE READ WRITE CKSUM
 fp0  ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c4t2004CFBFD4BDd0s0  ONLINE   0 0 0
 c4t2004CFA4D3A1d0s0  ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c4t2004CFA4D2D9d0s0  ONLINE   0 0 0
 c4t2004CFA4D2C7d0s0  ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c4t2004CFA4D655d0s0  ONLINE   0 0 0
 c4t2004CF9B63D0d0s0  ONLINE   0 0 0

The above configuration looks good to me.

 
 So the manner in which any given IO transaction gets to the zfs filesystem
 just gets ever more complicated and convoluted and it makes me wonder if I
 am tossing away performance to get higher levels of safety.

If you're using multipathing, then you get path load balancing
automatically, and you can pretty much ignore the controller balancing
issue, as long as you use the mpxio (scsi_vhci) path.  mpxio should take
care of ensuring that I/O is balanced across ports for you; you just
need to make sure that you are balancing *spindles* properly.

-- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Garrett D'Amore
On Thu, 2010-06-03 at 11:49 -0500, David Dyer-Bennet wrote:
 hot spares in place, but I have the bays reserved for that use.
 
 In the latest upgrade, I added 4 2.5 hot-swap bays (which got the system
 disks out of the 3.5 hot-swap bays).  I have two free, and that's the
 form-factor SSDs come in these days, so if I thought it would help I could
 add an SSD there.  Have to do quite a bit of research to see which uses
 would actually benefit me, and how much.  It's not obvious that either
 l2arc or zil on SSD would help my program loading, image file loading, or
 image file saving cases that much.  There may be more other stuff than I
 really think of though.

It really depends on the working sets these programs deal with.

zil is useful primarily when doing lots of writes, especially lots of
writes to small files or to data scattered throughout a file.  I view it
as a great solution for database acceleration, and for accelerating the
filesystems I use for hosting compilation workspaces.  (In retrospect,
since by definition the results of compilation are reproducible, maybe I
should just turn off synchronous writes for build workspaces... provided
that they do not contain any modifications to the sources themselves.
I'm going to have to play with this.)

l2arc is useful for data that is read back frequently but is too large
to fit in buffer cache.  I can imagine that it would be useful for
hosting storage associated with lots of  programs that are called
frequently. You can think of it as a logical extension of the buffer
cache in this regard... if your working set doesn't fit in RAM, then
l2arc can prevent going back to rotating media.

All other things being equal, I'd increase RAM before I'd worry too much
about l2arc.  The exception to that would be if I knew I had working
sets that couldn't possibly fit in RAM... 160GB of SSD is a *lot*
cheaper than 160GB of RAM. :-)

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 13:04, Garrett D'Amore wrote:
 On Thu, 2010-06-03 at 11:49 -0500, David Dyer-Bennet wrote:
 hot spares in place, but I have the bays reserved for that use.

 In the latest upgrade, I added 4 2.5 hot-swap bays (which got the
 system
 disks out of the 3.5 hot-swap bays).  I have two free, and that's the
 form-factor SSDs come in these days, so if I thought it would help I
 could
 add an SSD there.  Have to do quite a bit of research to see which uses
 would actually benefit me, and how much.  It's not obvious that either
 l2arc or zil on SSD would help my program loading, image file loading,
 or
 image file saving cases that much.  There may be more other stuff than I
 really think of though.

 It really depends on the working sets these programs deal with.

 zil is useful primarily when doing lots of writes, especially lots of
 writes to small files or to data scattered throughout a file.  I view it
 as a great solution for database acceleration, and for accelerating the
 filesystems I use for hosting compilation workspaces.  (In retrospect,
 since by definition the results of compilation are reproducible, maybe I
 should just turn off synchronous writes for build workspaces... provided
 that they do not contain any modifications to the sources themselves.
 I'm going to have to play with this.)

I suspect there are more cases here than I immediately think of.  For
example, sitting here thinking, I wonder if the web cache would benefit a
lot?  And all those email files?

RAW files from my camera are 12-15MB, and the resulting Photoshop files
are around 50MB (depending on compression, and they get bigger fast if I
add layers).  Those aren't small, and I don't read the same thing over and
over lots.

For build spaces, definitely should be reproducible from source.  A
classic production build starts with checking out a tagged version from
source control, and builds from there.

 l2arc is useful for data that is read back frequently but is too large
 to fit in buffer cache.  I can imagine that it would be useful for
 hosting storage associated with lots of  programs that are called
 frequently. You can think of it as a logical extension of the buffer
 cache in this regard... if your working set doesn't fit in RAM, then
 l2arc can prevent going back to rotating media.

I don't think I'm going to benefit much from this.

 All other things being equal, I'd increase RAM before I'd worry too much
 about l2arc.  The exception to that would be if I knew I had working
 sets that couldn't possibly fit in RAM... 160GB of SSD is a *lot*
 cheaper than 160GB of RAM. :-)

I just did increase RAM, same upgrade as the 2.5 bays and the additional
controller and the third mirrored vdev.  I increased it all the way to
4GB!  And I can't increase it further feasibly (4GB sticks of ECC RAM
being hard to find and extremely pricey; plus I'd have to displace some of
my existing memory).

Since this is a 2006 system, in another couple of years it'll be time to
replace MB and processor and memory, and I'm sure it'll have a lot more
memory next time.

I'm desperately waiting for Solaris 2006.$Q2 (Q2 since it was pointed
out last time that Spring was wrong on half the Earth), since I hope it
will resolve my backup problems so I can get incremental backups happening
nightly (intention is to use zfs send/receive with incremental replication
streams, to keep external drives up-to-date with data and all snapshots). 
The oldness of the system and especially the drives makes this more
urgent, though of course it's important in general.  I do manage a full
backup that completes now and then, anyway, and they'll complete overnight
if they don't hang. Problem is, if they hang, have to reboot the Solaris
box and every Windows box using it.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 12:03, Bob Friesenhahn wrote:
 On Thu, 3 Jun 2010, David Dyer-Bennet wrote:

 In an 8-bay chassis, there are other concerns, too.  Do I keep space
 open
 for a hot spare?  There's no real point in a hot spare if you have only
 one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2
 plus a hot spare.  And putting everything into one vdev means that for
 any
 upgrade I have to replace all 8 drives at once, a financial problem for
 a
 home server.

 It is not so clear to me that an 8-drive raidz3 is clearly better than
 7-drive raidz2 plus a hot spare.  From a maintenance standpoint, I
 think that it is useful to have a spare drive or even an empty spare
 slot so that it is easy to replace a drive without needing to
 physically remove it from the system.  A true hot spare allows
 replacement to start automatically right away if a failure is
 detected.

But is having a RAIDZ2 drop to single redundancy, with replacement
starting instantly, actually as good or better than having a RAIDZ3 drop
to double redundancy, with actual replacement happening later?  The
degraded state of the RAIDZ3 has the same redundancy as the healthy
state of the RAIDZ2.

Certainly having a spare drive bay to play with is often helpful; though
the scenarios that most immediately spring to mind are all mirror-related
and hence don't apply here.

 With only 8-drives, the reliability improvement from raidz3 is
 unlikely to be borne out in practice.  Other potential failures modes
 will completely drown out the on-paper reliability improvement
 provided by raidz3.

I wouldn't give up much of anything to add Z3 on 8 drives, no.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Frank Cusack

On 6/2/10 11:10 PM -0400 Roman Naumenko wrote:

Well, I explained it not very clearly. I meant the size of a raidz array
can't be changed.
For sure zpool add can do the job with a pool. Not with a raidz
configuration.


Well in that case it's invalid to compare against Netapp since they
can't do it either (seems to be the consensus on this list).  Neither
zfs nor Netapp (nor any product) is really designed to handle adding
one drive at a time.  Normally you have to add an entire shelf, and
if you're doing that it's better to add a new vdev to your pool.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Frank Cusack

On 6/3/10 8:45 AM +0200 Juergen Nickelsen wrote:

Richard Elling rich...@nexenta.com writes:


And some time before I had suggested to a my buddy zfs for his new
home storage server, but he turned it down since there is no
expansion available for a pool.


Heck, let him buy a NetApp :-)


Definitely a possibility, given the availability and pricing of
oldish NetApp hardware on eBay.


Not really.  Software license is invalid on resale, and you can't replace
a failed drive with a generic drive so at some point you must buy an
Ontap license = $$$.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread A Darren Dunham
On Thu, Jun 03, 2010 at 12:40:34PM -0700, Frank Cusack wrote:
 On 6/3/10 12:06 AM -0400 Roman Naumenko wrote:
 I think there is a difference. Just quickly checked netapp site:
 
 Adding new disks to a RAID group If a volume has more than one RAID
 group, you can specify the RAID group to which you are adding disks.
 
 hmm that's a surprising feature to me.

It's always been possible with Netapp.  Back in the pre-5.0 (maybe it
was pre-4.0) days, an OnTAP device only had one raid group and one
filesystem/volume.  All you could do was expand it, not add additional
raid groups or additional volumes.  When the other features were added,
the ability to expand a raid group was not removed.

 I remember, and this was a few years back but I don't see why it would
 be any different now, we were trying to add drives 1-2 at a time to
 medium-sized arrays (don't buy the disks until we need them, to hold
 onto cash), and the Netapp performance kept going down down down.  We
 eventually had to borrow an array from Netapp to copy our data onto
 to rebalance.  Netapp told us explicitly, make sure to add an entire
 shelf at a time (and a new raid group, obviously, don't extend any
 existing group).

Yup, that's absolutely the best way to do it.  Otherwise, all your
writes will be on the one or two new disks creating hotspots until you
can rebalance your data, and that could take a long time.  I'm pretty
sure in the distant past, they had no online rebalancer.  Nowadays there
is one, but it's not particularly speedy.

Think adding a mirror pair to a large, nearly-full zpool.  The same
thing will happen.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Marion Hakanson
frank+lists/z...@linetwo.net said:
 I remember, and this was a few years back but I don't see why it would be any
 different now, we were trying to add drives 1-2 at a time to medium-sized
 arrays (don't buy the disks until we need them, to hold onto cash), and the
 Netapp performance kept going down down down.  We eventually had to borrow an
 array from Netapp to copy our data onto to rebalance.  Netapp told us
 explicitly, make sure to add an entire shelf at a time (and a new raid group,
 obviously, don't extend any existing group). 

The advent of aggregates fixed that problem.  Used to be that a raid-group
belonged to only one volume.  Now multiple flex-vols (even tiny ones) share
all the spindles (and parity drives) on their aggregate, and you can rebalance
after adding drives without having to manually move/copy existing data.  Pretty
slick, if you can afford the price.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Victor Latushkin
On Jun 3, 2010, at 3:16 AM, Erik Trimble wrote:
 Expanding a RAIDZ (which, really, is the only thing that can't do right now, 
 w/r/t adding disks) requires the Block Pointer (BP) Rewrite functionality 
 before it can get implemented.

Strictly speaking BP rewrite is not required to expand a RAID-Z, though it is 
required to be able to rewrite all blocks on the expanded VDEV so that newly 
attached space is usable.

regards
victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Bob Friesenhahn

On Thu, 3 Jun 2010, David Dyer-Bennet wrote:


But is having a RAIDZ2 drop to single redundancy, with replacement
starting instantly, actually as good or better than having a RAIDZ3 drop
to double redundancy, with actual replacement happening later?  The
degraded state of the RAIDZ3 has the same redundancy as the healthy
state of the RAIDZ2.


Mathematically, I am sure that raidz3 is better.  Redundancy 
statistics are not the only consideration though.  Raidz3 will write 
slower and resilver slower.  If the power supply produces a surge and 
fries all the drives, then raidz3 wil not help more than raidz2. 
Once the probability of failure due to unrelated drive failures 
becomes small enough, other factors related to the system become the 
dominant ones.  The power supply could surge, memory can return wrong 
data (even with ECC), the OS kernel can have a bug, or a tree can fall 
on the computer during a storm.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Magda

On Jun 3, 2010, at 13:36, Garrett D'Amore wrote:

Perhaps you have been unlucky.  Certainly, there is a window with N 
+1 redundancy where a single failure leaves the system exposed in  
the face of a 2nd fault.  This is a statistics game...


It doesn't even have to be a drive failure, but an unrecoverable read  
error.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] one more time: pool size changes

2010-06-02 Thread Roman Naumenko
Recently I talked to a co-worker who manages NetApp storages. We discussed size 
changes for pools in zfs and aggregates in NetApp.

And some time before I had suggested to a my buddy zfs for his new home storage 
server, but he turned it down since there is no expansion available for a pool. 

And he really wants to be able to add a drive or couple to an existing pool. 
Yes, there are ways to expand storage to some extent without rebuilding it. 
Like replacing disk with larger ones. Not enough for a typical home user I 
would say. 

And this is might be an important for corporate too. Frankly speaking I doubt 
there are many administrators use it in DC environment. 

Nevertheless, NetApp appears to have such feature as I learned from my 
co-worker. It works with some restrictions (you have to zero disks before 
adding, and rebalance the aggregate after and still without perfect 
distribution) - but Ontap is able to do aggregates expansion nevertheless. 

So, my question is: what does prevent to introduce the same for zfs at present 
time? Is this because of the design of zfs, or there is simply no demand for it 
in community?

My understanding is that at present time there are no plans to introduce it.

--Regards,
Roman Naumenko
ro...@naumenko.com
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-02 Thread Frank Cusack

On 6/2/10 3:54 PM -0700 Roman Naumenko wrote:

And some time before I had suggested to a my buddy zfs for his new home
storage server, but he turned it down since there is no expansion
available for a pool.


That's incorrect.  zfs pools can be expanded at any time.  AFAIK zfs has
always had this capability.


Nevertheless, NetApp appears to have such feature as I learned from my
co-worker. It works with some restrictions (you have to zero disks before
adding, and rebalance the aggregate after and still without perfect
distribution) - but Ontap is able to do aggregates expansion
nevertheless.


I wasn't aware that Netapp could rebalance.  Is that a true Netapp
feature, or is it a matter of copying the data manually?  zfs doesn't
have a cleaner process that rebalances, so for zfs you would have to
copy the data to rebalance the pool.  I certainly wouldn't make my
Netapp/zfs decision based on that (alone).

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-02 Thread Freddie Cash
On Wed, Jun 2, 2010 at 3:54 PM, Roman Naumenko ro...@naumenko.ca wrote:

 Recently I talked to a co-worker who manages NetApp storages. We discussed
 size changes for pools in zfs and aggregates in NetApp.

 And some time before I had suggested to a my buddy zfs for his new home
 storage server, but he turned it down since there is no expansion available
 for a pool.


There are two ways to increase the storage space available to a ZFS pool:
  1.  add more vdevs to the pool
  2.  replace each drive in a vdev with a larger drive

The first option expands the width of the pool, adds redundancy to the
pool, and (should) increase the performance of the pool.  This is very
simple to do, but requires having the drive bays and/or drive connectors
available.  (In fact, any time you add a vdev to a pool, including when you
first create it, you go through this process.)

The second option increases the total storage of the pool, without
changing any of the redundancy of the pool.  Performance may or may not
increase.  Once all the drives in a vdev are replaced, the storage space
becomes available to the pool (depending on the ZFS version, you may need to
export/import the pool for the space to become available).

We've used both of the above quite successfully, both at home and at work.

Not sure what your buddy was talking about.  :)

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-02 Thread Erik Trimble

Roman Naumenko wrote:

Recently I talked to a co-worker who manages NetApp storages. We discussed size 
changes for pools in zfs and aggregates in NetApp.

And some time before I had suggested to a my buddy zfs for his new home storage server, but he turned it down since there is no expansion available for a pool. 

And he really wants to be able to add a drive or couple to an existing pool. Yes, there are ways to expand storage to some extent without rebuilding it. Like replacing disk with larger ones. Not enough for a typical home user I would say. 

And this is might be an important for corporate too. Frankly speaking I doubt there are many administrators use it in DC environment. 

Nevertheless, NetApp appears to have such feature as I learned from my co-worker. It works with some restrictions (you have to zero disks before adding, and rebalance the aggregate after and still without perfect distribution) - but Ontap is able to do aggregates expansion nevertheless. 


So, my question is: what does prevent to introduce the same for zfs at present 
time? Is this because of the design of zfs, or there is simply no demand for it 
in community?

My understanding is that at present time there are no plans to introduce it.

--Regards,
Roman Naumenko
ro...@naumenko.com
  


Expanding a RAIDZ (which, really, is the only thing that can't do right 
now, w/r/t adding disks) requires the Block Pointer (BP) Rewrite 
functionality before it can get implemented.


We've been promised BP rewrite for awhile, but I have no visibility as 
to where development on it is in the schedule.


Fortunately, several other things also depend on BP rewrite (e.g.  
shrinking a pool (removing vdevs), efficient defragmentation/compaction, 
etc.).


So, while resizing a raidZ device isn't really high on the list of 
things to do, the fundamental building block which would allow for it to 
occur is very much important for Oracle. And, once BP rewrite is 
available, I suspect that there might be a raidZ resize contribution 
from one of the non-Oracle folks.  Or, maybe even someone like me (whose 
not a ZFS developer inside Oracle, but I play one on TV...)



Dev guys - where are we on BP rewrite?


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-02 Thread Richard Elling
On Jun 2, 2010, at 4:08 PM, Freddie Cash wrote:

 On Wed, Jun 2, 2010 at 3:54 PM, Roman Naumenko ro...@naumenko.ca wrote:
 Recently I talked to a co-worker who manages NetApp storages. We discussed 
 size changes for pools in zfs and aggregates in NetApp.
 
 And some time before I had suggested to a my buddy zfs for his new home 
 storage server, but he turned it down since there is no expansion available 
 for a pool.
 
 There are two ways to increase the storage space available to a ZFS pool:
   1.  add more vdevs to the pool
   2.  replace each drive in a vdev with a larger drive

  3. grow a LUN and export/import (old releases) or toggle autoexpand=on (later
release)

 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-02 Thread Richard Elling
On Jun 2, 2010, at 3:54 PM, Roman Naumenko wrote:
 Recently I talked to a co-worker who manages NetApp storages. We discussed 
 size changes for pools in zfs and aggregates in NetApp.
 
 And some time before I had suggested to a my buddy zfs for his new home 
 storage server, but he turned it down since there is no expansion available 
 for a pool. 

Heck, let him buy a NetApp :-)

 And he really wants to be able to add a drive or couple to an existing pool. 
 Yes, there are ways to expand storage to some extent without rebuilding it. 
 Like replacing disk with larger ones. Not enough for a typical home user I 
 would say. 

Why not? I do this quite often. Growing is easy, shrinking is more challenging.

 And this is might be an important for corporate too. Frankly speaking I doubt 
 there are many administrators use it in DC environment. 
 
 Nevertheless, NetApp appears to have such feature as I learned from my 
 co-worker. It works with some restrictions (you have to zero disks before 
 adding, and rebalance the aggregate after and still without perfect 
 distribution) - but Ontap is able to do aggregates expansion nevertheless. 
 
 So, my question is: what does prevent to introduce the same for zfs at 
 present time? Is this because of the design of zfs, or there is simply no 
 demand for it in community?

Its been there since 2005: zpool subcommand add.
 -- richard

 
 My understanding is that at present time there are no plans to introduce it.
 
 --Regards,
 Roman Naumenko
 ro...@naumenko.com

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-02 Thread Brandon High
On Wed, Jun 2, 2010 at 3:54 PM, Roman Naumenko ro...@naumenko.ca wrote:
 And some time before I had suggested to a my buddy zfs for his new home 
 storage server, but he turned it down since there is no expansion available 
 for a pool.

There's no expansion for aggregates in OnTap, either. You can add more
disks (as a raid-dp or mirror set) to an existing aggr, but you can
also add more vdevs (as raidz or mirrors) to a zpool two.

 And he really wants to be able to add a drive or couple to an existing pool. 
 Yes, there are ways to expand storage to some extent without rebuilding it. 
 Like replacing disk with larger ones. Not enough for a typical home user I 
 would say.

You can do this. 'zpool add'

 Nevertheless, NetApp appears to have such feature as I learned from my 
 co-worker. It works with some restrictions (you have to zero disks before 
 adding, and rebalance the aggregate after and still without perfect 
 distribution) - but Ontap is able to do aggregates expansion nevertheless.

Yeah, you can add to a aggr, but you can't add to a raid-dp set. It's
the same as ZFS.

ZFS doesn't require that you zero disks, and there is no rebalancing.
As more data is written to the pool it will become more balanced
however.

 So, my question is: what does prevent to introduce the same for zfs at 
 present time? Is this because of the design of zfs, or there is simply no 
 demand for it in community?

 My understanding is that at present time there are no plans to introduce it.

Rebalancing depends on bp_rewrite, which is vaporware still. There has
been discussion of it for a while but no implementation that I know
of.

Once the feature is added, it will be possible to add or remove
devices from a zpool or vdev, something that OnTap can't do.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss