Re: [zfs-discuss] best migration path from Solaris 10

2011-03-20 Thread Fajar A. Nugraha
On Sun, Mar 20, 2011 at 4:05 AM, Pawel Jakub Dawidek p...@freebsd.org wrote:
 On Fri, Mar 18, 2011 at 06:22:01PM -0700, Garrett D'Amore wrote:
 Newer versions of FreeBSD have newer ZFS code.

 Yes, we are at v28 at this point (the lastest open-source version).

 That said, ZFS on FreeBSD is kind of a 2nd class citizen still. [...]

 That's actually not true. There are more FreeBSD committers working on
 ZFS than on UFS.

How is the performance of ZFS under FreeBSD? Is it comparable to that
in Solaris, or still slower due to some needed compatibility layer?

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] A resilver record?

2011-03-20 Thread Ian Collins
 Has anyone seen a resilver longer than this for a 500G drive in a 
riadz2 vdev?


scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 
19:57:37 2011

  c0t0d0  ONLINE   0 0 0  769G resilvered

and I told the client it would take 3 to 4 days!

:)

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread Roy Sigurd Karlsbakk
 Has anyone seen a resilver longer than this for a 500G drive in a
 riadz2 vdev?
 
 scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20
 19:57:37 2011
 c0t0d0 ONLINE 0 0 0 769G resilvered
 
 and I told the client it would take 3 to 4 days!

It all depends on the number of drives in the VDEV(s), traffic patterns during 
resilver, speed VDEV fill, of drives etc. Still, close to 6 days is a lot. Can 
you detail your configuration?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread taemun
769G resilvered on a 500G drive? I'm guessing there was a whole bunch of
activity (and probably snapshot creation) happening alongside the resilver.

On 20 March 2011 18:57, Ian Collins i...@ianshome.com wrote:

  Has anyone seen a resilver longer than this for a 500G drive in a riadz2
 vdev?

 scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20
 19:57:37 2011
  c0t0d0  ONLINE   0 0 0  769G resilvered

 and I told the client it would take 3 to 4 days!

 :)

 --
 Ian.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-20 Thread Fred Liu
Probably, we need place a tag before zfs -- Opensource-ZFS or Oracle-ZFS after 
Solaris11 release.
If it is true, these two ZFSes will definitely evolve into different directions.
BTW, Did Oracle unveil the actual release date? We are also at the cross 
road... 

Thanks.

Fred

 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha
 Sent: 星期日, 三月 20, 2011 14:55
 To: Pawel Jakub Dawidek
 Cc: openindiana-disc...@openindiana.org; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] best migration path from Solaris 10
 
 On Sun, Mar 20, 2011 at 4:05 AM, Pawel Jakub Dawidek p...@freebsd.org
 wrote:
  On Fri, Mar 18, 2011 at 06:22:01PM -0700, Garrett D'Amore wrote:
  Newer versions of FreeBSD have newer ZFS code.
 
  Yes, we are at v28 at this point (the lastest open-source version).
 
  That said, ZFS on FreeBSD is kind of a 2nd class citizen still. [...]
 
  That's actually not true. There are more FreeBSD committers working
 on
  ZFS than on UFS.
 
 How is the performance of ZFS under FreeBSD? Is it comparable to that
 in Solaris, or still slower due to some needed compatibility layer?
 
 --
 Fajar
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-20 Thread Joerg Schilling
Fred Liu fred_...@issi.com wrote:

 Probably, we need place a tag before zfs -- Opensource-ZFS or Oracle-ZFS 
 after Solaris11 release.
 If it is true, these two ZFSes will definitely evolve into different 
 directions.
 BTW, Did Oracle unveil the actual release date? We are also at the cross 
 road... 

The long term acceptance for ZFS depends on how Oracle will behave past the 
announced Solaris 11 is released. If they don't Opensource the related ZFS, 
they will harm the future of ZFS. If they Opensource it again, there is still a 
problem with syncing the ZFS ve3rsions from the OSS OpenSolaris continuation 
projects.

The revision number introduced by Sun is only useful if there is no more than 
a single entity that introduces new features.

For a reliable future for a distributed ZFS development, we would need 
something like the POSIX method to introduce tar extensions:

a combination of a textual name for the entity that introduced the
fature and a textual name for the feature.

e.g. SCHILY-zfs-encryption

Jörg

-- 
 EMail:jo...@schily.net  (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-20 Thread David Magda
On Mar 20, 2011, at 09:26, Joerg Schilling wrote:

 The long term acceptance for ZFS depends on how Oracle will behave past the 
 announced Solaris 11 is released. If they don't Opensource the related ZFS, 
 they will harm the future of ZFS. If they Opensource it again, there is still 
 a 
 problem with syncing the ZFS ve3rsions from the OSS OpenSolaris continuation 
 projects.

For a while Apple was considering it, and if Ellison and Jobs can come to an 
agreement, it would certainly become very popular very quickly.

Apple probably ships more UNIX(tm) devices than any other  vendor (often over 
3M units in a quarter). Using revenue as a metric gives similar results. And 
who says the Unix workstation market is dead? :)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread Roy Sigurd Karlsbakk
  It all depends on the number of drives in the VDEV(s), traffic
  patterns during resilver, speed VDEV fill, of drives etc. Still,
  close to 6 days is a lot. Can you detail your configuration?
 
 How many times do we have to rehash this? The speed of resilver is
 dependent on the amount of data, the distribution of data on the
 resilvering
 device, speed of the resilvering device, and the throttle. It is NOT
 dependent
 on the number of drives in the vdev.

Thanks for clearing this up - I've been told large VDEVs lead to long resilver 
times, but then, I guess that was wrong.

Btw after replacing some 2TB drives with 3TB ones in three VDEVs that were 95% 
full at the time, resilver times dropped by 30%, so I guess very full VDEVs 
aren't much fun even on the resilver side.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-20 Thread David Magda
On Mar 20, 2011, at 14:33, Garrett D'Amore wrote:

 I hear from reliable sources that Apple is not doing anything with ZFS,
 so I would not look there for leadership.

Given that one of the prominent (?) file system guys at Apple left to form his 
own ZFS company, I figured that was the case even before you stated the above:

http://tinyurl.com/4jznw48
http://arstechnica.com/apple/news/2011/03/how-zfs-is-slowly-making-its-way-to-mac-os-x.ars

The ZFS Working Group is awesome news. I hope to hear of a bright future for 
ZFS on all operating systems.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread David Magda
On Mar 20, 2011, at 14:24, Roy Sigurd Karlsbakk wrote:

 It all depends on the number of drives in the VDEV(s), traffic
 patterns during resilver, speed VDEV fill, of drives etc. Still,
 close to 6 days is a lot. Can you detail your configuration?
 
 How many times do we have to rehash this? The speed of resilver is
 dependent on the amount of data, the distribution of data on the
 resilvering device, speed of the resilvering device, and the throttle. It is 
 NOT
 dependent on the number of drives in the vdev.
 
 Thanks for clearing this up - I've been told large VDEVs lead to long 
 resilver times, but then, I guess that was wrong.

There was a thread (Suggested RaidZ configuration...) a little while back 
where the topic of IOps and resilver time came up:

http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/thread.html#44633

I think this message by Erik Trimble is a good summary:

 Scenario 1:I have 5 1TB disks in a raidz1, and I assume I have 128k slab 
 sizes.  Thus, I have 32k of data for each slab written to each disk. (4x32k 
 data + 32k parity for a 128k slab size).  So, each IOPS gets to reconstruct 
 32k of data on the failed drive.   It thus takes about 1TB/32k = 31e6 IOPS to 
 reconstruct the full 1TB drive.
 
 Scenario 2:I have 10 1TB drives in a raidz1, with the same 128k slab 
 sizes.  In this case, there's only about 14k of data on each drive for a 
 slab. This means, each IOPS to the failed drive only write 14k.  So, it takes 
 1TB/14k = 71e6 IOPS to complete.
 
 From this, it can be pretty easy to see that the number of required IOPS to 
 the resilvered disk goes up linearly with the number of data drives in a 
 vdev.  Since you're always going to be IOPS bound by the single disk 
 resilvering, you have a fixed limit.


http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/044660.html

Also, a post by Jeff Bonwick on resilvering:

http://blogs.sun.com/bonwick/entry/smokin_mirrors

Between Richard's and Eric's statements, I would say that while resilver time 
is not dependent number of drives in the vdev, the pool configuration can 
affect the IOps rate, and /that/ can affect the time it takes to finish a 
resilver. Is that a decent summary?

I think maybe the number of drives in the vdev perhaps come into play because 
that when people have a lot of disks, they often put them into RAIDZ[123] 
configurations. So it's just a matter of confusing the (IOps limiting) 
configuration with the fact that one may have many disks.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread Roy Sigurd Karlsbakk
 I think maybe the number of drives in the vdev perhaps come into
 play because that when people have a lot of disks, they often put them
 into RAIDZ[123] configurations. So it's just a matter of confusing the
 (IOps limiting) configuration with the fact that one may have many
 disks.

My answer was not meant to be a generic one, but based on the original 
question, which was about a raidz2 VDEV, but then, thanks for the info

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread Richard Elling
On Mar 20, 2011, at 12:48 PM, David Magda wrote:
 On Mar 20, 2011, at 14:24, Roy Sigurd Karlsbakk wrote:
 
 It all depends on the number of drives in the VDEV(s), traffic
 patterns during resilver, speed VDEV fill, of drives etc. Still,
 close to 6 days is a lot. Can you detail your configuration?
 
 How many times do we have to rehash this? The speed of resilver is
 dependent on the amount of data, the distribution of data on the
 resilvering device, speed of the resilvering device, and the throttle. It 
 is NOT
 dependent on the number of drives in the vdev.
 
 Thanks for clearing this up - I've been told large VDEVs lead to long 
 resilver times, but then, I guess that was wrong.
 
 There was a thread (Suggested RaidZ configuration...) a little while back 
 where the topic of IOps and resilver time came up:
 
 http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/thread.html#44633
 
 I think this message by Erik Trimble is a good summary:

hmmm... I must've missed that one, otherwise I would have said...

 
 Scenario 1:I have 5 1TB disks in a raidz1, and I assume I have 128k slab 
 sizes.  Thus, I have 32k of data for each slab written to each disk. (4x32k 
 data + 32k parity for a 128k slab size).  So, each IOPS gets to reconstruct 
 32k of data on the failed drive.   It thus takes about 1TB/32k = 31e6 IOPS 
 to reconstruct the full 1TB drive.

Here, the IOPS doesn't matter because the limit will be the media write
speed of the resilvering disk -- bandwidth.

 
 Scenario 2:I have 10 1TB drives in a raidz1, with the same 128k slab 
 sizes.  In this case, there's only about 14k of data on each drive for a 
 slab. This means, each IOPS to the failed drive only write 14k.  So, it 
 takes 1TB/14k = 71e6 IOPS to complete.

Here, IOPS might matter, but I doubt it.  Where we see IOPS matter is when the 
block
sizes are small (eg. metadata). In some cases you can see widely varying 
resilver times when 
the data is large versus small. These changes follow the temporal distribution 
of the original
data. For example, if a pool's life begins with someone loading their MP3 
collection (large
blocks, mostly sequential) and then working on source code (small blocks, more 
random, lots
of creates/unlinks) then the resilver will be bandwidth bound as it resilvers 
the MP3s and then 
IOPS bound as it resilvers the source. Hence, the prediction of when resilver 
will finish is not
very accurate.

 
 From this, it can be pretty easy to see that the number of required IOPS to 
 the resilvered disk goes up linearly with the number of data drives in a 
 vdev.  Since you're always going to be IOPS bound by the single disk 
 resilvering, you have a fixed limit.

You will not always be IOPS bound by the resilvering disk. You will be speed 
bound
by the resilvering disk, where speed is either write bandwidth or random write 
IOPS.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread Ian Collins

 On 03/20/11 08:57 PM, Ian Collins wrote:
 Has anyone seen a resilver longer than this for a 500G drive in a 
riadz2 vdev?


scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 
19:57:37 2011

  c0t0d0  ONLINE   0 0 0  769G resilvered

I didn't intend to start an argument, I was just very surprised the 
resilver took so long.


This box is backup staging server (Solaris 10u8), so it does receive a 
lot of data.  However it has lost a number of drives in the past and the 
resilver took around 100 hours hence my surprise.


The drive is part of an 8 drive raidz2 vdev, not overly full:

 raidz2  3.40T   227G

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread David Magda
On Mar 20, 2011, at 18:02, Ian Collins wrote:

 I didn't intend to start an argument, I was just very surprised the resilver 
 took so long.

ZFS is a relatively young file system, and it does a lot of things differently 
than what has been done in the past. Personally I think arguments / debates / 
discussions like this thread assist people in understanding how things work and 
help bring out any misconceptions that they may have, that can then be 
corrected.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread Richard Elling
On Mar 20, 2011, at 3:02 PM, Ian Collins wrote:

 On 03/20/11 08:57 PM, Ian Collins wrote:
 Has anyone seen a resilver longer than this for a 500G drive in a riadz2 
 vdev?
 
 scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37 
 2011
  c0t0d0  ONLINE   0 0 0  769G resilvered
 
 I didn't intend to start an argument, I was just very surprised the resilver 
 took so long.

I'd describe the thread as critical analysis, not argument. There are many 
facets of ZFS
resilver and scrub that many people have never experienced, so it makes sense to
explore the issue.

Expect ZFS resilvers to take longer in the future for HDDs.  
Expect ZFS resilvers to remain quite fast for SSDs. 
Why? Because HDDs are getting bigger, but not faster, while SSDs are getting 
bigger and faster. 

I've done a number of studies of this and have a lot of data to describe what 
happens. I also 
work through performance analysis of resilver cases for my ZFS tutorials.

 This box is backup staging server (Solaris 10u8), so it does receive a lot of 
 data.  However it has lost a number of drives in the past and the resilver 
 took around 100 hours hence my surprise.

We've thought about how to provide some sort of feedback on the progress of 
resilvers.
It is relatively simple to know what has already been resilvered and how much
throttling is currently active. But that info does not make future predictions 
more accurate.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread Ian Collins

 On 03/21/11 12:20 PM, Richard Elling wrote:

On Mar 20, 2011, at 3:02 PM, Ian Collins wrote:


On 03/20/11 08:57 PM, Ian Collins wrote:

Has anyone seen a resilver longer than this for a 500G drive in a riadz2 vdev?

scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37 
2011
  c0t0d0  ONLINE   0 0 0  769G resilvered


I didn't intend to start an argument, I was just very surprised the resilver 
took so long.

I'd describe the thread as critical analysis, not argument. There are many 
facets of ZFS
resilver and scrub that many people have never experienced, so it makes sense to
explore the issue.

Expect ZFS resilvers to take longer in the future for HDDs.
Expect ZFS resilvers to remain quite fast for SSDs.
Why? Because HDDs are getting bigger, but not faster, while SSDs are getting 
bigger and faster.

I've done a number of studies of this and have a lot of data to describe what 
happens. I also
work through performance analysis of resilver cases for my ZFS tutorials.


Does the throttling improve receive latency?

The 30+ second latency I see on this system during a resilver renders it 
pretty useless as a staging server (lots of small snapshots).


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A resilver record?

2011-03-20 Thread Erik Trimble

On 3/20/2011 2:23 PM, Richard Elling wrote:

On Mar 20, 2011, at 12:48 PM, David Magda wrote:

On Mar 20, 2011, at 14:24, Roy Sigurd Karlsbakk wrote:


It all depends on the number of drives in the VDEV(s), traffic
patterns during resilver, speed VDEV fill, of drives etc. Still,
close to 6 days is a lot. Can you detail your configuration?

How many times do we have to rehash this? The speed of resilver is
dependent on the amount of data, the distribution of data on the
resilvering device, speed of the resilvering device, and the throttle. It is NOT
dependent on the number of drives in the vdev.

Thanks for clearing this up - I've been told large VDEVs lead to long resilver 
times, but then, I guess that was wrong.

There was a thread (Suggested RaidZ configuration...) a little while back 
where the topic of IOps and resilver time came up:

http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/thread.html#44633

I think this message by Erik Trimble is a good summary:

hmmm... I must've missed that one, otherwise I would have said...


Scenario 1:I have 5 1TB disks in a raidz1, and I assume I have 128k slab 
sizes.  Thus, I have 32k of data for each slab written to each disk. (4x32k 
data + 32k parity for a 128k slab size).  So, each IOPS gets to reconstruct 32k 
of data on the failed drive.   It thus takes about 1TB/32k = 31e6 IOPS to 
reconstruct the full 1TB drive.

Here, the IOPS doesn't matter because the limit will be the media write
speed of the resilvering disk -- bandwidth.


Scenario 2:I have 10 1TB drives in a raidz1, with the same 128k slab sizes. 
 In this case, there's only about 14k of data on each drive for a slab. This 
means, each IOPS to the failed drive only write 14k.  So, it takes 1TB/14k = 
71e6 IOPS to complete.

Here, IOPS might matter, but I doubt it.  Where we see IOPS matter is when the 
block
sizes are small (eg. metadata). In some cases you can see widely varying 
resilver times when
the data is large versus small. These changes follow the temporal distribution 
of the original
data. For example, if a pool's life begins with someone loading their MP3 
collection (large
blocks, mostly sequential) and then working on source code (small blocks, more 
random, lots
of creates/unlinks) then the resilver will be bandwidth bound as it resilvers 
the MP3s and then
IOPS bound as it resilvers the source. Hence, the prediction of when resilver 
will finish is not
very accurate.


 From this, it can be pretty easy to see that the number of required IOPS to 
the resilvered disk goes up linearly with the number of data drives in a vdev.  
Since you're always going to be IOPS bound by the single disk resilvering, you 
have a fixed limit.

You will not always be IOPS bound by the resilvering disk. You will be speed 
bound
by the resilvering disk, where speed is either write bandwidth or random write 
IOPS.
  -- richard



Really? Can you really be bandwidth limited on a (typical) RAIDZ resilver?

I can see where you might be on a mirror, with large slabs and 
essentially sequential read/write - that is, since the drivers can queue 
up several read/write requests at a time, you have the potential to be 
reading/writing several (let's say 4) 128k slabs per single IOPS.  That 
means you read/write at 512k per IOPS for a mirror (best case 
scenario).  For a 7200RPM drive, that's 100 IOPS x .5MB/IOPS = 50MB/s, 
which is lower than the maximum throughput of a modern SATA drive.   For 
one of the 15k SAS drives able to do 300IOPS, you get 150MB/s, which 
indeed exceeds a SAS drive's write bandwidth.


For RAIDZn configs, however, you're going to be limited on the size of 
an individual read/write.  As Roy pointed out before, that max size of 
an individual portion of a slab is 128k/X, where X=number of data drives 
in RAIDZn.   So, for a typical 4-data-drive RAIDZn, even in the best 
case scenario where I can queue multiple slab requests (say 4) into a 
single IOPS, that means I'm likely to top out at about 128k of data to 
write to the resilvered drive per IOPS.  Which, leads to 12MB/s for the 
7200RPM drive, and 36MB/s for the 15k drive, both well under their 
respective bandwidth capability.


Even with large slab sizes, I really can't see any place where a RAIDZ 
resilver isn't going to be IOPS bound when using HDs as backing store.  
Mirrors are more likely, but still, even in that case, I think you're 
going to hit the IOPS barrier far more often than the bandwidth barrier.



Now, with SSDs as backing store, yes, you become bandwidth limited, 
because the IOPS values of SSDs are at least an order of magnitude 
greater than HDs, though both have the same max bandwidth characteristics.



Now, the *total* time it takes to resilver either a mirror or RAIDZ is 
indeed primarily dependent on the number of allocated slabs in the vdev, 
and the level of fragmentation of slabs.  That essentially defines the 
total amount of work that needs to be done.  The above discussion 
compares