On Dec 20, 2010, at 2:42 AM, Phil Harman <phil.har...@gmail.com> wrote:

>> Why does resilvering take so long in raidz anyway?
> 
> Because it's broken. There were some changes a while back that made it more 
> broken.

"broken" is the wrong term here. It functions as designed and correctly 
resilvers devices. Disagreeing with the design is quite different than
proving a defect.

> There has been a lot of discussion, anecdotes and some data on this list. 

"slow because I use devices with poor random write(!) performance"
is very different than "broken."

> The resilver doesn't do a single pass of the drives, but uses a "smarter" 
> temporal algorithm based on metadata.

A design that only does a single pass does not handle the temporal
changes. Many RAID implementations use a mix of spatial and temporal
resilvering and suffer with that design decision.

> However, the current implentation has difficulty finishing the job if there's 
> a steady flow of updates to the pool.

Please define current. There are many releases of ZFS, and
many improvements have been made over time. What has not
improved is the random write performance of consumer-grade
HDDs.

> As far as I'm aware, the only way to get bounded resilver times is to stop 
> the workload until resilvering is completed.

I know of no RAID implementation that bounds resilver times
for HDDs. I believe it is not possible. OTOH, whether a resilver
takes 10 seconds or 10 hours makes little difference in data
availability. Indeed, this is why we often throttle resilvering
activity. See previous discussions on this forum regarding the
dueling RFEs.

> The problem exists for mirrors too, but is not as marked because mirror 
> reconstruction is inherently simpler.

Resilver time is bounded by the random write performance of
the resilvering device. Mirroring or raidz make no difference.

> I believe Oracle is aware of the problem, but most of the core ZFS team has 
> left. And of course, a fix for Oracle Solaris no longer means a fix for the 
> rest of us.

Some "improvements" were made post-b134 and pre-b148.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to