Re: [zfs-discuss] 350TB+ storage solution

Karl Wagner Mon, 16 May 2011 05:35:55 -0700

I have to agree. ZFS needs a more intelligent scrub/resilver algorithm, which 
can 'sequentialise' the process. 
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.


Giovanni Tirloni <[email protected]> wrote:

On Mon, May 16, 2011 at 9:02 AM, Sandon Van Ness <[email protected]> wrote:


Actually I have seen resilvers take a very long time (weeks) on solaris/raidz2 
when I almost never see a hardware raid controller take more than a day or two. 
In one case i thrashed the disks absolutely as hard as I could (hardware 
controller) and finally was able to get the rebuild to take almost 1 week.. 
Here is an example of one right now:

  pool: raid3060
  state: ONLINE
  status: One or more devices is currently being resilvered. The pool will
  continue to function, possibly in a degraded state.
  action: Wait for the resilver to complete.
  scrub: resilver in progress for 224h54m, 52.38% done, 204h30m to go
  config:


Resilver has been a problem with RAIDZ volumes for a while. I've routinely seen 
it take >300 hours and sometimes >600 hours with 13TB pools at 80%. All disks 
are maxed out on IOPS while still reading 1-2MB/s and there rarely is any 
writes. I've written about it before here (and provided data). 

My only guess is that fragmentation is a real problem in a scrub/resilver 
situation but whenever the conversation changes to point weaknesses in ZFS we 
start seeing "that is not a problem" comments. With the 7000s appliance I've 
heard that the 900hr estimated resilver time was "normal" and "everything is 
working as expected". Can't help but think there is some walled garden syndrome 
floating around.

-- 
Giovanni Tirloni

_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] 350TB+ storage solution

Reply via email to