Re: [OmniOS-discuss] Slow scrub performance

Richard Elling Thu, 31 Jul 2014 09:11:14 -0700

correction below...

On Jul 30, 2014, at 10:37 PM, Richard Elling <[email protected]> 
wrote:


> apologies for the long post, data for big systems tends to do that, comments 
> below...
> 
> On Jul 30, 2014, at 9:10 PM, wuffers <[email protected]> wrote:
> 
>> So as I suspected, I lost 2 weeks of scrub time after the resilver. I 
>> started a scrub again, and it's going extremely slow (~13x slower than 
>> before):
>> 
>>   pool: tank
>>  state: ONLINE
>>   scan: scrub in progress since Tue Jul 29 15:41:27 2014
>>     45.4G scanned out of 24.5T at 413K/s, (scan is slow, no estimated time)
>>     0 repaired, 0.18% done
>> 
>> # iostat -zxCn 60 2 (2nd batch output)
>> 
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>   143.7 1321.5 5149.0 46223.4  0.0  1.5    0.0    1.0   0 120 c1
>>     2.4   33.3   72.0  897.5  0.0  0.0    0.0    0.6   0   2 
>> c1t5000C50055F8723Bd0
>>     2.7   22.8   82.9 1005.4  0.0  0.0    0.0    0.9   0   2 
>> c1t5000C50055E66B63d0
>>     2.2   24.4   73.1  917.7  0.0  0.0    0.0    0.7   0   2 
>> c1t5000C50055F87E73d0
>>     3.1   26.2  120.9  899.8  0.0  0.0    0.0    0.8   0   2 
>> c1t5000C50055F8BFA3d0
>>     2.8   16.5  105.9  941.6  0.0  0.0    0.0    1.0   0   2 
>> c1t5000C50055F9E123d0
>>     2.5   25.6   86.6  897.9  0.0  0.0    0.0    0.7   0   2 
>> c1t5000C50055F9F0B3d0
>>     2.3   19.9   85.3  967.8  0.0  0.0    0.0    1.2   0   2 
>> c1t5000C50055F9D3B3d0
>>     3.1   38.3  120.7 1053.1  0.0  0.0    0.0    0.8   0   3 
>> c1t5000C50055E4FDE7d0
>>     2.6   12.7   81.8  854.3  0.0  0.0    0.0    1.6   0   2 
>> c1t5000C50055F9A607d0
>>     3.2   25.0  121.7  871.7  0.0  0.0    0.0    0.8   0   2 
>> c1t5000C50055F8CDA7d0
>>     2.5   30.6   93.0  941.2  0.0  0.0    0.0    0.9   0   2 
>> c1t5000C50055E65877d0
>>     3.1   43.7  101.4 1004.2  0.0  0.0    0.0    1.0   0   4 
>> c1t5000C50055F9E7D7d0
>>     2.3   24.0   92.2  965.8  0.0  0.0    0.0    0.9   0   2 
>> c1t5000C50055FA0AF7d0
>>     2.5   25.3   99.2  872.9  0.0  0.0    0.0    0.8   0   2 
>> c1t5000C50055F9FE87d0
>>     2.9   19.0  116.1  894.8  0.0  0.0    0.0    1.2   0   2 
>> c1t5000C50055F9F91Bd0
>>     2.6   38.9   96.1  915.4  0.0  0.1    0.0    1.2   0   4 
>> c1t5000C50055F9FEABd0
>>     3.2   45.6  135.7  973.5  0.0  0.1    0.0    1.5   0   5 
>> c1t5000C50055F9F63Bd0
>>     3.1   21.2  105.9  966.6  0.0  0.0    0.0    1.0   0   2 
>> c1t5000C50055F9F3EBd0
>>     2.8   26.7  122.0  781.6  0.0  0.0    0.0    0.7   0   2 
>> c1t5000C50055F9F80Bd0
>>     3.1   31.6  119.9  932.5  0.0  0.0    0.0    1.1   0   3 
>> c1t5000C50055F9FB8Bd0
>>     3.1   32.5  123.3  924.1  0.0  0.0    0.0    0.9   0   3 
>> c1t5000C50055F9F92Bd0
>>     2.9   17.0  113.8  952.0  0.0  0.0    0.0    1.2   0   2 
>> c1t5000C50055F8905Fd0
>>     3.0   23.4  111.0  871.1  0.0  0.0    0.0    1.5   0   2 
>> c1t5000C50055F8D48Fd0
>>     2.8   21.4  105.5  858.0  0.0  0.0    0.0    1.0   0   2 
>> c1t5000C50055F9F89Fd0
>>     3.5   16.4   87.1  941.3  0.0  0.0    0.0    1.4   0   2 
>> c1t5000C50055F9EF2Fd0
>>     2.1   33.8   64.5  897.5  0.0  0.0    0.0    0.5   0   2 
>> c1t5000C50055F8C3ABd0
>>     3.0   21.8   72.3 1005.4  0.0  0.0    0.0    1.0   0   2 
>> c1t5000C50055E66053d0
>>     3.0   37.8  106.9 1053.5  0.0  0.0    0.0    0.9   0   3 
>> c1t5000C50055E66503d0
>>     2.7   26.0  107.7  897.9  0.0  0.0    0.0    0.7   0   2 
>> c1t5000C50055F9D3E3d0
>>     2.2   38.9   96.4  918.7  0.0  0.0    0.0    0.9   0   4 
>> c1t5000C50055F84FB7d0
>>     2.8   21.4  111.1  953.6  0.0  0.0    0.0    0.7   0   1 
>> c1t5000C50055F8E017d0
>>     3.0   30.6  104.3  940.9  0.0  0.1    0.0    1.5   0   3 
>> c1t5000C50055E579F7d0
>>     2.8   26.4   90.9  901.1  0.0  0.0    0.0    0.9   0   2 
>> c1t5000C50055E65807d0
>>     2.4   24.0   96.7  965.8  0.0  0.0    0.0    0.9   0   2 
>> c1t5000C50055F84A97d0
>>     2.9   19.8  109.4  967.8  0.0  0.0    0.0    1.1   0   2 
>> c1t5000C50055F87D97d0
>>     3.8   16.1  106.4  943.1  0.0  0.0    0.0    1.3   0   2 
>> c1t5000C50055F9F637d0
>>     2.2   17.1   72.7  966.6  0.0  0.0    0.0    1.4   0   2 
>> c1t5000C50055E65ABBd0
>>     2.7   12.7   86.0  863.3  0.0  0.0    0.0    1.5   0   2 
>> c1t5000C50055F8BF9Bd0
>>     2.7   23.2  101.8  871.1  0.0  0.0    0.0    1.0   0   2 
>> c1t5000C50055F8A22Bd0
>>     4.5   43.6  134.7 1004.2  0.0  0.0    0.0    1.0   0   4 
>> c1t5000C50055F9379Bd0
>>     2.8   24.0   87.9  917.7  0.0  0.0    0.0    0.8   0   2 
>> c1t5000C50055E57A5Fd0
>>     2.9   18.8  119.0  894.3  0.0  0.0    0.0    1.1   0   2 
>> c1t5000C50055F8CCAFd0
>>     3.4   45.7  128.1  976.8  0.0  0.1    0.0    1.2   0   5 
>> c1t5000C50055F8B80Fd0
>>     2.7   24.9  100.2  871.7  0.0  0.0    0.0    0.8   0   2 
>> c1t5000C50055F9FA1Fd0
>>     4.8   26.8  128.6  781.6  0.0  0.0    0.0    0.7   0   2 
>> c1t5000C50055E65F0Fd0
>>     2.7   16.3  109.5  941.6  0.0  0.0    0.0    1.1   0   2 
>> c1t5000C50055F8BE3Fd0
>>     3.1   21.1  119.9  858.0  0.0  0.0    0.0    1.1   0   2 
>> c1t5000C50055F8B21Fd0
>>     2.8   31.8  108.5  932.5  0.0  0.0    0.0    1.0   0   3 
>> c1t5000C50055F8A46Fd0
>>     2.4   25.3   87.4  872.9  0.0  0.0    0.0    0.8   0   2 
>> c1t5000C50055F856CFd0
>>     3.3   32.0  125.2  924.1  0.0  0.0    0.0    1.2   0   3 
>> c1t5000C50055E6606Fd0
>>   289.9  169.0 3905.0 12754.1  0.0  0.2    0.0    0.4   0  10 c2
>>   146.6   14.1 1987.9  305.2  0.0  0.0    0.0    0.2   0   4 
>> c2t500117310015D579d0
>>   143.4   10.6 1917.1  205.2  0.0  0.0    0.0    0.2   0   3 
>> c2t50011731001631FDd0
>>     0.0  144.3    0.0 12243.7  0.0  0.1    0.0    0.9   0   3 
>> c2t5000A72A3007811Dd0
>>     0.0   14.6    0.0   75.8  0.0  0.0    0.0    0.1   0   0 c4
>>     0.0    7.3    0.0   37.9  0.0  0.0    0.0    0.1   0   0 c4t0d0
>>     0.0    7.3    0.0   37.9  0.0  0.0    0.0    0.1   0   0 c4t1d0
>>   284.8  171.5 3792.8 12786.2  0.0  0.2    0.0    0.4   0  10 c12
>>     0.0  144.3    0.0 12243.7  0.0  0.1    0.0    0.9   0   3 
>> c12t5000A72B300780FFd0
>>   152.3   13.3 2004.6  255.9  0.0  0.0    0.0    0.2   0   4 
>> c12t500117310015D59Ed0
>>   132.5   13.9 1788.2  286.6  0.0  0.0    0.0    0.2   0   3 
>> c12t500117310015D54Ed0
>>     0.0   13.5    0.0   75.8  0.0  0.0    0.8    0.1   0   0 rpool
>>   718.4 1653.5 12846.8 71761.5 34.0  2.0   14.3    0.8   7  51 tank
>> 
>> This doesn't seem any busier than my earlier output (6% wait, 68% busy, 
>> asvc_t 1.1ms) and the dev team confirms that their workload hasn't changed 
>> for the past few days. If my math is right.. this will take ~719 days to 
>> complete.
> 
> The %busy for controllers is a sum of the %busy for all disks on the 
> controller, so
> is can be large, but overall isn't interesting. With HDDs, there is no way 
> you can 
> saturate the controller, so we don't really care what the %busy shows.
> 
> The more important item is that the number of read ops is fairly low for all 
> but 4 disks.
> Since you didn't post the pool configuration, we can only guess that they 
> might be a
> souce of the bottleneck. 

the above paragraph missed the editor's cut. You did post the pool config, 
thanks!
 -- richard

> 
> You're seeing a lot of reads from the cache devices. How much RAM does this 
> system
> have?
> 
>> 
>> Anything I can tune to help speed this up?
> 
> methinks the scrub I/Os are getting starved and since they are low priority, 
> they 
> could get very starved. In general, I wouldn't worry about it, but I 
> understand 
> why you might be nervous. Keep in mind that in ZFS scrubs are intended to 
> find 
> errors on idle data, not frequently accessed data.
> 
> more far below...
> 
>> 
>> On Tue, Jul 29, 2014 at 3:29 PM, wuffers <[email protected]> wrote:
>> Going to try to answer both responses in one message..
>> 
>> Short answer, yes. … Keep in mind that
>> 
>> 1. a scrub runs in the background (so as not to impact production I/O, this 
>> was not always the case and caused serious issues in the past with a pool 
>> being unresponsive due to a scrub)
>> 
>> 2. a scrub essentially walks the zpool examining every transaction in order 
>> (as does a resilver)
>> 
>> So the time to complete a scrub depends on how many write transactions since 
>> the pool was created (which is generally related to the amount of data but 
>> not always). You are limited by the random I/O capability of the disks 
>> involved. With VMs I assume this is a file server, so the I/O size will also 
>> affect performance.
>> 
>> I haven't noticed any slowdowns in our virtual environments, so I guess 
>> that's a good thing it's so low priority that it doesn't impact workloads. 
>> 
>> Run the numbers… you are scanning 24.2TB at about 5.5MB/sec … 4,613,734 
>> seconds or 54 days. And that assumes the same rate for all of the scan. The 
>> rate will change as other I/O competes for resources.
>> 
>> The number was fluctuating when I started the scrub, and I had seen it go as 
>> high as 35MB/s at one point. I am certain that our Hyper-V workload has 
>> increased since the last scrub, so this does make sense.
>>  
>> Looks like you have a fair bit of activity going on (almost 1MB/sec of 
>> writes per spindle).
>> 
>> As Richard correctly states below, this is the aggregate since boot (uptime 
>> ~56 days). I have another output from iostat as per his instructions below. 
>>  
>> Since this is storage for VMs, I assume this is the storage server for 
>> separate compute servers? Have you tuned the block size for the file share 
>> you are using? That can make a huge difference in performance.
>> 
>> Both the Hyper-V and VMware LUNs are created with 64K block sizes. From what 
>> I've read of other performance and tuning articles, that is the optimal 
>> block size (I did some limited testing when first configuring the SAN, but 
>> results were somewhat inconclusive). Hyper-V hosts our testing environment 
>> (we integrate with TFS, a MS product, so we have no choice here) and 
>> probably make up the bulk of the workload (~300+ test VMs with various 
>> OSes). VMware hosts our production servers (Exchange, file servers, SQL, AD, 
>> etc - ~50+ VMs).
>> 
>> I also noted that you only have a single LOG device. Best Practice is to 
>> mirror log devices so you do not lose any data in flight if hit by a power 
>> outage (of course, if this server has more UPS runtime that all the clients 
>> that may not matter).
>> 
>> Actually, I do have a mirror ZIL device, it's just disabled at this time (my 
>> ZIL devices are ZeusRAMs). At some point, I was troubleshooting some kernel 
>> panics (turned out to be a faulty SSD on the rpool), and hadn't re-enabled 
>> it yet. Thanks for the reminder (and yes, we do have a UPS as well). 
>> 
>> And oops.. re-attaching the ZIL as a mirror triggered a resilver now, 
>> suspending or canceling the scrub? Will monitor this and restart the scrub 
>> if it doesn't by itself.
>> 
>>   pool: tank
>>  state: ONLINE
>> status: One or more devices is currently being resilvered.  The pool will
>>         continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>   scan: resilver in progress since Tue Jul 29 14:48:48 2014
>>     3.89T scanned out of 24.5T at 3.06G/s, 1h55m to go
>>     0 resilvered, 15.84% done
>> 
>> At least it's going very fast. EDIT: Now about 67% done as I finish writing 
>> this, speed dropping to ~1.3G/s. 
>> 
>> maybe, maybe not
>> 
>> this is slower than most, surely slower than desired
>> 
>> Unfortunately reattaching the mirror to my log device triggered a resilver. 
>> Not sure if this is desired behavior, but yes, 5.5MB/s seems quite slow. 
>> Hopefully after the resilver the scrub will progress where it left off. 
>>  
>> The estimate is often very wrong, especially for busy systems.
>> If this is an older ZFS implementation, this pool is likely getting pounded 
>> by the
>> ZFS write throttle. There are some tunings that can be applied, but the old 
>> write
>> throttle is not a stable control system, so it will always be a little bit 
>> unpredictable.
>> 
>> The system is on r151008 (my BE states that I upgraded back in February, 
>> putting me in r151008j or so), with all the pools upgraded for the new 
>> enhancements as well as activating the new L2ARC compression feature. 
>> Reading the release notes, the ZFS write throttle enhancements were in since 
>> r151008e so I should be good there.
>>  
>>> # iostat -xnze
>> 
>> Unfortunately, this is the performance since boot and is not suitable for 
>> performance
>> analysis unless the system has been rebooted in the past 10 minutes or so. 
>> You'll need
>> to post the second batch from "iostat -zxCn 60 2"
>> 
>> Ah yes, that was my mistake. Output from second count (before re-attaching 
>> log mirror):
>> 
>> # iostat -zxCn 60 2
>> 
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>   255.7 1077.7 6294.0 41335.1  0.0  1.9    0.0    1.4   0 153 c1
>>     5.3   23.9  118.5  811.9  0.0  0.0    0.0    1.1   0   3 
>> c1t5000C50055F8723Bd0
>>     5.9   14.5  110.0  834.3  0.0  0.0    0.0    1.3   0   2 
>> c1t5000C50055E66B63d0
>>     5.6   16.6  123.8  822.7  0.0  0.0    0.0    1.3   0   2 
>> c1t5000C50055F87E73d0
>>     4.7   27.8  118.6  796.6  0.0  0.0    0.0    1.3   0   3 
>> c1t5000C50055F8BFA3d0
>>     5.6   14.5  139.7  833.8  0.0  0.0    0.0    1.6   0   3 
>> c1t5000C50055F9E123d0
>>     4.4   27.1  112.3  825.2  0.0  0.0    0.0    0.8   0   2 
>> c1t5000C50055F9F0B3d0
>>     5.0   20.2  121.7  803.4  0.0  0.0    0.0    1.2   0   3 
>> c1t5000C50055F9D3B3d0
>>     5.4   26.4  137.0  857.3  0.0  0.0    0.0    1.4   0   4 
>> c1t5000C50055E4FDE7d0
>>     4.7   12.3  123.7  832.7  0.0  0.0    0.0    2.0   0   3 
>> c1t5000C50055F9A607d0
>>     5.0   23.9  125.9  830.9  0.0  0.0    0.0    1.3   0   3 
>> c1t5000C50055F8CDA7d0
>>     4.5   31.4  112.2  814.6  0.0  0.0    0.0    1.1   0   3 
>> c1t5000C50055E65877d0
>>     5.2   24.4  130.6  872.5  0.0  0.0    0.0    1.2   0   3 
>> c1t5000C50055F9E7D7d0
>>     4.1   21.8  103.7  797.2  0.0  0.0    0.0    1.1   0   3 
>> c1t5000C50055FA0AF7d0
>>     5.5   24.8  129.8  802.8  0.0  0.0    0.0    1.5   0   4 
>> c1t5000C50055F9FE87d0
>>     5.7   17.7  137.2  797.6  0.0  0.0    0.0    1.4   0   3 
>> c1t5000C50055F9F91Bd0
>>     6.0   30.6  139.1  852.0  0.0  0.1    0.0    1.5   0   4 
>> c1t5000C50055F9FEABd0
>>     6.1   34.1  137.8  929.2  0.0  0.1    0.0    1.9   0   6 
>> c1t5000C50055F9F63Bd0
>>     4.1   15.9  101.8  791.4  0.0  0.0    0.0    1.6   0   3 
>> c1t5000C50055F9F3EBd0
>>     6.4   23.2  155.2  878.6  0.0  0.0    0.0    1.1   0   3 
>> c1t5000C50055F9F80Bd0
>>     4.5   23.5  106.2  825.4  0.0  0.0    0.0    1.1   0   3 
>> c1t5000C50055F9FB8Bd0
>>     4.0   23.2  101.1  788.9  0.0  0.0    0.0    1.3   0   3 
>> c1t5000C50055F9F92Bd0
>>     4.4   11.3  125.7  782.3  0.0  0.0    0.0    1.9   0   3 
>> c1t5000C50055F8905Fd0
>>     4.6   20.4  129.2  823.0  0.0  0.0    0.0    1.5   0   3 
>> c1t5000C50055F8D48Fd0
>>     5.1   19.7  142.9  887.2  0.0  0.0    0.0    1.7   0   3 
>> c1t5000C50055F9F89Fd0
>>     5.6   11.4  129.1  776.0  0.0  0.0    0.0    1.9   0   3 
>> c1t5000C50055F9EF2Fd0
>>     5.6   23.7  137.4  811.9  0.0  0.0    0.0    1.2   0   3 
>> c1t5000C50055F8C3ABd0
>>     6.8   13.9  132.4  834.3  0.0  0.0    0.0    1.8   0   3 
>> c1t5000C50055E66053d0
>>     5.2   26.7  126.9  857.3  0.0  0.0    0.0    1.2   0   3 
>> c1t5000C50055E66503d0
>>     4.2   27.1  104.6  825.2  0.0  0.0    0.0    1.0   0   3 
>> c1t5000C50055F9D3E3d0
>>     5.2   30.7  140.9  852.0  0.0  0.1    0.0    1.5   0   4 
>> c1t5000C50055F84FB7d0
>>     5.4   16.1  124.3  791.4  0.0  0.0    0.0    1.7   0   3 
>> c1t5000C50055F8E017d0
>>     3.8   31.4   89.7  814.6  0.0  0.0    0.0    1.1   0   4 
>> c1t5000C50055E579F7d0
>>     4.6   27.5  116.0  796.6  0.0  0.1    0.0    1.6   0   4 
>> c1t5000C50055E65807d0
>>     4.0   21.5   99.7  797.2  0.0  0.0    0.0    1.1   0   3 
>> c1t5000C50055F84A97d0
>>     4.7   20.2  116.3  803.4  0.0  0.0    0.0    1.4   0   3 
>> c1t5000C50055F87D97d0
>>     5.0   11.5  121.5  776.0  0.0  0.0    0.0    2.0   0   3 
>> c1t5000C50055F9F637d0
>>     4.9   11.3  112.4  782.3  0.0  0.0    0.0    2.3   0   3 
>> c1t5000C50055E65ABBd0
>>     5.3   11.8  142.5  832.7  0.0  0.0    0.0    2.4   0   3 
>> c1t5000C50055F8BF9Bd0
>>     5.0   20.3  121.4  823.0  0.0  0.0    0.0    1.7   0   3 
>> c1t5000C50055F8A22Bd0
>>     6.6   24.3  170.3  872.5  0.0  0.0    0.0    1.3   0   3 
>> c1t5000C50055F9379Bd0
>>     5.8   16.3  121.7  822.7  0.0  0.0    0.0    1.3   0   2 
>> c1t5000C50055E57A5Fd0
>>     5.3   17.7  146.5  797.6  0.0  0.0    0.0    1.4   0   3 
>> c1t5000C50055F8CCAFd0
>>     5.7   34.1  141.5  929.2  0.0  0.1    0.0    1.7   0   5 
>> c1t5000C50055F8B80Fd0
>>     5.5   23.8  125.7  830.9  0.0  0.0    0.0    1.2   0   3 
>> c1t5000C50055F9FA1Fd0
>>     5.0   23.2  127.9  878.6  0.0  0.0    0.0    1.1   0   3 
>> c1t5000C50055E65F0Fd0
>>     5.2   14.0  163.7  833.8  0.0  0.0    0.0    2.0   0   3 
>> c1t5000C50055F8BE3Fd0
>>     4.6   18.9  122.8  887.2  0.0  0.0    0.0    1.6   0   3 
>> c1t5000C50055F8B21Fd0
>>     5.5   23.6  137.4  825.4  0.0  0.0    0.0    1.5   0   3 
>> c1t5000C50055F8A46Fd0
>>     4.9   24.6  116.7  802.8  0.0  0.0    0.0    1.4   0   4 
>> c1t5000C50055F856CFd0
>>     4.9   23.4  120.8  788.9  0.0  0.0    0.0    1.4   0   3 
>> c1t5000C50055E6606Fd0
>>   234.9  170.1 4079.9 11127.8  0.0  0.2    0.0    0.5   0   9 c2
>>   119.0   28.9 2083.8  670.8  0.0  0.0    0.0    0.3   0   3 
>> c2t500117310015D579d0
>>   115.9   27.4 1996.1  634.2  0.0  0.0    0.0    0.3   0   3 
>> c2t50011731001631FDd0
>>     0.0  113.8    0.0 9822.8  0.0  0.1    0.0    1.0   0   2 
>> c2t5000A72A3007811Dd0
>>     0.1   18.5    0.0   64.8  0.0  0.0    0.0    0.0   0   0 c4
>>     0.1    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t0d0
>>     0.0    9.2    0.0   32.4  0.0  0.0    0.0    0.0   0   0 c4t1d0
>>   229.8   58.1 3987.4 1308.0  0.0  0.1    0.0    0.3   0   6 c12
>>   114.2   27.7 1994.8  626.0  0.0  0.0    0.0    0.3   0   3 
>> c12t500117310015D59Ed0
>>   115.5   30.4 1992.6  682.0  0.0  0.0    0.0    0.3   0   3 
>> c12t500117310015D54Ed0
>>     0.1   17.1    0.0   64.8  0.0  0.0    0.6    0.1   0   0 rpool
>>   720.3 1298.4 14361.2 53770.8 18.7  2.3    9.3    1.1   6  68 tank
> 
> ok, so the pool is issuing 720 read iops, including resilver workload, vs 
> 1298 write iops.
> There is plenty of I/O capacity left on the table here, as you can see by the 
> %busy being
> so low.
> 
> So I think the pool is not scheduling scrub I/Os very well. You can increase 
> the number of
> scrub I/Os in the scheduler by adjusting the zfs_vdev_scrub_max_active 
> tunable. The
> default is 2, but you'll have to consider that a share (in the stock market 
> sense) where
> the active sync reads and writes are getting 10 each. You can try bumping up 
> the value
> and see what happens over some time, perhaps 10 minutes or so -- too short of 
> a time
> and you won't get a good feeling for the impact (try this in off-peak time).
>       echo zfs_vdev_scrub_max_active/W0t5 | mdb -kw
> will change the value from 2 to 5, increasing its share of the total I/O 
> workload.
> 
> You can see the progress of scan (scrubs do scan) workload by looking at the 
> ZFS
> debug messages.
>       echo ::zfs_dbgmsg | mdb -k
> These will look mysterious... they are. But the interesting bits are about 
> how many blocks
> are visited in some amount of time (txg sync interval). Ideally, this will 
> change as you 
> adjust zfs_vdev_scrub_max_active.
>  -- richard
> 
>>  
>> Is 153% busy correct on c1? Seems to me that disks are quite "busy", but are 
>> handling the workload just fine (wait at 6% and asvc_t at 1.1ms)
>> 
>> Interestingly, this is the same output now that the resilver is running:
>> 
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>  2876.9 1041.1 25400.7 38189.1  0.0 37.9    0.0    9.7   0 2011 c1
>>    60.8   26.1  540.1  845.2  0.0  0.7    0.0    8.3   0  39 
>> c1t5000C50055F8723Bd0
>>    58.4   14.2  511.6  740.7  0.0  0.7    0.0   10.1   0  39 
>> c1t5000C50055E66B63d0
>>    60.2   16.3  529.3  756.1  0.0  0.8    0.0   10.1   0  41 
>> c1t5000C50055F87E73d0
>>    57.5   24.9  527.6  841.7  0.0  0.7    0.0    9.0   0  40 
>> c1t5000C50055F8BFA3d0
>>    57.9   14.5  543.5  765.1  0.0  0.7    0.0    9.8   0  38 
>> c1t5000C50055F9E123d0
>>    57.9   23.9  516.6  806.9  0.0  0.8    0.0    9.3   0  40 
>> c1t5000C50055F9F0B3d0
>>    59.8   24.6  554.1  857.5  0.0  0.8    0.0    9.6   0  42 
>> c1t5000C50055F9D3B3d0
>>    56.5   21.0  480.4  715.7  0.0  0.7    0.0    8.9   0  37 
>> c1t5000C50055E4FDE7d0
>>    54.8    9.7  473.5  737.9  0.0  0.7    0.0   11.2   0  39 
>> c1t5000C50055F9A607d0
>>    55.8   20.2  457.3  708.7  0.0  0.7    0.0    9.9   0  40 
>> c1t5000C50055F8CDA7d0
>>    57.8   28.6  487.0  796.1  0.0  0.9    0.0    9.9   0  45 
>> c1t5000C50055E65877d0
>>    60.8   27.1  572.6  823.7  0.0  0.8    0.0    8.8   0  41 
>> c1t5000C50055F9E7D7d0
>>    55.8   21.1  478.2  766.6  0.0  0.7    0.0    9.7   0  40 
>> c1t5000C50055FA0AF7d0
>>    57.0   22.8  528.3  724.5  0.0  0.8    0.0    9.6   0  41 
>> c1t5000C50055F9FE87d0
>>    56.2   10.8  465.2  715.6  0.0  0.7    0.0   10.4   0  38 
>> c1t5000C50055F9F91Bd0
>>    59.2   29.4  524.6  740.9  0.0  0.8    0.0    8.9   0  41 
>> c1t5000C50055F9FEABd0
>>    57.3   30.7  496.7  788.3  0.0  0.8    0.0    9.1   0  42 
>> c1t5000C50055F9F63Bd0
>>    55.5   16.3  461.9  652.9  0.0  0.7    0.0   10.1   0  39 
>> c1t5000C50055F9F3EBd0
>>    57.2   22.1  495.1  701.1  0.0  0.8    0.0    9.8   0  41 
>> c1t5000C50055F9F80Bd0
>>    59.5   30.2  543.1  741.8  0.0  0.9    0.0    9.6   0  45 
>> c1t5000C50055F9FB8Bd0
>>    56.5   25.1  515.4  786.9  0.0  0.7    0.0    8.6   0  38 
>> c1t5000C50055F9F92Bd0
>>    61.8   12.5  540.6  790.9  0.0  0.8    0.0   10.3   0  41 
>> c1t5000C50055F8905Fd0
>>    57.0   19.8  521.0  774.3  0.0  0.7    0.0    9.6   0  39 
>> c1t5000C50055F8D48Fd0
>>    56.3   16.3  517.7  724.7  0.0  0.7    0.0    9.9   0  38 
>> c1t5000C50055F9F89Fd0
>>    57.0   13.4  504.5  790.5  0.0  0.8    0.0   10.7   0  40 
>> c1t5000C50055F9EF2Fd0
>>    55.0   26.1  477.6  845.2  0.0  0.7    0.0    8.3   0  36 
>> c1t5000C50055F8C3ABd0
>>    57.8   14.1  518.7  740.7  0.0  0.8    0.0   10.8   0  41 
>> c1t5000C50055E66053d0
>>    55.9   20.8  490.2  715.7  0.0  0.7    0.0    9.0   0  37 
>> c1t5000C50055E66503d0
>>    57.0   24.1  509.7  806.9  0.0  0.8    0.0   10.0   0  41 
>> c1t5000C50055F9D3E3d0
>>    59.1   29.2  504.1  740.9  0.0  0.8    0.0    9.3   0  44 
>> c1t5000C50055F84FB7d0
>>    54.4   16.3  449.5  652.9  0.0  0.7    0.0   10.4   0  39 
>> c1t5000C50055F8E017d0
>>    57.8   28.4  503.3  796.1  0.0  0.9    0.0   10.1   0  45 
>> c1t5000C50055E579F7d0
>>    58.2   24.9  502.0  841.7  0.0  0.8    0.0    9.2   0  40 
>> c1t5000C50055E65807d0
>>    58.2   20.7  513.4  766.6  0.0  0.8    0.0    9.8   0  41 
>> c1t5000C50055F84A97d0
>>    56.5   24.9  508.0  857.5  0.0  0.8    0.0    9.2   0  40 
>> c1t5000C50055F87D97d0
>>    53.4   13.5  449.9  790.5  0.0  0.7    0.0   10.7   0  38 
>> c1t5000C50055F9F637d0
>>    57.0   11.8  503.0  790.9  0.0  0.7    0.0   10.6   0  39 
>> c1t5000C50055E65ABBd0
>>    55.4    9.6  461.1  737.9  0.0  0.8    0.0   11.6   0  40 
>> c1t5000C50055F8BF9Bd0
>>    55.7   19.7  484.6  774.3  0.0  0.7    0.0    9.9   0  40 
>> c1t5000C50055F8A22Bd0
>>    57.6   27.1  518.2  823.7  0.0  0.8    0.0    8.9   0  40 
>> c1t5000C50055F9379Bd0
>>    59.6   17.0  528.0  756.1  0.0  0.8    0.0   10.1   0  41 
>> c1t5000C50055E57A5Fd0
>>    61.2   10.8  530.0  715.6  0.0  0.8    0.0   10.7   0  40 
>> c1t5000C50055F8CCAFd0
>>    58.0   30.8  493.3  788.3  0.0  0.8    0.0    9.4   0  43 
>> c1t5000C50055F8B80Fd0
>>    56.5   19.9  490.7  708.7  0.0  0.8    0.0   10.0   0  40 
>> c1t5000C50055F9FA1Fd0
>>    56.1   22.4  484.2  701.1  0.0  0.7    0.0    9.5   0  39 
>> c1t5000C50055E65F0Fd0
>>    59.2   14.6  560.9  765.1  0.0  0.7    0.0    9.8   0  39 
>> c1t5000C50055F8BE3Fd0
>>    57.9   16.2  546.0  724.7  0.0  0.7    0.0   10.1   0  40 
>> c1t5000C50055F8B21Fd0
>>    59.5   30.0  553.2  741.8  0.0  0.9    0.0    9.8   0  45 
>> c1t5000C50055F8A46Fd0
>>    57.4   22.5  504.0  724.5  0.0  0.8    0.0    9.6   0  41 
>> c1t5000C50055F856CFd0
>>    58.4   24.6  531.4  786.9  0.0  0.7    0.0    8.4   0  38 
>> c1t5000C50055E6606Fd0
>>   511.0  161.4 7572.1 11260.1  0.0  0.3    0.0    0.4   0  14 c2
>>   252.3   20.1 3776.3  458.9  0.0  0.1    0.0    0.2   0   6 
>> c2t500117310015D579d0
>>   258.8   18.0 3795.7  350.0  0.0  0.1    0.0    0.2   0   6 
>> c2t50011731001631FDd0
>>     0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3 
>> c2t5000A72A3007811Dd0
>>     0.2   16.1    1.9   56.7  0.0  0.0    0.0    0.0   0   0 c4
>>     0.2    8.1    1.6   28.3  0.0  0.0    0.0    0.0   0   0 c4t0d0
>>     0.0    8.1    0.3   28.3  0.0  0.0    0.0    0.0   0   0 c4t1d0
>>   495.6  163.6 7168.9 11290.3  0.0  0.2    0.0    0.4   0  14 c12
>>     0.0  123.4    0.0 10451.1  0.0  0.1    0.0    1.0   0   3 
>> c12t5000A72B300780FFd0
>>   248.2   18.1 3645.8  323.0  0.0  0.1    0.0    0.2   0   5 
>> c12t500117310015D59Ed0
>>   247.4   22.1 3523.1  516.2  0.0  0.1    0.0    0.2   0   6 
>> c12t500117310015D54Ed0
>>     0.2   14.8    1.9   56.7  0.0  0.0    0.6    0.1   0   0 rpool
>>  3883.5 1357.7 40141.6 60739.5 22.8 38.6    4.4    7.4  54 100 tank
>> 
>> It is very busy with alot of wait % and higher asvc_t (2011% busy on c1?!). 
>> I'm assuming resilvers are alot more aggressive than scrubs.
>> 
>> There are many variables here, the biggest of which is the current non-scrub 
>> load.
>> 
>> I might have lost 2 weeks of scrub time, depending on whether the scrub will 
>> resume where it left off. I'll update when I can. 
>>  
>> 
>

_______________________________________________
OmniOS-discuss mailing list
[email protected]
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Slow scrub performance

Reply via email to