Re: [zfs-discuss] ZFS Restripe

Richard Elling Tue, 03 Aug 2010 21:28:09 -0700

On Aug 3, 2010, at 8:55 PM, Eduardo Bragatto wrote:

> On Aug 3, 2010, at 10:57 PM, Richard Elling wrote:
> 
>> Unfortunately, zpool iostat is completely useless at describing performance.
>> The only thing it can do is show device bandwidth, and everyone here knows
>> that bandwidth is not performance, right?  Nod along, thank you.
> 
> I totally understand that, I only used the output to show the space 
> utilization per raidz1 volume.
> 
>> Yes, and you also notice that the writes are biased towards the raidz1 sets
>> that are less full.  This is exactly what you want :-)  Eventually, when the 
>> less
>> empty sets become more empty, the writes will rebalance.
> 
> Actually, if we are going to consider the values from zpool iostats, they are 
> just slightly biased towards the volumes I would want -- for example, on the 
> first post I've made, the volume with less free space had 845GB free.. that 
> same volume now has 833GB -- I really would like to just stop writing to that 
> volume at this point as I've experience very bad performance in the past when 
> a volume gets nearly full.


The tipping point for the change in the first fit/best fit allocation algorithm 
is
now 96%. Previously, it was 70%. Since you don't specify which OS, build, 
or zpool version, I'll assume you are on something modern.

NB, "zdb -m" will show the pool's metaslab allocations. If there are no 100%
free metaslabs, then it is a clue that the allocator might be working extra 
hard.

> As a reference, here's the information I posted less than 12 hours ago:
> 
> # zpool iostat -v | grep -v c4
>                capacity     operations    bandwidth
> pool           used  avail   read  write   read  write
> ------------  -----  -----  -----  -----  -----  -----
> backup        35.2T  15.3T    602    272  15.3M  11.1M
> raidz1      11.6T  1.06T    138     49  2.99M  2.33M
> raidz1      11.8T   845G    163     54  3.82M  2.57M
> raidz1      6.00T  6.62T    161     84  4.50M  3.16M
> raidz1      5.88T  6.75T    139     83  4.01M  3.09M
> ------------  -----  -----  -----  -----  -----  -----
> 
> And here's the info from the same system, as I write now:
> 
> # zpool iostat -v | grep -v c4
>                 capacity     operations    bandwidth
> pool           used  avail   read  write   read  write
> ------------  -----  -----  -----  -----  -----  -----
> backup        35.3T  15.2T    541    208  9.90M  6.45M
>  raidz1      11.6T  1.06T    116     38  2.16M  1.41M
>  raidz1      11.8T   833G    122     39  2.28M  1.49M
>  raidz1      6.02T  6.61T    152     64  2.72M  1.78M
>  raidz1      5.89T  6.73T    149     66  2.73M  1.77M
> ------------  -----  -----  -----  -----  -----  -----
> 
> As you can see, the second raidz1 volume is not being spared and has been 
> providing with almost as much space as the others (and even more compared to 
> the first volume).

Yes, perhaps 1.5-2x data written to the less full raidz1 sets.  The exact 
amount of data is not shown, because zpool iostat doesn't show how 
much data is written, it shows the bandwidth.

>>> I have the impression I'm getting degradation in performance due to the 
>>> limited space in the first two volumes, specially the second, which has 
>>> only 845GB free.
>> 
>> Impressions work well for dating, but not so well for performance.
>> Does your application run faster or slower?
> 
> You're a funny guy. :)
> 
> Let me re-phrase it: I'm sure I'm getting degradation in performance as my 
> applications are waiting more on I/O now than they used to do (based on CPU 
> utilization graphs I have). The impression part, is that the reason is the 
> limited space in those two volumes -- as I said, I already experienced bad 
> performance on zfs systems running nearly out of space before.

OK, so how long are they waiting?  Try "iostat -zxCn" and look at the
asvc_t column.  This will show how the disk is performing, though it 
won't show the performance delivered by the file system to the 
application.  To measure the latter, try "fsstat zfs" (assuming you are
on a Solaris distro)

Also, if these are HDDs, the media bandwidth decreases and seeks 
increase as they fill. ZFS tries to favor the outer cylinders (lower numbered
metaslabs) to take this into account.

>>> Is there any way to re-stripe the pool, so I can take advantage of all 
>>> spindles across the raidz1 volumes? Right now it looks like the newer 
>>> volumes are doing the heavy while the other two just hold old data.
>> 
>> Yes, of course.  But it requires copying the data, which probably isn't 
>> feasible.
> 
> I'm willing to copy data around to get this accomplish, I'm really just 
> looking for the best method -- I have more than 10TB free, so I have some 
> space to play with if I have to duplicate some data and erase the old copy, 
> for example.

zfs send/receive is usually the best method.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Restripe

Reply via email to