On Aug 3, 2010, at 8:55 PM, Eduardo Bragatto wrote: > On Aug 3, 2010, at 10:57 PM, Richard Elling wrote: > >> Unfortunately, zpool iostat is completely useless at describing performance. >> The only thing it can do is show device bandwidth, and everyone here knows >> that bandwidth is not performance, right? Nod along, thank you. > > I totally understand that, I only used the output to show the space > utilization per raidz1 volume. > >> Yes, and you also notice that the writes are biased towards the raidz1 sets >> that are less full. This is exactly what you want :-) Eventually, when the >> less >> empty sets become more empty, the writes will rebalance. > > Actually, if we are going to consider the values from zpool iostats, they are > just slightly biased towards the volumes I would want -- for example, on the > first post I've made, the volume with less free space had 845GB free.. that > same volume now has 833GB -- I really would like to just stop writing to that > volume at this point as I've experience very bad performance in the past when > a volume gets nearly full.
The tipping point for the change in the first fit/best fit allocation algorithm is now 96%. Previously, it was 70%. Since you don't specify which OS, build, or zpool version, I'll assume you are on something modern. NB, "zdb -m" will show the pool's metaslab allocations. If there are no 100% free metaslabs, then it is a clue that the allocator might be working extra hard. > As a reference, here's the information I posted less than 12 hours ago: > > # zpool iostat -v | grep -v c4 > capacity operations bandwidth > pool used avail read write read write > ------------ ----- ----- ----- ----- ----- ----- > backup 35.2T 15.3T 602 272 15.3M 11.1M > raidz1 11.6T 1.06T 138 49 2.99M 2.33M > raidz1 11.8T 845G 163 54 3.82M 2.57M > raidz1 6.00T 6.62T 161 84 4.50M 3.16M > raidz1 5.88T 6.75T 139 83 4.01M 3.09M > ------------ ----- ----- ----- ----- ----- ----- > > And here's the info from the same system, as I write now: > > # zpool iostat -v | grep -v c4 > capacity operations bandwidth > pool used avail read write read write > ------------ ----- ----- ----- ----- ----- ----- > backup 35.3T 15.2T 541 208 9.90M 6.45M > raidz1 11.6T 1.06T 116 38 2.16M 1.41M > raidz1 11.8T 833G 122 39 2.28M 1.49M > raidz1 6.02T 6.61T 152 64 2.72M 1.78M > raidz1 5.89T 6.73T 149 66 2.73M 1.77M > ------------ ----- ----- ----- ----- ----- ----- > > As you can see, the second raidz1 volume is not being spared and has been > providing with almost as much space as the others (and even more compared to > the first volume). Yes, perhaps 1.5-2x data written to the less full raidz1 sets. The exact amount of data is not shown, because zpool iostat doesn't show how much data is written, it shows the bandwidth. >>> I have the impression I'm getting degradation in performance due to the >>> limited space in the first two volumes, specially the second, which has >>> only 845GB free. >> >> Impressions work well for dating, but not so well for performance. >> Does your application run faster or slower? > > You're a funny guy. :) > > Let me re-phrase it: I'm sure I'm getting degradation in performance as my > applications are waiting more on I/O now than they used to do (based on CPU > utilization graphs I have). The impression part, is that the reason is the > limited space in those two volumes -- as I said, I already experienced bad > performance on zfs systems running nearly out of space before. OK, so how long are they waiting? Try "iostat -zxCn" and look at the asvc_t column. This will show how the disk is performing, though it won't show the performance delivered by the file system to the application. To measure the latter, try "fsstat zfs" (assuming you are on a Solaris distro) Also, if these are HDDs, the media bandwidth decreases and seeks increase as they fill. ZFS tries to favor the outer cylinders (lower numbered metaslabs) to take this into account. >>> Is there any way to re-stripe the pool, so I can take advantage of all >>> spindles across the raidz1 volumes? Right now it looks like the newer >>> volumes are doing the heavy while the other two just hold old data. >> >> Yes, of course. But it requires copying the data, which probably isn't >> feasible. > > I'm willing to copy data around to get this accomplish, I'm really just > looking for the best method -- I have more than 10TB free, so I have some > space to play with if I have to duplicate some data and erase the old copy, > for example. zfs send/receive is usually the best method. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss