On 2013-02-12 10:32, Ian Collins wrote:
Ram Chander wrote:
You are right. So it looks like re-distribution issue. Initially there
were two Vdev with 24 disks ( disk 0-23 ) for close to year. After
which which we added 24 more disks and created additional vdevs. The
initial vdevs are filled up and so write speed declined. Now how to
find files that are present in a Vdev or a disk. That way I can remove
and re-copy back to distribute data. Any other way to solve this ?
The only way is to avoid the problem in the first place by not mixing
vdev sizes in a pool.
Well, that disbalance is there - in the zpool status printout we see
raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares
- which seems to sum up to 48 ;)
Depending on disk size, it might be possible that tlvdev sizes in
gigabytes were kept the same (i.e. a raidz set with twice as many
disks of half size), but we have no info on this detail and it is
unlikely. The disk sets being in one pool, this would still quite
disbalance the load among spindles and IO buses.
Beside all that - with the "older" tlvdev's being more full than
the "newer" ones, there is the disbalance which wouldn't be avoided
by not mixing vdev sizes - writes into newer ones are more likely
to quickly find available "holes", while writes into older ones are more
fragmented and longer data inspection is needed to find a hole -
if not even the gang-block fragmentation. These two are, I believe,
the basis for performance drop on "full" pools, with the measure
being rather the mix of IO patterns and fragmentation of data and
I think there were developments in illumos ZFS to address more
writes onto devices with more available space; I am not sure if
the average write latency to a tlvdev was monitored and taken
into account during write-targeting decisions (which would also
wrap the case of failing devices which take longer to respond).
I am not sure which portions nave been completed and integrated
into common illumos-gate.
As was suggested, you can use "zpool iostat -v 5" to monitor IOs
to the pool with a fanout per TLVDEV and per disk, and witness
possible patterns there. Do keep in mind, however, that for a
non-failed raidz set you should see reads from only the data
disks for a particular stripe, while parity disks are not used
unless a checksum mismatch occurs. On the average data should
be on all disks in such a manner that there is no "dedicated"
parity disk, but with small IOs you are likely to notice this.
If the budget permits, I'd suggest building (or leasing) another
system with balanced disk sets and replicating all data onto it,
then repurposing the older system - for example, to be a backup
of the newer box (also after remaking the disk layout).
As for the question of "which files are on the older disks" -
you can as a rule of thumb use the file creation/modification
time in comparison with the date when you expanded the pool ;)
Closer inspection could be done with a ZDB walk to print out
the DVA block addresses for blocks of a file (the DVA includes
the number of the top-level vdev), but that would take some
time - to determine which files you want to expect (likely
some band of sizes) and then to do these zdb walks.
zfs-discuss mailing list