> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> I understand that relatively high fragmentation is inherent
> to ZFS due to its COW and possible intermixing of metadata
> and data blocks (of which metadata path blocks are likely
> to expire and get freed relatively quickly).
> I believe it was sometimes implied on this list that such
> fragmentation for "static" data can be currently combatted
> only by zfs send-ing existing pools data to other pools at
> some reserved hardware, and then clearing the original pools
> and sending the data back. This is time-consuming, disruptive
> and requires lots of extra storage idling for this task (or
> at best - for backup purposes).
Can be combated by sending & receiving. But that's not the only way. You
can defrag, (or apply/remove dedup and/or compression, or any of the other
stuff that's dependent on BP rewrite) by doing any technique which
sequentially reads the existing data, and writes it back to disk again. For
example, if you "cp -p file1 file2 && mv file2 file1" then you have
effectively defragged file1 (or added/removed dedup or compression). But of
course it's requisite that file1 is sufficiently "not being used" right now.
> I wonder how resilvering works, namely - does it write
> blocks "as they were" or in an optimized (defragmented)
> fashion, in two usecases:
resilver goes according to temporal order. While this might sometimes yield
a slightly better organization (If a whole bunch of small writes were
previously spread out over a large period of time on a largely idle system,
they will now be write-aggregated to sequential blocks) usually resilvering
recreates fragmentation similar to the pre-existing fragmentation.
In fact, even if you zfs send | zfs receive while preserving snapshots,
you're still recreating the data in something loosely temporal order.
Because it will do all the blocks of the oldest snapshot, and then all the
blocks of the second oldest snapshot, etc. So by preserving the old
snapshots, you might sometimes be recreating significant amount of
> 1) Resilvering from a healthy array (vdev) onto a spare drive
> in order to replace one of the healthy drives in the vdev;
> 2) Resilvering a degraded array from existing drives onto a
> new drive in order to repair the array and make it redundant
Same behavior either way. Unless... If your old disks are small and very
full, and your new disks are bigger, then sometimes in the past you may have
suffered fragmentation due to lack of available sequential unused blocks.
So resilvering onto new *larger* disks might make a difference.
> Finally, what would the gurus say - does fragmentation
> pose a heavy problem on nearly-filled-up pools made of
> spinning HDDs
Yes. But that's not unique to ZFS or COW. No matter what your system, if
your disk is nearly full, you will suffer from fragmentation.
> and can fragmentation be effectively combatted
> on ZFS at all (with or without BP rewrite)?
With BP rewrite, yes you can effectively combat fragmentation.
Unfortunately it doesn't exist. :-/
Without BP rewrite... Define "effectively." ;-) I have successfully
defragged, compressed, enabled/disabled dedup on pools before, by using zfs
send | zfs receive... Or by asking users, "Ok, we're all in agreement, this
weekend, nobody will be using the "a" directory. Right?" So then I sudo rm
-rf a, and restore from the latest snapshot. Or something along those
lines. Next weekend, we'll do the "b" directory...
zfs-discuss mailing list