On 08/25/2010 05:14 PM, Anthony Liguori wrote:
At a high level, I don't think online compaction requires any
specific support from an image format.
You need to know that the block is free and can be reallocated.
Semantically, TRIM/DISCARD means that "I don't care about the contents
of the block anymore until I do another write." Behind the scenes, we
can keep track of which blocks have been discarded in an in-memory
list whereas the first write to the block causes it to be evicted from
the discarded list.
A background task would attempt to detect idle I/O and copy a block
from the end of the file to a location on the discarded list. When
the copy has completed, you can then remove the L2 entry for the
discarded block (effectively punching a hole in the image), sync, and
then update the l2 entry for the block at the end of file location to
point to the new block location. You can then ftruncate to reduce
overall file size.
That should work.
If you tried to maintain a free list, then you would need to sync on
TRIM/DISCARD which is potentially a fast path. While a background
task may be less efficient in the short term, it's just as efficient
in the long term and it has the advantage of keeping any fast path fast.
You only need to sync when the free list size grows beyond the amount of
space you're prepared to lose on power fail. And you may be able to
defer the background task indefinitely by satisfying new allocations
from the free list.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.