On 08/25/2010 02:14 AM, Avi Kivity wrote:
If (c) happens before (b), then we've created an extent that's
attached to a table with a zero reference count. This is a corrupt
image.
If the only issue is new block allocation, it can be easily solved.
Technically, I believe there are similar issues around creating
snapshots but I don't think we care.
Instead of allocating exactly the needed amount of blocks, allocate
a large extent and hold them in memory.
So you're suggesting that we allocate a bunch of blocks, update the ref
count table so that they are seen as allocated even though they aren't
attached to an l1 table?
The next allocation can then be filled from memory, so the
allocation sync is amortized over many blocks. A power fail will leak
the preallocated blocks, losing some megabytes of address space, but
not real disk space.
It's a clever idea, but it would lose real disk space which is probably
not a huge issue.
Let's consider if we eliminate the reference count table which means
eliminating internal snapshots.
1) guest submits write request
2) allocate extent
3) write data to disk (a)
4) write (a) completes
5) write extent table (c)
6) write (c) completes
7) complete guest write request
If this all happens in order and we lose power, we just leak a
block. It means we need a periodic fsck.
If (c) completes before (a), then it means that the image is not
corrupted but data gets lost. This is okay based on the guest contract.
And that's it. There is no scenario where the disk is corrupted.
_if_ that's the only failure mode.
If we had another disk format that only supported growth and metadata
for a backing file, can you think of another failure scenario?
Regards,
Anthony Liguori