Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

Avi Kivity Mon, 11 Oct 2010 09:12:13 -0700

 On 10/11/2010 05:49 PM, Anthony Liguori wrote:

On 10/11/2010 09:58 AM, Avi Kivity wrote:
A leak is unacceptable. It means an image can grow to an unboundedsize. If you are a server provider offering multitenancy, then amalicious guest can potentially grow the image beyond it's allottedsize causing a Denial of Service attack against another tenant.
This particular leak cannot grow, and is not controlled by the guest.
As the image gets moved from hypervisor to hypervisor, it can keepgrowing if given a chance to fill up the disk, then trim it all way.
In a mixed hypervisor environment, it just becomes a numbers game.

I don't see how it can grow. Both the freelist and the clusters itpoints to consume space, which becomes a leak once you move it to ahypervisor that doesn't understand the freelist. The older hypervisorthen allocates new blocks. As soon as it performs a metadata scan (ifever), the freelist is reclaimed.

You could only get a growing leak if you moved it to a hypervisor thatdoesn't perform metadata scans, but then that is independent of thefreelist.

A freelist has to be a non-optional feature. When the freelist bitis set, an older QEMU cannot read the image. If the freelist iscompleted used, the freelist bit can be cleared and the image isthen usable by older QEMUs.
Once we support TRIM (or detect zeros) we'll never have a cleanfreelist.
Zero detection doesn't add to the free list.

Why not? If a cluster is zero filled, you may drop it (assuming nobacking image).

A potential solution here is to treat TRIM a little differently thanwe've been discussing.
When TRIM happens, don't immediately write an unallocated clusterentry for the L2. Leave the L2 entry in-tact. Don't actually write aUCE to the L2 until you actually allocate the block.
This implies a cost because you'll need to do metadata syncs to makethis work. However, that eliminates leakage.

The information is lost on shutdown; and you can have a large number ofunallocated-in-waiting clusters (like a TRIM issued by mkfs, or a userexpecting a visit from RIAA).

A slight twist on your proposal is to have an allocated-but-may-drop bitin a L2 entry. TRIM or zero detection sets the bit (leaving the clusternumber intact). A following write to the cluster needs to clear thebit; if we reallocate the cluster we need to replace it with a ZCE.

This makes the freelist all L2 entries with the bit set; it may be lessefficient than a custom data structure though.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

Reply via email to