On Tue, Oct 12, 2010 at 11:16:17AM -0500, Anthony Liguori wrote: > On 10/12/2010 10:59 AM, Stefan Hajnoczi wrote: > >On Tue, Oct 12, 2010 at 05:39:48PM +0200, Kevin Wolf wrote: > >>Am 12.10.2010 17:22, schrieb Anthony Liguori: > >>>On 10/12/2010 10:08 AM, Kevin Wolf wrote: > >>>> Otherwise we might destroy data that isn't > >>>>even touched by the guest request in case of a crash. > >>>> > >>>The failure scenarios are either that the cluster is leaked in which > >>>case, the old version of the data is still present or the cluster is > >>>orphaned because the L2 entry is written, in which case the old version > >>>of the data is present. > >>Hm, how does the latter case work? Or rather, what do mean by "orphaned"? > >> > >>>Are you referring to a scenario where the cluster is partially written > >>>because the data is present in the write cache and the write cache isn't > >>>flushed on power failure? > >>The case I'm referring to is a COW. So let's assume a partial write to > >>an unallocated cluster, we then need to do a COW in pre/postfill. Then > >>we do a normal write and link the new cluster in the L2 table. > >> > >>Assume that the write to the L2 table is already on the disk, but the > >>pre/postfill data isn't yet. At this point we have a bad state because > >>if we crash now we have lost the data that should have been copied from > >>the backing file. > >In this case QED_F_NEED_CHECK is set and the invalid cluster offset > >should be reset to zero on open. > > > >However, I think we can get into a state where the pre/postfill data > >isn't on the disk yet but another allocation has increased the file > >size, making the unwritten cluster "valid". This fools consistency > >check into thinking the data cluster (which was never written to on > >disk) is valid. > > > >Will think about this more tonight. > > It's fairly simple to add a sync to this path. It's probably worth > checking the prefill/postfill for zeros and avoiding the write/sync > if that's the case. That should optimize the common cases of > allocating new space within a file. > > My intuition is that we can avoid the sync entirely but we'll need > to think about it further.
We can avoid it when a backing image is not used. Your idea to check for zeroes in the backing image is neat too, it may well reduce the common case even for backing images. Stefan