Am 23.04.2015 um 11:47 hat Denis V. Lunev geschrieben: > On 23/04/15 12:32, Kevin Wolf wrote: > >Am 23.04.2015 um 11:20 hat Stefan Hajnoczi geschrieben: > >>On Wed, Apr 22, 2015 at 04:16:38PM +0300, Denis V. Lunev wrote: > >>>On 22/04/15 16:08, Stefan Hajnoczi wrote: > >>>>On Wed, Mar 11, 2015 at 01:28:02PM +0300, Denis V. Lunev wrote: > >>>>>+static int64_t allocate_cluster(BlockDriverState *bs, int64_t > >>>>>sector_num) > >>>>>+{ > >>>>>+ BDRVParallelsState *s = bs->opaque; > >>>>>+ uint32_t idx, offset, tmp; > >>>>>+ int64_t pos; > >>>>>+ int ret; > >>>>>+ > >>>>>+ idx = sector_num / s->tracks; > >>>>>+ offset = sector_num % s->tracks; > >>>>>+ > >>>>>+ if (idx >= s->catalog_size) { > >>>>>+ return -EINVAL; > >>>>>+ } > >>>>>+ if (s->catalog_bitmap[idx] != 0) { > >>>>>+ return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + > >>>>>offset; > >>>>>+ } > >>>>>+ > >>>>>+ pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS; > >>>>>+ bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS); > >>>>>+ s->catalog_bitmap[idx] = pos / s->off_multiplier; > >>>>>+ > >>>>>+ tmp = cpu_to_le32(s->catalog_bitmap[idx]); > >>>>>+ > >>>>>+ ret = bdrv_pwrite_sync(bs->file, > >>>>>+ sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, > >>>>>sizeof(tmp)); > >>>>What is the purpose of the sync? > >>>This is necessary to preserve image consistency on crash from > >>>my point of view. There is no check consistency at the moment. > >>>The sync will be removed later when proper crash detection > >>>code will be added (patches 19, 20, 21) > >>Let's look at possible orderings in case of failure: > >> > >>A. BAT update > >>B. Data write > >> > >>This sync enforces A, B ordering. If we can see B, then A must also > >>have happened thanks to the sync. > >> > >>But A, B ordering is too conservative. Imagine B, A ordering and the > >>failure where we crash before A. It means we wrote the data but never > >>linked it into the BAT. > >> > >>What happens in that case? We've leaked a cluster in the underlying > >>image file but it doesn't corrupt the visible disk from the guest > >>point-of-view. > >> > >>Because your implementation uses truncate to extend the file size before > >>A, even the A, B failure case results in a leaked cluster. So the B, A > >>case is not worse in any way. > >> > >>Why do other image formats sync cluster allocation updates? Because > >>they support backing files and in that case an A, B ordering results in > >>data corruption so they enforce B, A ordering (the opposite of what > >>you're trying to do!). > >> > >>The reason why A, B ordering results in data corruption when backing > >>files are in use is because the guest's write request might touch only a > >>subset of the cluster (a couple of sectors out of the whole cluster). > >>So the guest needs to copy the remaining sectors from the backing file. > >>If there is a dangling BAT entry like in the A, B failure case, then the > >>guest will see a zeroed cluster instead of the contents of the backing > >>file. This is a data corruption, but only if a backing file is being > >>used! > >> > >>So the sync is not necessary, both A, B and B, A ordering work for > >>block/parallels.c. > >Actually, I suspect this means that the parallels driver is restricted > >to protocols with bdrv_has_zero_init() == true, otherwise zeros can turn > >into random data (which means that it can't work e.g. directly on host > >block devices). > > > >Do we enforce this? > > > >Kevin > this is fixed in the patch 26 when the code is replaced with > > + if (s->data_end + s->tracks > pos) { > + int ret; > + if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) { > + ret = bdrv_write_zeroes(bs->file, s->data_end, > + s->prealloc_size, 0); > + } else { > + ret = bdrv_truncate(bs->file, > + (s->data_end + s->prealloc_size) << BDRV_SECTOR_BITS); > + } > + if (ret < 0) { > + return ret; > + } > + } > > on a default path, but you are correct. Some checking is > necessary to be on a safe side.
Ah, yes, you're right. That should fix it for the default settings. The check is only needed when using the bdrv_truncate() path. Kevin