On Mon, Jan 24, 2011 at 2:54 PM, Kevin Wolf <kw...@redhat.com> wrote: > [ Re-adding qemu-devel to CC ] > > Am 24.01.2011 15:34, schrieb Stefan Hajnoczi: >> On Thu, Jan 20, 2011 at 5:10 PM, Kevin Wolf <kw...@redhat.com> wrote: >>> @@ -702,17 +622,30 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, >>> QCowL2Meta *m) >>> >>> if (m->nb_available & (s->cluster_sectors - 1)) { >>> uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - >>> 1); >>> + cow = true; >>> ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << >>> 9), >>> m->nb_available - end, s->cluster_sectors); >>> if (ret < 0) >>> goto err; >>> } >>> >>> - /* update L2 table */ >>> + /* >>> + * Update L2 table. >>> + * >>> + * Before we update the L2 table to actually point to the new cluster, >>> we >>> + * need to be sure that the refcounts have been increased and COW was >>> + * handled. >>> + */ >>> + if (cow) { >>> + bdrv_flush(bs->file); >> >> Just bdrv_flush(bs->file) and not a refcounts cache flush? > > Refcounts and data need not to be ordered against each other. They only > must both be on disk when we write the L2 table.
Have I missed where refcounts actually get flushed from the cache out to the disk because bdrv_flush(bs->file) only syncs the file but doesn't write out dirty data held in cache. >>> + } >>> + >>> + qcow2_cache_set_dependency(bs, s->l2_table_cache, >>> s->refcount_block_cache); >>> ret = get_cluster_table(bs, m->offset, &l2_table, &l2_offset, >>> &l2_index); >>> if (ret < 0) { >>> goto err; >>> } >>> + qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table); >>> >>> for (i = 0; i < m->nb_clusters; i++) { >>> /* if two concurrent writes happen to the same unallocated cluster >>> @@ -728,16 +661,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, >>> QCowL2Meta *m) >>> (i << s->cluster_bits)) | QCOW_OFLAG_COPIED); >>> } >>> >>> - /* >>> - * Before we update the L2 table to actually point to the new cluster, >>> we >>> - * need to be sure that the refcounts have been increased and COW was >>> - * handled. >>> - */ >>> - bdrv_flush(bs->file); >>> >>> - ret = write_l2_entries(bs, l2_table, l2_offset, l2_index, >>> m->nb_clusters); >>> + ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); >>> if (ret < 0) { >>> - qcow2_l2_cache_reset(bs); >>> goto err; >>> } >>> >> >> The function continues like this: >> >> /* >> * If this was a COW, we need to decrease the refcount of the old cluster. >> * Also flush bs->file to get the right order for L2 and refcount update. >> */ >> if (j != 0) { >> bdrv_flush(bs->file); >> for (i = 0; i < j; i++) { >> qcow2_free_any_clusters(bs, >> be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED, 1); >> } >> } >> >> Does bdrv_flush(bs->file) "get the right order for L2 and refcount >> update"? We've just put an L2 table, should this be an L2 table >> flush? > > I agree, this looks wrong. What we need is a dependency to ensure that > first L2 is flushed and then the refcount block. > qcow2_free_any_clusters() calls update_refcount() indirectly, which > takes care of setting this dependency. > > So in the end I think it's just an unnecessary bdrv_flush. Makes sense? I don't understand this fully. I've noticed that the cache isn't the only mechanism for making changes to tables - there are functions like write_l2_entries() that directly write out parts of tables without honoring dependencies or using the dirty bit on the cache entry. I probably need to look at this more carefully to fully understand whether or not it is correct. Stefan