Am 17.02.2017 um 14:22 hat Denis V. Lunev geschrieben: > On 02/17/2017 03:48 PM, Kevin Wolf wrote: > > Am 17.02.2017 um 13:40 hat Vladimir Sementsov-Ogievskiy geschrieben: > >> 17.02.2017 15:09, Kevin Wolf wrote: > >>> Am 17.02.2017 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>>> 16.02.2017 14:49, Kevin Wolf wrote: > >>>>> Am 16.02.2017 um 12:25 hat Kevin Wolf geschrieben: > >>>>>> Am 15.02.2017 um 11:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>>>>>> Auto loading bitmaps are bitmaps stored in the disk image, which > >>>>>>> should > >>>>>>> be loaded when the image is opened and become BdrvDirtyBitmaps for the > >>>>>>> corresponding drive. > >>>>>>> > >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> > >>>>>>> Reviewed-by: John Snow <js...@redhat.com> > >>>>>>> Reviewed-by: Max Reitz <mre...@redhat.com> > >>>>>> Why do we need a new BlockDriver callback and special code for it in > >>>>>> bdrv_open_common()? The callback is only ever called immediately after > >>>>>> .bdrv_open/.bdrv_file_open, so can't the drivers just do this > >>>>>> internally > >>>>>> in their .bdrv_open implementation? Even more so because qcow2 is the > >>>>>> only driver that supports this callback. > >>>>> Actually, don't we have to call this in qcow2_invalidate_cache()? > >>>>> Currently, I think, after a migration, the autoload bitmaps aren't > >>>>> loaded. > >>>>> > >>>>> By moving the qcow2_load_autoloading_dirty_bitmaps() call to > >>>>> qcow2_open(), this would be fixed. > >>>>> > >>>>> Kevin > >>>> Bitmap should not be reloaded on any intermediate qcow2-open's, > >>>> reopens, etc. It should be loaded once, on bdrv_open, to not create > >>>> extra collisions (between in-memory bitmap and it's stored version). > >>>> That was the idea. > >>>> > >>>> For bitmaps migration there are separate series, we shouldn't load > >>>> bitmap from file on migration, as it's version in the file is > >>>> outdated. > >>> That's not what your series is doing, though. It loads the bitmaps when > >> Actually, they will not be loaded as they will have IN_USE flag. > >> > >>> migration starts and doesn't reload then when migration completes, even > >>> though they are stale. Migration with shared storage would just work > >>> without an extra series if you did these things in the correct places. > >>> > >>> As a reminder, this is how migration with shared storage works (or > >>> should work with your series): > >>> > >>> 1. Start destination qemu instance. This calls bdrv_open() with > >>> BDRV_O_INACTIVE. We can read in some metadata, though we don't need > >>> much more than the image size at this point. Writing to the image is > >>> still impossible. > >>> > >>> 2. Start migration on the source, while the VM is still writing to the > >>> image, rendering the cached metadata from step 1 stale. > >>> > >>> 3. Migration completes: > >>> > >>> a. Stop the VM > >>> > >>> b. Inactivate all images in the source qemu. This is where all > >>> metadata needs to be written back to the image file, including > >>> bitmaps. No writes to the image are possible after this point > >>> because BDRV_O_INACTIVE is set. > >>> > >>> c. Invalidate the caches in the destination qemu, i.e. reload > >>> everything from the file that could have changed since step 1, > >>> including bitmaps. BDRV_O_INACTIVE is cleared, making the image > >>> ready for writes. > >>> > >>> d. Resume the VM on the destination > >>> > >>> 4. Exit the source qemu process, which involves bdrv_close(). Note that > >>> at this point, no writing to the image file is possible any more, > >>> it's the destination qemu process that own the image file now. > >>> > >>> Your series loads and stores bitmaps in steps 1 and 4. This means that > >> Actually - not. in 1 bitmaps are "in use", in 4 INACTIVE is set (and > >> it is checked), nothing is stored. > >> > >>> they are stale on the destination when migration completes, and that > >>> bdrv_close() wants to write to an image file that it doesn't own any > >>> more, which will cause an assertion failure. If you instead move things > >>> to steps 3b and 3c, it will just work. > >> Hmm, I understand the idea.. But this will interfere with postcopy > >> bitmap migration. So if we really need this, there should be some > >> additional control flags or capabilities.. The problem of your > >> approach is that bitmap actually migrated in the short state when > >> source and destination are stopped, it may take time, as bitmaps may > >> be large. > > You can always add optimisations, but this is the basic lifecycle > > process of block devices in qemu, so it would be good to adhere to it. > > So far there are no other pieces of information that are ignored in > > bdrv_invalidate()/bdrv_inactivate() and instead only handled in > > bdrv_open()/bdrv_close(). It's a matter of consistency, too. > > > > And not having to add special cases for specific features in the generic > > bdrv_open()/close() paths is a big plus for me anyway. > > > > Kevin > But for sure this is bad from the downtime point of view. > On migrate you will have to write to the image and re-read > it again on the target. This would be very slow. This will > not help for the migration with non-shared disk too. > > That is why we have specifically worked in a migration, > which for a good does not influence downtime at all now. > > With a write we are issuing several write requests + sync. > Our measurements shows that bdrv_drain could take around > a second on an averagely loaded conventional system, which > seems unacceptable addition to me.
I'm not arguing against optimising migration, I fully agree with you. I just think that we should start with a correct if slow base version and then add optimisation to that, instead of starting with a broken base version and adding to that. Look, whether you do the expensive I/O on open/close and make that a slow operation or whether you do it on invalidate_cache/inactivate doesn't really make a difference in term of slowness because in general both operations are called exactly once. But it does make a difference in terms of correctness. Once you do the optimisation, of course, you'll skip writing those bitmaps that you transfer using a different channel, no matter whether you skip it in bdrv_close() or in bdrv_inactivate(). Kevin