Re: [Qemu-devel] [PATCH] migration: invalidate cache before source start

Vladimir Sementsov-Ogievskiy Mon, 08 Oct 2018 08:36:57 -0700

26.06.2018 11:44, Vladimir Sementsov-Ogievskiy wrote:
> 25.06.2018 20:50, Dr. David Alan Gilbert wrote:
>> * Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
>>> * Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:
>>>> 15.06.2018 15:06, Dr. David Alan Gilbert wrote:
>>>>> * Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:
>>>>>> Invalidate cache before source start in case of failed migration.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>>>>> <vsement...@virtuozzo.com>
>>>>> Why doesn't the code at the bottom of migration_completion,
>>>>> fail_invalidate:   and the code in migrate_fd_cancel handle this?
>>>>>
>>>>> What case did you see it in that those didn't handle?
>>>>> (Also I guess it probably should set s->block_inactive = false)
>>>> on source I see:
>>>>
>>>> 81392@1529065750.766289:migrate_set_state new state 7
>>>> 81392@1529065750.766330:migration_thread_file_err
>>>> 81392@1529065750.766332:migration_thread_after_loop
>>>>
>>>> so, we are leaving loop on
>>>>          if (qemu_file_get_error(s->to_dst_file)) {
>>>>              migrate_set_state(&s->state, current_active_state,
>>>> MIGRATION_STATUS_FAILED);
>>>> trace_migration_thread_file_err();
>>>> break;
>>>>          }
>>>>
>>>> and skip migration_completion()
>
>
> John is right, this ls an unrelated log, here we fail before 
> inactivation and there are no problems.
>
> Actual problem is when we fail in postcopy_start, at the end. And 
> source log looks like:
>
> 84297@1530001796.287344:migrate_set_state new state 1
> 84297@1530001796.287374:migration_fd_outgoing fd=101
> 84297@1530001796.287383:migration_set_outgoing_channel 
> ioc=0x56363454d630 ioctype=qio-channel-socket hostname=(null)
> 84297@1530001796.294032:migration_bitmap_sync_start
> 84297@1530001796.300483:migration_bitmap_sync_end dirty_pages 932
> 84297@1530001796.300561:migrate_set_state new state 4
> 84297@1530001796.300588:migration_thread_setup_complete
> 84297@1530001796.300593:migrate_pending pending size 1107976192 max 0 
> (pre = 0 compat=1107976192 post=0)
> 84297@1530001796.300595:migrate_set_state new state 5
> Tap fd 33 disable, ret 0
> 84297@1530001796.426477:migration_bitmap_sync_start
> 84297@1530001796.433227:migration_bitmap_sync_end dirty_pages 1091
> 84297@1530001796.439077:migrate_global_state_pre_save saved state: 
> running
> 2018-06-26T08:29:56.439134Z qemu-kvm: postcopy_start: Migration stream 
> errored -5
> 84297@1530001796.439141:migrate_set_state new state 7
> 84297@1530001796.439181:migration_thread_after_loop
> Tap fd 33 enable, ret 0
> 84297@1530001796.453639:migrate_fd_cleanup
> qemu-kvm: block/io.c:1655: bdrv_co_pwritev: Assertion 
> `!(bs->open_flags & 0x0800)' failed.
> 2018-06-26 08:29:56.605+0000: shutting down, reason=crashed
>
>
>>> Yeh, OK; I'd seen soemthing else a few days ago, where a cancellation
>>> test that had previously ended with a 'cancelled' state has now 
>>> ended up
>>> in 'failed' (which is the state 7 you have above).
>>> I suspect there's something else going on as well; I think what is
>>> supposed to happen in the case of 'cancel' is that it spins in 
>>> 'cancelling' for
>>> a while in migrate_fd_cancel and then at the bottom of 
>>> migrate_fd_cancel
>>> it does the recovery, but because it's going to failed instead, then
>>> it's jumping over that recovery.
>> Going back and actually looking at the patch again;
>> can I ask for 1 small change;
>>     Can you set s->block_inactive = false   in the case where you
>> don't get the local_err (Like we do at the bottom of migrate_fd_cancel)
>>
>>
>> Does that make sense?
>
> Ok, I'll resend.
>
> Hm, looks like I'm fixing an outdated version (based on v2.9.0) And my 
> reproduce isn't appropriate for upstream.
> But looks like current code have a possibility of the same fail:
>
> postcopy_start()
>     ....
>     ret = qemu_file_get_error(ms->to_dst_file);
>     if (ret) {
>         error_report("postcopy_start: Migration stream errored");
>
> leads to "return MIG_ITERATE_SKIP;" in migration_iteration_run
>
> then the loop should finish, as state should be 
> MIGRATION_STATUS_FAILED, so we will not call migration_completion.
>
> Hm, I have questions now:
>
> 1. should we check s->block_inactive, and if it is false, don't 
> invalidate? it is done in migrate_fd_cancel(), but not don in 
> migration_completion().
> 2. should we call qemu_mutex_lock_iothread() like in 
> migration_completion()? Why is it needed in migration_completion(), 
> when vm is not running?



Hm, forgotten thread, I should resend, but what do you think about these 
questions?

>
>>
>> Thanks,
>>
>> Dave
>>
>>> Dave
>>>
>>>>> Dave
>>>>>
>>>>>> ---
>>>>>>
>>>>>>    migration/migration.c | 9 ++++++++-
>>>>>>    1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>> index 1e99ec9b7e..8f39e0dc02 100644
>>>>>> --- a/migration/migration.c
>>>>>> +++ b/migration/migration.c
>>>>>> @@ -2806,7 +2806,14 @@ static void 
>>>>>> migration_iteration_finish(MigrationState *s)
>>>>>>        case MIGRATION_STATUS_FAILED:
>>>>>>        case MIGRATION_STATUS_CANCELLED:
>>>>>>            if (s->vm_was_running) {
>>>>>> -            vm_start();
>>>>>> +            Error *local_err = NULL;
>>>>>> +            bdrv_invalidate_cache_all(&local_err);
>>>>>> +            if (local_err) {
>>>>>> +                error_reportf_err(local_err, "Can't invalidate 
>>>>>> disks before "
>>>>>> +                                  "source vm start");
>>>>>> +            } else {
>>>>>> +                vm_start();
>>>>>> +            }
>>>>>>            } else {
>>>>>>                if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
>>>>>>                    runstate_set(RUN_STATE_POSTMIGRATE);
>>>>>> -- 
>>>>>> 2.11.1
>>>>>>
>>>>> -- 
>>>>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>>>>
>>>> -- 
>>>> Best regards,
>>>> Vladimir
>>>>
>>> -- 
>>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>> -- 
>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>
>


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH] migration: invalidate cache before source start

Reply via email to