Re: [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race
On 2018-03-06 17:18, Stefan Hajnoczi wrote: > On Mon, Mar 05, 2018 at 05:04:52PM +0100, Max Reitz wrote: >> On 2018-03-05 16:59, Stefan Hajnoczi wrote: >>> There is a race between the test's 'query-migrate' QMP command after the >>> QMP 'STOP' event and completing the migration: >>> >>> The test case invokes 'query-migrate' upon receiving 'STOP'. At this >>> point the migration thread may still be in the process of completing. >>> Therefore 'query-migrate' can return 'status': 'active' for a brief >>> window of time instead of 'status': 'completed'. This results in >>> qemu-iotests 203 hanging. >>> >>> Solve the race by enabling the 'events' migration capability, which >>> causes QEMU to emit migration-specific QMP events that do not suffer >>> from this race condition. Wait for the QMP 'MIGRATION' event with >>> 'status': 'completed'. >>> >>> Reported-by: Max Reitz>>> Signed-off-by: Stefan Hajnoczi >>> --- >>> tests/qemu-iotests/203 | 15 +++ >>> tests/qemu-iotests/203.out | 5 + >>> 2 files changed, 16 insertions(+), 4 deletions(-) >> >> So much for "the ppoll() dungeon"... > > It was still a pain to debug :). > > I put a ring buffer into the QMP monitor input/output code. Oh, wow. > Then it was > possible to figure out the issue via GDB on a hung QEMU: > > (gdb) p current_run_state > RUN_STATE_POSTMIGRATE > (gdb) p current_migration->status > MIGRATION_STATUS_COMPLETED > (gdb) p monitor_out_ring > ...'STOP' event... > (gdb) p monitor_in_ring > ...query-migrate... <-- okay, the test checked if migration finished > > Then looking at the code: > > static void migration_completion(MigrationState *s) > { > ... > if (s->state == MIGRATION_STATUS_ACTIVE) { > qemu_mutex_lock_iothread(); > s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); > s->vm_was_running = runstate_is_running(); > ret = global_state_store(); > > if (!ret) { > bool inactivate = !migrate_colo_enabled(); > > v The stop event comes from here > ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > ... > } > qemu_mutex_unlock_iothread(); <--- oh, no! > ... > if (!migrate_colo_enabled()) { > migrate_set_state(>state, current_active_state, > MIGRATION_STATUS_COMPLETED); <-- too late! > } > > return; OK... I guess the answer to this just is "the stop event doesn't mean anything, use the migration events instead" (i.e. what your patch does). Thanks a lot, applied to my block branch: https://github.com/XanClic/qemu/commits/block Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race
On Mon, Mar 05, 2018 at 05:04:52PM +0100, Max Reitz wrote: > On 2018-03-05 16:59, Stefan Hajnoczi wrote: > > There is a race between the test's 'query-migrate' QMP command after the > > QMP 'STOP' event and completing the migration: > > > > The test case invokes 'query-migrate' upon receiving 'STOP'. At this > > point the migration thread may still be in the process of completing. > > Therefore 'query-migrate' can return 'status': 'active' for a brief > > window of time instead of 'status': 'completed'. This results in > > qemu-iotests 203 hanging. > > > > Solve the race by enabling the 'events' migration capability, which > > causes QEMU to emit migration-specific QMP events that do not suffer > > from this race condition. Wait for the QMP 'MIGRATION' event with > > 'status': 'completed'. > > > > Reported-by: Max Reitz> > Signed-off-by: Stefan Hajnoczi > > --- > > tests/qemu-iotests/203 | 15 +++ > > tests/qemu-iotests/203.out | 5 + > > 2 files changed, 16 insertions(+), 4 deletions(-) > > So much for "the ppoll() dungeon"... It was still a pain to debug :). I put a ring buffer into the QMP monitor input/output code. Then it was possible to figure out the issue via GDB on a hung QEMU: (gdb) p current_run_state RUN_STATE_POSTMIGRATE (gdb) p current_migration->status MIGRATION_STATUS_COMPLETED (gdb) p monitor_out_ring ...'STOP' event... (gdb) p monitor_in_ring ...query-migrate... <-- okay, the test checked if migration finished Then looking at the code: static void migration_completion(MigrationState *s) { ... if (s->state == MIGRATION_STATUS_ACTIVE) { qemu_mutex_lock_iothread(); s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); s->vm_was_running = runstate_is_running(); ret = global_state_store(); if (!ret) { bool inactivate = !migrate_colo_enabled(); v The stop event comes from here ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); ... } qemu_mutex_unlock_iothread(); <--- oh, no! ... if (!migrate_colo_enabled()) { migrate_set_state(>state, current_active_state, MIGRATION_STATUS_COMPLETED); <-- too late! } return; signature.asc Description: PGP signature
Re: [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race
On 2018-03-05 16:59, Stefan Hajnoczi wrote: > There is a race between the test's 'query-migrate' QMP command after the > QMP 'STOP' event and completing the migration: > > The test case invokes 'query-migrate' upon receiving 'STOP'. At this > point the migration thread may still be in the process of completing. > Therefore 'query-migrate' can return 'status': 'active' for a brief > window of time instead of 'status': 'completed'. This results in > qemu-iotests 203 hanging. > > Solve the race by enabling the 'events' migration capability, which > causes QEMU to emit migration-specific QMP events that do not suffer > from this race condition. Wait for the QMP 'MIGRATION' event with > 'status': 'completed'. > > Reported-by: Max Reitz> Signed-off-by: Stefan Hajnoczi > --- > tests/qemu-iotests/203 | 15 +++ > tests/qemu-iotests/203.out | 5 + > 2 files changed, 16 insertions(+), 4 deletions(-) So much for "the ppoll() dungeon"... Thanks! Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race
There is a race between the test's 'query-migrate' QMP command after the QMP 'STOP' event and completing the migration: The test case invokes 'query-migrate' upon receiving 'STOP'. At this point the migration thread may still be in the process of completing. Therefore 'query-migrate' can return 'status': 'active' for a brief window of time instead of 'status': 'completed'. This results in qemu-iotests 203 hanging. Solve the race by enabling the 'events' migration capability, which causes QEMU to emit migration-specific QMP events that do not suffer from this race condition. Wait for the QMP 'MIGRATION' event with 'status': 'completed'. Reported-by: Max ReitzSigned-off-by: Stefan Hajnoczi --- tests/qemu-iotests/203 | 15 +++ tests/qemu-iotests/203.out | 5 + 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/tests/qemu-iotests/203 b/tests/qemu-iotests/203 index 2c811917d8..4874a1a0d8 100755 --- a/tests/qemu-iotests/203 +++ b/tests/qemu-iotests/203 @@ -49,11 +49,18 @@ with iotests.FilePath('disk0.img') as disk0_img_path, \ node_name='drive1-node', iothread='iothread0', force=True)) +iotests.log('Enabling migration QMP events...') +iotests.log(vm.qmp('migrate-set-capabilities', capabilities=[ +{ +'capability': 'events', +'state': True +} +])) + iotests.log('Starting migration...') iotests.log(vm.qmp('migrate', uri='exec:cat >/dev/null')) while True: -vm.get_qmp_event(wait=60.0) -result = vm.qmp('query-migrate') -status = result.get('return', {}).get('status', None) -if status == 'completed': +event = vm.event_wait('MIGRATION') +iotests.log(event, filters=[iotests.filter_qmp_event]) +if event['data']['status'] == 'completed': break diff --git a/tests/qemu-iotests/203.out b/tests/qemu-iotests/203.out index 3f1ff900e4..1a11f0975c 100644 --- a/tests/qemu-iotests/203.out +++ b/tests/qemu-iotests/203.out @@ -2,5 +2,10 @@ Launching VM... Setting IOThreads... {u'return': {}} {u'return': {}} +Enabling migration QMP events... +{u'return': {}} Starting migration... {u'return': {}} +{u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'setup'}, u'event': u'MIGRATION'} +{u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'active'}, u'event': u'MIGRATION'} +{u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'completed'}, u'event': u'MIGRATION'} -- 2.14.3