[PATCH] cpus: avoid stucking in pause_all_vcpus due to race

2020-03-16 Thread Longpeng(Mike)
From: Longpeng 

We found an issue when repeat reboot in guest during migration, it cause the
migration thread never be waken up again.

|
   |
LOCK BQL   |
...|
main_loop_should_exit  |
 pause_all_vcpus   |
  1. set all cpus ->stop=true  |
 and then kick |
  2. return if all cpus is paused  |
 (by '->stopped == true'), else|
  3. qemu_cond_wait [BQL UNLOCK]   |
   |LOCK BQL
   |...
   |do_vm_stop
   | pause_all_vcpus
   |  (A)set all cpus ->stop=true
   | and then kick
   |  (B)return if all cpus is paused
   | (by '->stopped == true'), else
   |  (C)qemu_cond_wait [BQL UNLOCK]
  4. be waken up and LOCK BQL  |  (D)be waken up BUT wait for  BQL
  5. goto 2.   |
 (BQL is still LOCKed) |
 resume_all_vcpus  |
  1. set all cpus ->stop=false |
 and ->stopped=false   |
...|
BQL UNLOCK |  (E)LOCK BQL
   |  (F)goto B. [but stopped is false now!]
   |Finally, sleep at step 3 forever.

As suggested by Paolo, resume_all_vcpus should notice this race, so we need
to move the change of runstate before pause_all_vcpus in do_vm_stop() and
ignore the resume request if runstate is not running.

Cc: Paolo Bonzini 
Cc: Dr . David Alan Gilbert 
Cc: Richard Henderson 
Signed-off-by: Longpeng 
---
 cpus.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index b4f8b84..ef441bd 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1026,9 +1026,9 @@ static int do_vm_stop(RunState state, bool send_stop)
 int ret = 0;
 
 if (runstate_is_running()) {
+runstate_set(state);
 cpu_disable_ticks();
 pause_all_vcpus();
-runstate_set(state);
 vm_state_notify(0, state);
 if (send_stop) {
 qapi_event_send_stop();
@@ -1899,6 +1899,10 @@ void resume_all_vcpus(void)
 {
 CPUState *cpu;
 
+if (!runstate_is_running()) {
+return;
+}
+
 qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
 CPU_FOREACH(cpu) {
 cpu_resume(cpu);
-- 
1.8.3.1




Re: [PATCH] cpus: avoid stucking in pause_all_vcpus due to race

2020-03-16 Thread Paolo Bonzini
On 16/03/20 09:37, Longpeng(Mike) wrote:
> From: Longpeng 
> 
> We found an issue when repeat reboot in guest during migration, it cause the
> migration thread never be waken up again.
> 
> |
>|
> LOCK BQL   |
> ...|
> main_loop_should_exit  |
>  pause_all_vcpus   |
>   1. set all cpus ->stop=true  |
>  and then kick |
>   2. return if all cpus is paused  |
>  (by '->stopped == true'), else|
>   3. qemu_cond_wait [BQL UNLOCK]   |
>|LOCK BQL
>|...
>|do_vm_stop
>| pause_all_vcpus
>|  (A)set all cpus ->stop=true
>| and then kick
>|  (B)return if all cpus is paused
>| (by '->stopped == true'), else
>|  (C)qemu_cond_wait [BQL UNLOCK]
>   4. be waken up and LOCK BQL  |  (D)be waken up BUT wait for  BQL
>   5. goto 2.   |
>  (BQL is still LOCKed) |
>  resume_all_vcpus  |
>   1. set all cpus ->stop=false |
>  and ->stopped=false   |
> ...|
> BQL UNLOCK |  (E)LOCK BQL
>|  (F)goto B. [but stopped is false now!]
>|Finally, sleep at step 3 forever.
> 
> As suggested by Paolo, resume_all_vcpus should notice this race, so we need
> to move the change of runstate before pause_all_vcpus in do_vm_stop() and
> ignore the resume request if runstate is not running.
> 
> Cc: Paolo Bonzini 
> Cc: Dr . David Alan Gilbert 
> Cc: Richard Henderson 
> Signed-off-by: Longpeng 

Queued, thanks!

Paolo

> ---
>  cpus.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/cpus.c b/cpus.c
> index b4f8b84..ef441bd 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1026,9 +1026,9 @@ static int do_vm_stop(RunState state, bool send_stop)
>  int ret = 0;
>  
>  if (runstate_is_running()) {
> +runstate_set(state);
>  cpu_disable_ticks();
>  pause_all_vcpus();
> -runstate_set(state);
>  vm_state_notify(0, state);
>  if (send_stop) {
>  qapi_event_send_stop();
> @@ -1899,6 +1899,10 @@ void resume_all_vcpus(void)
>  {
>  CPUState *cpu;
>  
> +if (!runstate_is_running()) {
> +return;
> +}
> +
>  qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
>  CPU_FOREACH(cpu) {
>  cpu_resume(cpu);
>