Re: [Qemu-devel] [PATCH COLO-Frame v11 25/39] COLO: implement default failover treatment

Dr. David Alan Gilbert Thu, 10 Dec 2015 11:03:12 -0800

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> If we detect some error in colo,  we will wait for some time,
> hoping users also detect it. If users don't issue failover command.
> We will go into default failover procedure, which the PVM will takeover
> work while SVM is exit in default.


I'm not sure this is needed; especially on the SVM.  I don't see any harm
in the SVM waiting forever to be told what to do - it could be told to
failover or quit; I don't see any benefit to it automatically exiting.

In the primary, I can see if you didn't have some automated error
detection system then I can understand it (but I think it's rare);
but you really would want to make that failover delay configurable
so that you could turn it off in a system that did have failure detection;
because automatically restarting the primary after it had caused a failover
to the secondary would be very bad.

Dave

> 
> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
> Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
> ---
>  migration/colo.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index f31e957..1e6d3dd 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -19,6 +19,14 @@
>  #include "qemu/sockets.h"
>  #include "migration/failover.h"
>  
> +/*
> + * The delay time before qemu begin the procedure of default failover 
> treatment.
> + * Unit: ms
> + * Fix me: This value should be able to change by command
> + * 'migrate-set-parameters'
> + */
> +#define DEFAULT_FAILOVER_DELAY 2000
> +
>  /* colo buffer */
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>  
> @@ -264,6 +272,7 @@ static void colo_process_checkpoint(MigrationState *s)
>  {
>      QEMUSizedBuffer *buffer = NULL;
>      int64_t current_time, checkpoint_time = 
> qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +    int64_t error_time;
>      int ret = 0;
>      uint64_t value;
>  
> @@ -322,8 +331,25 @@ static void colo_process_checkpoint(MigrationState *s)
>      }
>  
>  out:
> +    current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      if (ret < 0) {
>          error_report("%s: %s", __func__, strerror(-ret));
> +        /* Give users time to get involved in this verdict */
> +        while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) {
> +            if (failover_request_is_active()) {
> +                error_report("Primary VM will take over work");
> +                break;
> +            }
> +            usleep(100 * 1000);
> +            current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +        }
> +
> +        qemu_mutex_lock_iothread();
> +        if (!failover_request_is_active()) {
> +            error_report("Primary VM will take over work in default");
> +            failover_request_active(NULL);
> +        }
> +        qemu_mutex_unlock_iothread();
>      }
>  
>      qsb_free(buffer);
> @@ -384,6 +410,7 @@ void *colo_process_incoming_thread(void *opaque)
>      QEMUFile *fb = NULL;
>      QEMUSizedBuffer *buffer = NULL; /* Cache incoming device state */
>      uint64_t  total_size;
> +    int64_t error_time, current_time;
>      int ret = 0;
>      uint64_t value;
>  
> @@ -499,9 +526,28 @@ void *colo_process_incoming_thread(void *opaque)
>      }
>  
>  out:
> +    current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      if (ret < 0) {
>          error_report("colo incoming thread will exit, detect error: %s",
>                       strerror(-ret));
> +        /* Give users time to get involved in this verdict */
> +        while (current_time - error_time <= DEFAULT_FAILOVER_DELAY) {
> +            if (failover_request_is_active()) {
> +                error_report("Secondary VM will take over work");
> +                break;
> +            }
> +            usleep(100 * 1000);
> +            current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +        }
> +        /* check flag again*/
> +        if (!failover_request_is_active()) {
> +            /*
> +            * We assume that Primary VM is still alive according to
> +            * heartbeat, just kill Secondary VM
> +            */
> +            error_report("SVM is going to exit in default!");
> +            exit(1);
> +        }
>      }
>  
>      if (fb) {
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH COLO-Frame v11 25/39] COLO: implement default failover treatment

Reply via email to