See
https://docs.openstack.org/nova/latest/contributor/process.html#overview
for details about our blueprint process.

** Changed in: nova
       Status: New => Won't Fix

** Changed in: nova
   Importance: Undecided => Wishlist

** Changed in: nova
       Status: Won't Fix => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1745073

Title:
  Nova Compute unintentionally stopped to monitor live migration

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===========

  There is the case that nova-compute unintentionally stop to monitor live 
migration although live migration operation thread is still running 
(_live_migration_operation).
  This cause the problem that nova-compute result in reporting "migration was 
succeeded" to Nova conductor and Nova compute periodic task try to delete all 
instance related information inside /var/lib/nova/instances/<instance-id> 
because live migration was succeeded from nova point of view.
  This could cause the problem of live migration and also this is led to 
misunderstanding for the status of live migration operation to the operator.

  "So it must be better at least Nova compute monitor live migration
  during _live_migration_operation thread be running"

  Above case won't happen usually as long as libvirtd correctly maintain domain 
job information and correctly clean up after job is completed even if 
nova-compute won't check if live migration operation thread is finished or not.
  But If libvirtd couldn't maintain domain job information correctly or 
something happened in clean up phase, nova-compute could misunderstand live 
migration as successful although still in progress "obviously" because the 
_live_migration_operation thread is running.

  We could think it as just libvirtd matter and Nova doesn't have to take care 
of these.
  But I think it must be better to implement more safety way if we can take it 
and actually I faced this situation with libvirtd 3.2.0 and it took a bit time 
to notice live migration operation thread is never finished from log and 
migration status in the database.

  More specifically, I think here (finish_event) should be always checked not 
only the time when job type is VIR_DOMAIN_JOB_NONE
  
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L6871

  libvirtd side problem is already fixed by
  https://www.redhat.com/archives/libvir-list/2017-April/msg00387.html
  and this is included in 3.3.0, but still I think nova compute should
  change behaviour for future problem that could be happened

  Steps to reproduce
  ==================

  * *Use libvirtd-3.2.0 having bug related to live migration*
     -> This version of libvirtd would often (not always) block for ever in 
virDomainMigrateToURI3 method and cause _live_migration_operation thrad is 
running for ever 

  * Create test vm with swap disk

     ```
     $ openstack flavor create --ram 1024 --disk 20 --swap 4048 --vcpus 1 test
     +----------------------------+--------------------------------------+
     | Field                      | Value                                |
     +----------------------------+--------------------------------------+
     | disk                       | 20                                   |
     | id                         | d4e400a7-fd10-4c18-9dbc-f89f24e668af |
     | name                       | test                                 |
     | os-flavor-access:is_public | True                                 |
     | ram                        | 1024                                 |
     | rxtx_factor                | 1.0                                  |
     | swap                       | 4048                                 |
     | vcpus                      | 1                                    |
     +----------------------------+--------------------------------------+
     ```
     
     ```
     $ openstack server create --flavor test --image <something image> --nic 
net-id=<something network> test_server
     ```

  
  * Nova "block" live migration test vm from HV1 to HV2

     ```
     $ nova live-migration --block-migrate test_server HV2
     ```

  * Check migration status

     ```
     $ nova migration-list
     
+-----+-----------------------+-----------------------+----------------+--------------+-------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+
     | Id  | Source Node           | Dest Node             | Source Compute | 
Dest Compute | Dest Host   | Status    | Instance UUID                        | 
Old Flavor | New Flavor | Created At                 | Updated At               
  | Type           |
     
+-----+-----------------------+-----------------------+----------------+--------------+-------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+
     | 1   | -                     | -                     | HV1    | HV2  | -  
         | completed     | e484eb18-2794-4651-a357-d2070940ed32 | 6          | 
6          | 2018-01-09T03:02:10.000000 | 2018-01-09T03:02:20.000000 | 
live-migration |
     ```

  * Check vm status

  ```
  $ nova list
  
+--------------------------------------+--------------------------+--------+------------+-------------+--------------------+
  | ID                                   | Name                     | Status | 
Task State | Power State | Networks           |
  
+--------------------------------------+--------------------------+--------+------------+-------------+--------------------+
  | a221c19b-4d4e-46d4-8888-10c14ca0fe27 | test_server              | ACTIVE | 
-          | Paused      | net1=192.168.11.11 |
  
+--------------------------------------+--------------------------+--------+------------+-------------+--------------------+
  ```

  
  Expected result
  ===============

  The host running nova-api/nova-conductor
   * migration status should not be changed to "completed" until 
_live_migration_operation thread finish
   * VM status should not be changed to "ACTIVE" until 
_live_migration_operation thread finish

  The host running nova-compute
   * continue on monitoring live migration during _live_migration_operation 
thread being running 

  
  Actual result
  =============

  The host running nova-api
  * migration status was changed to "completed" in nova database (check by nova 
migration-list)
  * VM status was changed to "ACTIVE" (check by nova list)

  The host running nova-compute
  * stop to monitor live migration although _live_migration_operation thread is 
still running  (check by log displaying "Live migration monitoring is all done")

  
  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/
     * 13.1.0-1.el7 (Centos7)
                        
  2. Which hypervisor did you use?
     * libvirt + KVM 
         * libvirt-daemon: 3.2.0-14.el7_4.7 
         * qemu-kvm: 2.6.0-28.el7.10.1

  2. Which storage type did you use?
     * local storage (just ephemeral disk)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1745073/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to