Public bug reported: When live migrating an instance, it is supposed to retry some (configurable) number of times. It only retries if the host compatibility and migration pre-checks raise nova.exception.Invalid, though:
https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L167-L174 If, for instance, a destination hypervisor has run out of disk space it will not raise an Invalid subclass, but rather MigrationPreCheckError, which causes the retry loop to short-circuit. Nova should instead retry as long as either Invalid or MigrationPreCheckError is raised. This can be tricky to reproduce because it only occurs if a host raises MigrationPreCheckError before a valid host is found, so it's dependent upon the order in which the scheduler supplies possible destinations to the conductor. In theory, though, it can be reproduced by bringing up a number of hypervisors, exhausting the disk on one -- ideally the one that the scheduler will return first -- and then attempting a live migration. It will fail with something like: $ nova live-migration --block-migrate stpierre-test-1 ERROR (BadRequest): Migration pre-check error: Unable to migrate f44296dd- ffa6-4ec0-8256-c311d025d46c: Disk of instance is too large(available on destination host:-38654705664 < need:1073741824) (HTTP 400) (Request-ID: req-9951691a-c63c-4888-bec5-30a072dfe727) Even when there are valid hosts to migrate to. ** Affects: nova Importance: Undecided Assignee: Chris St. Pierre (stpierre) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1480441 Title: Live migration doesn't retry on migration pre-check failure Status in OpenStack Compute (nova): In Progress Bug description: When live migrating an instance, it is supposed to retry some (configurable) number of times. It only retries if the host compatibility and migration pre-checks raise nova.exception.Invalid, though: https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L167-L174 If, for instance, a destination hypervisor has run out of disk space it will not raise an Invalid subclass, but rather MigrationPreCheckError, which causes the retry loop to short-circuit. Nova should instead retry as long as either Invalid or MigrationPreCheckError is raised. This can be tricky to reproduce because it only occurs if a host raises MigrationPreCheckError before a valid host is found, so it's dependent upon the order in which the scheduler supplies possible destinations to the conductor. In theory, though, it can be reproduced by bringing up a number of hypervisors, exhausting the disk on one -- ideally the one that the scheduler will return first -- and then attempting a live migration. It will fail with something like: $ nova live-migration --block-migrate stpierre-test-1 ERROR (BadRequest): Migration pre-check error: Unable to migrate f44296dd- ffa6-4ec0-8256-c311d025d46c: Disk of instance is too large(available on destination host:-38654705664 < need:1073741824) (HTTP 400) (Request-ID: req-9951691a-c63c-4888-bec5-30a072dfe727) Even when there are valid hosts to migrate to. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1480441/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

