Okay, to be honest, at first glance, I thought it was not really a
bugfix because the evacuate operation is an admin policy and reset-state
is exactly here to reconcile the VM state so that an operator could
still fix that.

The real problem I see with setting the instance.host field before
calling the driver is that all of our API actions do that *after* the
driver is called, which would mean a different behaviour.

See, I'm torn. Sure, we could fix that specific issue and create a tech
debt, but I'm more in favor of having the libvirt driver being more
robust and be able to delete the temporary resource it created in case
of any error. Like you said, that rather looks like a blueprint to me.

** Changed in: nova
       Status: In Progress => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1626230

Title:
  instance artefacts are not removed by libvirt driver if it fails to
  spawn

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  When an instance is evacuated an attempt to rebuild it on a different
  host is made.  If the instance spawn method in the driver fails and
  raises and exception then the instance is placed in an error state.
  However the instance is still recorded a being on the source node but
  depending on how far through the spawn instance related files will be
  present and the instance may be running on the target.

  The xenAPI driver cleans up the instance artefact's if spawn fails but not
  so the libvirt driver.

  In the case where compute nodes do not use shared storage a subsequent
  attempt to evacuate the instance to the same target will fail because
  the instance directory is already present.

  The use of reset-state and then evacuate to another node will enable
  the successful evacuation of the instance.  However the 'orphaned'
  files and running instance on the original target will need to be
  cleaned up manually.

  We could update the instance's host once the claim is complete on the
  target.  In this case in the event of a failure to spawn it will
  effectively have evacuated so the files on the original host will be
  cleaned up when that node is restored.

  However maybe we should address this by bring the libvirt driver into line
  with the XenAPI driver and getting it to clean up resources associated with
  an instance that fails to spawn?  Will raise a blueprint for this.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1626230/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to