Re: [openstack-dev] [nova] instances stuck with task_state of REBOOTING

Chris Friesen Thu, 20 Mar 2014 12:07:10 -0700

On 03/20/2014 12:29 PM, Chris Friesen wrote:

The fact that there are no success or error logs in nova-compute.log
makes me wonder if we somehow got stuck in self.driver.reboot().


Also, I'm kind of wondering what would happen if nova-compute was
running reboot_instance() and we rebooted the controller at the same
time.  reboot_instance() could time out trying to update the instance
with the the new power state and a task_state of None.  Later on in
_sync_power_states() we would update the power_state, but nothing would
update the task_state.  I don't think this is what happened to us though
since I'd expect to see logs of the timeout.

Actually, looking at the logs a bit more carefully it appears that whathappened is something like this:


We reboot the controllers.
Right after they come back up something calls compute.api.API.reboot()

That sets instance.task_state = task_states.REBOOTING and then callsinstance.save() to update the database.

Then it calls self.compute_rpcapi.reboot_instance() which does an rpc cast.

That message gets dropped on the floor due to communication issuesbetween the controller and the compute.

Now we're stuck with a task_state of REBOOTING.

I think that both of the RPC message loss scenarios are valid withcurrent nova code, so we really do need an audit to clean up after thissort of thing.


Chris



_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] instances stuck with task_state of REBOOTING

Reply via email to