[Xen-API] unkillable deadbeef-dead-beef domain and xapi host self disabling

George Shuklin Tue, 19 Apr 2011 09:29:14 -0700

Good day.

XCP 0.5, got this bug with PV-only VM's on host:


Xen suddenly stops to kill domain (i still don't know why). Domain has
statud 'd' (dying) and it was not possible to kill it even by
xc.domain_destroy() from python script.

Xapi checks if stray domain avaible when starts. It change their uuid to
deadbeef value if found and keeps trying to kill them until they die.

During this time host change it own value to 'disabled' in xapi
database. If domain is unkillable, it keeps host disable and even
prevent VM migration from hosts (because host switching to emergency
mode until it able to kill all stray domains).

I have met this problem last week, I was able to fool xapi by changing
database (create fake deadbeef vm set resident-on and powerstate
correctly), and even was able to migrate few machines from damaged host,
but domains in list after 'bad one' was unmigrable: after suspend xapi
was unable to destroy domain and migration process hangs...


log in attachment.

I think it's clearly Xen bug, but I unable to reproduce it, so I unable
to report it correctly.

Other question is xapi behavior with stray (or unkillable) domains, I
think we shall allow some kind of 'emergency migration' procedure: we
allow migration, we put domain to endless 'pause' state (we can kill it
but we can allow it to continue to run). 

---
wBR, George.

[20110414T08:14:12.584Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|dbsync] killing umanaged domain: 
deadbeef-dead-beef-dead-beef00000027
[20110414T08:14:12.585Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xenops] Domain.destroy: all known devices = [  ]
[20110414T08:14:12.585Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xenops] Domain.destroy calling Xc.domain_destroy (domid 39)
[20110414T08:14:12.585Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xenops] No qemu-dm pid in xenstore; assuming this domain was PV
[20110414T08:14:12.586Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xenops] Domain.destroy: rm /local/domain/39
[20110414T08:14:12.586Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xenops] Domain.destroy: deleting backend paths
[20110414T08:14:12.588Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xenops] Domain 39 still exists (domid=39; 
uuid=deadbeef-dead-beef-dead-beef00000027): waiting for it to disappear.
[20110414T08:14:17.586Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xenops] Domain 39 still exists (domid=39; 
uuid=deadbeef-dead-beef-dead-beef00000027): waiting for it to disappear.
[20110414T08:14:22.400Z|debug|hostname|9 inet-RPC|session.slave_local_login 
D:dc2eac1d6541|xapi] Add session to local storage
[20110414T08:14:22.437Z| info|hostname|10 inet-RPC|pool.is_slave 
D:4fff824c9eb8|xapi] Pool.is_slave call received (I'm a slave)
[20110414T08:14:22.437Z|debug|hostname|10 inet-RPC|pool.is_slave 
D:4fff824c9eb8|xapi] About to kick the database connection to make sure it's 
still working...
[20110414T08:14:22.586Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xenops] Domain 39 still exists (domid=39; 
uuid=deadbeef-dead-beef-dead-beef00000027): waiting for it to disappear.
[20110414T08:14:27.585Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xenops] Domain 39 still exists (domid=39; 
uuid=deadbeef-dead-beef-dead-beef00000027): waiting for it to disappear.
[20110414T08:14:47.586Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xapi] Raised at domain.ml:374.9-44 -> list.ml:69.12-15 -> 
dbsync_slave.ml:502.4-63 -> pervasiveext.ml:22.2-9
[20110414T08:14:47.586Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|backtrace] Raised at pervasiveext.ml:26.22-25 -> 
dbsync_slave.ml:690.2-144 -> dbsync.ml:62.7-53 -> server_helpers.ml:70.10-22
[20110414T08:14:47.586Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|dispatcher] Server_helpers.exec exception_handler: Got exception 
INTERNAL_ERROR: [ Domain.Domain_stuck_in_dying_state(39) ]
[20110414T08:14:47.586Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|dispatcher] Raised at string.ml:150.25-34 -> 
stringext.ml:108.13-29
[20110414T08:14:47.587Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|backtrace] Raised at string.ml:150.25-34 -> 
stringext.ml:108.13-29
[20110414T08:14:47.587Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xapi] Raised at server_helpers.ml:92.14-15 -> 
pervasiveext.ml:22.2-9
[20110414T08:14:47.587Z|debug|hostname|0 thread_zero|dbsync (update_env) 
D:e441d10b0cd8|xapi] Raised at pervasiveext.ml:26.22-25 -> 
pervasiveext.ml:22.2-9
[20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init 
D:bf90c0bf3715|backtrace] Raised at pervasiveext.ml:26.22-25 -> 
dbsync.ml:76.4-17
[20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init 
D:bf90c0bf3715|dbsync] dbsync caught an exception: INTERNAL_ERROR: [ 
Domain.Domain_stuck_in_dying_state(39) ]
[20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init 
D:bf90c0bf3715|dbsync] Raised at string.ml:150.25-34 -> stringext.ml:108.13-29
[20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init 
D:bf90c0bf3715|xapi] Failure in slave dbsync; slave will pause and then restart 
to try again. Entering emergency mode.
[20110414T08:14:47.587Z| info|hostname|0 thread_zero|server_init 
D:bf90c0bf3715|xapi] Cannot contact master: running in slave emergency mode
[20110414T08:14:47.587Z| info|hostname|0 thread_zero|server_init 
D:bf90c0bf3715|xapi] Cannot contact master: running in slave emergency mode
[20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init 
D:bf90c0bf3715|xapi] Cannot contact master: running in slave emergency mode
[20110414T08:14:47.594Z|debug|hostname|0 thread_zero|server_init 
D:bf90c0bf3715|xapi] Will restart management software in 166.0 seconds

_______________________________________________
xen-api mailing list
[email protected]
http://lists.xensource.com/mailman/listinfo/xen-api

[Xen-API] unkillable deadbeef-dead-beef domain and xapi host self disabling

Reply via email to