Good day. XCP 0.5, got this bug with PV-only VM's on host:
Xen suddenly stops to kill domain (i still don't know why). Domain has statud 'd' (dying) and it was not possible to kill it even by xc.domain_destroy() from python script. Xapi checks if stray domain avaible when starts. It change their uuid to deadbeef value if found and keeps trying to kill them until they die. During this time host change it own value to 'disabled' in xapi database. If domain is unkillable, it keeps host disable and even prevent VM migration from hosts (because host switching to emergency mode until it able to kill all stray domains). I have met this problem last week, I was able to fool xapi by changing database (create fake deadbeef vm set resident-on and powerstate correctly), and even was able to migrate few machines from damaged host, but domains in list after 'bad one' was unmigrable: after suspend xapi was unable to destroy domain and migration process hangs... log in attachment. I think it's clearly Xen bug, but I unable to reproduce it, so I unable to report it correctly. Other question is xapi behavior with stray (or unkillable) domains, I think we shall allow some kind of 'emergency migration' procedure: we allow migration, we put domain to endless 'pause' state (we can kill it but we can allow it to continue to run). --- wBR, George.
[20110414T08:14:12.584Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|dbsync] killing umanaged domain: deadbeef-dead-beef-dead-beef00000027 [20110414T08:14:12.585Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xenops] Domain.destroy: all known devices = [ ] [20110414T08:14:12.585Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xenops] Domain.destroy calling Xc.domain_destroy (domid 39) [20110414T08:14:12.585Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xenops] No qemu-dm pid in xenstore; assuming this domain was PV [20110414T08:14:12.586Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xenops] Domain.destroy: rm /local/domain/39 [20110414T08:14:12.586Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xenops] Domain.destroy: deleting backend paths [20110414T08:14:12.588Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xenops] Domain 39 still exists (domid=39; uuid=deadbeef-dead-beef-dead-beef00000027): waiting for it to disappear. [20110414T08:14:17.586Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xenops] Domain 39 still exists (domid=39; uuid=deadbeef-dead-beef-dead-beef00000027): waiting for it to disappear. [20110414T08:14:22.400Z|debug|hostname|9 inet-RPC|session.slave_local_login D:dc2eac1d6541|xapi] Add session to local storage [20110414T08:14:22.437Z| info|hostname|10 inet-RPC|pool.is_slave D:4fff824c9eb8|xapi] Pool.is_slave call received (I'm a slave) [20110414T08:14:22.437Z|debug|hostname|10 inet-RPC|pool.is_slave D:4fff824c9eb8|xapi] About to kick the database connection to make sure it's still working... [20110414T08:14:22.586Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xenops] Domain 39 still exists (domid=39; uuid=deadbeef-dead-beef-dead-beef00000027): waiting for it to disappear. [20110414T08:14:27.585Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xenops] Domain 39 still exists (domid=39; uuid=deadbeef-dead-beef-dead-beef00000027): waiting for it to disappear. [20110414T08:14:47.586Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xapi] Raised at domain.ml:374.9-44 -> list.ml:69.12-15 -> dbsync_slave.ml:502.4-63 -> pervasiveext.ml:22.2-9 [20110414T08:14:47.586Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|backtrace] Raised at pervasiveext.ml:26.22-25 -> dbsync_slave.ml:690.2-144 -> dbsync.ml:62.7-53 -> server_helpers.ml:70.10-22 [20110414T08:14:47.586Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|dispatcher] Server_helpers.exec exception_handler: Got exception INTERNAL_ERROR: [ Domain.Domain_stuck_in_dying_state(39) ] [20110414T08:14:47.586Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|dispatcher] Raised at string.ml:150.25-34 -> stringext.ml:108.13-29 [20110414T08:14:47.587Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|backtrace] Raised at string.ml:150.25-34 -> stringext.ml:108.13-29 [20110414T08:14:47.587Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xapi] Raised at server_helpers.ml:92.14-15 -> pervasiveext.ml:22.2-9 [20110414T08:14:47.587Z|debug|hostname|0 thread_zero|dbsync (update_env) D:e441d10b0cd8|xapi] Raised at pervasiveext.ml:26.22-25 -> pervasiveext.ml:22.2-9 [20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init D:bf90c0bf3715|backtrace] Raised at pervasiveext.ml:26.22-25 -> dbsync.ml:76.4-17 [20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init D:bf90c0bf3715|dbsync] dbsync caught an exception: INTERNAL_ERROR: [ Domain.Domain_stuck_in_dying_state(39) ] [20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init D:bf90c0bf3715|dbsync] Raised at string.ml:150.25-34 -> stringext.ml:108.13-29 [20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init D:bf90c0bf3715|xapi] Failure in slave dbsync; slave will pause and then restart to try again. Entering emergency mode. [20110414T08:14:47.587Z| info|hostname|0 thread_zero|server_init D:bf90c0bf3715|xapi] Cannot contact master: running in slave emergency mode [20110414T08:14:47.587Z| info|hostname|0 thread_zero|server_init D:bf90c0bf3715|xapi] Cannot contact master: running in slave emergency mode [20110414T08:14:47.587Z|debug|hostname|0 thread_zero|server_init D:bf90c0bf3715|xapi] Cannot contact master: running in slave emergency mode [20110414T08:14:47.594Z|debug|hostname|0 thread_zero|server_init D:bf90c0bf3715|xapi] Will restart management software in 166.0 seconds
_______________________________________________ xen-api mailing list [email protected] http://lists.xensource.com/mailman/listinfo/xen-api
