On 04/14/2015 11:31 AM, Ian Jackson wrote:
Konrad Rzeszutek Wilk writes (libvirtd live-locking on CTX_LOCK when doing 'virsh
domid save /tmp/blah' with guest corrupting memory (on purpose).):
It looks like thread #10 is blocking in libxl_read_exactly waiting
for 'libxl-save-helper'. Said application (see below) has dispatched
an message through helper_getreply and is blocking on __read_nocancel.
This is not supposed to block.
helper_stdout_readable assumes that the fd is actually readable.
However, for complicated reasons it can happen in a multithreaded
program that the fd was _reviously_ readable and is now no longer.
This was not clearly documented in the internal API documentation.
I have produced what I think are two patches that will fix this. I
have compiled them but I haven't tested them. Konrad, are you able to
check whether they fix your bug ?
I too saw this bug just before Konrad's report, but the patches don't seem to
help. Running a script that continually saves and restores domains will
eventually lock libvirtd with essentially the same traces reported by Konrad
Thread 4 (Thread 0x7fffee3a0700 (LWP 39068)):
#0 0x73a9aa9d in read () from /lib64/libpthread.so.0
#1 0x74540ea0 in libxl_read_exactly (ctx=0x7fffe00445e0, fd=37,
data=0x7fffee39f36e,
sz=2, source=0x7fffc80010c0 domain 6 save/restore helper stdout pipe,
what=0x7458112a ipc msg header) at libxl_utils.c:430
#2 0x7454913a in helper_stdout_readable (egc=0x7fffee39f540,
ev=0x7fffc8002038, fd=37,
events=3, revents=1) at libxl_save_callout.c:281
#3 0x7454fafb in afterpoll_internal (egc=0x7fffee39f540,
poller=0x7fffea00, nfds=4,
fds=0x7fffe930, now=...) at libxl_event.c:1185
#4 0x7455127a in eventloop_iteration (egc=0x7fffee39f540,
poller=0x7fffea00)
at libxl_event.c:1645
#5 0x74551df1 in libxl__ao_inprogress (ao=0x7fffc8001060,
file=0x74575e1b libxl.c,
line=982, func=0x74578750 __func__.17561 libxl_domain_suspend) at
libxl_event.c:1896
#6 0x7450e051 in libxl_domain_suspend (ctx=0x7fffe00445e0, domid=6,
fd=29, flags=0,
ao_how=0x0) at libxl.c:982
#7 0x7fffe8774636 in libxlDoDomainSave (driver=0x7fffe011f1c0,
vm=0x7fffe004f950,
to=0x7fffc8000990 /tmp/sles12gm-pv.img) at libxl/libxl_driver.c:1584
#8 0x7fffe8774a35 in libxlDomainSaveFlags (dom=0x7fffc8000de0,
to=0x7fffc8000990 /tmp/sles12gm-pv.img, dxml=0x0, flags=0) at
libxl/libxl_driver.c:1653
#9 0x7fffe8774b11 in libxlDomainSave (dom=0x7fffc8000de0,
to=0x7fffc8000990 /tmp/sles12gm-pv.img) at libxl/libxl_driver.c:1678
#10 0x7751db15 in virDomainSave (domain=0x7fffc8000de0,
to=0x7fffc80009d0 /tmp/sles12gm-pv.img) at libvirt-domain.c:839
...
Thread 1 (Thread 0x77fc18c0 (LWP 39059)):
#0 0x73a9a7bc in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x73a964a4 in _L_lock_952 () from /lib64/libpthread.so.0
#2 0x73a96306 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x7454caf6 in libxl__ctx_lock (ctx=0x7fffe00445e0) at
libxl_internal.h:3268
#4 0x7454fe98 in libxl_osevent_occurred_fd (ctx=0x7fffe00445e0,
for_libxl=0x7fffe004f210, fd=32, events_ign=0, revents_ign=1) at
libxl_event.c:1242
#5 0x7fffe8770573 in libxlFDEventCallback (watch=24, fd=32, vir_events=1,
fd_info=0x55896c60) at libxl/libxl_driver.c:123
#6 0x773f71bc in virEventPollDispatchHandles (nfds=14,
fds=0x55897fa0)
at util/vireventpoll.c:508
#7 0x773f79f9 in virEventPollRunOnce () at util/vireventpoll.c:657
#8 0x773f58fa in virEventRunDefaultImpl () at util/virevent.c:308
#9 0x555c2131 in virNetServerRun (srv=0x55889980) at
rpc/virnetserver.c:1139
#10 0x5556cf88 in main (argc=2, argv=0x7fffe378) at libvirtd.c:1489
Regards,
Jim
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel