On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote: > This is not a problem if we are only having one single loop thread like > before. However, after per-monitor thread is introduced, this is not > true any more, and the race can happen. > > The race can be triggered with "make check -j8" sometimes: > > qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91: > io_watch_poll_finalize: Assertion `iwp->src == NULL' failed. > > This patch keeps the reference for the watch object when creating in > io_add_watch_poll(), so that the object will never be released in the > context main loop, especially when the context loop is running in > another standalone thread. Meanwhile, when we want to remove the watch > object, we always first detach the watch object from its owner context, > then we continue with the cleanup. > > Without this patch, calling io_remove_watch_poll() in main loop thread > is not thread-safe, since the other per-monitor thread may be modifying > the watch object at the same time.
This doesn't feel right to me. Why is the main loop thread doing anything at all with the Chardev, if there is a per-monitor thread ? The Chardev code isn't thread safe so it isn't safe to have two separate threads accessing the same Chardev. IOW, if we want a per-monitor thread, then we must make sure the main thread never touches that monitor's chardev at all. While your patch here might have avoided the assertion you mention above, I fear this is just papering over a fundamental problem that still exists, that can only be solved by not letting the mainloop touch the chardev at all. > > Reviewed-by: Marc-André Lureau <marcandre.lur...@redhat.com> > Signed-off-by: Peter Xu <pet...@redhat.com> > --- > chardev/char-io.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/chardev/char-io.c b/chardev/char-io.c > index f810524..3828c20 100644 > --- a/chardev/char-io.c > +++ b/chardev/char-io.c > @@ -122,7 +122,6 @@ GSource *io_add_watch_poll(Chardev *chr, > g_free(name); > > g_source_attach(&iwp->parent, context); > - g_source_unref(&iwp->parent); > return (GSource *)iwp; > } > > @@ -131,12 +130,24 @@ static void io_remove_watch_poll(GSource *source) > IOWatchPoll *iwp; > > iwp = io_watch_poll_from_source(source); > + > + /* > + * Here the order of destruction really matters. We need to first > + * detach the IOWatchPoll object from the context (which may still > + * be running in another loop thread), only after that could we > + * continue to operate on iwp->src, or there may be race condition > + * between current thread and the context loop thread. > + * > + * Let's blame the glib bug mentioned in commit 2b3167 (again) for > + * this extra complexity. > + */ > + g_source_destroy(&iwp->parent); > if (iwp->src) { > g_source_destroy(iwp->src); > g_source_unref(iwp->src); > iwp->src = NULL; > } > - g_source_destroy(&iwp->parent); > + g_source_unref(&iwp->parent); > } > > void remove_fd_in_watch(Chardev *chr) > -- > 2.7.4 > Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|