On Tue, 9 Sep 2014 13:10:48 +0200 Marek Chalupa <mchqwe...@gmail.com> wrote:
> On 4 September 2014 15:17, Pekka Paalanen <ppaala...@gmail.com> wrote: > > > On Fri, 29 Aug 2014 11:21:30 +0200 > > Marek Chalupa <mchqwe...@gmail.com> wrote: > > > > > When wl_display_read_events() returns with errno == EAGAIN, we > > > naturally try to call it again. But this next call results in > > > deadlock. > > > > > > Signed-off-by: Marek Chalupa <mchqwe...@gmail.com> > > > --- > > > tests/display-test.c | 88 > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 88 insertions(+) > > > > > > diff --git a/tests/display-test.c b/tests/display-test.c > > > index 1289866..e1cf830 100644 > > > --- a/tests/display-test.c > > > +++ b/tests/display-test.c > > > @@ -32,6 +32,7 @@ > > > #include <sys/types.h> > > > #include <sys/stat.h> > > > #include <pthread.h> > > > +#include <poll.h> > > > > > > #include "wayland-private.h" > > > #include "wayland-server.h" > > > @@ -480,6 +481,93 @@ TEST(threading_cancel_read_tst) > > > display_destroy(d); > > > } > > > > > > +/* set this to 1 when it is right that > > > + * the thread is woken up */ > > > +static int threads_can_be_woken_up = 0; > > > + > > > +static void * > > > +thread_eagain_func(void *data) { > > > + struct client *c = data; > > > + > > > + register_reading(c->wl_display); > > > + > > > + c->display_stopped = 1; > > > + assert(wl_display_read_events(c->wl_display) == 0 > > > + && errno != EAGAIN); > > > + /* if thread was sleeping in read_events, this assert > > > + * must pass now */ > > > + assert(threads_can_be_woken_up && "Thread woke up too > > > early"); + > > > + pthread_exit(NULL); > > > +} > > > + > > > +static void > > > +threading_read_eagain(void) > > > +{ > > > + struct client *c = client_connect(); > > > + pthread_t th1, th2, th3; > > > + struct pollfd fds; > > > + int i; > > > + > > > + register_reading(c->wl_display); > > > + > > > + th1 = create_thread(c, thread_eagain_func); > > > + th2 = create_thread(c, thread_eagain_func); > > > + th3 = create_thread(c, thread_eagain_func); > > > + > > > + /* All the threads are sleeping, waiting until read or > > > cancel > > > + * is called. Since we have no data on socket waiting, > > > + * the wl_connection_read should end up with error and set > > > errno > > > + * to EAGAIN => wl_display_read_events() will return 0 and > > > set > > errno > > > + * to EAGAIN. It should not wake up the sleeping threads, > > > so that > > > + * this thread can try to read again */ > > > + alarm(3); > > > + assert(wl_display_read_events(c->wl_display) == 0); > > > + assert(errno == EAGAIN); > > > > I think there is a problem here. wl_display_read_events() returning > > 0 means success, and in that case errno may not be set if it was a > > real success. There is no EAGAIN to be handled, it's already done > > inside read_events(). > > > > In this test it won't be a real success, but you cannot code > > clients to check for errno after getting 0 from > > wl_display_read_events(). > > > > > > + > > > + /* the same should happen next times - but there was a bug > > > when > > > + * the main thread got blocked! */ > > > + for (i = 0; i < 5; ++i) { > > > + assert(wl_display_read_events(c->wl_display) == 0); > > > > Therefore you are here calling wl_display_read_events() again after > > it succeeded! And that IMO is a developer error. > > > > yes > > > > Besides, I cannot see anywhere where it says that it is ok to call > > wl_display_read_events() again even if it failed, without calling > > wl_display_prepare_read() first. > > > > > Since original piece of code returned 0 on EAGAIN, then I suppose the > author considered this as a success, so yes, wl_display_read_events() > is not probably meant > to be called again. But consider this: > if the thread that got EAGAIN (the only thread that is not sleeping) > calls wl_display_prepare_read() > + wl_display_read_events() again, it works. Problem is that it does > not have to call it > right after the failed wl_display_read_events and the other threads > are still sleeping. > Moreover, if there are not events on the socket for some time, this > one thread is looping in cycle > while the other threads are sleeping. > > What I meant to do was to give the programmer the choice whether to > loop or let > all threads try it again (cancel read). But as you said before, 0 is > wrong return value for that. > See end of the e-mail for conclusions. > > > > But I see something else: wl_display_read_events() does not > > decrement reader_count if it returns due to last_error. I think > > that needs a fix to guarantee the reader_count is decremented, so > > that the side-effects are consistent. > > > > Yes, we should fix this. There are cases that can cause a problem. > > > > > + assert(errno == EAGAIN); > > > + } > > > + > > > + /* generate something to read and wait until the event > > > comes */ > > > + wl_proxy_marshal((struct wl_proxy *) c->tc, > > > + 0 /* STOP DISPLAY */, 1 /* number */); > > > + assert(wl_display_flush(c->wl_display) > 0); > > > + > > > + fds.fd = wl_display_get_fd(c->wl_display); > > > + fds.events = POLLIN; > > > + assert(poll(&fds, 1, 3) == 1); > > > > Is timeout of 3 milliseconds intended? > > > > Nope, my bad. > > > > > > > + > > > + /* threads can be woken up now */ > > > + threads_can_be_woken_up = 1; > > > + > > > + errno = 0; > > > + assert(wl_display_read_events(c->wl_display) == 0); > > > + assert(errno != EAGAIN); > > > + assert(wl_display_dispatch_pending(c->wl_display) == 1); > > > + > > > + pthread_join(th1, NULL); > > > + pthread_join(th2, NULL); > > > + pthread_join(th3, NULL); > > > + > > > + client_disconnect(c); > > > +} > > > + > > > +TEST(threading_read_eagain_tst) > > > +{ > > > + struct display *d = display_create(); > > > + > > > + client_create(d, threading_read_eagain); > > > + > > > + display_run(d); > > > + display_resume(d); > > > + > > > + display_destroy(d); > > > +} > > > + > > > static void * > > > thread_prepare_and_read2(void *data) > > > { > > > > > > Thanks, > > pq > > > > First, thanks for review. > > Summing it up, in my point of view we have these options: > > * Let it as it is: > this can lead to starving of other threads for some time. > Moreover, if the thread that > got EAGAIN decides to exit (or simply not to read again) instead > of another trial to read, the other threads will block. > * return -1 and set EAGAIN > programmer can either try to read again, or cancel read and let > all threads do another cycle. Good thing is > that programmer knows that there was an error and can cancel read > in the case that the thread is exiting > instead of another cycle. > * wake up threads on EAGAIN > this solution will force all threads to do another cycle (or > whatever they want) and should not have any side effects. It can lead > to big overhead if there are no events on the socket for some time > (waking up threads again and again in cycle) > > + fix the bug with last_error, of course > > Is there something that I'm missing? Hi, just a quick comment, since I didn't get to think it all through yet. I believe wl_display_read_events() encountering EAGAIN is not an error. The socket fd is non-blocking (hmm, is it?), so EAGAIN is a normal condition when there is nothing to read. The program is using poll() to attempt reading only when there is/was something, and otherwise sleep in poll(). There should not be any busy-spinning. A good question is, should this work for blocking fds and without poll()? I'm not even sure libwayland-client gives out blocking fds, but OTOH I vaguely recall some documentation about blocking. window.c main event loop in display_run() uses epoll. simple-shm just calls wl_display_dispatch() in a loop doing nothing else. Maybe these were the only two use cases supported? Thanks, pq _______________________________________________ wayland-devel mailing list wayland-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/wayland-devel