Re: [PATCH] server: Fix crash when accessing client which is already freed
Hi Pekka, On 21 November 2016 at 13:30, Pekka Paalanen wrote: > On Mon, 21 Nov 2016 12:19:43 + Daniel Stone wrote: >> I'm would suggest we document that invoking wl_display_flush_clients() / >> wl_connection_flush() from a request handler may cause undefined behaviour, >> and that it must only be called outside of dispatch. > > FWIW, I agree with Daniel here. There are several things that are not > quite defined or expected to be done inside dispatch calls, and > flushing all clients is one of them. > > I could see, IIRC, how sending lots of events from inside dispatch > could cause the one client to be flushed. I think flush is done > automatically if the send buffer fills up. If the flush at that point > fails, there is no way to signal back that "oops, the compositor should > put that thing aside and do something else" until the connection > unblocks. We don't want to be checking and handling every send-event > possibly failing, so killing the client can happen. ISTR we don't > dynamically grow the send buffers, do we? But even the growing would > need to stop somewhere. Two paths, depending on whether the event is queued or immediately posted. If the latter, then the event will be flushed immediately. If the former, _and_ the queue is already full, then the queue will be flushed immediately before the event is written to the queue. In either case, if the flush fails, then client->error will be set. Since we work exclusively on NOWAIT connections, this includes if the kernel buffers are full because the client hasn't yet read. Either way, this is not related to the above patch, because it takes a different codepath (wl_resource_queue_event_array -> wl_closure_queue -> wl_connection_queue -> wl_connection_flush / wl_resource_post_event_array -> wl_closure_send -> wl_connection_write -> wl_connection_flush), not involving calling wl_display_flush_clients(). This implicit mid-dispatch flush is completely fine: the patch only fixes the compositor _explicitly_ flushing _all_ clients in the middle of request dispatch. (Side note: I suspect there might be some scope for increasing our - currently static - buffer limit, but making that limit too high just shifts the problem of unresponsive clients. But this also does not have any bearing on this patch.) Cheers, Daniel ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [PATCH] server: Fix crash when accessing client which is already freed
On Mon, 21 Nov 2016 12:19:43 + Daniel Stone wrote: > Hi Hyun Kook, > > On 23 September 2016 at 00:40, Hyun Kook Khang > wrote: > > > in wl_client_connection_data(), > > there is no guarantee that client would never be destroyed in the process > > of handling the pending input. > > > > For example, > > 1. It comes to wl_client_connection_data() > > 2. wl_closure_invoke() would be invoked, and then wl_display_flush_clients > > would be invoked anyhow later in the same routine. (wl_closure_invoke() -> > > ? -> wl_display_flush_clients()) > > Here, client will be destroyed if wl_connection_flush() returns > > negative value and errno is not EAGAIN. > > > > Out of interest, what is the '?' here? I can't find any examples of this > happening in Wayland, Weston, or Mesa. If you could share your usecase, > this would be quite interesting. I believe the idea behind deferring the > flush was to minimise the number of system calls made, and also somewhat > decrease the latency when multiple clients are involved, by batching all > the flushes (potentially time-consuming) to a point where we do not have > any more requests to process (so are not sensitive to latency). > > In general, Wayland compositors should always be responsive: client request > handling should be lightweight and not time-consuming. Wayland clients > should also be completely asynchronous, rather than following a > RPC/method-return pattern of { send_request(); block_for_reply(); }. Given > that, the only reason to flush inside request processing would be because > you are sending a lot of data, but again the Wayland protocol itself should > not be carrying large amounts of data. So I'm not sure what usecase there > is for this. > > > > 3. It will come back to the while loop in wl_client_connection_data(). > > If wl_connection_pending_input() returns any values which is bigger > > than “size of p”; it will go into the while loop again, and then it will do > > something with the client which is already freed. > > > > > > The simplest way to solve this is checking the source(wl_event_source)’ > > fd, but for now there is no way to access wl_event_source in > > wayland-server.c. (wl_event_source will be freed a while later) > > > > Hm, this is racy though. > > I'm would suggest we document that invoking wl_display_flush_clients() / > wl_connection_flush() from a request handler may cause undefined behaviour, > and that it must only be called outside of dispatch. Hi, FWIW, I agree with Daniel here. There are several things that are not quite defined or expected to be done inside dispatch calls, and flushing all clients is one of them. I could see, IIRC, how sending lots of events from inside dispatch could cause the one client to be flushed. I think flush is done automatically if the send buffer fills up. If the flush at that point fails, there is no way to signal back that "oops, the compositor should put that thing aside and do something else" until the connection unblocks. We don't want to be checking and handling every send-event possibly failing, so killing the client can happen. ISTR we don't dynamically grow the send buffers, do we? But even the growing would need to stop somewhere. Before we can review a fix, we need to understand exactly when and why the problem happens. Is this a case of the compositor explicitly calling flush, or is it an automatic flush? What triggers it? Thanks, pq pgpD0A7ggfvxD.pgp Description: OpenPGP digital signature ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [PATCH] server: Fix crash when accessing client which is already freed
Hi Hyun Kook, On 23 September 2016 at 00:40, Hyun Kook Khang wrote: > in wl_client_connection_data(), > there is no guarantee that client would never be destroyed in the process > of handling the pending input. > > For example, > 1. It comes to wl_client_connection_data() > 2. wl_closure_invoke() would be invoked, and then wl_display_flush_clients > would be invoked anyhow later in the same routine. (wl_closure_invoke() -> > ? -> wl_display_flush_clients()) > Here, client will be destroyed if wl_connection_flush() returns > negative value and errno is not EAGAIN. > Out of interest, what is the '?' here? I can't find any examples of this happening in Wayland, Weston, or Mesa. If you could share your usecase, this would be quite interesting. I believe the idea behind deferring the flush was to minimise the number of system calls made, and also somewhat decrease the latency when multiple clients are involved, by batching all the flushes (potentially time-consuming) to a point where we do not have any more requests to process (so are not sensitive to latency). In general, Wayland compositors should always be responsive: client request handling should be lightweight and not time-consuming. Wayland clients should also be completely asynchronous, rather than following a RPC/method-return pattern of { send_request(); block_for_reply(); }. Given that, the only reason to flush inside request processing would be because you are sending a lot of data, but again the Wayland protocol itself should not be carrying large amounts of data. So I'm not sure what usecase there is for this. > 3. It will come back to the while loop in wl_client_connection_data(). > If wl_connection_pending_input() returns any values which is bigger > than “size of p”; it will go into the while loop again, and then it will do > something with the client which is already freed. > > > The simplest way to solve this is checking the source(wl_event_source)’ > fd, but for now there is no way to access wl_event_source in > wayland-server.c. (wl_event_source will be freed a while later) > Hm, this is racy though. I'm would suggest we document that invoking wl_display_flush_clients() / wl_connection_flush() from a request handler may cause undefined behaviour, and that it must only be called outside of dispatch. Cheers, Daniel ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [PATCH] server: Fix crash when accessing client which is already freed
Hi, could you write how to trigger the crash, also in the commit message maybe? Besides that, i have a comment inline below. 2016-09-21 9:08 GMT+02:00 Hyunkook Khang : > While processing pending data, client could be destroyed in the middle of > the process. (e.g. by invoking wl_display_flush_clients()). > In this case, client will be freed, but we are still in the processing data > of the client, so it could cause a crash. > > To avoid this, instead of destroying the client directly, > just set the error here and destroy the client where it needs to be. > > Signed-off-by: Hyunkook Khang > --- > src/wayland-server.c |8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/src/wayland-server.c b/src/wayland-server.c > index 9d7d9c1..89d0bac 100644 > --- a/src/wayland-server.c > +++ b/src/wayland-server.c > @@ -1103,10 +1103,16 @@ wl_display_terminate(struct wl_display *display) > WL_EXPORT void > wl_display_run(struct wl_display *display) > { > + struct wl_client *client, *next; > + > display->run = 1; > > while (display->run) { > wl_display_flush_clients(display); > + wl_list_for_each_safe(client, next, &display->client_list, > link) { > + if (client->error) > + wl_client_destroy(client); > + } wl_display_flush_clients() may be called manually by the user too, without using wl_display_run(). So now all the users would need to do the loop to destroy the clients, or else they would leak. Instead, you should do the loop at the end of wl_display_flush_clients(). Thanks, Giulio > wl_event_loop_dispatch(display->loop, -1); > } > } > @@ -1124,7 +1130,7 @@ wl_display_flush_clients(struct wl_display *display) > WL_EVENT_WRITABLE | > WL_EVENT_READABLE); > } else if (ret < 0) { > - wl_client_destroy(client); > + client->error = 1; > } > } > } > -- > 1.7.9.5 > > ___ > wayland-devel mailing list > wayland-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/wayland-devel ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel