Re: [PATCH] server: Fix crash when accessing client which is already freed

2016-11-21 Thread Daniel Stone
Hi Pekka,

On 21 November 2016 at 13:30, Pekka Paalanen  wrote:
> On Mon, 21 Nov 2016 12:19:43 + Daniel Stone  wrote:
>> I'm would suggest we document that invoking wl_display_flush_clients() /
>> wl_connection_flush() from a request handler may cause undefined behaviour,
>> and that it must only be called outside of dispatch.
>
> FWIW, I agree with Daniel here. There are several things that are not
> quite defined or expected to be done inside dispatch calls, and
> flushing all clients is one of them.
>
> I could see, IIRC, how sending lots of events from inside dispatch
> could cause the one client to be flushed. I think flush is done
> automatically if the send buffer fills up. If the flush at that point
> fails, there is no way to signal back that "oops, the compositor should
> put that thing aside and do something else" until the connection
> unblocks. We don't want to be checking and handling every send-event
> possibly failing, so killing the client can happen. ISTR we don't
> dynamically grow the send buffers, do we? But even the growing would
> need to stop somewhere.

Two paths, depending on whether the event is queued or immediately
posted. If the latter, then the event will be flushed immediately. If
the former, _and_ the queue is already full, then the queue will be
flushed immediately before the event is written to the queue. In
either case, if the flush fails, then client->error will be set. Since
we work exclusively on NOWAIT connections, this includes if the kernel
buffers are full because the client hasn't yet read.

Either way, this is not related to the above patch, because it takes a
different codepath (wl_resource_queue_event_array -> wl_closure_queue
-> wl_connection_queue -> wl_connection_flush /
wl_resource_post_event_array -> wl_closure_send -> wl_connection_write
-> wl_connection_flush), not involving calling
wl_display_flush_clients(). This implicit mid-dispatch flush is
completely fine: the patch only fixes the compositor _explicitly_
flushing _all_ clients in the middle of request dispatch.

(Side note: I suspect there might be some scope for increasing our -
currently static - buffer limit, but making that limit too high just
shifts the problem of unresponsive clients. But this also does not
have any bearing on this patch.)

Cheers,
Daniel
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH] server: Fix crash when accessing client which is already freed

2016-11-21 Thread Pekka Paalanen
On Mon, 21 Nov 2016 12:19:43 +
Daniel Stone  wrote:

> Hi Hyun Kook,
> 
> On 23 September 2016 at 00:40, Hyun Kook Khang 
> wrote:
> 
> > in wl_client_connection_data(),
> > there is no guarantee that client would never be destroyed in the process
> > of handling the pending input.
> >
> > For example,
> > 1. It comes to wl_client_connection_data()
> > 2. wl_closure_invoke() would be invoked, and then wl_display_flush_clients
> > would be invoked anyhow later in the same routine. (wl_closure_invoke() ->
> > ? -> wl_display_flush_clients())
> > Here, client will be destroyed if wl_connection_flush() returns
> > negative value and errno is not EAGAIN.
> >  
> 
> Out of interest, what is the '?' here? I can't find any examples of this
> happening in Wayland, Weston, or Mesa. If you could share your usecase,
> this would be quite interesting. I believe the idea behind deferring the
> flush was to minimise the number of system calls made, and also somewhat
> decrease the latency when multiple clients are involved, by batching all
> the flushes (potentially time-consuming) to a point where we do not have
> any more requests to process (so are not sensitive to latency).
> 
> In general, Wayland compositors should always be responsive: client request
> handling should be lightweight and not time-consuming. Wayland clients
> should also be completely asynchronous, rather than following a
> RPC/method-return pattern of { send_request(); block_for_reply(); }. Given
> that, the only reason to flush inside request processing would be because
> you are sending a lot of data, but again the Wayland protocol itself should
> not be carrying large amounts of data. So I'm not sure what usecase there
> is for this.
> 
> 
> > 3. It will come back to the while loop in wl_client_connection_data().
> > If wl_connection_pending_input() returns any values which is bigger
> > than “size of p”; it will go into the while loop again, and then it will do
> > something with the client which is already freed.
> >
> >
> > The simplest way to solve this is checking the source(wl_event_source)’
> > fd, but for now there is no way to access wl_event_source in
> > wayland-server.c. (wl_event_source will be freed a while later)
> >  
> 
> Hm, this is racy though.
> 
> I'm would suggest we document that invoking wl_display_flush_clients() /
> wl_connection_flush() from a request handler may cause undefined behaviour,
> and that it must only be called outside of dispatch.

Hi,

FWIW, I agree with Daniel here. There are several things that are not
quite defined or expected to be done inside dispatch calls, and
flushing all clients is one of them.

I could see, IIRC, how sending lots of events from inside dispatch
could cause the one client to be flushed. I think flush is done
automatically if the send buffer fills up. If the flush at that point
fails, there is no way to signal back that "oops, the compositor should
put that thing aside and do something else" until the connection
unblocks. We don't want to be checking and handling every send-event
possibly failing, so killing the client can happen. ISTR we don't
dynamically grow the send buffers, do we? But even the growing would
need to stop somewhere.

Before we can review a fix, we need to understand exactly when and why
the problem happens.

Is this a case of the compositor explicitly calling flush, or is it an
automatic flush? What triggers it?


Thanks,
pq


pgpD0A7ggfvxD.pgp
Description: OpenPGP digital signature
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH] server: Fix crash when accessing client which is already freed

2016-11-21 Thread Daniel Stone
Hi Hyun Kook,

On 23 September 2016 at 00:40, Hyun Kook Khang 
wrote:

> in wl_client_connection_data(),
> there is no guarantee that client would never be destroyed in the process
> of handling the pending input.
>
> For example,
> 1. It comes to wl_client_connection_data()
> 2. wl_closure_invoke() would be invoked, and then wl_display_flush_clients
> would be invoked anyhow later in the same routine. (wl_closure_invoke() ->
> ? -> wl_display_flush_clients())
> Here, client will be destroyed if wl_connection_flush() returns
> negative value and errno is not EAGAIN.
>

Out of interest, what is the '?' here? I can't find any examples of this
happening in Wayland, Weston, or Mesa. If you could share your usecase,
this would be quite interesting. I believe the idea behind deferring the
flush was to minimise the number of system calls made, and also somewhat
decrease the latency when multiple clients are involved, by batching all
the flushes (potentially time-consuming) to a point where we do not have
any more requests to process (so are not sensitive to latency).

In general, Wayland compositors should always be responsive: client request
handling should be lightweight and not time-consuming. Wayland clients
should also be completely asynchronous, rather than following a
RPC/method-return pattern of { send_request(); block_for_reply(); }. Given
that, the only reason to flush inside request processing would be because
you are sending a lot of data, but again the Wayland protocol itself should
not be carrying large amounts of data. So I'm not sure what usecase there
is for this.


> 3. It will come back to the while loop in wl_client_connection_data().
> If wl_connection_pending_input() returns any values which is bigger
> than “size of p”; it will go into the while loop again, and then it will do
> something with the client which is already freed.
>
>
> The simplest way to solve this is checking the source(wl_event_source)’
> fd, but for now there is no way to access wl_event_source in
> wayland-server.c. (wl_event_source will be freed a while later)
>

Hm, this is racy though.

I'm would suggest we document that invoking wl_display_flush_clients() /
wl_connection_flush() from a request handler may cause undefined behaviour,
and that it must only be called outside of dispatch.

Cheers,
Daniel
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH] server: Fix crash when accessing client which is already freed

2016-09-22 Thread Giulio Camuffo
Hi,

could you write how to trigger the crash, also in the commit message
maybe? Besides that, i have a comment inline below.

2016-09-21 9:08 GMT+02:00 Hyunkook Khang :
> While processing pending data, client could be destroyed in the middle of
> the process. (e.g. by invoking wl_display_flush_clients()).
> In this case, client will be freed, but we are still in the processing data
> of the client, so it could cause a crash.
>
> To avoid this, instead of destroying the client directly,
> just set the error here and destroy the client where it needs to be.
>
> Signed-off-by: Hyunkook Khang 
> ---
>  src/wayland-server.c |8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/src/wayland-server.c b/src/wayland-server.c
> index 9d7d9c1..89d0bac 100644
> --- a/src/wayland-server.c
> +++ b/src/wayland-server.c
> @@ -1103,10 +1103,16 @@ wl_display_terminate(struct wl_display *display)
>  WL_EXPORT void
>  wl_display_run(struct wl_display *display)
>  {
> +   struct wl_client *client, *next;
> +
> display->run = 1;
>
> while (display->run) {
> wl_display_flush_clients(display);
> +   wl_list_for_each_safe(client, next, &display->client_list, 
> link) {
> +   if (client->error)
> +   wl_client_destroy(client);
> +   }

wl_display_flush_clients() may be called manually by the user too,
without using wl_display_run(). So now all the users would need to do
the loop to destroy the clients, or else they would leak. Instead, you
should do the loop at the end of wl_display_flush_clients().


Thanks,
Giulio

> wl_event_loop_dispatch(display->loop, -1);
> }
>  }
> @@ -1124,7 +1130,7 @@ wl_display_flush_clients(struct wl_display *display)
>   WL_EVENT_WRITABLE |
>   WL_EVENT_READABLE);
> } else if (ret < 0) {
> -   wl_client_destroy(client);
> +   client->error = 1;
> }
> }
>  }
> --
> 1.7.9.5
>
> ___
> wayland-devel mailing list
> wayland-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/wayland-devel
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel