On 2018-02-22 10:48 AM, Markus Ongyerth wrote:
On 2018/2月/22 09:34, Derek Foreman wrote:
On 2018-02-22 08:58 AM, Daniel Stone wrote:

On 22 February 2018 at 14:14, Markus Ongyerth <w...@ongy.net> wrote:
It seems that this patch makes that assumption invalid, and we would
need patches to weston, enlightenment, and mutter to prevent a
use-after-free during the signal emit?  Now I'm seeing valgrind errors
on E and weston during buffer destroy.

Personally, I don't think we should change this assumption and declare
the existing code that's worked for years suddenly buggy. :/

The code was buggy the whole time. Just because it was never triggered, does
not imply it's not a bug.
free()ing these struct wl_list without removing them from the signal list
leaves other struct wl_list that are outside the control of the current code
in an invalid, prone to use-after-free, state.

There's a difference between something being 'buggy' and a design with
non-obvious details you might not like. If destroy handlers not
removing their list elements were buggy, we would be seeing bugs from
that. But instead it's part of the API contract: when a destroy signal
is invoked, you are guaranteed that this will be the first and only
access to your list member. This implies that anyone trying to remove
their link from the list (accessing other listeners in the list) is

Suddenly allowing this is a breaking API change (*some* struct wl_list inside
a wl_listener) can suddenly become invalid for reasons outside the users

I don't know if I've quite parsed this right, but as above, not
removing elements of a destroy listener list, when the listener is
invoked, is our current API.

Related to this entire thing:
In [1] you added tests for this and promote something, that is in essence, a
breaking change.

It's not a breaking change though: it's the API we've pushed on everyone so far.

Also, it doesn't prevent external libraries from doing whichever they want
if they have complete control of the destroy listener list contents.

So you suggest we break a now mandated api and expose ourselves to funny
implementation detail changes that are now justified, because *we break API*?

I'm sorry, I'm have a hard time parsing this.

The suggested mandate is that libwayland internals won't touch the listener after you receive the notification.

That on receipt of a destroy notification you can free your stuff without removing your listener from the list.

What is prevented is libwayland's destroy notifier list walk accessing an
element again after it is potentially freed by external code.

Which could be fixed by said node removing itself from a list, instead of
leaving a list in invalid states for asumed behaviour.

And, of course, break the whole external world in the process.


We're not going to break years of working code built on what seems to have been a quite reasonable assumption.

We can completely replace the internal data structures in libwayland with
whatever we want, but we must preserve that behaviour.

Why can we change one implementation detail, but have to keep another one?

One is API, the other is implementation detail, so the question is irrelevant?

I understand what you're saying, I really do, but it's not pragmatic. Again, we can't break all external users of our library for very little real benefit.

It does not mean we can never rework the destroy signal emit path in
libwayland to allow some items to be removed by the notification handlers
and others just freed, or to allow a destroy notifier to touch the list.A

There is no destroy signal emit path.

Sure there is. It's currently an implementation detail that it happens to be the exact same path as other signal emitters.

There is a signal emit path, which may be called on destroy signals, and
suddenly has to follow different semantics because of what exactly?

Because a large amount of software will break if we change the currently expected behavior.

I realize this leaves me open to all manner of ridiculous slippery slope arguments, such as "if my software depended on a bug in an authentication system that forgot to ask for a password...", so all snark aside can we back away from that ledge now?

As library authors we have to be pragmatic. We have to avoid surprising our callers. While this API constraint is annoying, it is harmless, and demanding all old code suddenly conform to a new, different constraint that was never enforced before is too onerous.

And how exactly do we expose that to our listeners? By the name of the signal
in the struct?

Why do we need to? A notification callback written to be used as a destroy notifier will be written differently than one intended to be used for other things.

Most notably, a non-destruction notifier will be capable of being called more than once, or a crash could occur (right now).

It just makes it easy to verify that attempts to do that don't break
guarantees we've always made.

Again, kindly point me to the point where an implementation detail made it
towards a guarantee.
And for which signals exactly. Under which circumstances, oh and what
guarantee exactly.

We seem to be talking in circles - as above, we can't break all that software. I don't really like it either, but it's where we're at right now.

It would be nice if I had a proper proposal to take apart for this to be
explictly added to the API.
Otherwise I can provide a (intentionaly snarky) one :)

Oh, that's good, I was hoping we could turn up the snark at some point. ;)

Part one ammend [1] (wl_listener) with:
"The wl_list inside a wl_listener can be invalid (pointer towards free'd
memory) at any time the listener notify is called. For further details see


Part two ammend [2] (wl_signal) with:
Signals with the name `*_destroy` have special semantics.
If they are currently emitted, any wl_signal_add/wl_signal_get on the signal
or wl_list_remove on the link of any listener in it is invalid.
This is also the cause for invalid struct wl_list entries in wl_listener.



Seriously though, your #2 change is defining behaviour that currently crashes to continue to crash. That's an implementation detail, and nobody can possibly be relying on it. We can *fix* that instead of trying to leverage it to start a fight on the internet. ;)

[1] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__listener
[2] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__signal

Sure, the exact way they are specified here is a bit funny.
We could also add that to the various `wl_*_add_destroy_listener` functions.
Then we'd have the (from libwayland side breaking) change that e.g.
`wl_event_loop_get_destroy_listener` can't be called anymore under certain

I'll happily review a patch that mentions libwayland won't attempt to access the listeners in the destroy list more than once though. Should probably write one myself.



It also makes good wrapper implementations into managed languages annoying.
For example (admittedly my own) [2] ensures a wl_listener can never be lost
and leak memory. It is freed when the Handle is GC'd.
To prevent any use-after-free into this wl_listener, it removes the listener
from the list beforehand.
I would very much like to keep this code (since it is perfectly valid on the
current ABI) and is good design in managed languages.

Sure, that is annoying. In hindsight, it probably wasn't a good API
for particularly the new generation of managed languages. In the
meantime, probably the easiest way to do this, and come into line with
all the other users, would be to define a separate destroy-listener
type which intentionally leaks its wl_listener link after being
signaled, rather than removing it.

wayland-devel mailing list

wayland-devel mailing list

Reply via email to