Re: [PATH] core: implement a safe wl_signal_emit

Derek Foreman Thu, 22 Feb 2018 14:04:37 -0800

On 2018-02-22 01:36 PM, Markus Ongyerth wrote:

On 2018/2月/22 12:31, Derek Foreman wrote:

On 2018-02-22 10:48 AM, Markus Ongyerth wrote:

On 2018/2月/22 09:34, Derek Foreman wrote:

On 2018-02-22 08:58 AM, Daniel Stone wrote:

Hi,


On 22 February 2018 at 14:14, Markus Ongyerth <[email protected]> wrote:

It seems that this patch makes that assumption invalid, and we would
need patches to weston, enlightenment, and mutter to prevent a
use-after-free during the signal emit?  Now I'm seeing valgrind errors
on E and weston during buffer destroy.

Personally, I don't think we should change this assumption and declare
the existing code that's worked for years suddenly buggy. :/


The code was buggy the whole time. Just because it was never triggered, does
not imply it's not a bug.
free()ing these struct wl_list without removing them from the signal list
leaves other struct wl_list that are outside the control of the current code
in an invalid, prone to use-after-free, state.


There's a difference between something being 'buggy' and a design with
non-obvious details you might not like. If destroy handlers not
removing their list elements were buggy, we would be seeing bugs from
that. But instead it's part of the API contract: when a destroy signal
is invoked, you are guaranteed that this will be the first and only
access to your list member. This implies that anyone trying to remove
their link from the list (accessing other listeners in the list) is
buggy.

Suddenly allowing this is a breaking API change (*some* struct wl_list inside
a wl_listener) can suddenly become invalid for reasons outside the users
control.


I don't know if I've quite parsed this right, but as above, not
removing elements of a destroy listener list, when the listener is
invoked, is our current API.

Related to this entire thing:
In [1] you added tests for this and promote something, that is in essence, a
breaking change.


It's not a breaking change though: it's the API we've pushed on everyone so far.


Also, it doesn't prevent external libraries from doing whichever they want
if they have complete control of the destroy listener list contents.


So you suggest we break a now mandated api and expose ourselves to funny
implementation detail changes that are now justified, because *we break API*?


I'm sorry, I'm have a hard time parsing this.

The suggested mandate is that libwayland internals won't touch the listener
after you receive the notification.

Libwayland internals?
That would be fine. Then effectivly nobody can rely on this either way, if
they ever want to have code that can be integrated with other consumers of
libwayland.

The problem is, that a single library that relies on this will force anyone
that uses it to adhere to it.
If we now have a listener that does things properly it will use-after-free if
it ever shares a signal list with that library.


I've just sent an RFC patch to the list that resolves that issue.


That on receipt of a destroy notification you can free your stuff without
removing your listener from the list.


Which is exactly what we don't want. Since that implies whenever we share a
destroy signal list with a listener from somewhere that's not inherently our
code, we can't rely on us being allowed to remove ourselves form the list
(that's everything from libwayland btw.).
And if we do, we'd have to take the blame for any integration that fails,
since *WE* break API now.


See above.

doing whichever they want

So if you suggest that we jsut break api here, you actually suggest we do
something that will break as soon as someone wants to integrate a library that
also works with another codebase that relies on this mandate.


See above.


What is prevented is libwayland's destroy notifier list walk accessing an
element again after it is potentially freed by external code.


Which could be fixed by said node removing itself from a list, instead of
leaving a list in invalid states for asumed behaviour.


And, of course, break the whole external world in the process.

No.

These users break interop with qt. If we ever expect a library that uses qt
(maybe in a plugin for weston?) to be used together with the codebase then
things break.


See above.


We're not going to break years of working code built on what seems to have
been a quite reasonable assumption.

And the asumption that memory that's pointed into (by pointers that are
*really* easy to fix) sounds pretty reasonable as well.


The assumption that this is really easy to fix is incorrect.

Yeah, it's a one liner in every destroy handler everywhere in code wedon't follow. That part is trivial.

Making sure every distro that packages both wayland and something thatcalls wayland updates all those callers to new versions before theyupdate wayland to something that violates these assumptions is less trivial.

While I generally despise the word "ecosystem" when used to describeanything that doesn't contain fish, we'd really be doing ours harm if webroke a basic assumption that... right, you know where I'm going withthis as I've said it 37 times by now. ;)

I point out qt here (which afaik implies KDE) to also get a few years, though
I don't see a reason why 5 year old code that behaves badly would be worth
more than 3 month old code that (may or may not) behave badly, so I could also
talk about our implementations.

This is disingenuous though, what you're complaining about isn'tbreaking Qt, it's breaking an interoperation in a single client betweenQt code and <other library> that makes the assumption some of us areclaiming should be considered API.

That is, it breaks no existent code - because all that code wouldcurrently be crashing and someone would've complained to the list aboutit with a concrete example.

While this mandate does not directly cause crashes in those (which admittably
is a bit better), it does result in code that's no longer correct. So if
anyone cares about code correctness, this code is now effectivly broken.


Disagree, it will continue to execute as it always has.  Also, see above ;)

You effectivly can't choose to not break stuff. You have to pick which current
usecase you want to break.


See above.

I don't think there's a good measure, which consumers should get priority.
(Of course we should ;) )

Unfortunately, when you see my patch you'll notice how I've assignedpriorities. :(

What I don't see a way to do is allow a walk of the signal notifier listduring a destruction handler - (for the folks at home, that's somethingthat's never been possible, but this thread is a direct result of apatch that tried to change that)


We can completely replace the internal data structures in libwayland with
whatever we want, but we must preserve that behaviour.


Why can we change one implementation detail, but have to keep another one?


One is API, the other is implementation detail, so the question is
irrelevant?


Cuold you kindly point me towards the point that made the implementation
detail of not touching a certain (currently rather unsepcified) subset of
wl_list elements part of the API?

No such requirement exists and nobody is proposing making that arequirement. You're claiming it follows directly from other actionsthat it doesn't.

I can't find the point that guarantees it. Since that would also imply, that I
can't remove my own destruction listeners from the notify callback and I have
been doing that so far.

These are not mutually exclusive requirements. That is animplementation detail. See above.


I understand what you're saying, I really do, but it's not pragmatic. Again,
we can't break all external users of our library for very little real
benefit.


Again, you break users either way. I do see the difference between
use-after-free and just making code incorrect, but both of those are breaking
changes.
And iirc further down in my previous mail, I pointed out a point in the
current api docs, that would have to get fixed, in a breaking way.


No, we don't.

BTW, if you think I've flippantly dismissed a concerns that weren'tisomorphic with "See above." do let me know, but I feel the entropy inthis discussion is dropping rapidly. I worried if I trimmed toosavagely at this point you'd think I dismissed something out of hand.

But I think at this point we're spiraling around the same points overand over.


It does not mean we can never rework the destroy signal emit path in
libwayland to allow some items to be removed by the notification handlers
and others just freed, or to allow a destroy notifier to touch the list.A


There is no destroy signal emit path.


Sure there is.  It's currently an implementation detail that it happens to
be the exact same path as other signal emitters.o


:) I wouldn't mind to cooperate on this, since destroy signals are really the
thing we worry about (with this commit).
If we have a way to specify something as destroy listener, with the added
semantics, that every listener is expected to be removed from the list after
the emit, we can probably get something done that works for both usecases.

I think we can perhaps do that going forward - but it would have to bein addition to something like the patch I've already proposed, whichwould allow all currently functioning destroy signal users to continueto function and interoperate. (and passes my proposed test case)

At that point the new api becomes more about solving the specificproblem of allowing list walks from a destroy notifier (for presumably asubset of listeners that used the new api)

This turns into a class of problems that can be entirely solved indifferent ways, so it may be hard to gain traction for a solution ifit's deemed invasive...

This is certainly the smaller breaking of listener semantics. It may require a
bit of a hacky thing to do, but it should be doable. I wouldn't mind having a
whack at that.
That should probably split what we currently have into:
`wl_signal_emit`, `wl_signal_emit_safe`, `wl_signal_emit_safe_destroy`

`wl_signal_emit` would be deprecated, but kept to not surprise library users.
`wl_signal_emit_safe` would be this patch's version of emit
`wl_signal_emit_safe_destroy` would make sure to not touch the memory of
already called listeners while keeping the semantics that allow us to remove
arbitrary listeners from the signal list.
I could probably hack compat between the two behavious into that one as well,
thinking about what it could look like.

If you can, I'll happily review patches - as long as we don't requireany existing applications/toolkits to suddenly be broken.


I'd like to see this resolved too.

I have a hard time thinking about a solution to the arbitrary removalcase that doesn't break previous usage. A separate destroy listenerlist that allows arbitrary walks (with an emit like emersion's code)seems like it would resolve that, provided we still had a separate apifor old style users...


Whether that would land or not requires more opinions than my own.

There is a signal emit path, which may be called on destroy signals, and
suddenly has to follow different semantics because of what exactly?


Because a large amount of software will break if we change the currently
expected behavior.

I realize this leaves me open to all manner of ridiculous slippery slope
arguments, such as "if my software depended on a bug in an authentication
system that forgot to ask for a password...", so all snark aside can we back
away from that ledge now?

I didn't want to drive it that far. No worries, I think I can justify my
opinion with actual arguments.


Much appreciated. :D


As library authors we have to be pragmatic.  We have to avoid surprising our
callers.  While this API constraint is annoying, it is harmless, and


hehehe Thanks for that one.

demanding all old code suddenly conform to a new, different constraint that
was never enforced before is too onerous.

Then why is that the fix you propose?


See above. ;)

I think you have the same basic misunderstanding (from my point of view of
course) as Daniel.
That this change does not force anyone to change their current code.
I think I layed out my reasoning why I disagree in this (and the mail to
daniel) at multiple points.

As I hope my patch illustrates, existing code can continue to run, andeven be made to interoperate, with no surprises.

And how exactly do we expose that to our listeners? By the name of the signal
in the struct?


Why do we need to?  A notification callback written to be used as a destroy
notifier will be written differently than one intended to be used for other
things.

We have to in the context of sane language bindings. It is exceedingly
annoying, if something silly like this prevents me from forcing proper
behaviour of the wl_list type on GC.
In the context of C alone it's a triviality, in the context of languages that
actually try to get some safety into types or the runtime, it's horrible.

In the context of a C library that's been in development for almost adecade, there are decisions that have been made a long time ago withoutthe input of developers of different languages, and we're stuck withsome of them now. :(

Part one ammend [1] (wl_listener) with:
"The wl_list inside a wl_listener can be invalid (pointer towards free'd
memory) at any time the listener notify is called. For further details see
wl_signal.


NAK.

Make a better one, that properly allows what you intend to allow :)


After we debate the patch I just sent out we can address that again.

I would never suggest to actualy take this up, but that's the chagne you are
intending to make form my viewpoint.

I feel your viewpoint was based on assuming certain implementationdetails were set in stone.

It is better constrained, but this is the change to the listeners, if we just
officially allow to free them inside a signal list.

Obviously we can't do that in a general sense, only for destructionlisteners.

Part two ammend [2] (wl_signal) with:
Signals with the name `*_destroy` have special semantics.
If they are currently emitted, any wl_signal_add/wl_signal_get on the signal
or wl_list_remove on the link of any listener in it is invalid.
This is also the cause for invalid struct wl_list entries in wl_listener.


NAK.

HTH.

HTH?

Hope That Helps. bonus snark for your obviously inflammatorysuggestions. ;)


Seriously though, your #2 change is defining behaviour that currently
crashes to continue to crash.  That's an implementation detail, and nobody
can possibly be relying on it.  We can *fix* that instead of trying to
leverage it to start a fight on the internet. ;)

If it currently crahses (in the context of libwayland, not
weston/gnome/whatever) then it should probably be added to the documentation.
Since breaking wl_list_get was one of the points discussed about this patch on
irc before, so I would expect the wl_*_list_get functions to be valid at any
point.
If it is something that crashes in weston/gnome and should now be allowed do
to your mandate, I don't see how you can claim it's not a breaking change.

By this definition adding new API is a breaking change, as it enablessomething that didn't previously work.

Except it's not a breaking change because nothing previously knew aboutit or could depend on it and function.


I think the key issues we need to address right now are:

1) whether callers can continue to free listeners in destroy handlerswithout removing them first.2) whether callers can continue to remove their listeners from destroyhandlers.

3) how to make these interoperate.

I think with my proposed patch we're at yes, yes, and they do.

After that we can worry about list walks from destroy notifiers -something that's never ever been possible...


Thanks,
Derek

[1] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__listener
[2] https://wayland.freedesktop.org/docs/html/apc.html#Server-structwl__signal

Sure, the exact way they are specified here is a bit funny.
We could also add that to the various `wl_*_add_destroy_listener` functions.
Then we'd have the (from libwayland side breaking) change that e.g.
`wl_event_loop_get_destroy_listener` can't be called anymore under certain
circumstances.


I'll happily review a patch that mentions libwayland won't attempt to access
the listeners in the destroy list more than once though.  Should probably
write one myself.

Thanks,
Derek


Thanks,
Derek

It also makes good wrapper implementations into managed languages annoying.
For example (admittedly my own) [2] ensures a wl_listener can never be lost
and leak memory. It is freed when the Handle is GC'd.
To prevent any use-after-free into this wl_listener, it removes the listener
from the list beforehand.
I would very much like to keep this code (since it is perfectly valid on the
current ABI) and is good design in managed languages.


Sure, that is annoying. In hindsight, it probably wasn't a good API
for particularly the new generation of managed languages. In the
meantime, probably the easiest way to do this, and come into line with
all the other users, would be to define a separate destroy-listener
type which intentionally leaks its wl_listener link after being
signaled, rather than removing it.

Cheers,
Daniel
_______________________________________________
wayland-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/wayland-devel




_______________________________________________
wayland-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


_______________________________________________
wayland-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [PATH] core: implement a safe wl_signal_emit

Reply via email to