Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-27 Thread Peter Bex
On Thu, Jul 27, 2023 at 12:30:46PM +0200, Peter Bex wrote:
> 1) There's no (efficient) way to know if an object is a finalizable one.
>We need this because we can't simply clear *all* the objects inside
>a finalizable object that aren't referenced elsewhere, because we do
>want to keep foreign pointers etc which are only referenced by the
>finalized object itself.  So we'd need an efficient way to know if
>an object pointed to by a finalizable object is itself finalizable.

Strike that - I think it can be done at the cost of an additional
pointer per finalizer to encode the boundaries of objects that belong to
that finalizer's "reachable set".  But still, that leaves us with #2.

> 2) We have no (canonical) way of breaking strong references.  For weak
>references, it is clear that we have some special indicator, the
>"broken weak pointer" placeholder that gets put there when an object
>reference is cleared.

Cheers,
Peter


signature.asc
Description: PGP signature


Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-27 Thread Peter Bex
On Wed, Jul 26, 2023 at 04:45:53PM +0100, Andy Bennett wrote:
> So how can we finalise a circular list of objects all of which have
> finalisers and still maintain atomicity?
> 
> The docs say the order is "undefined".
> It seems that the best way to finalise this structure is to explicitly break
> all the strong references between components of the list (as we do for
> external weak references) before any of the finalisers are called.

I thought about this one too - it would be nice if all the finalized objects
that refer to other finalized objects would have these links cleared.
However, there are two main obstacles to that:

1) There's no (efficient) way to know if an object is a finalizable one.
   We need this because we can't simply clear *all* the objects inside
   a finalizable object that aren't referenced elsewhere, because we do
   want to keep foreign pointers etc which are only referenced by the
   finalized object itself.  So we'd need an efficient way to know if
   an object pointed to by a finalizable object is itself finalizable.

2) We have no (canonical) way of breaking strong references.  For weak
   references, it is clear that we have some special indicator, the
   "broken weak pointer" placeholder that gets put there when an object
   reference is cleared.

> It may also improve the memory model if we define the object that the
> finaliser receives as a "copy" of the object that has ("already") been
> garbage collected.

I don't really think this will make much of a difference either way - as
it is currently, you couldn't distinguish the "original" object from a
copy.

Possibly a better way is what MIT Scheme (and, gasp, JavaScript) does:
register a finalizer on object with an extraction procedure that returns
the value to finalize.  That way, the object getting deleted is not the
object that is getting finalized.  For instance, when a port would be
finalized, the finalization procedure receives the underlying file
descriptor and not the entire port object.

Although after giving that some more thought, I'm not 100% sure this
would really solve the issue - you can still extract complex objects
from the to-be-finalized object (or have the "identity" procedure as
the "extractor", so it returns the object itself) and therefore have
an object with other things pointing into it, and that could still
be resurrected...

Cheers,
Peter


signature.asc
Description: PGP signature


Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-26 Thread Andy Bennett

Hi,


I think this clears things sufficiently up.


Please excuse me if I am missing some details...



Strong references are already ensured to be gone
and weak references are cleared using the incantations that sjamaan
(in his wisdom) proposed


What is the definition of a weak reference in this case?


One definition is that weak references only refer to the object if there 
are one or more strong references to the object.


In this case, all the strong references are gone before the finaliser runs 
and therefore all the weak references are semantically invalid already.


Whilst the finaliser is running there are no strong references to the 
object but it's possible (if the finaliser decides to resurrect the object) 
that there will once again be a strong reference to the object (e.g. if the 
finaliser set!s it).


But then, as you say, it's reincarnated: a new object that will never be 
more than equal? (and certainly not eqv? or eq?) to the old one.


Everything's consistent in this model because the references that the GC 
has to objects are neither "weak" nor "strong" but merely an implementation 
detail hidden from the outside.


This seems to be the definition that sjamaan is using.



Another definition is that weak references can refer to an object that is 
still "in memory".


This way lies disaster because then the semantics of weak references are 
dependent on specific implementation details.





I think here we're struggling with the atomicity of the garbage collector 
because the finaliser is special user code that executes inside the garbage 
collector's "transaction" and that code has all the power and capabilities 
of any other scheme code.




What happens if there are a circular list of objects all of which have 
finalisers?


The CHICKEN GC usually handles circular lists well, but when this one is 
collected (i.e. there are no external strong references to the list 
anymore) all its members come up for finalisation at the same time.


Which order are the finalisers called in?

...and which of the other objects can be seen from the object that is 
finalised first?



This is similar to the argument above about the correct definition of weak 
pairs.


To preserve atomicity there needs to be no "happens before" or "happens 
after" relationships between things that happen in a single GC cycle 
(transaction).


If weak pairs are not *already* invalid by the time the finaliser runs then 
it's possible to see into the GC transaction because the weak pair 
invalidation can be observed to "happen after" the GC cycle.


When the managed memory for the circular list is freed it does not matter 
precisely which order it is freed in because there are no side effects of 
freeing it and it all happens in a single GC cycle.


But when the finalisers are run the user's code will see these objects in 
various different states depending on the order.



So how can we finalise a circular list of objects all of which have 
finalisers and still maintain atomicity?


The docs say the order is "undefined".
It seems that the best way to finalise this structure is to explicitly 
break all the strong references between components of the list (as we do 
for external weak references) before any of the finalisers are called.




It may also improve the memory model if we define the object that the 
finaliser receives as a "copy" of the object that has ("already") been 
garbage collected.






Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF



Re: How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-11 Thread Peter Bex
On Mon, Jul 10, 2023 at 09:28:19PM +0200, felix.winkelm...@bevuta.com wrote:
> After thinking some more about this, I realize that your approach
> (clearing weak ref's to finalized data) is the right thing, since
> any other behaviour in the presence of multithreading leads to
> disaster.

Thank you for seeing the light.

Cheers,
Peter


signature.asc
Description: PGP signature


How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-10 Thread felix . winkelmann
After thinking some more about this, I realize that your approach
(clearing weak ref's to finalized data) is the right thing, since
any other behaviour in the presence of multithreading leads to
disaster.

Let me elaborate.

Finalization is a time of reckoning, a purgatory where an object
undergoes a final cleansing of possibly sinful state, of references
to things foreign and alien, the dirty underbelly of an objects
existence that must be brought to order in ways only the user can
truly know about. Since the object is in this state, all its wordly
connections have already ceased to exist, its identity forgotten
(or it wouldn't be ready for reclamation).

But what if other threads access a weakly remembered object
while it is in purgatory? They would deal with an empty husk, a ghost,
likely to be devoid of the things (external pointers and other resources)
that define its true self, a mere shadow, with consequences that do not
have to be explicitly mentioned here and are better left unsaid.

Should the user (for reasons we can and must never know) decide that
the object is not ready yet to go to the other world and should stay
for another cycle of suffering in this earthly existence and store the
value in some external location, then the object will internally be
the same and have the same true identity, but external pointers will
have ceased to exist. Strong references are already ensured to be gone
and weak references are cleared using the incantations that sjamaan
(in his wisdom) proposed (certain enlightened objects that have a
sufficiently advanced self conciousness may know their true identity,
i.e. keep circular references to itself, but these must necessarily be
internal and are irrelevant when seem from outside).

So reincarnation means the object _is_ identical, but the nature
of its identity is invisible to the outside world. Enlightened objects
may know about their true identity but trying to communicate that
beyond its inner self is meaningless when seen from the outside.

I think this clears things sufficiently up.


felix




How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-10 Thread felix . winkelmann
> However, there's one more concern:
>
> > The potential use-after-free scenario can still happen if the object is
> > kept alive, regardless of how we handle weak refs, this is unavoidable
> > if we allow finalizers and keep the possibility of resurrection.
>
> I have thought about this a bit more but I came to the conclusion that
> from an abstraction point of view it's better to clear weak refs to
> finalized data.  The reason is that when a module exposes an object, the
> *user* should not need to know or care exactly how that object is
> implemented and that it happens to use a finalizer.
>

I concur - it indeed breaks the abstraction. IHMO both behaviours (clearing
weak refs or not) have potential to confuse the user, but merely adding a
finalizer should not semantically change the behaviour of weak refs.
Let me think a bit more about this, please.


felix




How should we deal with weak refs to finalizable objects? (was: Re: [PATCH] Bugfix and drop weak references to finalizable objects (was: Re: [PATCH] thread-safe handling of asynchronous events))

2023-07-10 Thread Peter Bex
On Fri, Jul 07, 2023 at 11:23:17PM +0200, felix.winkelm...@bevuta.com wrote:
> I'm not very comfortable with this change. This feels like trading in
> one inconsistency (weak refs being cleared for a potentially non-dead
> object) for another (potentially inconsistent ties of GC-controlled
> memory to non-GC'd resources).

It depends on how you view finalizers.  Personally, I would think a
finalizer should get run on what's *essentially* "already GCed" data.
Therefore, it makes no sense to pass a finalizer data that still holds
onto other cleared data.

But like I said, I can get behind your POV - you could also argue that
if an object holds onto other things and it's "already GCed", all the
things it (and nobody else) holds onto (even strongly) should be
cleared, and that absolutely makes no sense whatsoever.

(come to think of it, the ideal solution would probably be to clear
"outside" weak references to finalizable data but keep the object
itself internally intact.  But that's extremely hard to track in the GC)

However, there's one more concern:

> The potential use-after-free scenario can still happen if the object is
> kept alive, regardless of how we handle weak refs, this is unavoidable
> if we allow finalizers and keep the possibility of resurrection.

I have thought about this a bit more but I came to the conclusion that
from an abstraction point of view it's better to clear weak refs to
finalized data.  The reason is that when a module exposes an object, the
*user* should not need to know or care exactly how that object is
implemented and that it happens to use a finalizer.

So let's say an egg exposes an object, and I'm using it, but I want to
reference it weakly.  Then, all things will work fine most of the time.
However, there's a nasty race condition lurking: if the finalizer
happens to run before my code extracts the object from the weak ref,
but I extract it before the next GC, my code may crash.  Or it may not,
use-after-free is tricky like that.

This should not be the user's concern, and it's a breach of abstraction.
It's also a global issue, as it can affect *any* code that holds onto
an object from a "3rd party" (other module).

Note that this would be problematic *even* if the 3rd party wrote the
code so carefully that it clears the pointer from the object such that
there's no use-after-free bug.  Because the code will still raise an
exception when passed this invalidated object.  And again, that will
be a race condition for the person who uses this module.  This means
that the problem is spread to every single user who decides to use a
weak reference to a 3rd party object that involves resources that must
be freed.

On the other hand, if I'm writing an egg that exposes a foreign object
that needs to be collected, it *is* of my concern (and not a breach of
abstraction) that the data gets freed properly.  Let's say I decide to
store weak refs inside the object (which is not even *that* likely),
and those weak refs point to other things inside the same object, which
I know to be GCed at the time the finalizer is called.  Then that's
something I know and must take care of when writing the finalizer.
Finalizers of objects not seeing weak refs into that same object is
very much a localized concernn.

Also, perhaps more importantly, this will break *immediately* and
*consistently*.  The finalizer will simply see broken weak pointers, all
the time, every time it runs on the "collected" object.  Since this is
also documented in my patch, I think this is the least impactful way of
doing things.

I'm not even all that concerned about finalizers "reviving" dead
objects, but in that case it's *definitely* the responsibility of the
finalizer's author to make sure it doesn't cause any trouble.

So overall, IMHO the problem is a lot more self-contained/localized,
and more deterministic if we clear the references before running the
finalizer.  That has to count for something, I think!

Cheers,
Peter


signature.asc
Description: PGP signature