Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

2015-07-17 Thread Gordon Sim

On 07/17/2015 04:37 PM, Rafael Schloming wrote:

Are they really returned with a ref count of 0?


No, indeed they are not. I missed the fact that the finalizer will run 
before the object is returned to the application.


Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

2015-07-17 Thread Gordon Sim

On 07/17/2015 08:11 PM, Rafael Schloming wrote:

On Fri, Jul 17, 2015 at 10:45 AM, Gordon Sim  wrote:


Still digesting the explanation (thanks!) but one follow up question:

On 07/17/2015 04:37 PM, Rafael Schloming wrote:


it isn't actually possible to use the object when there refcount is 0.



What is the purpose of the incref/decref pattern then, e.g. as used in
pn_session_free()? That is designed to trigger the finalizer, right? If so
it would seem that can only happen if the reference count is 0 at the point
when pn_session_free() is called.

(And if it was valid to call pn_session_free for a given session, then
that session would have been valid for use before that, no?)



The refcount should never be zero going into pn_session_free. The situation
where it triggers the finalizer is when the refcount is 1 upon entering
pn_session_free, but the referenced boolean is false meaning that the one
remaining reference that exists is the parent -> child pointer (in this
case the connection -> session pointer). The incref triggers the
pn_session_incref hook which flips the reference around so that the session
-> connection pointer is now the reference and the connection -> session
pointer is the borrowed reference.


Ah, I see, I'd missed that hook. So the pn_incref(session) call actually 
increments the reference of the *connection*, not the session in this 
case. The pn_decref(session) then reduces the refcount of the session 
from 1 to 0 which triggers the finalizer.


Thanks! I'm slowly understanding how it works a little more.



Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

2015-07-17 Thread Rafael Schloming
On Fri, Jul 17, 2015 at 10:45 AM, Gordon Sim  wrote:

> Still digesting the explanation (thanks!) but one follow up question:
>
> On 07/17/2015 04:37 PM, Rafael Schloming wrote:
>
>> it isn't actually possible to use the object when there refcount is 0.
>>
>
> What is the purpose of the incref/decref pattern then, e.g. as used in
> pn_session_free()? That is designed to trigger the finalizer, right? If so
> it would seem that can only happen if the reference count is 0 at the point
> when pn_session_free() is called.
>
> (And if it was valid to call pn_session_free for a given session, then
> that session would have been valid for use before that, no?)
>

The refcount should never be zero going into pn_session_free. The situation
where it triggers the finalizer is when the refcount is 1 upon entering
pn_session_free, but the referenced boolean is false meaning that the one
remaining reference that exists is the parent -> child pointer (in this
case the connection -> session pointer). The incref triggers the
pn_session_incref hook which flips the reference around so that the session
-> connection pointer is now the reference and the connection -> session
pointer is the borrowed reference. Logically this momentarily increases the
refcount from 1 to 2 and then the hook decreases it back to 1 and the
decref triggers the finalizer.

Now if pn_session_free is called when the refcount is > 1, or if
pn_ep_decref ends up posting events (events create references to endpoint
objects and thereby increase the refcount), then the pn_incref/decref ends
up being a noop and the session is simply marked as free but finalized when
the last reference to it is decref'ed.

--Rafael


Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

2015-07-17 Thread Gordon Sim

Still digesting the explanation (thanks!) but one follow up question:

On 07/17/2015 04:37 PM, Rafael Schloming wrote:

it isn't actually possible to use the object when there refcount is 0.


What is the purpose of the incref/decref pattern then, e.g. as used in 
pn_session_free()? That is designed to trigger the finalizer, right? If 
so it would seem that can only happen if the reference count is 0 at the 
point when pn_session_free() is called.


(And if it was valid to call pn_session_free for a given session, then 
that session would have been valid for use before that, no?)





Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

2015-07-17 Thread Rafael Schloming
On Thu, Jul 16, 2015 at 7:46 AM, Gordon Sim  wrote:

> On 07/16/2015 02:40 PM, aconway wrote:
>
>> Can someone who understand the proton use of refcounts please add some
>> doc comments to explain the semantics? Apologies if this is already
>> there and I missed it, tell me to RTFM.
>>
>
> I'm not entirely sure I understand it. However having spent a couple of
> days reading and puzzling over the code, I'll try and offer some answers
> where I think I can, and add some questions of my own.
>
>  For comparison here is what "refcount" traditionally means ("tradition"
>> includes CORBA, boost::intrusive_ptr, std::shared_ptr etc.) I'm not
>> saying we have to make proton conform, but we should document how it
>> differs to save confusion.
>>
>> 1. A refcount is a count of the number of references (owning pointers)
>> to an object.
>>
>
> Yes, that broadly speaking is the purpose in proton. However this was a
> recent addition and it is not used uniformly inside the library itself. Not
> only that but the old api, where the application (generally) owned the
> pointers it created until it called pn_xxx_free, is still supported
> alongside the newer mode of use, as e.g. employed in the python binding,
> where the application uses only incref and decref.
>
>  2. Objects are created with refcuont=1, the creator owns the first
>> reference.
>>
>
> This is not always the case and was one of the points I found surprising.
> Though a newly created 'object' does indeed have its ref count set to 1,
> for pn_session() and pn_link(), which are the functions actually used by
> the application to created new sessions or links, the implementation of
> those functions includes an explicit decref, meaning the objects are
> returned with a ref count of 0.
>

Are they really returned with a ref count of 0? I don't think proton
objects can actually exist with a refcount of 0 outside a finalizer. What
should actually happen is that the finalizer for the newly created session
will run and cause the parent of the session or link (the connection or
session) to inspect the child's state. Based on that state it may create a
new reference to the child, e.g. if there is outstanding work on the wire
to be done for that session or link. In the case of a new session or link
this is always the case, so it will always end up creating a new reference
to it, but this should never result in a child with a non zero refcount
being returned or visible in any way to user code.

I suspect this was done to simplify things for the newer mode of use, where
> a wrapper object can always be instantiated and it simply increments the
> ref count. Would be good to have that confirmed though, as to me this is
> the source of confusion and complexity.
>

Yes, this is certainly true, but it is also to accommodate the memory model
the C interface exposes. In the C interface sessions and links are owned by
their parent objects, e.g. freeing the connection frees all the contained
sessions. The way this is accommodated is that the parent object is what
owns the reference to the child by default. What is returned from the
pn_session()/pn_link() calls is a borrowed reference. (This changes when
you do an incref, see below for more details.)


> However this means that in the old mode of use, e.g.
> pn_session()/pn_session_free(), the object may have a refcount of 0 even
> though it has not been freed by the application.
>

As mentioned above, this should never be able to happen outside of a
finalizer. Perhaps what is confusing here is that the finalizer can create
a new reference to the object that is being finalized thereby causing the
pn_decref() to *appear* to be a noop, however what is really happening is
an important state change, the last reference is being deleted, and the
finalizer has decided to create a new reference for other purposes (e.g.
because there is outstanding transport work to be done with that object or
because it is going into a pool) and the state of the object (or related
objects/state) is being changed in key ways to reflect this.

Note that this pattern is not at all unique to proton's refcount system.
All GC systems that have finalizers accommodate these sorts of semantics,
i.e. finalizers causing objects to become live again. This is fundamental
to finalizers since they are just user code and can do whatever they want,
including create new references.

What is confusing about it here is more to do with how this capability is
being used to interact with all the engine data structures that predate
both the refcount system and the collection objects that make use of the
refcount system. I believe ultimately the engine data structures should be
reworked to use the various collection objects now available, at which
point a lot of this will become much more centralized, self contained, and
understandable and likely won't require so much magic (or at least the
magic will be in one place rather than spread around like it is now).


>
>  3. If anot

Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

2015-07-16 Thread aconway
On Thu, 2015-07-16 at 15:11 +0100, Gordon Sim wrote:
> On 07/16/2015 02:40 PM, aconway wrote:
> > The fix mentioned above has this, which make no sense under 
> > traditional
> > refcounting:
> > 
> >  pn_incref(endpoint);
> >  pn_decref(endpoint);
> 
> Note that this is not added as part of my fix, it is already there. 
> [snip]
> 
> The reference counting logic may not match the ideal, but we can't 
> postpone a fix for the current issue pending some nicer overall 
> solution. We can avoid it, by backing out the problematic previous 
> commit, or we can adjust that commit.
> 

+1, I'm just proposing that we get a clear statement of the intended
semantics of the *existing* proton refcounting scheme into the code as
it has tripped up a couple of people so far. I for one am still not
very clear on it. This is not an issue for 0.10 but for longer term
code health.

Cheers,
Alan.


Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

2015-07-16 Thread Gordon Sim

On 07/16/2015 02:40 PM, aconway wrote:

Can someone who understand the proton use of refcounts please add some
doc comments to explain the semantics? Apologies if this is already
there and I missed it, tell me to RTFM.


I'm not entirely sure I understand it. However having spent a couple of 
days reading and puzzling over the code, I'll try and offer some answers 
where I think I can, and add some questions of my own.



For comparison here is what "refcount" traditionally means ("tradition"
includes CORBA, boost::intrusive_ptr, std::shared_ptr etc.) I'm not
saying we have to make proton conform, but we should document how it
differs to save confusion.

1. A refcount is a count of the number of references (owning pointers)
to an object.


Yes, that broadly speaking is the purpose in proton. However this was a 
recent addition and it is not used uniformly inside the library itself. 
Not only that but the old api, where the application (generally) owned 
the pointers it created until it called pn_xxx_free, is still supported 
alongside the newer mode of use, as e.g. employed in the python binding, 
where the application uses only incref and decref.



2. Objects are created with refcuont=1, the creator owns the first
reference.


This is not always the case and was one of the points I found 
surprising. Though a newly created 'object' does indeed have its ref 
count set to 1, for pn_session() and pn_link(), which are the functions 
actually used by the application to created new sessions or links, the 
implementation of those functions includes an explicit decref, meaning 
the objects are returned with a ref count of 0.


I suspect this was done to simplify things for the newer mode of use, 
where a wrapper object can always be instantiated and it simply 
increments the ref count. Would be good to have that confirmed though, 
as to me this is the source of confusion and complexity.


However this means that in the old mode of use, e.g. 
pn_session()/pn_session_free(), the object may have a refcount of 0 even 
though it has not been freed by the application.



3. If another owning reference is created, the refcount MUST BE
incremented.


This is not currently the case for all internal use which necessitates 
some extra logic and checks.



4. The owner of a reference MUST decrease the refcount on dropping the
reference and MUST NOT use the pointer after refcount is decremented.
It no longer "owns" a reference.

5. The object MAY be deleted when refcount goes to 0 or MAY be pooled
for re-use but it MUST NOT be used by any code (other than the pooling
system if there is one)


As above, that is currently not completely true at least for sessions 
and links, where they may be in use while still having a 0 ref count.



6. You never examine the refcount (except for debugging) Either you own
a reference, which means refcount > 0, or you don't, which means you
MUST NOT touch the object, even to examine its refcount.


My questions:

* what are the respective roles of the ref count in pn_endpoint_t, and 
in the pni_head_t record associated with pn_session_t and pn_link_t? 
i.e. why are there two separate ref counts for the same object?


* pn_link_new() and pn_session() both decrement the count (in the 
pni_head_t record) for the newly created objects. Why?


* what exactly does the 'referenced' flag indicate?

* when a finalize function is exited early due to the return value of 
pni_preserve_child, the reference count is incremented to prevent that 
object being freed by the code in pn_class_decref. Where/how is that 
then decremented again to retrigger the finalizer?


* when should pn_class_free be used v pn_class_decref? (The former does 
not check whether the finalizer increases the reference, the latter does)


* pn_class_free() has the precondition that the object passed in should 
have a refcount of 1 (which I understand) or -1. When and why would the 
refcount be -1?


Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

2015-07-16 Thread Gordon Sim

On 07/16/2015 02:40 PM, aconway wrote:

The fix mentioned above has this, which make no sense under traditional
refcounting:

 pn_incref(endpoint);
 pn_decref(endpoint);


Note that this is not added as part of my fix, it is already there. The 
simple explanation is that it is a trick to get a finalize function for 
the object in question to be re-run (when it was previously cut short). 
The broader context is that the reference counting is not used 
everywhere internally (it was a more recent addition), causing some 
additional twists and turns in the logic.




Traditionally, this sequence is a no-op if the caller owns the
reference and is illegal if not. We need a clear statement of the
semantics of such sequences: when it it legal and what does it mean?
 From the fix it seems to be related to the "freed" and "referenced"
members, what exactly is the relationship?


The 'freed' flag indicates that the pn_xxx_free() call was made. This is 
not always the case. There is a mode of use, e.g. in the python binding, 
that use inc-/dec- ref instead. I'm not as clear on the design behind 
the 'referenced' flag, but it is set to false when the finalizer is cut 
short.


The original fix for PROTON-905 triggered the re-run of the finalizer 
whenever the reference count was 0 and a session or link was moved out 
of the set of modified items. From the description it was only intended 
to do this where the object in question had been freed and had had its 
finalizer cut short. Hence the addition in my proposed fix of that extra 
qualification.


As it is, the code currently on master, pending the 0.10 release is 
*incompatible* with qpid-cpp. So either the original fix needs to be 
reverted or it needs to be corrected in some way (e.g. along the lines I 
have suggested).


The reference counting logic may not match the ideal, but we can't 
postpone a fix for the current issue pending some nicer overall 
solution. We can avoid it, by backing out the problematic previous 
commit, or we can adjust that commit.