Memory management practices

2011-09-12 Thread Sebastian Spaeth
On Fri, 9 Sep 2011 13:53:28 -0400, Austin Clements  wrote:
> Ah, the *Python* objects don't care, but the underlying C objects do.
[...]

Thanks for the elaboration. I understand now and agree with the analysis..

> Hence my suggestion that, rather than trying to emulate C-style memory
> management in bindings, bindings should create an additional talloc
> reference to the underlying objects and rather than calling
> notmuch_*_destroy during finalization, they should simply unlink this
> additional reference.

Agreed, that sounds like a much better option, although it would keep a
(underlying C object) for Query and all derived Messages around, even
when I explicitely "del query" in python, as long as the python GC keeps
any of those Message() objects alive and around, wouldn't it? (which
would probably be an ok behavior).

But the talloc ref/unref is not exposed through the lib currently, of course.

> Then there's also no need to replicate the library's reference
> structure in the bindings (though there is a danger of needlessly
> delaying free's when the library creates convenience references like
> the one from notmuch_query_t to notmuch_messages_t; for these I'd
> recommend that the bindings undo such references, which requires a
> little knowledge of the library's reference structure, but nothing
> beyond what should be documented).

Right, that would of course solve the above 'problem'.

Sebastian
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



Re: Memory management practices

2011-09-12 Thread Sebastian Spaeth
On Fri, 9 Sep 2011 13:53:28 -0400, Austin Clements  wrote:
> Ah, the *Python* objects don't care, but the underlying C objects do.
[...]

Thanks for the elaboration. I understand now and agree with the analysis..

> Hence my suggestion that, rather than trying to emulate C-style memory
> management in bindings, bindings should create an additional talloc
> reference to the underlying objects and rather than calling
> notmuch_*_destroy during finalization, they should simply unlink this
> additional reference.

Agreed, that sounds like a much better option, although it would keep a
(underlying C object) for Query and all derived Messages around, even
when I explicitely "del query" in python, as long as the python GC keeps
any of those Message() objects alive and around, wouldn't it? (which
would probably be an ok behavior).

But the talloc ref/unref is not exposed through the lib currently, of course.

> Then there's also no need to replicate the library's reference
> structure in the bindings (though there is a danger of needlessly
> delaying free's when the library creates convenience references like
> the one from notmuch_query_t to notmuch_messages_t; for these I'd
> recommend that the bindings undo such references, which requires a
> little knowledge of the library's reference structure, but nothing
> beyond what should be documented).

Right, that would of course solve the above 'problem'.

Sebastian


pgpxuRSupD4PQ.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Memory management practices

2011-09-11 Thread Austin Clements
Quoth Ben Gamari on Sep 11 at  5:47 pm:
> Sorry I've been so quiet on this recently. I've been a little under the
> weather.

No worries.

> On Fri, 9 Sep 2011 13:53:28 -0400, Austin Clements  
> wrote:
> > Hence my suggestion that, rather than trying to emulate C-style memory
> > management in bindings, bindings should create an additional talloc
> > reference to the underlying objects and rather than calling
> > notmuch_*_destroy during finalization, they should simply unlink this
> > additional reference.
> 
> Currently talloc's reference counting interface is hidden behind
> _destroy. While this might be a fairly intrusive change, perhaps notmuch
> wants to juse expose a pair of reference counting *_ref/unref functions
> instead of the *_destroy. Most users would simply need to change
> existing *_destroy()s to _unref()s. Furthermore, this would allow
> bindings authors to easily ensure non-broken GC behavior.

I think the _destroy functions are silly.  They all just call
talloc_free and, indeed, it would arguably be incorrect for them to do
anything more (any additional cleanup should be in a talloc
destructor).  talloc is never explicitly mentioned in lib/notmuch.h
(intentionally, I would assume) but talloc-style notions of
"ownership" pervade the library documentation.  IMO, the library
should just admit to using talloc, rather than try to wrap all of the
not-insubstantial talloc functionality a caller may need.

In the language of talloc, it's very natural to express the needs of
bindings in terms talloc_reference and talloc_unlink.  The bindings
could maintain a per-Database context and track their own ownership by
adding a talloc reference from this context to each object returned
from the bindings; the finalizer would simply unlink the finalized
object from this context.  Bindings could also use a global context
(though that would obviously be awkward in Haskell without biting the
unsafePerformIO bullet).  Alternatively, bindings could use the NULL
context, which has the advantage of not actually tracking ownership in
talloc, but the disadvantage of making it harder to track down bugs
(since any code can reference or unlink from NULL).


Re: Memory management practices

2011-09-11 Thread Austin Clements
Quoth Ben Gamari on Sep 11 at  5:47 pm:
> Sorry I've been so quiet on this recently. I've been a little under the
> weather.

No worries.

> On Fri, 9 Sep 2011 13:53:28 -0400, Austin Clements  wrote:
> > Hence my suggestion that, rather than trying to emulate C-style memory
> > management in bindings, bindings should create an additional talloc
> > reference to the underlying objects and rather than calling
> > notmuch_*_destroy during finalization, they should simply unlink this
> > additional reference.
> 
> Currently talloc's reference counting interface is hidden behind
> _destroy. While this might be a fairly intrusive change, perhaps notmuch
> wants to juse expose a pair of reference counting *_ref/unref functions
> instead of the *_destroy. Most users would simply need to change
> existing *_destroy()s to _unref()s. Furthermore, this would allow
> bindings authors to easily ensure non-broken GC behavior.

I think the _destroy functions are silly.  They all just call
talloc_free and, indeed, it would arguably be incorrect for them to do
anything more (any additional cleanup should be in a talloc
destructor).  talloc is never explicitly mentioned in lib/notmuch.h
(intentionally, I would assume) but talloc-style notions of
"ownership" pervade the library documentation.  IMO, the library
should just admit to using talloc, rather than try to wrap all of the
not-insubstantial talloc functionality a caller may need.

In the language of talloc, it's very natural to express the needs of
bindings in terms talloc_reference and talloc_unlink.  The bindings
could maintain a per-Database context and track their own ownership by
adding a talloc reference from this context to each object returned
from the bindings; the finalizer would simply unlink the finalized
object from this context.  Bindings could also use a global context
(though that would obviously be awkward in Haskell without biting the
unsafePerformIO bullet).  Alternatively, bindings could use the NULL
context, which has the advantage of not actually tracking ownership in
talloc, but the disadvantage of making it harder to track down bugs
(since any code can reference or unlink from NULL).
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Memory management practices

2011-09-11 Thread Ben Gamari
Sorry I've been so quiet on this recently. I've been a little under the
weather.

On Fri, 9 Sep 2011 13:53:28 -0400, Austin Clements  wrote:
> Ah, the *Python* objects don't care, but the underlying C objects do.
> Suppose the Query were finalized first.  Python calls Query.__del__,
> which calls notmuch_query_destroy, which releases the underlying
> talloc references to the C notmuch_messages_t objects, causing talloc
> to free the notmuch_messages_t.  Messages._msgs now points to freed
> memory, so when Python then finalizes the Messages object,
> Messages.__del__ will pass this dangling pointer to
> notmuch_messages_destroy, which will crash.

Exactly. This is exactly what I suspect is happening in my case.

> 
> Hence my suggestion that, rather than trying to emulate C-style memory
> management in bindings, bindings should create an additional talloc
> reference to the underlying objects and rather than calling
> notmuch_*_destroy during finalization, they should simply unlink this
> additional reference.

Currently talloc's reference counting interface is hidden behind
_destroy. While this might be a fairly intrusive change, perhaps notmuch
wants to juse expose a pair of reference counting *_ref/unref functions
instead of the *_destroy. Most users would simply need to change
existing *_destroy()s to _unref()s. Furthermore, this would allow
bindings authors to easily ensure non-broken GC behavior.

Does this sound completely insane, somewhat insane, or reasonable?

Cheers,

- Ben


Re: Memory management practices

2011-09-11 Thread Ben Gamari
Sorry I've been so quiet on this recently. I've been a little under the
weather.

On Fri, 9 Sep 2011 13:53:28 -0400, Austin Clements  wrote:
> Ah, the *Python* objects don't care, but the underlying C objects do.
> Suppose the Query were finalized first.  Python calls Query.__del__,
> which calls notmuch_query_destroy, which releases the underlying
> talloc references to the C notmuch_messages_t objects, causing talloc
> to free the notmuch_messages_t.  Messages._msgs now points to freed
> memory, so when Python then finalizes the Messages object,
> Messages.__del__ will pass this dangling pointer to
> notmuch_messages_destroy, which will crash.

Exactly. This is exactly what I suspect is happening in my case.

> 
> Hence my suggestion that, rather than trying to emulate C-style memory
> management in bindings, bindings should create an additional talloc
> reference to the underlying objects and rather than calling
> notmuch_*_destroy during finalization, they should simply unlink this
> additional reference.

Currently talloc's reference counting interface is hidden behind
_destroy. While this might be a fairly intrusive change, perhaps notmuch
wants to juse expose a pair of reference counting *_ref/unref functions
instead of the *_destroy. Most users would simply need to change
existing *_destroy()s to _unref()s. Furthermore, this would allow
bindings authors to easily ensure non-broken GC behavior.

Does this sound completely insane, somewhat insane, or reasonable?

Cheers,

- Ben
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Memory management practices

2011-09-09 Thread Austin Clements
Quoth Sebastian Spaeth on Sep 09 at 11:27 am:
> On Thu, 8 Sep 2011 11:15:57 -0400, Austin Clements  
> wrote:
> > In general, a garbage collector can't make any guarantees about
> > finalization order.  When a collection of objects all become
> > unreachable simultaneously (for example, the last reference to any
> > Messages object is dropped, causing the Query object and the Message
> > object to both become unreachable), the garbage collector *could*
> > finalize the Query first (causing talloc to free the
> > notmuch_messages_t) and then the Messages object (causing it to
> > crash).  There's no guarantee in general because, in the presence of
> > cycles, there is no meaningful finalization order.
> 
> Right, but that should not pose a problem for python. If e.g. both a
> Query and derived Message objects become unreachable, the python objects
> would not care which object is ditched and deleted first. Currently, it
> seems that we finalize the Messages first, and the Query second. But we
> would not fail if the Query were finalized first. Granted, the
> underlying libnotmuch Message objects were torn away while the python
> Message objects were still around. But they would ultimately also be
> sweeped away, and that would not cause any problems.
> 
> But I am sure that I am missing out something. I'll leave this
> discussion to the pros :-).

Ah, the *Python* objects don't care, but the underlying C objects do.
Suppose the Query were finalized first.  Python calls Query.__del__,
which calls notmuch_query_destroy, which releases the underlying
talloc references to the C notmuch_messages_t objects, causing talloc
to free the notmuch_messages_t.  Messages._msgs now points to freed
memory, so when Python then finalizes the Messages object,
Messages.__del__ will pass this dangling pointer to
notmuch_messages_destroy, which will crash.

Hence my suggestion that, rather than trying to emulate C-style memory
management in bindings, bindings should create an additional talloc
reference to the underlying objects and rather than calling
notmuch_*_destroy during finalization, they should simply unlink this
additional reference.  Any remaining library-created references will
keep the object alive as long as it's still needed by the library.
Then there's also no need to replicate the library's reference
structure in the bindings (though there is a danger of needlessly
delaying free's when the library creates convenience references like
the one from notmuch_query_t to notmuch_messages_t; for these I'd
recommend that the bindings undo such references, which requires a
little knowledge of the library's reference structure, but nothing
beyond what should be documented).


Memory management practices

2011-09-09 Thread Sebastian Spaeth
On Thu, 8 Sep 2011 11:15:57 -0400, Austin Clements  wrote:
> In general, a garbage collector can't make any guarantees about
> finalization order.  When a collection of objects all become
> unreachable simultaneously (for example, the last reference to any
> Messages object is dropped, causing the Query object and the Message
> object to both become unreachable), the garbage collector *could*
> finalize the Query first (causing talloc to free the
> notmuch_messages_t) and then the Messages object (causing it to
> crash).  There's no guarantee in general because, in the presence of
> cycles, there is no meaningful finalization order.

Right, but that should not pose a problem for python. If e.g. both a
Query and derived Message objects become unreachable, the python objects
would not care which object is ditched and deleted first. Currently, it
seems that we finalize the Messages first, and the Query second. But we
would not fail if the Query were finalized first. Granted, the
underlying libnotmuch Message objects were torn away while the python
Message objects were still around. But they would ultimately also be
sweeped away, and that would not cause any problems.

But I am sure that I am missing out something. I'll leave this
discussion to the pros :-).

Sebastian
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



Re: Memory management practices

2011-09-09 Thread Austin Clements
Quoth Sebastian Spaeth on Sep 09 at 11:27 am:
> On Thu, 8 Sep 2011 11:15:57 -0400, Austin Clements  wrote:
> > In general, a garbage collector can't make any guarantees about
> > finalization order.  When a collection of objects all become
> > unreachable simultaneously (for example, the last reference to any
> > Messages object is dropped, causing the Query object and the Message
> > object to both become unreachable), the garbage collector *could*
> > finalize the Query first (causing talloc to free the
> > notmuch_messages_t) and then the Messages object (causing it to
> > crash).  There's no guarantee in general because, in the presence of
> > cycles, there is no meaningful finalization order.
> 
> Right, but that should not pose a problem for python. If e.g. both a
> Query and derived Message objects become unreachable, the python objects
> would not care which object is ditched and deleted first. Currently, it
> seems that we finalize the Messages first, and the Query second. But we
> would not fail if the Query were finalized first. Granted, the
> underlying libnotmuch Message objects were torn away while the python
> Message objects were still around. But they would ultimately also be
> sweeped away, and that would not cause any problems.
> 
> But I am sure that I am missing out something. I'll leave this
> discussion to the pros :-).

Ah, the *Python* objects don't care, but the underlying C objects do.
Suppose the Query were finalized first.  Python calls Query.__del__,
which calls notmuch_query_destroy, which releases the underlying
talloc references to the C notmuch_messages_t objects, causing talloc
to free the notmuch_messages_t.  Messages._msgs now points to freed
memory, so when Python then finalizes the Messages object,
Messages.__del__ will pass this dangling pointer to
notmuch_messages_destroy, which will crash.

Hence my suggestion that, rather than trying to emulate C-style memory
management in bindings, bindings should create an additional talloc
reference to the underlying objects and rather than calling
notmuch_*_destroy during finalization, they should simply unlink this
additional reference.  Any remaining library-created references will
keep the object alive as long as it's still needed by the library.
Then there's also no need to replicate the library's reference
structure in the bindings (though there is a danger of needlessly
delaying free's when the library creates convenience references like
the one from notmuch_query_t to notmuch_messages_t; for these I'd
recommend that the bindings undo such references, which requires a
little knowledge of the library's reference structure, but nothing
beyond what should be documented).
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Memory management practices

2011-09-09 Thread Sebastian Spaeth
On Thu, 8 Sep 2011 11:15:57 -0400, Austin Clements  wrote:
> In general, a garbage collector can't make any guarantees about
> finalization order.  When a collection of objects all become
> unreachable simultaneously (for example, the last reference to any
> Messages object is dropped, causing the Query object and the Message
> object to both become unreachable), the garbage collector *could*
> finalize the Query first (causing talloc to free the
> notmuch_messages_t) and then the Messages object (causing it to
> crash).  There's no guarantee in general because, in the presence of
> cycles, there is no meaningful finalization order.

Right, but that should not pose a problem for python. If e.g. both a
Query and derived Message objects become unreachable, the python objects
would not care which object is ditched and deleted first. Currently, it
seems that we finalize the Messages first, and the Query second. But we
would not fail if the Query were finalized first. Granted, the
underlying libnotmuch Message objects were torn away while the python
Message objects were still around. But they would ultimately also be
sweeped away, and that would not cause any problems.

But I am sure that I am missing out something. I'll leave this
discussion to the pros :-).

Sebastian


pgpDVYXY0iaif.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Memory management practices

2011-09-08 Thread Sebastian Spaeth
On Wed, 7 Sep 2011 23:05:19 -0400, Austin Clements  wrote:
> Sorry, I went back and re-read your earlier messages and now I see why
> your references were the way they were.  I stand by the rest of my
> previous message though.  I think the technique used in the Python
> bindings only works because Python's GC happens to finalize in a
> particular order (though I doubt that's guaranteed, and could easily
> not be the case if you stray into the realm of its cycle collector).
> In general, it seems like approach is trying to recreate C-like memory
> management and is fragile as a result, whereas talloc should, I think,
> allow bindings to express their runtime's memory management rather
> naturally.

Mmmh? Why would the method in python be fragile? Each message object
holds a reference to its parent query object to keep it alive. Are you
saying cycle collectors could kill off the query object nonetheless?
(Assume that I know nothing of GCs which comes close to reality)

Sebastian
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



Memory management practices

2011-09-08 Thread Austin Clements
Quoth Sebastian Spaeth on Sep 08 at  3:50 pm:
> On Wed, 7 Sep 2011 23:05:19 -0400, Austin Clements  
> wrote:
> > Sorry, I went back and re-read your earlier messages and now I see why
> > your references were the way they were.  I stand by the rest of my
> > previous message though.  I think the technique used in the Python
> > bindings only works because Python's GC happens to finalize in a
> > particular order (though I doubt that's guaranteed, and could easily
> > not be the case if you stray into the realm of its cycle collector).
> > In general, it seems like approach is trying to recreate C-like memory
> > management and is fragile as a result, whereas talloc should, I think,
> > allow bindings to express their runtime's memory management rather
> > naturally.
> 
> Mmmh? Why would the method in python be fragile? Each message object
> holds a reference to its parent query object to keep it alive. Are you
> saying cycle collectors could kill off the query object nonetheless?
> (Assume that I know nothing of GCs which comes close to reality)

In general, a garbage collector can't make any guarantees about
finalization order.  When a collection of objects all become
unreachable simultaneously (for example, the last reference to any
Messages object is dropped, causing the Query object and the Message
object to both become unreachable), the garbage collector *could*
finalize the Query first (causing talloc to free the
notmuch_messages_t) and then the Messages object (causing it to
crash).  There's no guarantee in general because, in the presence of
cycles, there is no meaningful finalization order.

That being said, this approach might be (probably is) fine in Python
because Python has an unusual hybrid garbage collector.  Long ago,
Python had only a reference-count based garbage collector.  It now has
a cycle detector layered on top of that [1], but that only kicks in if
there are reference cycles.  Assuming there aren't cycles in the
objects created by the Python bindings, you should get the
deterministic behavior of the reference counted collector.  This isn't
the case in Haskell, which has a generational collector that makes no
guarantees about finalization order (guarantees it couldn't always
keep).


[1] One day a student came to Moon and said: "I understand how to make
a better garbage collector. We must keep a reference count of the
pointers to each cons."

Moon patiently told the student the following story:

"One day a student came to Moon and said: 'I understand how to
make a better garbage collector...


Re: Memory management practices

2011-09-08 Thread Austin Clements
Quoth Sebastian Spaeth on Sep 08 at  3:50 pm:
> On Wed, 7 Sep 2011 23:05:19 -0400, Austin Clements  wrote:
> > Sorry, I went back and re-read your earlier messages and now I see why
> > your references were the way they were.  I stand by the rest of my
> > previous message though.  I think the technique used in the Python
> > bindings only works because Python's GC happens to finalize in a
> > particular order (though I doubt that's guaranteed, and could easily
> > not be the case if you stray into the realm of its cycle collector).
> > In general, it seems like approach is trying to recreate C-like memory
> > management and is fragile as a result, whereas talloc should, I think,
> > allow bindings to express their runtime's memory management rather
> > naturally.
> 
> Mmmh? Why would the method in python be fragile? Each message object
> holds a reference to its parent query object to keep it alive. Are you
> saying cycle collectors could kill off the query object nonetheless?
> (Assume that I know nothing of GCs which comes close to reality)

In general, a garbage collector can't make any guarantees about
finalization order.  When a collection of objects all become
unreachable simultaneously (for example, the last reference to any
Messages object is dropped, causing the Query object and the Message
object to both become unreachable), the garbage collector *could*
finalize the Query first (causing talloc to free the
notmuch_messages_t) and then the Messages object (causing it to
crash).  There's no guarantee in general because, in the presence of
cycles, there is no meaningful finalization order.

That being said, this approach might be (probably is) fine in Python
because Python has an unusual hybrid garbage collector.  Long ago,
Python had only a reference-count based garbage collector.  It now has
a cycle detector layered on top of that [1], but that only kicks in if
there are reference cycles.  Assuming there aren't cycles in the
objects created by the Python bindings, you should get the
deterministic behavior of the reference counted collector.  This isn't
the case in Haskell, which has a generational collector that makes no
guarantees about finalization order (guarantees it couldn't always
keep).


[1] One day a student came to Moon and said: "I understand how to make
a better garbage collector. We must keep a reference count of the
pointers to each cons."

Moon patiently told the student the following story:

"One day a student came to Moon and said: 'I understand how to
make a better garbage collector...
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Memory management practices

2011-09-08 Thread Sebastian Spaeth
On Wed, 7 Sep 2011 23:05:19 -0400, Austin Clements  wrote:
> Sorry, I went back and re-read your earlier messages and now I see why
> your references were the way they were.  I stand by the rest of my
> previous message though.  I think the technique used in the Python
> bindings only works because Python's GC happens to finalize in a
> particular order (though I doubt that's guaranteed, and could easily
> not be the case if you stray into the realm of its cycle collector).
> In general, it seems like approach is trying to recreate C-like memory
> management and is fragile as a result, whereas talloc should, I think,
> allow bindings to express their runtime's memory management rather
> naturally.

Mmmh? Why would the method in python be fragile? Each message object
holds a reference to its parent query object to keep it alive. Are you
saying cycle collectors could kill off the query object nonetheless?
(Assume that I know nothing of GCs which comes close to reality)

Sebastian


pgpyf3KnE6TIX.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Memory management practices

2011-09-07 Thread Austin Clements
On Wed, Sep 7, 2011 at 10:48 PM, Austin Clements  wrote:
> *snip*
>
> I'm a bit confused by the reference tree you drew. ?The references in
> the underlying libnotmuch objects are the other way around.
> notmuch_query_t holds a talloc reference to every notmuch_messages_t
> it produces, not the other way around.

Sorry, I went back and re-read your earlier messages and now I see why
your references were the way they were.  I stand by the rest of my
previous message though.  I think the technique used in the Python
bindings only works because Python's GC happens to finalize in a
particular order (though I doubt that's guaranteed, and could easily
not be the case if you stray into the realm of its cycle collector).
In general, it seems like approach is trying to recreate C-like memory
management and is fragile as a result, whereas talloc should, I think,
allow bindings to express their runtime's memory management rather
naturally.


Memory management practices

2011-09-07 Thread Austin Clements
On Wed, Sep 7, 2011 at 4:36 PM, Ben Gamari  wrote:
> On Mon, 29 Aug 2011 16:30:57 -0400, Ben Gamari  
> wrote:
>> [SNIP]
>>
>> In general, it seems to me that memory management in notmuch bindings is
>> a little bit harder than it needs to me due to the decision not to
>> talloc_ref parent objects when a new child object is created. This means
>> that a bindings author needs to recreate the ownership tree in their
>> binding, a task which is fairly easily done (except in the case of
>> Haskell due to the weak GC finalization guarantees) but seems quite
>> unnecessary. Is there a reason this decision was made? Would a patch be
>> accepted adding talloc_ref'ing parents in those functions creating
>> children and talloc_frees in *_destroys?
>>
> Any opinions concerning whether this is an acceptable idea? I wouldn't
> mind putting together a patch-set, but I'd rather not waste my time if
> the set would ultimately be rejected due to some technical objection I
> have yet to think of.
>
> Cheers,

I've been meaning to look in to this in depth.  (I still haven't, but
wanted to give you some reply.)

In general (though perhaps not always?), libnotmuch uses talloc() to
allocate children objects, which already implicitly creates a talloc
reference from the parent object to the child object.  You've
certainly thought about this harder than I have, but it seems like the
bindings should simply create an additional talloc reference and
unlink that reference in the GC finalizer, so that the library-created
references would maintain the integrity of the data structures, while
the binding-created references would maintain their extent.  Hence, I
don't see why simultaneous GC would cause problems with talloc, or why
the bindings would have to recreate the reference tree.

I'm a bit confused by the reference tree you drew.  The references in
the underlying libnotmuch objects are the other way around.
notmuch_query_t holds a talloc reference to every notmuch_messages_t
it produces, not the other way around.  (Though, in reality, these
objects are completely independent of each other.  This reference
exists purely as a convenience for C programmers to make it easy to
clean up all notmuch_messages_t objects when you destroy the
notmuch_query_t.  This is probably a poor interface; it may be better
to take an explicit talloc context, which could be the query object,
or could be something else.  In fact, I would expect this to cause
memory *leaks* in bindings if it were not handled carefully, rather
than premature GC.)


Re: Memory management practices

2011-09-07 Thread Austin Clements
On Wed, Sep 7, 2011 at 10:48 PM, Austin Clements  wrote:
> *snip*
>
> I'm a bit confused by the reference tree you drew.  The references in
> the underlying libnotmuch objects are the other way around.
> notmuch_query_t holds a talloc reference to every notmuch_messages_t
> it produces, not the other way around.

Sorry, I went back and re-read your earlier messages and now I see why
your references were the way they were.  I stand by the rest of my
previous message though.  I think the technique used in the Python
bindings only works because Python's GC happens to finalize in a
particular order (though I doubt that's guaranteed, and could easily
not be the case if you stray into the realm of its cycle collector).
In general, it seems like approach is trying to recreate C-like memory
management and is fragile as a result, whereas talloc should, I think,
allow bindings to express their runtime's memory management rather
naturally.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Memory management practices

2011-09-07 Thread Austin Clements
On Wed, Sep 7, 2011 at 4:36 PM, Ben Gamari  wrote:
> On Mon, 29 Aug 2011 16:30:57 -0400, Ben Gamari  wrote:
>> [SNIP]
>>
>> In general, it seems to me that memory management in notmuch bindings is
>> a little bit harder than it needs to me due to the decision not to
>> talloc_ref parent objects when a new child object is created. This means
>> that a bindings author needs to recreate the ownership tree in their
>> binding, a task which is fairly easily done (except in the case of
>> Haskell due to the weak GC finalization guarantees) but seems quite
>> unnecessary. Is there a reason this decision was made? Would a patch be
>> accepted adding talloc_ref'ing parents in those functions creating
>> children and talloc_frees in *_destroys?
>>
> Any opinions concerning whether this is an acceptable idea? I wouldn't
> mind putting together a patch-set, but I'd rather not waste my time if
> the set would ultimately be rejected due to some technical objection I
> have yet to think of.
>
> Cheers,

I've been meaning to look in to this in depth.  (I still haven't, but
wanted to give you some reply.)

In general (though perhaps not always?), libnotmuch uses talloc() to
allocate children objects, which already implicitly creates a talloc
reference from the parent object to the child object.  You've
certainly thought about this harder than I have, but it seems like the
bindings should simply create an additional talloc reference and
unlink that reference in the GC finalizer, so that the library-created
references would maintain the integrity of the data structures, while
the binding-created references would maintain their extent.  Hence, I
don't see why simultaneous GC would cause problems with talloc, or why
the bindings would have to recreate the reference tree.

I'm a bit confused by the reference tree you drew.  The references in
the underlying libnotmuch objects are the other way around.
notmuch_query_t holds a talloc reference to every notmuch_messages_t
it produces, not the other way around.  (Though, in reality, these
objects are completely independent of each other.  This reference
exists purely as a convenience for C programmers to make it easy to
clean up all notmuch_messages_t objects when you destroy the
notmuch_query_t.  This is probably a poor interface; it may be better
to take an explicit talloc context, which could be the query object,
or could be something else.  In fact, I would expect this to cause
memory *leaks* in bindings if it were not handled carefully, rather
than premature GC.)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Memory management practices

2011-09-07 Thread Ben Gamari
On Mon, 29 Aug 2011 16:30:57 -0400, Ben Gamari  
wrote:
> [SNIP]
> 
> In general, it seems to me that memory management in notmuch bindings is
> a little bit harder than it needs to me due to the decision not to
> talloc_ref parent objects when a new child object is created. This means
> that a bindings author needs to recreate the ownership tree in their
> binding, a task which is fairly easily done (except in the case of
> Haskell due to the weak GC finalization guarantees) but seems quite
> unnecessary. Is there a reason this decision was made? Would a patch be
> accepted adding talloc_ref'ing parents in those functions creating
> children and talloc_frees in *_destroys?
> 
Any opinions concerning whether this is an acceptable idea? I wouldn't
mind putting together a patch-set, but I'd rather not waste my time if
the set would ultimately be rejected due to some technical objection I
have yet to think of.

Cheers,

- Ben


Re: Memory management practices

2011-09-07 Thread Ben Gamari
On Mon, 29 Aug 2011 16:30:57 -0400, Ben Gamari  wrote:
> [SNIP]
> 
> In general, it seems to me that memory management in notmuch bindings is
> a little bit harder than it needs to me due to the decision not to
> talloc_ref parent objects when a new child object is created. This means
> that a bindings author needs to recreate the ownership tree in their
> binding, a task which is fairly easily done (except in the case of
> Haskell due to the weak GC finalization guarantees) but seems quite
> unnecessary. Is there a reason this decision was made? Would a patch be
> accepted adding talloc_ref'ing parents in those functions creating
> children and talloc_frees in *_destroys?
> 
Any opinions concerning whether this is an acceptable idea? I wouldn't
mind putting together a patch-set, but I'd rather not waste my time if
the set would ultimately be rejected due to some technical objection I
have yet to think of.

Cheers,

- Ben
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Memory management practices

2011-08-29 Thread Ben Gamari
Hey all,

Over the last few weeks I've been trying to fix some brokeness in the
notmuch-haskell bindings with respect to memory management.

In discussion included below, I describe the issue as I approached it.
In short, it appears that GHC's garbage collector is quite liberal with
the order in which it frees resources (which is apparently permitted by
Haskell's FFI specification), allowing, for instance, a Messages object to be
freed before a Query object despite my attempts to hold the proper references in
the Haskell wrapper objects to keep the Query reachable.

In general, it seems to me that memory management in notmuch bindings is
a little bit harder than it needs to me due to the decision not to
talloc_ref parent objects when a new child object is created. This means
that a bindings author needs to recreate the ownership tree in their
binding, a task which is fairly easily done (except in the case of
Haskell due to the weak GC finalization guarantees) but seems quite
unnecessary. Is there a reason this decision was made? Would a patch be
accepted adding talloc_ref'ing parents in those functions creating
children and talloc_frees in *_destroys?

Cheers,

- Ben



On Mon, 29 Aug 2011 20:30:10 +0200, Bertram Felgenhauer  wrote:
> Dear Ben,
> 
> Ben Gamari wrote:
> > After looking into this issue in a bit more depth, I'm even more
> > confused. In fact, I would not be surprised if I have stumbled into a
> > bug in the GC.
> [...]
> > MessagesMessage
> >   |   
> >   |  msmpp
> >   \/
> > QueryMessages
> >   |
> >   |  qmpp
> >   \/
> > Query
> > 
> > As we can see, each MessagesMessage object in the Messages list
> > resulting from queryMessages holds a reference to the Query object from
> > which it originated. For this reason, I fail to see how it is possible
> > that the RTS would attempt to free the Query before freeing the
> > MessagesPtr.
> 
> When a garbage collection is performed, the RTS determines which heap
> objects are still reachable. The rest is then freed _simultaneously_,
> and the corresponding finalizers are run in some random order.
> 
> So assuming the application holds a reference to the MessagesMessage
> object for a while and then drops it, the GC will detect unreachability
> of all the three objects at the same time and in the end, the finalizer
> for MessagesMessage may be run before that of Query.
> 
> So I think this is not a bug.
> 
> To solve this problem properly, libnotmuch should stop imposing order
> constraints on when objects are freed - this would mean tracking
> references using talloc_ref and talloc_unlink instead of
> talloc_free inside the library.
> 
> For a bindings author who does not want to touch the library, the best
> idea I have is to add a layer with the sole purpose of tracking those
> implicit references.
> 
> Best regards,
> 
> Bertram
> 
> ___
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Memory management practices

2011-08-29 Thread Ben Gamari
Hey all,

Over the last few weeks I've been trying to fix some brokeness in the
notmuch-haskell bindings with respect to memory management.

In discussion included below, I describe the issue as I approached it.
In short, it appears that GHC's garbage collector is quite liberal with
the order in which it frees resources (which is apparently permitted by
Haskell's FFI specification), allowing, for instance, a Messages object to be
freed before a Query object despite my attempts to hold the proper references in
the Haskell wrapper objects to keep the Query reachable.

In general, it seems to me that memory management in notmuch bindings is
a little bit harder than it needs to me due to the decision not to
talloc_ref parent objects when a new child object is created. This means
that a bindings author needs to recreate the ownership tree in their
binding, a task which is fairly easily done (except in the case of
Haskell due to the weak GC finalization guarantees) but seems quite
unnecessary. Is there a reason this decision was made? Would a patch be
accepted adding talloc_ref'ing parents in those functions creating
children and talloc_frees in *_destroys?

Cheers,

- Ben



On Mon, 29 Aug 2011 20:30:10 +0200, Bertram Felgenhauer 
 wrote:
> Dear Ben,
> 
> Ben Gamari wrote:
> > After looking into this issue in a bit more depth, I'm even more
> > confused. In fact, I would not be surprised if I have stumbled into a
> > bug in the GC.
> [...]
> > MessagesMessage
> >   |   
> >   |  msmpp
> >   \/
> > QueryMessages
> >   |
> >   |  qmpp
> >   \/
> > Query
> > 
> > As we can see, each MessagesMessage object in the Messages list
> > resulting from queryMessages holds a reference to the Query object from
> > which it originated. For this reason, I fail to see how it is possible
> > that the RTS would attempt to free the Query before freeing the
> > MessagesPtr.
> 
> When a garbage collection is performed, the RTS determines which heap
> objects are still reachable. The rest is then freed _simultaneously_,
> and the corresponding finalizers are run in some random order.
> 
> So assuming the application holds a reference to the MessagesMessage
> object for a while and then drops it, the GC will detect unreachability
> of all the three objects at the same time and in the end, the finalizer
> for MessagesMessage may be run before that of Query.
> 
> So I think this is not a bug.
> 
> To solve this problem properly, libnotmuch should stop imposing order
> constraints on when objects are freed - this would mean tracking
> references using talloc_ref and talloc_unlink instead of
> talloc_free inside the library.
> 
> For a bindings author who does not want to touch the library, the best
> idea I have is to add a layer with the sole purpose of tracking those
> implicit references.
> 
> Best regards,
> 
> Bertram
> 
> ___
> Glasgow-haskell-users mailing list
> glasgow-haskell-us...@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch