Re: [RFC] Split notmuch_database_close into two functions
Quoth Mark Walters on Apr 17 at 9:42 am: On Thu, 12 Apr 2012, Austin Clements amdra...@mit.edu wrote: Quoth Justus Winter on Apr 12 at 11:05 am: Quoting Austin Clements (2012-04-01 05:23:23) Quoth Justus Winter on Mar 21 at 1:55 am: I propose to split the function notmuch_database_close into notmuch_database_close and notmuch_database_destroy so that long running processes like alot can close the database while still using data obtained from queries to that database. Is this actually safe? My understanding of Xapian::Database::close is that, once you've closed the database, basically anything can throw a Xapian exception. A lot of data is retrieved lazily, both by notmuch and by Xapian, so simply having, say, a notmuch_message_t object isn't enough to guarantee that you'll be able to get data out of it after closing the database. Hence, I don't see how this interface could be used correctly. I do not know how, but both alot and afew (and occasionally the notmuch binary) are somehow safely using this interface on my box for the last three weeks. I see. TL;DR: This isn't safe, but that's okay if we document it. The bug report [0] you pointed to was quite informative. At its core, this is really a memory management issue. To sum up for the record (and to check my own thinking): It sounds like alot is careful not to use any notmuch objects after closing the database. The problem is that, currently, closing the database also talloc_free's it, which recursively free's everything derived from it. Python later GCs the wrapper objects, which *also* try to free their underlying objects, resulting in a double free. Before the change to expose notmuch_database_close, the Python bindings would only talloc_free from destructors. Furthermore, they prevented the library from recursively freeing things at other times by internally maintaining a reverse reference for every library talloc reference (e.g., message is a sub-allocation of query, so the bindings keep a reference from each message to its query to ensure the query doesn't get freed). The ability to explicitly talloc_free the database subverts this mechanism. So, I've come around to thinking that splitting notmuch_database_close and _destroy is okay. It certainly parallels the rest of the API better. However, notmuch_database_close needs a big warning similar to Xapian::Database::close's warning that retrieving information from objects derived from this database may not work after calling close. notmuch_database_close is really a specialty interface, and about the only thing you can guarantee after closing the database is that you can destroy other objects. This is also going to require a SONAME major version bump, as mentioned by others. Which, to be fair, would be a good opportunity to fix some other issues, too, like how notmuch_database_open can't return errors and how notmuch_database_get_directory is broken on read-only databases. The actual bump should be done at release time, but maybe we should drop a note somewhere (NEWS?) so we don't forget. Can I just check that there is no way to reopen the Xapian database readonly? (I may be using the wrong term: I mean is there a way of switching an open read-write database to read-only without losing the attached structures/messages/threads etc) If I understand it this would be sufficient as it would free the lock, but could be more generally useful for long lived notmuch processes. That would be handy and perfect for this situation, but no (I double-checked with Olly on IRC, which you probably saw). We might be able to lobby for this capability if it seems more generally useful. On the other hand, I think it would probably mix poorly with Xapian's optimistic snapshot isolation if we tried to use it for anything non-trivial (combined with real snapshot isolation it would be awesome). ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC] Split notmuch_database_close into two functions
On Thu, 12 Apr 2012, Austin Clements amdra...@mit.edu wrote: Quoth Justus Winter on Apr 12 at 11:05 am: Quoting Austin Clements (2012-04-01 05:23:23) Quoth Justus Winter on Mar 21 at 1:55 am: I propose to split the function notmuch_database_close into notmuch_database_close and notmuch_database_destroy so that long running processes like alot can close the database while still using data obtained from queries to that database. Is this actually safe? My understanding of Xapian::Database::close is that, once you've closed the database, basically anything can throw a Xapian exception. A lot of data is retrieved lazily, both by notmuch and by Xapian, so simply having, say, a notmuch_message_t object isn't enough to guarantee that you'll be able to get data out of it after closing the database. Hence, I don't see how this interface could be used correctly. I do not know how, but both alot and afew (and occasionally the notmuch binary) are somehow safely using this interface on my box for the last three weeks. I see. TL;DR: This isn't safe, but that's okay if we document it. The bug report [0] you pointed to was quite informative. At its core, this is really a memory management issue. To sum up for the record (and to check my own thinking): It sounds like alot is careful not to use any notmuch objects after closing the database. The problem is that, currently, closing the database also talloc_free's it, which recursively free's everything derived from it. Python later GCs the wrapper objects, which *also* try to free their underlying objects, resulting in a double free. Before the change to expose notmuch_database_close, the Python bindings would only talloc_free from destructors. Furthermore, they prevented the library from recursively freeing things at other times by internally maintaining a reverse reference for every library talloc reference (e.g., message is a sub-allocation of query, so the bindings keep a reference from each message to its query to ensure the query doesn't get freed). The ability to explicitly talloc_free the database subverts this mechanism. So, I've come around to thinking that splitting notmuch_database_close and _destroy is okay. It certainly parallels the rest of the API better. However, notmuch_database_close needs a big warning similar to Xapian::Database::close's warning that retrieving information from objects derived from this database may not work after calling close. notmuch_database_close is really a specialty interface, and about the only thing you can guarantee after closing the database is that you can destroy other objects. This is also going to require a SONAME major version bump, as mentioned by others. Which, to be fair, would be a good opportunity to fix some other issues, too, like how notmuch_database_open can't return errors and how notmuch_database_get_directory is broken on read-only databases. The actual bump should be done at release time, but maybe we should drop a note somewhere (NEWS?) so we don't forget. Can I just check that there is no way to reopen the Xapian database readonly? (I may be using the wrong term: I mean is there a way of switching an open read-write database to read-only without losing the attached structures/messages/threads etc) If I understand it this would be sufficient as it would free the lock, but could be more generally useful for long lived notmuch processes. Best wishes Mark ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC] Split notmuch_database_close into two functions
On Tue, Apr 17 2012, Justus Winter 4win...@informatik.uni-hamburg.de wrote: Hi everyone, Quoting Patrick Totzke (2012-04-13 10:33:58) Quoting Austin Clements (2012-04-01 04:23:23) Maybe you could describe your use case in more detail? Quoting Austin Clements (2012-04-12 17:57:44) Quoth Justus Winter on Apr 12 at 11:05 am: ... I see. TL;DR .. which should pretty much settle Austins opinion on libnotmuch users being second class citizens. Na, I think you misinterpreted Austin here, I think he summarized his position. Looking back at my mail I think I came across a lot harsher than necessary, I'm sorry for that. Let's get back to the issue at hand, shall we? Both Austin and Mark seem to support the split, any other opinions? I support it too (as did earlyer). Do we already have pushable series or is there a need for a new one ? It would be good to have this done so that application writers can start relying this being available in official notmuch (in 0.13 -- SONAME updated -- any other patches pending requiring SONAME bump to be reviewed ?). Cheers, Justus Tomi ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC] Split notmuch_database_close into two functions
Quoting Austin Clements (2012-04-01 05:23:23) Quoth Justus Winter on Mar 21 at 1:55 am: I propose to split the function notmuch_database_close into notmuch_database_close and notmuch_database_destroy so that long running processes like alot can close the database while still using data obtained from queries to that database. Is this actually safe? My understanding of Xapian::Database::close is that, once you've closed the database, basically anything can throw a Xapian exception. A lot of data is retrieved lazily, both by notmuch and by Xapian, so simply having, say, a notmuch_message_t object isn't enough to guarantee that you'll be able to get data out of it after closing the database. Hence, I don't see how this interface could be used correctly. I do not know how, but both alot and afew (and occasionally the notmuch binary) are somehow safely using this interface on my box for the last three weeks. Maybe you could describe your use case in more detail? Uh, okay. There are two long running processes (alot and afew in my case) competing for a resource (writable access to the notmuch database). This access is serialized by a lock maintained by libxapian. This is a rather classic cs problem. In order to avoid starvation of one process, both processes must release the lock after a finite amount of time. Now for all practical purposes the requirement is usually that both processes should minimize the time spent while they aquired the lock. The only way to ensure that the lock is released is to ask notmuch to release the lock. When Patrick asked me for a way to release the lock I exposed the function notmuch_database_close [0]. I made a mistake. The mistake was to assume that a function named notmuch_database_close would actually do as the name implies. But that wasn't the case. I should have been more suspicious, but being somewhat naive I patched notmuch_database_close to actually close the database [1]. I'm basically trying to correct that mistake. One way to fix that is this patchset, the other one is to rename notmuch_database_close to notmuch_database_destroy and to remove the ability to call this function from the python bindings. There is a branch of alot [2] that needs this functionality and hence this patchset to avoid crashes (the ticket contains more information why this leads to crashes). Incidentally this branch also fixes another bug in alot [3]. Patrick spoke up in this thread stating that this patchset is useful for alot. There is a unpublished branch of afew that requires this functionality. I think it very much boils down to the question whether or not libnotmuch and its users are second class citizens. This issue is not a problem for the notmuch binary since it is short lived in nature. In fact I feel reminded of [4] when I pointed out problems in the code that are problematic for libnotmuch users. You - Austin - suggested to whip up patches instead of raising concerns. That's a valid response, after all, talk is cheap ;) But now that I have a patchset that is rather small (yes, it changes the API but any program can be adjusted automatically using sed...) and I kind of thought that it would be accepted more easily. And although the changes are trivial it took me quite some time to track it down. The thread [4] dried out without someone like you or David stating that he cares about libnotmuch and its users as much as the users of the notmuch binary. These kind of problems require invasive code and API changes, I asked in that very same thread how to do this properly but noone answered. I think I'm just not willing to devote my time to fix these problems if these issues aren't even perceived as problems by the majority of notmuch users and developers b/c they are using the emacs interface which just calls the notmuch binary that has no such problems. Cheers, Justus 0: b2734519db78fdec76eeafc5fe8f5631a6436cf6 1: cfc5f1059aa16753cba610c41601cacc97260e08 2: https://github.com/pazz/alot/issues/413 3: https://github.com/pazz/alot/issues/414 4: 20120221002921.8534.57...@thinkbox.jade-hamburg.de ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC] Split notmuch_database_close into two functions
Quoth Justus Winter on Apr 12 at 11:05 am: Quoting Austin Clements (2012-04-01 05:23:23) Quoth Justus Winter on Mar 21 at 1:55 am: I propose to split the function notmuch_database_close into notmuch_database_close and notmuch_database_destroy so that long running processes like alot can close the database while still using data obtained from queries to that database. Is this actually safe? My understanding of Xapian::Database::close is that, once you've closed the database, basically anything can throw a Xapian exception. A lot of data is retrieved lazily, both by notmuch and by Xapian, so simply having, say, a notmuch_message_t object isn't enough to guarantee that you'll be able to get data out of it after closing the database. Hence, I don't see how this interface could be used correctly. I do not know how, but both alot and afew (and occasionally the notmuch binary) are somehow safely using this interface on my box for the last three weeks. I see. TL;DR: This isn't safe, but that's okay if we document it. The bug report [0] you pointed to was quite informative. At its core, this is really a memory management issue. To sum up for the record (and to check my own thinking): It sounds like alot is careful not to use any notmuch objects after closing the database. The problem is that, currently, closing the database also talloc_free's it, which recursively free's everything derived from it. Python later GCs the wrapper objects, which *also* try to free their underlying objects, resulting in a double free. Before the change to expose notmuch_database_close, the Python bindings would only talloc_free from destructors. Furthermore, they prevented the library from recursively freeing things at other times by internally maintaining a reverse reference for every library talloc reference (e.g., message is a sub-allocation of query, so the bindings keep a reference from each message to its query to ensure the query doesn't get freed). The ability to explicitly talloc_free the database subverts this mechanism. So, I've come around to thinking that splitting notmuch_database_close and _destroy is okay. It certainly parallels the rest of the API better. However, notmuch_database_close needs a big warning similar to Xapian::Database::close's warning that retrieving information from objects derived from this database may not work after calling close. notmuch_database_close is really a specialty interface, and about the only thing you can guarantee after closing the database is that you can destroy other objects. This is also going to require a SONAME major version bump, as mentioned by others. Which, to be fair, would be a good opportunity to fix some other issues, too, like how notmuch_database_open can't return errors and how notmuch_database_get_directory is broken on read-only databases. The actual bump should be done at release time, but maybe we should drop a note somewhere (NEWS?) so we don't forget. [0] https://github.com/pazz/alot/issues/413 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC] Split notmuch_database_close into two functions
Quoting Austin Clements (2012-04-12 18:57:44) Quoth Justus Winter on Apr 12 at 11:05 am: Quoting Austin Clements (2012-04-01 05:23:23) Quoth Justus Winter on Mar 21 at 1:55 am: I propose to split the function notmuch_database_close into notmuch_database_close and notmuch_database_destroy so that long running processes like alot can close the database while still using data obtained from queries to that database. Is this actually safe? My understanding of Xapian::Database::close is that, once you've closed the database, basically anything can throw a Xapian exception. A lot of data is retrieved lazily, both by notmuch and by Xapian, so simply having, say, a notmuch_message_t object isn't enough to guarantee that you'll be able to get data out of it after closing the database. Hence, I don't see how this interface could be used correctly. I do not know how, but both alot and afew (and occasionally the notmuch binary) are somehow safely using this interface on my box for the last three weeks. I see. TL;DR: This isn't safe, but that's okay if we document it. The bug report [0] you pointed to was quite informative. At its core, this is really a memory management issue. To sum up for the record (and to check my own thinking): It sounds like alot is careful not to use any notmuch objects after closing the database. The problem is that, currently, closing the database also talloc_free's it, which recursively free's everything derived from it. Python later GCs the wrapper objects, which *also* try to free their underlying objects, resulting in a double free. Before the change to expose notmuch_database_close, the Python bindings would only talloc_free from destructors. Furthermore, they prevented the library from recursively freeing things at other times by internally maintaining a reverse reference for every library talloc reference (e.g., message is a sub-allocation of query, so the bindings keep a reference from each message to its query to ensure the query doesn't get freed). The ability to explicitly talloc_free the database subverts this mechanism. Exactly. So, I've come around to thinking that splitting notmuch_database_close and _destroy is okay. It certainly parallels the rest of the API better. However, notmuch_database_close needs a big warning similar to Xapian::Database::close's warning that retrieving information from objects derived from this database may not work after calling close. Yes, but then again one should always expect function calls to fail and most APIs have mechanisms to communicate failures. OTOH this might be an indication that the notmuch API should be redesigned. Both alot and afew have their own wrappers around the notmuch API to work around some limitations (e.g. changes to messages are enqueued and executed at some point, with some kind of mechanism to cope with the notmuch database temporarily not being available, message objects have to be re-fetched if they got outdated (IIRC, whatever that means)). Justus ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC] Split notmuch_database_close into two functions
You're right of course, updated patch sent as a follow up. Justus ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC] Split notmuch_database_close into two functions
Justus Winter 4win...@informatik.uni-hamburg.de writes: I propose to split the function notmuch_database_close into notmuch_database_close and notmuch_database_destroy so that long running processes like alot can close the database while still using data obtained from queries to that database. I've updated the tools, the go, ruby and python bindings to use notmuch_database_destroy instead of notmuch_database_close to destroy database objects. This looks like a good idea. grep _destroy *.c outputs plenty of other matches so this is not a new term here. I wonder what (backward) compability issues there are, though. In message id:1332291311-28954-2-git-send-email-4win...@informatik.uni-hamburg.de there was typo in comment: notmuch_database_destroyed ? Cheers, Justus Tomi ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch