Re: Update on python-cffi bindings
Hi all, Daniel Kahn Gillmorwrites: > On Thu 2017-12-21 12:30:39 +0100, Floris Bruynooghe wrote: >> The API changes a lot and there is no easy migration. And history has >> shown that's a terrible way to get something new adopted. Last time I >> suggested a possible multi-tiered approach (maybe not as explicit): >> >> 1 I think it's possible to move the memory management technique to the >> existing API without API breakage. It would still allow users to call >> functions in the wrong order etc, but that's not any regression. >> >> 2 It's probably possible to either switch the existing API to use CFFI >> or create a drop-in replacement for it based on CFFI. The benefit >> here is two-fold: users get better PyPy performance and I believe it >> becomes easier to maintain the bindings, e.g. all you need to do to >> call notmuch_database_get_path is >> capi.ffi.string(capi.lib.notmuch_database_get_path(dbptr)) (see >> >> https://github.com/flub/notmuch/blob/cffi/bindings/python-cffi/notdb/_database.py#L263) >> for an actual example). But this really depends on what everyone else >> here that maintains the Python bindings here thinks. I'd encourage to >> have a look at the CFFI implementation to get an idea of this. > > these two steps on their own seem like they give us: > > * better and safer memory management > > * better performance on PyPy > > * arguably, easier maintenance of the bindings. > > These seem like unambiguous wins, and there is no downside -- people > using the notmuch module can just upgrade to the new version and it's > done. I'd love to see these changes happen. > > The only thing it doesn't do is provide more idiomatic bindings for new > users, which you describe as: > >> 3 As last step I still think providing the more idiomatic bindings is >> useful, especially for new users. It does take the burden of >> correctly calling the C functions somewhat more. This could then >> either treat a notmuch_cffi as a lower level API which more directly >> maps the C API or it could call C directly as it does now. I'm not >> currently sure on which is more feasible or better here. >> >> An additional thing that could be done here to ease migration is to >> allow creating the new objects from the old ones allowing a codebase >> to gradually adopt the newer API where appropriate. E.g.: >> >> db = notmuch.Database(...) >> msg = db.find_message(...) >> new_db = notdb.Database.from_notmuch(db) >> new_msg = notdb.Message.from_notmuch(msg) >> print('Tags not on msg: {}'.format(new_db.tags - new_msg.tags)) >> >> This would rely on the existing API to migrate to CFFI, otherwise it >> could still be possible but would be very hairy. > > I'm not sure i understand this last sentence. Are you saying that step > 3 depends on step 2 happening? The mixing of a new API with the existing API would be possible if we can successfully migrate the existing notmuch API to use CFFI. If not it may theoretically be possible but could be a lot of pain. > I'm not sure about the name "notdb" I'm in no way attached to this name, just needed something to work with. Naming things is hard, if anyone has a different idea please share. > but i don't mind the idea of there > being some other "more-pythonic" notmuch bindings. New users would > likely prefer it, and that'd be fine. > >> - For exposing completely new APIs, sure building the >> notdb.ImmutableTagSet and MutableTagSet was not straight forward, >> likewise for the PropertiesMap. Many other things are much easier >> though. One possible nice way to alleviate this with the idea of the >> existing notmuch API being the lower-level layer nearly mimicking the >> C-API directly. That way adding new APIs there is more or less >> straight forward and there is less time pressure to add them to the >> higher level API. Especially if mixing the APIs is supported. > > I think this is in line with the approach described above. I like it. I would really like to hear from some of the existing python binding users on all of the above and what they thing of this approach. Is this work worth it to you? Will it make you more or less likely to want to migrate to more pythonic bindings? In particular I'm not that convinced in the suitability of the existing library as a low-level binding as it currently sits somewhere above basic wrapping of libnotmuch. Anyway, I would appreciate any feedback on the proposed APIs and way forward. Cheers, Floris ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Update on python-cffi bindings
On Thu 2017-12-21 12:30:39 +0100, Floris Bruynooghe wrote: > The API changes a lot and there is no easy migration. And history has > shown that's a terrible way to get something new adopted. Last time I > suggested a possible multi-tiered approach (maybe not as explicit): > > 1 I think it's possible to move the memory management technique to the > existing API without API breakage. It would still allow users to call > functions in the wrong order etc, but that's not any regression. > > 2 It's probably possible to either switch the existing API to use CFFI > or create a drop-in replacement for it based on CFFI. The benefit > here is two-fold: users get better PyPy performance and I believe it > becomes easier to maintain the bindings, e.g. all you need to do to > call notmuch_database_get_path is > capi.ffi.string(capi.lib.notmuch_database_get_path(dbptr)) (see > > https://github.com/flub/notmuch/blob/cffi/bindings/python-cffi/notdb/_database.py#L263) > for an actual example). But this really depends on what everyone else > here that maintains the Python bindings here thinks. I'd encourage to > have a look at the CFFI implementation to get an idea of this. these two steps on their own seem like they give us: * better and safer memory management * better performance on PyPy * arguably, easier maintenance of the bindings. These seem like unambiguous wins, and there is no downside -- people using the notmuch module can just upgrade to the new version and it's done. I'd love to see these changes happen. The only thing it doesn't do is provide more idiomatic bindings for new users, which you describe as: > 3 As last step I still think providing the more idiomatic bindings is > useful, especially for new users. It does take the burden of > correctly calling the C functions somewhat more. This could then > either treat a notmuch_cffi as a lower level API which more directly > maps the C API or it could call C directly as it does now. I'm not > currently sure on which is more feasible or better here. > > An additional thing that could be done here to ease migration is to > allow creating the new objects from the old ones allowing a codebase > to gradually adopt the newer API where appropriate. E.g.: > > db = notmuch.Database(...) > msg = db.find_message(...) > new_db = notdb.Database.from_notmuch(db) > new_msg = notdb.Message.from_notmuch(msg) > print('Tags not on msg: {}'.format(new_db.tags - new_msg.tags)) > > This would rely on the existing API to migrate to CFFI, otherwise it > could still be possible but would be very hairy. I'm not sure i understand this last sentence. Are you saying that step 3 depends on step 2 happening? I'm not sure about the name "notdb" but i don't mind the idea of there being some other "more-pythonic" notmuch bindings. New users would likely prefer it, and that'd be fine. > - For exposing completely new APIs, sure building the > notdb.ImmutableTagSet and MutableTagSet was not straight forward, > likewise for the PropertiesMap. Many other things are much easier > though. One possible nice way to alleviate this with the idea of the > existing notmuch API being the lower-level layer nearly mimicking the > C-API directly. That way adding new APIs there is more or less > straight forward and there is less time pressure to add them to the > higher level API. Especially if mixing the APIs is supported. I think this is in line with the approach described above. I like it. > I must admit I didn't look too much at the existing tests untill just > now. If I understand correctly the existing tests are in > T390-python.sh. In this case I'd say the tests I added are a lot more > thorough. The reason I added tox.ini is to easily test against multiple > python versions, i.e. CPython 3.5, 3.6 & PyPy 3.5. If I had to guess at > the best way to integrate I'd say it's probably best to create a > TXXX-python-pytest-pyXY.sh for each supported python version and lose > tox.ini. I'd suggest for those tests to be a simplified version of what > tox does: create a virtualenv and run pytest inside of it. I personally would rather not deal with virtualenvs during development if we don't need them. Take a look at test_python() in test/test-lib.sh to see how the python tests are currently handled -- it's pretty simple, relies on the configured version of notmuch for testing, and it would be great to keep the simplicity. I think the approach you've outlined above sounds really good, and i would be happy to see it happen for notmuch. It strikes me as much more useful and less scary than a hard cutover or overhaul that requires adoption of a new API from the ground up. Thanks for continuing to push on this, Floris! --dkg signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org
Re: Update on python-cffi bindings
Daniel Kahn Gillmorwrites: > Hi Floris-- > > On Sun 2017-12-17 19:08:18 +0100, Floris Bruynooghe wrote: > > i've heard reported, and i also appreciate your attention to performance > concerns on different python platforms (e.g. making sure things are > performant on both CPython and PyPy). Oh btw, I can happily report this has paid off. These bindings perform much better on PyPy while performing slightly worse on CPython. I haven't proven this but my guess is most of the performance loss on CPython is due to the memory management which unfortunately involves a lot of calls on each object destruction, also each object needs to be destroyed individually so when destroying a parent all children will get destroyed first which is a waste from libnotmuch's point of view. This can be improved on by using a weakref.finalizer-modeled approach, but is some more work. > I confess that i haven't read the series in full, but i have two main > concerns that i'd generally use to evaluate proposals like this: > > 0) how much does the API change? that is, if we're expecting existing > users of the notmuch python bindings to adopt this new approach, how > much pain are we putting them through? How much of an effort has > been made to reduce that pain, and do we have a clear and > comprehensive porting guide? The API changes a lot and there is no easy migration. And history has shown that's a terrible way to get something new adopted. Last time I suggested a possible multi-tiered approach (maybe not as explicit): 1 I think it's possible to move the memory management technique to the existing API without API breakage. It would still allow users to call functions in the wrong order etc, but that's not any regression. 2 It's probably possible to either switch the existing API to use CFFI or create a drop-in replacement for it based on CFFI. The benefit here is two-fold: users get better PyPy performance and I believe it becomes easier to maintain the bindings, e.g. all you need to do to call notmuch_database_get_path is capi.ffi.string(capi.lib.notmuch_database_get_path(dbptr)) (see https://github.com/flub/notmuch/blob/cffi/bindings/python-cffi/notdb/_database.py#L263) for an actual example). But this really depends on what everyone else here that maintains the Python bindings here thinks. I'd encourage to have a look at the CFFI implementation to get an idea of this. 3 As last step I still think providing the more idiomatic bindings is useful, especially for new users. It does take the burden of correctly calling the C functions somewhat more. This could then either treat a notmuch_cffi as a lower level API which more directly maps the C API or it could call C directly as it does now. I'm not currently sure on which is more feasible or better here. An additional thing that could be done here to ease migration is to allow creating the new objects from the old ones allowing a codebase to gradually adopt the newer API where appropriate. E.g.: db = notmuch.Database(...) msg = db.find_message(...) new_db = notdb.Database.from_notmuch(db) new_msg = notdb.Message.from_notmuch(msg) print('Tags not on msg: {}'.format(new_db.tags - new_msg.tags)) This would rely on the existing API to migrate to CFFI, otherwise it could still be possible but would be very hairy. So do people reckon following this path would be more desirable and worth the extra effort? Would there be a willingness to change the exising notmuch API to a CFFI implementation? I didn't go down this path yet as last time there was no feedback on this suggestion while there was some moderate curiosity for a more idiomatic API. > 1) how accessible is the new implementation for contributions from > other developers? For example, a transition to a highly idiomatic > style of python that no other developers would be able to improve > would put a large maintenance burden on you. - For the CFFI-part I believe this is easier then the existing ctypes as I tried to say above. - For exposing completely new APIs, sure building the notdb.ImmutableTagSet and MutableTagSet was not straight forward, likewise for the PropertiesMap. Many other things are much easier though. One possible nice way to alleviate this with the idea of the existing notmuch API being the lower-level layer nearly mimicking the C-API directly. That way adding new APIs there is more or less straight forward and there is less time pressure to add them to the higher level API. Especially if mixing the APIs is supported. > Do you have any thoughts about these questions? > > For example, for point 0, have you tried to run alot or some other > python-based notmuch MUA against this newer python binding? what > changes were necessary? > > For example, for point 1, can you show me how i would (as a fellow notmuch > contributor) create a patch to add
Re: Update on python-cffi bindings
Hi Floris-- On Sun 2017-12-17 19:08:18 +0100, Floris Bruynooghe wrote: > Thanks for all the feedback on an early post of my CFFI-based libnotmuch > Python bindings. I've now completed these somewhat more and they now > have most of the functionality. Here's what's new since last time: thanks for this work! I appreciate and share your goals of fixing the memory management models so that notmuch objects can't incur the crashes i've heard reported, and i also appreciate your attention to performance concerns on different python platforms (e.g. making sure things are performant on both CPython and PyPy). I confess that i haven't read the series in full, but i have two main concerns that i'd generally use to evaluate proposals like this: 0) how much does the API change? that is, if we're expecting existing users of the notmuch python bindings to adopt this new approach, how much pain are we putting them through? How much of an effort has been made to reduce that pain, and do we have a clear and comprehensive porting guide? 1) how accessible is the new implementation for contributions from other developers? For example, a transition to a highly idiomatic style of python that no other developers would be able to improve would put a large maintenance burden on you. Do you have any thoughts about these questions? For example, for point 0, have you tried to run alot or some other python-based notmuch MUA against this newer python binding? what changes were necessary? For example, for point 1, can you show me how i would (as a fellow notmuch contributor) create a patch to add support for some of the recent (post 0.25.3) changes to notmuch in the python interface? Also, the old python bindings are actually directly exercised by the notmuch test suite. i see you've adopted the tox.ini convention to trigger a run of pytest, but that's not how the current test suite works. Do you think notmuch needs to adopt tox in order to use this series, or do you think you could integrate the testing of this module into the currently-existing test suite? the more things that this series changes at once, the harder it is to push for its integration. bite-size, piecemeal changes will have an easier path to wider adoption. thanks for this work! I look forward to us solving the problems you're working on it this series. I hope we can get there :) --dkg signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Update on python-cffi bindings
Hi all, Thanks for all the feedback on an early post of my CFFI-based libnotmuch Python bindings. I've now completed these somewhat more and they now have most of the functionality. Here's what's new since last time: - All tests pass on Python 3.5, 3.6 and pypy3.5 I could probably add 3.4 as well, but I got the impression from people this was less important. Likewise it seemed there was a more-or-less consensus that missing python 2.7 is no big deal. - properties supported as a mapping This behaves as a normal mapping, with one extra method: getall(prefix='', *, exact=False). Which returns an iterator rather then just one item. - queries and threads There's now a Database.messages(query, *, omit_excluded=EXCLUDE.TRUE, sort=SORT.UNSORTED, exclude=tags=None) function. It behaves as a generator, i.e. an iterator is returned. Likewise for Database.threads(). Also Database.count_messages() and Database.count_threads() all have the same signature (well, they return ints). - Message.frozen() as a context manager with msg.frozen(): msg.clear() raise Exception('oops') just works, that's less obvious to make this work then it looks but seemed worth it to look native in Python. I've pushed all changes to github.com/flub/notmuch on the cffi branch. I'd be keen to hear more feedback from other people. Things still to do: - Nice docs, there's comprehensive docstrings now but sphinx docs are clearly superior. - A few API bits, e.g. Database.upgrade() and some others. They're probably less important and I'll likely only do them as people request them. - Improve finalization. It'd be nice to give users direct access to .destroy() methods to free the memory, or even turn everything into (optional) context managers which would automatically call .destroy(): with db.threads('tag:foo') as threads: for thread in threads: pass This is probably nice for long-running applications which may want to be somewhat careful about their memory. But finalizers need to be implemented with a slightly more extended version of weakref.finalizer() for this to be safe. Let me know what you think. I'd also be happy to re-post a patch, but it's huge and also the commits not very clean currently. Please let me know the best way to proceed. Floris ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch