Re: DRAFT Introduce CFFI-based Python bindings

2017-12-03 Thread Tomi Ollila
On Wed, Nov 29 2017, Floris Bruynooghe wrote:

> David Bremner  writes:
>
>> Floris Bruynooghe  writes:
>>
>>>
>>> Lastly there are some downsides to the choices I made:
>>> - I ended up going squarely for CPython 3.6+.  Choosing Python
>>>   3 allowed better API design, e.g. with keyword-only parameters
>>>   etc.  Choosing CPython 3.4+ restricts the madness that can
>>>   happen with __del__ and gives some newer (tho now unused)
>>>   features in weakref.finalizer.
>>> - This is no longer drop-in compatible.
>>> - I haven't got to a stage where my initial goal of speed has
>>>   been proven yet.
>>
>> I guess you'll have to convince the maintainers / users of alot and afew
>> that this makes sense before we go much further. I'd point out that
>> Debian stable is only at python 3.5, so that makes me a bit wary of this
>> (being able to run the test suite on debian stable and similar aged
>> distros useful for me, and I suspect other developers).
>>
>> I know there are issues with memory management in the current bindings,
>> so that may be a strong reason to push to python 3.6; it seems to need
>> more investigation at the moment.
>
> So on earlier Python versions, sure this is possible at not too much
> cost.
>
> - Python 3.4+ would just cost the use of some f-strings.  Not major, was
>   just nice to use.
> - Python <3.4 afaik would only need a tweak to the Database.tags and
>   Message.tags properties.  I *think* swapping the caching of these
>   using a weakref should suffice and not break the brittle
>   Python-libnotmuch memory management.
>   Mind you I think Python 3.0-3.3 are pretty old and not much point in
>   supporting them.  But this would also apply for 2.7 support.
> - Python 2.7 is probably the worst, in that keyword-only arguments would
>   be gone.  If python 2.7 is required I'd be much keener to have another
>   go at a drop-in replacement with the memory safety features and then
>   build the "notdb" API on top off it.  But for that to be worth it
>   people need to be convinced enough that maintaining a CFFI version is
>   nicer than a ctypes version I guess.

IMO Python 3.4+ would be OK, if python 2 support can be dropped. 
Even Ubuntu 14.04 has python 3.4. One notable distribution that has
Python 3.3 by default is RHEL 7, but there seems to be quite a few
packaged alternatives available...

>
> Kind regards,
> Floris


Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: DRAFT Introduce CFFI-based Python bindings

2017-12-01 Thread Floris Bruynooghe
Florian Klink  writes:

>>> I guess you'll have to convince the maintainers / users of alot and afew
>>> that this makes sense before we go much further. I'd point out that
>>> Debian stable is only at python 3.5, so that makes me a bit wary of this
>>> (being able to run the test suite on debian stable and similar aged
>>> distros useful for me, and I suspect other developers).
>>>
>>> I know there are issues with memory management in the current bindings,
>>> so that may be a strong reason to push to python 3.6; it seems to need
>>> more investigation at the moment.
>>
>>I am generally in favour of modernizing the notmuch python bindings,
>>especially when it comes to memory management and exception handling.
>>
>>At the moment, the alot interface officially only supports python v2.7
>>but our dependencies have now mostly been updated and we are working on
>>port to python 3, see here: https://github.com/pazz/alot/pull/1055
>
> afew maintainer here ;-)
>
> I'm also very much in favor of a more modern and pythonic interface, and would
> gladly support retiring python 2, moving to the new interface.
>
> I had a quick glimpse on the code, and would like to do some annotations. I 
> fear
> it's a bit awkward to do this inside the huge patch, which might already have
> changed, and send back via email.
> Did you publish a changeset to github, or somewhere else where I could comment
> on it?

I now also pushed it to github.com/flub/notmuch in the now, it's in the
cffi branch and all code is in bindings/notmuch-cffi.  Not sure if this
helps with commenting, I can make a pull-request to somewhere if you
want - just let me know where.

Cheers,
Floris
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: DRAFT Introduce CFFI-based Python bindings

2017-11-30 Thread Florian Klink

I guess you'll have to convince the maintainers / users of alot and afew
that this makes sense before we go much further. I'd point out that
Debian stable is only at python 3.5, so that makes me a bit wary of this
(being able to run the test suite on debian stable and similar aged
distros useful for me, and I suspect other developers).

I know there are issues with memory management in the current bindings,
so that may be a strong reason to push to python 3.6; it seems to need
more investigation at the moment.


I am generally in favour of modernizing the notmuch python bindings,
especially when it comes to memory management and exception handling.

At the moment, the alot interface officially only supports python v2.7
but our dependencies have now mostly been updated and we are working on
port to python 3, see here: https://github.com/pazz/alot/pull/1055


afew maintainer here ;-)

I'm also very much in favor of a more modern and pythonic interface, and would
gladly support retiring python 2, moving to the new interface.

I had a quick glimpse on the code, and would like to do some annotations. I fear
it's a bit awkward to do this inside the huge patch, which might already have
changed, and send back via email.
Did you publish a changeset to github, or somewhere else where I could comment
on it?

Cheers,
Florian


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: DRAFT Introduce CFFI-based Python bindings

2017-11-29 Thread Floris Bruynooghe
Patrick Totzke  writes:

> Quoting David Bremner (2017-11-28 23:59:26)
>> Floris Bruynooghe  writes:
>> 
>> >
>> > Lastly there are some downsides to the choices I made:
>> > - I ended up going squarely for CPython 3.6+.  Choosing Python
>> >   3 allowed better API design, e.g. with keyword-only parameters
>> >   etc.  Choosing CPython 3.4+ restricts the madness that can
>> >   happen with __del__ and gives some newer (tho now unused)
>> >   features in weakref.finalizer.
>> > - This is no longer drop-in compatible.
>> > - I haven't got to a stage where my initial goal of speed has
>> >   been proven yet.
>> 
>> I guess you'll have to convince the maintainers / users of alot and afew
>> that this makes sense before we go much further. I'd point out that
>> Debian stable is only at python 3.5, so that makes me a bit wary of this
>> (being able to run the test suite on debian stable and similar aged
>> distros useful for me, and I suspect other developers).
>> 
>> I know there are issues with memory management in the current bindings,
>> so that may be a strong reason to push to python 3.6; it seems to need
>> more investigation at the moment.
>> 
>> d
>
>
> I am generally in favour of modernizing the notmuch python bindings,
> especially when it comes to memory management and exception handling.
>
> At the moment, the alot interface officially only supports python v2.7
> but our dependencies have now mostly been updated and we are working on
> port to python 3, see here: https://github.com/pazz/alot/pull/1055
>
> @Floris, you are welcome to join #alot on freenode if you want to
> discuss details on that.
>
> You mention that your new API breaks compatibility with the existing
> ones. Do you have some demo code that uses the new API for reference?

Short, untested, example which works with what's posted:

db = notdb.Database.create()
# or
db = notdb.Database(path=None, mode=notdb.Database.MODE.READ_WRITE)
print(db.path) -> pathlib.Path (a py34 dependency)
if 'unread' in db.tags:  # tags behaves like a set
print('unread mail!')
with db.atomic():
msg = db.add('/path/to/file')
mdg = db.get('/path/to/file')
msg = db.find('some-msgid')
db.remove('path/to/other/file')
# sorry, don't have a query interface yet
assert 'unread' in msg.tags
for tag in msg.tags:
print(f'a tag: {tag}')
with msg.frozen():  # Message.frozen() not yet implemented
msg.tags.clear()  # all set operations supported
msg.tags.add('atag')
msg.tags_to_flags()

I imagine the query interface would be something like:

with db.query('tag:atag') as query:
print(f'results: {query.count}')
for msg in query:
print(msg.path)

But to be honest I've been spending most time on getting the
memory-safety figured (which I hope I finally did) so far and I think
the tag handling is so far the nicest thing to show off.  They're
completely normal Python sets with no special behaviour at all (well,
that's not true - there's the binary interface, see the code posted for
this).

Actually, this last point is kind of important and I failed to mention
it before too.  The existing Python bindings convert many bytes from
libnotmuch to Python strings, that is unicode on Python 3.  For many it
uses b'bytes'.decode('utf-8', errors='ignore') which is a sane default
if you want to display things.  But if you need to round-trip a tag and
store it again you might be changing the tag.  I've not found the right
way to handle binary data (e.g. also needed for messageid) everywhere
yet but for tags I've gone with:

for tag in iter(msg.tags):  # iter() normally called implicitly by for loop
print(f'All unicode this: {tag}')
for tag in msg.tags.iter(encoding=None):
other_msg.tags.add(tag)  # This passes pure bytes around, loses nothing
# What I've used for message ID for now is a "BinString" type
print(f'All just unicode: {msg.messageid}')
binary_msgid = bytes(msg.messageid)  # No lossy conversion

This BinString stuff is somewhat hacky, not sure how sane that is.  The
second iterator on tags feels somewhat cleaner.  Likewise tags could be
BinString as well instead of plain str.

Cheers,
Floris
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: DRAFT Introduce CFFI-based Python bindings

2017-11-29 Thread Floris Bruynooghe
David Bremner  writes:

> Floris Bruynooghe  writes:
>
>>
>> Lastly there are some downsides to the choices I made:
>> - I ended up going squarely for CPython 3.6+.  Choosing Python
>>   3 allowed better API design, e.g. with keyword-only parameters
>>   etc.  Choosing CPython 3.4+ restricts the madness that can
>>   happen with __del__ and gives some newer (tho now unused)
>>   features in weakref.finalizer.
>> - This is no longer drop-in compatible.
>> - I haven't got to a stage where my initial goal of speed has
>>   been proven yet.
>
> I guess you'll have to convince the maintainers / users of alot and afew
> that this makes sense before we go much further. I'd point out that
> Debian stable is only at python 3.5, so that makes me a bit wary of this
> (being able to run the test suite on debian stable and similar aged
> distros useful for me, and I suspect other developers).
>
> I know there are issues with memory management in the current bindings,
> so that may be a strong reason to push to python 3.6; it seems to need
> more investigation at the moment.

So on earlier Python versions, sure this is possible at not too much
cost.

- Python 3.4+ would just cost the use of some f-strings.  Not major, was
  just nice to use.
- Python <3.4 afaik would only need a tweak to the Database.tags and
  Message.tags properties.  I *think* swapping the caching of these
  using a weakref should suffice and not break the brittle
  Python-libnotmuch memory management.
  Mind you I think Python 3.0-3.3 are pretty old and not much point in
  supporting them.  But this would also apply for 2.7 support.
- Python 2.7 is probably the worst, in that keyword-only arguments would
  be gone.  If python 2.7 is required I'd be much keener to have another
  go at a drop-in replacement with the memory safety features and then
  build the "notdb" API on top off it.  But for that to be worth it
  people need to be convinced enough that maintaining a CFFI version is
  nicer than a ctypes version I guess.

Kind regards,
Floris
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: DRAFT Introduce CFFI-based Python bindings

2017-11-29 Thread Patrick Totzke
Quoting David Bremner (2017-11-28 23:59:26)
> Floris Bruynooghe  writes:
> 
> >
> > Lastly there are some downsides to the choices I made:
> > - I ended up going squarely for CPython 3.6+.  Choosing Python
> >   3 allowed better API design, e.g. with keyword-only parameters
> >   etc.  Choosing CPython 3.4+ restricts the madness that can
> >   happen with __del__ and gives some newer (tho now unused)
> >   features in weakref.finalizer.
> > - This is no longer drop-in compatible.
> > - I haven't got to a stage where my initial goal of speed has
> >   been proven yet.
> 
> I guess you'll have to convince the maintainers / users of alot and afew
> that this makes sense before we go much further. I'd point out that
> Debian stable is only at python 3.5, so that makes me a bit wary of this
> (being able to run the test suite on debian stable and similar aged
> distros useful for me, and I suspect other developers).
> 
> I know there are issues with memory management in the current bindings,
> so that may be a strong reason to push to python 3.6; it seems to need
> more investigation at the moment.
> 
> d


I am generally in favour of modernizing the notmuch python bindings,
especially when it comes to memory management and exception handling.

At the moment, the alot interface officially only supports python v2.7
but our dependencies have now mostly been updated and we are working on
port to python 3, see here: https://github.com/pazz/alot/pull/1055

@Floris, you are welcome to join #alot on freenode if you want to
discuss details on that.

You mention that your new API breaks compatibility with the existing
ones. Do you have some demo code that uses the new API for reference?

Cheers,
P


signature.asc
Description: signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: DRAFT Introduce CFFI-based Python bindings

2017-11-28 Thread David Bremner
Floris Bruynooghe  writes:

>
> Lastly there are some downsides to the choices I made:
> - I ended up going squarely for CPython 3.6+.  Choosing Python
>   3 allowed better API design, e.g. with keyword-only parameters
>   etc.  Choosing CPython 3.4+ restricts the madness that can
>   happen with __del__ and gives some newer (tho now unused)
>   features in weakref.finalizer.
> - This is no longer drop-in compatible.
> - I haven't got to a stage where my initial goal of speed has
>   been proven yet.

I guess you'll have to convince the maintainers / users of alot and afew
that this makes sense before we go much further. I'd point out that
Debian stable is only at python 3.5, so that makes me a bit wary of this
(being able to run the test suite on debian stable and similar aged
distros useful for me, and I suspect other developers).

I know there are issues with memory management in the current bindings,
so that may be a strong reason to push to python 3.6; it seems to need
more investigation at the moment.

d
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


DRAFT Introduce CFFI-based Python bindings

2017-11-28 Thread Floris Bruynooghe
Hi all,

Here are the beginnings off CFFI-based Python bindings, rather
than the ctypes-based ones currently available.  I started this
work in order to get faster bindings on pypy since a script of
mine was running slower on pypy than CPython.  Initially aiming
for a drop-in replacement of the existing bindings I ended up
abandoning this to help enforce correct usage of the API.

The benefits of this approach are:
- More "Pythonic" API, e.g. tags behave like sets, iterators
  which get consumed can easily be re-created as is usual with
  collections, avoid allowing invalid combinations of args and
  calls on a Python-API level.
- CFFI, this works on both CPython and PyPy, on the latter it
  is (supposed to be) a lot faster as the JIT can cross the
  boundary between C and Python code where it otherwise has
  extra overheads to emulate the C-Python API.  Additionally
  it makes it safer to use compared to ctypes, it works on the
  API level using the compiler to figure out the correct details
  of the platform.  Compared to ctypes which only works on the
  ABI level and you need to rely on knowing the layout of code
  when writing the Python bindings.

Additionaly I belive these bindings fix a memory safety issue,
certain situations in my test-suite would lead to coredump which
is not something which should be possible from within Python.
I believe I have seen similar reports in the list archives so
am not the only one seeing these.  Sadly these are hard to
isolate and I have not managed to re-create this in a nice
minimal example, however I believe the root cause is that in
some situations, mostly interpreter shutdown, the __del__
method can have been called while there are still references
to the object and while child-objects are still alive.  This
effectively results in double-frees as the child object frees
memory already freed by the parent.  These bindings solve this
by adding the .alive property and using this to check parent
objects are still alive before destroying themselves.  This is
somewhat expensive, but works and is easy to implement.

Lastly there are some downsides to the choices I made:
- I ended up going squarely for CPython 3.6+.  Choosing Python
  3 allowed better API design, e.g. with keyword-only parameters
  etc.  Choosing CPython 3.4+ restricts the madness that can
  happen with __del__ and gives some newer (tho now unused)
  features in weakref.finalizer.
- This is no longer drop-in compatible.
- I haven't got to a stage where my initial goal of speed has
  been proven yet.

In theory I think it's possible to create a CFFI-based drop-in
replacement to the bindings, only adding the memory-safety fixes
and keeping the Python 2.7 compatibility.  It would then be
possible to build the API proposed in these bindings on top of
this, but once I was making these bindings safer it felt strange
to still allow the API to be misused.


There are a lot of details about this which can be discussed,
also many finer implementation points and even just getting the
proposed API right (you'll notice large gaps for now).  But
this mail is already too long.  I look forward to your comments
and feedback on the approach taken and on whether some form
of this could make it into the main repo.


Lastly a small note on the AUTHORS file patch, due to my own
unfortunate choice of employer I have strict rules to follow
on how to submit patches.  One of which is to add this line if
an AUTHORS file exists.  Given clearly not everyone is listed
here though maybe this is not appropriate.  I would also rather
receive email on f...@devork.be rather than the address I have
to use in the git commits.


Kind Regards,
Floris


___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch