More thoughts:
Regarding the int/long and str/unicode dichotomy in Python 2/3, the
solution that is the most unsurprising and convenient for PyGreSQL users
is to return the native types int and str in both Python 2 (as before)
and now also in Python 3.
The problem of course is that int will be actually long and str will be
unicode under the hood in Python 3. So it will be inconvenient for us
PyGreSQL developers since we need to add case distinctions in the
PyGreSQL C code to reflect that, but it's better that things are
inconvenient for developers than for users. My hope is also that these
case distinctions can be minimized by using clever compatibility macros
from py3c.
The major difficulty is that since Python 3 str are actually unicode,
but Postgres continues to provide only byte strings, we need to decode
the byte strings, and the question is which encoding to use.
My suggestion is that for a first proof of concept, we should hardcode
the encoding to UTF8. This would work for UTF8 and ASCII databases since
UTF8 is ASCII-transparent, and thus cover the majority of cases, but of
course it would break for other encodings like LATIN1.
The proper solution that can be implemented in a next step would be to
1. get the client encoding using PQparameterStatus (maybe also make this
available to users as a property of the connection), 2. add an internal
mapping from Postgres encoding names to Python encoding names (since
some need translation), 3. use this for decoding byte strings in Python
3. As far as I see, this is also how psycopg2 is doing it.
In another step, we could even add configuration settings that allow
Python 2 users to receive unicode instead of str or Python 3 users to
receive bytes instead of str, if you want to deviate from the default
(to get more convenience in Python 2 or more performance in Python 3).
Sounds good?
-- Christoph
_______________________________________________
PyGreSQL mailing list
[email protected]
https://mail.vex.net/mailman/listinfo.cgi/pygresql