More thoughts:

Regarding the int/long and str/unicode dichotomy in Python 2/3, the solution that is the most unsurprising and convenient for PyGreSQL users is to return the native types int and str in both Python 2 (as before) and now also in Python 3.

The problem of course is that int will be actually long and str will be unicode under the hood in Python 3. So it will be inconvenient for us PyGreSQL developers since we need to add case distinctions in the PyGreSQL C code to reflect that, but it's better that things are inconvenient for developers than for users. My hope is also that these case distinctions can be minimized by using clever compatibility macros from py3c.

The major difficulty is that since Python 3 str are actually unicode, but Postgres continues to provide only byte strings, we need to decode the byte strings, and the question is which encoding to use.

My suggestion is that for a first proof of concept, we should hardcode the encoding to UTF8. This would work for UTF8 and ASCII databases since UTF8 is ASCII-transparent, and thus cover the majority of cases, but of course it would break for other encodings like LATIN1.

The proper solution that can be implemented in a next step would be to 1. get the client encoding using PQparameterStatus (maybe also make this available to users as a property of the connection), 2. add an internal mapping from Postgres encoding names to Python encoding names (since some need translation), 3. use this for decoding byte strings in Python 3. As far as I see, this is also how psycopg2 is doing it.

In another step, we could even add configuration settings that allow Python 2 users to receive unicode instead of str or Python 3 users to receive bytes instead of str, if you want to deviate from the default (to get more convenience in Python 2 or more performance in Python 3).

Sounds good?

-- Christoph

_______________________________________________
PyGreSQL mailing list
[email protected]
https://mail.vex.net/mailman/listinfo.cgi/pygresql

Reply via email to