I've uncovered what seems to me to a problem with python Unicode string objects passed to extension modules. Or perhaps it's revealing a misunderstanding on my part :-) So I would like to get some clarification.
Extension modules written in C receive strings from python via the PyArg_ParseTuple family. Most extension modules use the 's' or 's#' format parameter. Many C libraries in Linux use the UTF-8 encoding. The 's' format when passed a Unicode object will encode the string according to the default encoding which is immutably set to 'ascii' in site.py. Thus a C library expecting UTF-8 which uses the 's' format in PyArg_ParseTuple will get an encoding error when passed a Unicode string which contains any code points outside the ascii range. Now my questions: * Is the use of the 's' or 's*' format parameter in an extension binding expecting UTF-8 fundamentally broken and not expected to work? Instead should the binding be using a format conversion which specifies the desired encoding, e.g. 'es' or 'es#'? * The extension modules could successfully use the 's' or 's#' format conversion in a UTF-8 environment if the default encoding was UTF-8. Changing the default encoding to UTF-8 would in one easy stroke "fix" most extension modules, right? Why is the default encoding 'ascii' in UTF-8 environments and why is the default encoding prohibited from being changed from ascii? * Did Python 2.5 introduce anything which now makes this issue visible whereas before it was masked by some other behavior? Summary: Python programs which use Unicode string objects for their i18n and which "link" to C libraries expecting UTF-8 but which have a CPython binding which only uses 's' or 's#' formats programs seem to often fail with encoding errors. However, I have yet to see a CPython binding which does explicitly define it's encoding requirements. This suggests to me I either do not understand the issue in it's entirety or many CPython bindings in Linux UTF-8 environments are broken with respect to their i18n handling and the problem is currently not addressed. -- John Dennis <[EMAIL PROTECTED]> _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com