[Phillip] >> WSGI already copes, actually. Note that Jython and IronPython have >> this issue today, and see: >> >> http://www.python.org/dev/peps/pep-0333/#unicode-issues
[James] > It would seem very odd, however, for WSGI/python3 to use strings- > restricted-to-0xFF for network I/O while everywhere else in python3 is > going to use bytes for the same purpose. I think it's worth pointing out the reason for the current restriction to iso-8859-1 is *because* python did not have a bytes type at the time the WSGI spec was drawn up. IIRC, the bytes type had not yet even been proposed for Py3K. Cpython effectively held all byte sequences as strings, a paradigm which is (still) followed by jython (not sure about ironpython). The restriction to iso-8859-1 is really a distraction; iso-8859-1 is used simply as an identity encoding that also enforces that all "bytes" in the string have a value from 0x00 to 0xff, so that they are suitable for byte-oriented IO. So, in output terms at least, WSGI *is* a byte-oriented protocol. The problem is the python-the-language didn't have support for bytes at the time WSGI was designed. [James] > You'd have to modify your app > to call write(unicodetext.encode('utf-8').decode('latin-1')) or so.... Did you mean: write(unicodetext.encode('utf-8').encode('latin-1'))? Either way, the second encode is not required; write(unicodetext.encode('utf-8')) is sufficient, since it will generate a byte-sequence(string) which will (actually "should": see (*) note below) pass the following test. try: wsgi_response_data.encode('iso-8859-1') except UnicodeError: # Illegal WSGI response data! On a side note, it's worth noting that Philip Jenvey's excellent rework of the jython IO subsystem to use java.nio is fundamentally byte oriented. http://www.nabble.com/fileno-support-is-not-in-jython.-Reason--t4750734.html http://fisheye3.cenqua.com/browse/jython/trunk/jython/src/org/python/core/io Because it is based on the new IO design for Python 3K, as described in PEP 3116 http://www.python.org/dev/peps/pep-3116/ Regards, Alan. [*] Although I notice that cpython 2.5, for a reason I don't fully understand, fails this particular encoding sequence. (Maybe it's to do with the possibility that the result of an encode operation is no longer an encodable string?) Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> response = u"interferon-gamma (IFN-\u03b3) responses in cattle" >>> response.encode('utf-8').encode('latin-1') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 22: ordinal not in range(128) >>> Meaning that to enforce the WSGI iso-8859-1 convention on cpython 2.5, you would have to carry out this rigmarole >>> response.encode('utf-8').decode('latin-1').encode('latin-1') 'interferon-gamma (IFN-\xce\xb3) responses in cattle' >>> Perhaps this behaviour is an artifact of the cpython implementation? Whereas jython passes it just fine (and correctly, IMHO) Jython 2.2.1 on java1.4.2_15 Type "copyright", "credits" or "license" for more information. >>> response = u"interferon-gamma (IFN-\u03b3) responses in cattle" >>> response.encode('utf-8') 'interferon-gamma (IFN-\xCE\xB3) responses in cattle' >>> response.encode('utf-8').encode('latin-1') 'interferon-gamma (IFN-\xCE\xB3) responses in cattle' >>> _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com