At 09:51 PM 1/4/2011 +1100, Graham Dumpleton wrote:
Add another point. FWIW, these are coming up because of questions being asked on python-dev IRC channel about PEP 3333.The issue as it came down to was that the PEP may not be clear enough in explaining that where str() is unicode and as such something like PATH_INFO, although unicode, is actually bytes decoded as ISO-8859-1, needed to be re encoded/decoded to get it back to Unicode in the charset required before use. They were thinking that because it was unicode already they could use it as is and not need to do anything. Ie., didn't realise that need to do: path_info = environ.get('PATH_INFO', '') path_info = path_info.encode('ISO-8859-1').decode('UTF-8') for example to get it interpreted as UTF-8 first. They were simply looking at concatenating new URL bits to the ISO-8859-1 variant from other unicode strings that weren't bytes represented as ISO-8859-1. In Python 2.X it was obvious that since it wasn't unicode that you had to decode it, but confusion may arise for Python 3.X if this requirement is not explicitly spelled out with a code example like above. We all may see it as obvious and yes perhaps it could be covered in separate articles or commentaries be people, but given this person was new to it, maybe it is deserving of more explanation in the PEP itself if they were confused.
It would be really awesome if somebody would write separate Application Authors' Guide and Middleware Authors' Guides to WSGI. They don't need to know absolutely everything in the PEP, unlike server authors.
It could also be that the PEP covers it adequately already. I am too tired to read through it again right now.
It's pretty prominently stated early on that NO strings in the spec are really unicode, they're just bytes packed into unicode objects.
Obviously, no matter how prominently this is stated, some people will still make this mistake, but if desired, we could always put some additional info near the environ part of the spec for clarification.
(It occurs to me in retrospect that I should probably have updated wsgiref in the stdlib to check the bytesy-ness of strings used to create Header objects. Too late for 3.2, though.)
_______________________________________________ Web-SIG mailing list [email protected] Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
