New submission from Nick Coghlan: The WSGI 1.1 standard mandates that binary data be decoded as latin-1 text: http://www.python.org/dev/peps/pep-3333/#unicode-issues
This means that many WSGI headers will in fact contain *improperly encoded data*. Developers working directly with WSGI (rather than using a WSGI framework like Django, Flask or Pyramid) need to convert those strings back to bytes and decode them properly before passing them on to user applications. I suggest adding a simple "fix_encoding" function to wsgiref that covers this: def fix_encoding(data, encoding, errors="surrogateescape"): return data.encode("latin-1").decode(encoding, errors) The primary intended benefit is to WSGI related code more self-documenting. Compare the proposal with the status quo: data = wsgiref.fix_encoding(data, "utf-8") data = data.encode("latin-1").decode("utf-8", "surrogateescape") The proposal hides the mechanical details of what is going on in order to emphasise *why* the change is needed, and provides you with a name to go look up if you want to learn more. The latter just looks nonsensical unless you're already familiar with this particular corner of the WSGI specification. ---------- messages: 225814 nosy: ncoghlan priority: normal severity: normal status: open title: Add wsgiref.fix_encoding type: enhancement versions: Python 3.5 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue22264> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com