New submission from Nick Coghlan:
The WSGI 1.1 standard mandates that binary data be decoded as latin-1 text:
http://www.python.org/dev/peps/pep-3333/#unicode-issues
This means that many WSGI headers will in fact contain *improperly encoded
data*. Developers working directly with WSGI (rather than using a WSGI
framework like Django, Flask or Pyramid) need to convert those strings back to
bytes and decode them properly before passing them on to user applications.
I suggest adding a simple "fix_encoding" function to wsgiref that covers this:
def fix_encoding(data, encoding, errors="surrogateescape"):
return data.encode("latin-1").decode(encoding, errors)
The primary intended benefit is to WSGI related code more self-documenting.
Compare the proposal with the status quo:
data = wsgiref.fix_encoding(data, "utf-8")
data = data.encode("latin-1").decode("utf-8", "surrogateescape")
The proposal hides the mechanical details of what is going on in order to
emphasise *why* the change is needed, and provides you with a name to go look
up if you want to learn more.
The latter just looks nonsensical unless you're already familiar with this
particular corner of the WSGI specification.
----------
messages: 225814
nosy: ncoghlan
priority: normal
severity: normal
status: open
title: Add wsgiref.fix_encoding
type: enhancement
versions: Python 3.5
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue22264>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com