New submission from Nick Coghlan:

The WSGI 1.1 standard mandates that binary data be decoded as latin-1 text: 
http://www.python.org/dev/peps/pep-3333/#unicode-issues

This means that many WSGI headers will in fact contain *improperly encoded 
data*. Developers working directly with WSGI (rather than using a WSGI 
framework like Django, Flask or Pyramid) need to convert those strings back to 
bytes and decode them properly before passing them on to user applications.

I suggest adding a simple "fix_encoding" function to wsgiref that covers this:

    def fix_encoding(data, encoding, errors="surrogateescape"):
        return data.encode("latin-1").decode(encoding, errors)

The primary intended benefit is to WSGI related code more self-documenting. 
Compare the proposal with the status quo:

    data = wsgiref.fix_encoding(data, "utf-8")
    data = data.encode("latin-1").decode("utf-8", "surrogateescape")

The proposal hides the mechanical details of what is going on in order to 
emphasise *why* the change is needed, and provides you with a name to go look 
up if you want to learn more.

The latter just looks nonsensical unless you're already familiar with this 
particular corner of the WSGI specification.

----------
messages: 225814
nosy: ncoghlan
priority: normal
severity: normal
status: open
title: Add wsgiref.fix_encoding
type: enhancement
versions: Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22264>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to