[Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3

P.J. Eby Tue, 21 Sep 2010 09:14:31 -0700

While the Web-SIG is trying to hash out PEP 444, I thought it wouldbe a good idea to have a backup plan that would allow the Python 3stdlib to move forward, without needing a major new spec to settleout implementation questions.

After all, even if PEP 333 is ultimately replaced by PEP 444, it'sprobably a good idea to have *some* sort of WSGI 1-ish thingavailable on Python 3, with bytes/unicode and other matters settled.

In the past, I was waiting for some consensuses (consensi?) onWeb-SIG about different approaches to Python 3, looking for some sortof definite, "yes, we all like this" response. However, I can seenow that this just means it's my fault we don't have a spec yet. :-(

So, unless any last-minute showstopper rebuttals show up this week,I've decided to go ahead officially bless nearly all of what GrahamDumpleton (who's not only the mod_wsgi author, but has put hugeamounts of work into shepherding WSGI-on-Python3 proposals, WSGIamendments, etc.) has proposed, with a few minor exceptions.

In other words: almost none of the following is my own original work;it's like 90% Graham's. Any praise for this belongs to him; the onlything that belongs to me is the blame for not doing thissooner! (Sorry Graham. You asked me to do this ages ago, and you were right.)

Anyway, I'm posting this for comment to both Python-Dev and theWeb-SIG. If you are commenting on the technical details of theamendments, please reply to the Web-SIG only. If you are commentingon the development agenda for wsgiref or other Python 3 libraryissues, please reply to Python-Dev only. That way, neither list willsee off-topic discussions. Thanks!



The Plan
========

I plan to update the proposal below per comments and feedback duringthis week, then update PEP 333 itself over the weekend or early nextweek, followed by a code review of Python 3's wsgiref, andimplementation of needed changes (such as recoding os.environ tolatin1-captured bytes in the CGI handler).

To complete the changes, it is possible that I may need assistancefrom one or more developers who have more Python 3 experience. Ifafter reading the proposed changes to the spec, you would like tovolunteer to help with updating wsgiref to match, please let me know!



The Proposal
============


Overview
--------

1. The primary purpose of this update is to provide a uniform portingpattern for moving Python 2 WSGI code to Python 3, meaning a patternof changes that can be mechanically applied to as little code aspractical, while still keeping the WSGI spec easy to programmaticallyvalidate (e.g. via ``wsgiref.validate``).


The Python 3 specific changes are to use:

* ``bytes`` for I/O streams in both directions
* ``str`` for environ keys and values
* ``bytes`` for arguments to start_response() and write()
* text stream for wsgi.errors

In other words, "strings in, bytes out" for headers, bytes for bodies.

In general, only changes that don't break Python 2 WSGIimplementations are allowed. The changes should also not breakmod_wsgi on Python 3, but may make some Python 3 wsgi applicationsnon-compliant, despite continuing to function on mod_wsgi.

This is because mod_wsgi allows applications to output string headersand bodies, but I am ruling that option out because it forces everypiece of middleware to have to be tested with arbitrary combinationsof strings and bytes in order to test compliance. If you want yourapplication to output strings rather than bytes, you can always use adecorator to do that. (And a sample one could be provided in wsgiref.)

2. The secondary purpose of the update is to address somelong-standing open issues documented here:


   http://www.wsgi.org/wsgi/Amendments_1.0

As with the Python 3 changes, only changes that don't retroactivelyinvalidate existing implementations are allowed.

3. There is no tertiary purpose. ;-) (By which I mean, all otherkinds of changes are out-of-scope for this update.)

4. The section below labeled "A Note On String Types" is proposed forverbatim addition to the "Specification Overview" section in the PEP;the other sections below describe changes to be made inline at theappropriate part of the spec, and changes that were proposed but arerejected for inclusion in this amendment.



A Note On String Types
----------------------

In general, HTTP deals with bytes, which means that thisspecification is mostly about handling bytes.

However, the content of those bytes often has some kind of textualinterpretation, and in Python, strings are the most convenient way tohandle text.

But in many Python versions and implementations, strings are Unicode,rather than bytes. This requires a careful balance between a usableAPI and correct translations between bytes and text in the context ofHTTP... especially to support porting code between Pythonimplementations with different ``str`` types.


WSGI therefore defines two kinds of "string":

* "Native" strings (which are always implemented using the type named ``str``)

* "Bytestrings" (which are implemented using the ``bytes`` type inPython 3, and ``str`` elsewhere)

So, even though HTTP is in some sense "really just bytes", there aremany API conveniences to be had by using whatever Python's default``str`` type is.

Do not be confused however: even if Python's ``str`` is actuallyUnicode under the hood, the *content* of a native string is stillrestricted to bytes! See the section on `Unicode Issues`_ later inthis document.

In short: where you see the word "string" in this document, it refersto a "native" string, i.e., an object of type ``str``, whether it isinternally implemented as bytes or unicode. Where you see referencesto "bytestring", this should be read as "an object of type ``bytes``under Python 3, or type ``str`` under Python 2".



Clarifications (To be made in-line)
-----------------------------------

The following amendments are clarifications to parts of the existingspec that proved over the years to be ambiguous or insufficientlyspecified, as well as some attempts to correct practical errors.

(Note: many of these issues cannot be completely fixed in WSGI 1without breaking existing implementations, and so the text below hasnotations such as "(MUST in WSGI 2)" to indicate where anyreplacement spec for WSGI 1 should strengthen them.)

* If an application returns a body iterator, a server (or middleware)MAY stop iterating over it and discard the remainder of the output,as long as it calls any close() method provided by theiterator. Applications returning a generator or other customiterator SHOULD NOT assume that the entire iterator will beconsumed. (This change makes it explicit that caching middleware orHEAD-processing servers can throw away the response body.)

* start_response() SHOULD (MUST in WSGI 2) check for errors in thestatus or headers at the time it's called, so that an error can beraised as close to the problem as possible

* If start_response() raises an error when called normally (i.e.without exc_info), it SHOULD be an error to call it a second timewithout passing exc_info

* The SERVER_PORT variable is of type str, just like any other CGIenviron variable. (According to the WSGI wiki, "someimplementations" expect it to be an integer, even though there isnothing in the WSGI spec that allows a CGI variable to be anything but a str.)

* A server SHOULD (MUST in WSGI 2) support the size hint argument toreadline() on its wsgi.input stream.

* A server SHOULD (MUST in WSGI 2) return an empty bytestring fromread() on wsgi.input to indicate an end-of-file condition. (In WSGI2, language should be clarified to allow the input stream length andCONTENT_LENGTH to be out of sync, for reasons explained in Graham's blog post.)

* A server SHOULD (MUST in WSGI 2) allow read() to be called withoutan argument, and return the entire remaining contents of the stream

* If an application provides a Content-Length header, the serverSHOULD NOT (MUST NOT in WSGI 2) send more data to the client than wasspecified in that header, whether via write(), yielded bodybytestrings, or via a wsgi.file_wrapper. (This rule applies tomiddleware as well.)


* wsgi.errors is a text stream accepting "native strings"



Rejected Amendments
-------------------

* Manlio Perillo's suggestion to allow header specification to bedelayed until the response iterator is producing non-emptyoutput. This would've been a possible win for async WSGI, but couldrequire substantial changes to existing servers.


_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

[Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3

Reply via email to