[Web-SIG] PEP 444

Alice Bevan-McGregor Sun, 21 Nov 2010 03:13:45 -0800

(A version of this is is available at http://web-core.org/2.0/pep-0444/ — links 
are links, code may be easier to read.)


PEP 444 is quite exciting to me.  So much so that I’ve been spending a few days 
writing a high-performance (C10K, 10Krsec) Py2.6+/3.1+ HTTP/1.1 server which 
implements much of the proposed standard.  The server is functional (less 
web3.input at the time of this writing), but differs from PEP 444 in several 
ways.  It also adds several features I feel should be part of the spec.

Source for the server is available on GitHub:

        https://github.com/pulp/marrow.server.http

I have made several notes about the PEP 444 specification during implementation 
of the above, and concern over some implementation details:

First, async is poorly defined:

> If the origin server advertises that it has the web3.async capability, a Web3 
> application callable used by the server is permitted to return a callable 
> that accepts no arguments. When it does so, this callable is to be called 
> periodically by the origin server until it returns a non-None response, which 
> must be a normal Web3 response tuple.

Polling is not true async.  I believe that it should be up to the server to 
define how async is utilized, and that the specification should be clarified on 
this point.  (“Called periodically” is too vague.)  “Callable” should likely be 
redefined as “generator” (a callable that yields) as most applications require 
holding on to state and wrapping everything in functools.partial() is somewhat 
ugly.  Utilizing generators would improve support for existing Python async 
frameworks, and allow four modes of operation: yield None (no response, keep 
waiting), yield response_tuple (standard response), return / raise 
StopIteration (close the async connection) and allow for data to be passed back 
to the async callable by the higher-level async framework.

Second, WSGI middleware, while impressive in capability, are somewhat… 
heavy-weight.  Heavily nesting function calls is wasteful of CPU and RAM, 
especially if the middleware decides it can’t operate, for example, GZip 
compression disabling itself for non-text/ mimetypes.  The majority of WSGI 
middleware can, and probably should be, implemented as linear ingress or egress 
filters.  For example, on-disk static file serving could be an ingress filter, 
and GZip compression an egress filter.  m.s.http supports this filtering and 
demonstrates one API for such.  Also, I am in the process of writing an example 
egress CompressionFilter.

An example API and filter use implementation: (paraphrased from 
marrow.server.http)

> # No filters, near 0 overhead.
> for filter_ in ingress_filters:
>     # Can mutate the environment.
>     result = filter_(env)
>     
>     # Allow the filter to return a response rather than continuing.
>     if result:
>         # result is a status, headers, body_iter tuple
>         return result[0], result[1], result[2]
> 
> status, headers, body = application(env)
> 
> for filter_ in egress_filters:
>     # Can mutate the environment, status, headers, body, or
>     # return completely new status, headers, and body.
>     status, headers, body = filter_(env, status, headers, body)
> 
> return status, headers, body

The environment has some minor issues.  I’ll write up my changes in RFC-style:

SERVER_NAME is REQUIRED and MUST contain the DNS name of the server OR virtual 
server name for the web server if available OR an empty bytestring if DNS 
resolution is unavailable.  SERVER_ADDR is REQUIRED and MUST contain the web 
server’s bound IP address.  URL reconstruction SHOULD use HTTP_HOST if 
available, SERVER_NAME if there is no HTTP_HOST, and fall back on SERVER_ADDR 
if SERVER_NAME is an empty bytestring.

CONTENTL_LENGTH is REQUIRED and MUST be None if not defined by the client.  
Testing explicitly for None is more efficient than armoring against missing 
values; also, explicit is better than implicit.  (Paste’s WSGI1 server defines 
CONTENT_LENGTH as 0, but this implies the client explicitly declared it as 
zero, which is not the case.)

FRAGMENT and PARAMETERS are REQUIRED and are parsed out of the URL in the same 
way as the QUERY_STRING. FRAGMENT is the text after a hash mark (a.k.a. 
“anchor” to browsers, e.g. /foo#bar). PARAMETERS come before QUERY_STRING, and 
after PATH_INFO separated by a semicolon, e.g. /foo;bar?baz.  Both values MUST 
be empty bytestrings if not present in the URL. (Rarely used — I’ve only seen 
it in Java and ColdFusion applications — but still useful.)

Points of contention:

Changing the namespace seems needless.  Using the wsgi.* namespace with a 
wsgi.version of (2, 0) will allow applications to easily armor themselves 
against incompatible use.  That’s what wsgi.version is for!  I’d add this as a 
strong “point of contention”.  m.s.http keeps the wsgi namespace and uses a 
version of (2, 0).

That’s it so far.  I may occasionally write in with additional ideas as I 
continue with my HTTP server implementation.

        — Alice.

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

[Web-SIG] PEP 444

Reply via email to