Re: [Web-SIG] PEP 444 / WSGI 2 Async

Alice Bevan–McGregor Wed, 05 Jan 2011 21:56:18 -0800

Alex Grönholm and I have been discussing async implementation details(and other areas of PEP 444) for some time on IRC. Below is thecleaned up log transcriptions with additional notes where needed.

Note: The logs are in mixed chronological order — discussion of onetopic is chronological, potentially spread across days, but separatetopics may jump around a bit in time. Because of this I haveeliminated the timestamps as they add nothing to the discussion.Dialogue in square brackets indicates text added after-the-fact forclarity. Topics are separated by three hyphens. Backslashes indicatejoined lines.

This should give a fairly comprehensive explanation of the rationalebehind some decisions in the rewrite; a version of these conversations(in narrative style vs. discussion) will be added to the rewrite RealSoon Now™ under the Rationale section.


        — Alice.


--- General

agronholm: my greatest fear is that a standard is adopted that does notsolve existing problems

GothAlice: [Are] there any guarantees as to which thread / process acallback [from the future instance] will be executed in?




--- 444 vs. 3333

agronholm: what new features does pep 444 propose to add to pep 3333? \async, filters, no buffering?

GothAlice: Async, filters, no server-level buffering, native stringusage, the definition of "byte string" as "the format returned bysocket read" (which, on Java, is unicode!), and the allowance forreturned data to be Latin1 Unicode. \ All of this together will allow a'''def hello(environ): return "200 OK", [], ["Hello world!"]''' exampleapplication to work across Python versions without modification (or useof b"" prefix)


agronholm: why the special casing for latin1 btw? is that an http thing?

GothAlice: Latin1 = \u0000 → \u00FF — it's one of the only formats thatcan be decoded while preserving raw bytes, and if another encoding isneeded, transcode safely. \ Effectively requiring Latin1 for unicodeoutput ensures single byte conformance on the data. \ If an applicationneeds to return UTF-8, for example, it can return an encoded UTF-8bytestream, which will be passed right through,




--- Filters

agronholm: regarding middleware, you did have a point there --exception handling would be pretty difficult with ingress/egress filters


GothAlice: Yup.  It's pretty much a do or die scenario in filter-land.

agronholm: but if we're not ditching middleware, I wonder about theoverall benefits of filtering \ it surely complicates the scenario soit'd better be worth it \ I don't so much agree with your reasoningthat [middleware] complicates debugging \ I don't see any obviousperformance improvements either (over middleware)

GothAlice: Simplified debugging of your application w/ reduced stack tosort through, reduced nested stack overhead (memory allocationimprovement), clearer separation of tasks (egress compression is a goodexample). This follows several of the Zen of Python guidelines: \Simple is better than complex. \ Flat is better than nested. \ Thereshould be one-- and preferably only one --obvious way to do it. \ Ifthe implementation is hard to explain, it's a bad idea. \ If theimplementation is easy to explain, it may be a good idea.

agronholm: I would think that whatever memory the stack elementsconsume is peanuts compared to the rest of the application \ingress/egress isn't exactly simpler than middleware

GothAlice: The implementation for ingress/egress filters is two lineseach: a for loop and a call to the elements iterated over. Can't getmuch simpler or easier to explain. ;) \ Middleware is pretty complex…\ The majority of ingress filters won't have to examine wsgi.input, andsupporting async on egress would be relatively easy for the filters(pass-through non-bytes data in body_iter). \ If you look at a systemthat offers input filtering, output filtering, and decorators(middleware), modifying input should "obviously" be an input filter,and vice-versa.

agronholm: how does a server invoke the ingress filters \ in myopinion, both ingress and egress filters should essentially be pipes \compression filters are a good example of this \ once a block ofrequest data (body) comes through from the client, it should be sentthrough the filter chain

agronholm: consider an application that receives a huge gzip encodedupload \ the decompression filter decompresses as much as it can usingthe incoming data \ the application only gets the next block once thedecompression filter has enough raw data to decompress

GothAlice: Ingress decompression, for example, would accept the environargument, detect gzip content-encoding, then decompress the wsgi.inputinto its own buffer, and finally replace wsgi.input in the environ withits decompressed version. \ Alternatively, it could decompress chunksand have a more intelligent replacement for wsgi.input (to delaydecompression until it is needed).

agronholm: are you saying that the filter should decompress all of thedata at once? how would this work with async?

GothAlice: The first example is the easiest to implement, but you arecorrect in that it would buffer all the data up-front. The second Idescribed (intelligent wsgi.input replacement) would work in an asyncapplication environment. (But would be harder to code and unit-test.)


agronholm: I don't really see how it would work

GothAlice: environ = parse_headers() ; decompression_filter(environ)

agronholm: wouldn't it be simpler to just have ingress filters returnthe data chunk, altered or not?

GothAlice: decompression_filter(environ): ifenviron.get('HTTP_TRANSFER_ENCODING', None) == 'gzip':environ['wsgi.input'] = StreamDecompression(environ['wsgi.input'])

agronholm: I'm not very comfortable with the idea of wsgi.input inasync apps \ I'm just thinking what would happen when you doenviron['wsgi.input'].read()

GothAlice: One of two things: in a sync environment, it blocks until itcan read, in an async environment [combined with yield] itpauses/shelves your application until the data is available.

agronholm: I'd rather do away with wsgi.input altogether, but I haven'tyet figured out how the application would read the entire request bodythen


agronholm: it should be fairly easy to write a helper function for that though

GothAlice: Returning the internal socket representation would improvesome things, and make things generally worse. :/


agronholm: returning socket from what?

GothAlice: In Tornado's HTTP server, you read and write directlyfrom/to the IOStream. \ wsgi.input, though, is more abstracted


agronholm: argh, I can't think of a way to make this work beautifully

GothAlice: Yeah.  :(

agronholm: the requirements of async apps are a big problem

agronholm: returning magic values from the app sounds like a bad idea

agronholm: the best solution I can come up with is to havewsgi.async_input or something, which returns an async token for anygiven read operation

agronholm: most filters only deal with the headers \ so what if we madeit so that the filter chain is only accessed once, and filters thatneed to modify the body as well would return a generator \ and when theserver receives more data, it would feed it to the first generator inthe chain, feed the results from that to the next etc.

agronholm: the generators could also return futures, at which point theserver adjourns processing of the chain until the callback fires \ inmultithreaded mode, the server would simply call .result() which wouldblock, and in single threaded mode, add a callback to the reactor


GothAlice: Hmm.

agronholm: the ingress filters' return values would affect what is sentto the application

agronholm: [I'm] trying to solve the inherent difficulties with havinga file-like object in the environ \ my solution would allow them towork transparently with sync and async apps alike

GothAlice: Hmm. What would the argspec of an ingress filter be, then?(The returned value, via yield, being wsgi.input chunks.)


agronholm: probably environ, body_generator or something

agronholm: the beauty in wsgi in general is of course that it requiresno importing of predefined functions or anything \ so there should besome way for the application to read the entire request at once

GothAlice: I think combining wsgi.async with specific attributes onwsgi.input which can be yielded as async tokens might be a way to go.

GothAlice: agronholm: yielding None from the application being a politeway to re-schedule the application after a reactor cycle to give otherconnections a chance before doing something potentially blocking.

agronholm: I thought None meant "I'm done here" \ otoh, the app has toreturn some response

GothAlice: That's yielding an application response tuple followed byStopIteration. \ (Not necessarily immediately returning StopIterationafter yielding the response; there may be clean-up to do; which is anice addition.)

GothAlice: Three options: yield None (reschedule to be nice/cooperativebehaviour), yield status, headers, body (deliver a response), and yieldAsyncToken.

agronholm: so what would the application yield if it wanted to generatethe body in chunks? (potentially a slow process)

GothAlice: A body_iter that generates the body in chunks, as per astandard (non-generator) application callable. \ That wouldn't change.\ But often an application would want to async stream the response bodyin before starting body generation.

GothAlice: An application MUST be a callable returning (status_bytes,header_list, body_iter) OR a generator. IF the application is agenerator, it MUST yield EITHER None (delay execution), a(status_bytes, header_list, body_iter) tuple, or an async token. Afteryielding a response the application generator MAY perform additionalactions before raising StopIteration, but MUST NOT yield anything butNone or async tokens from that point onward.

agronholm: one of my concerns is how a request body modifyingmiddleware will work with async apps unless it's specifically designedwith those in mind \ you suggested that such middleware replacewsgi.input with their own

GothAlice: It would have to be; or it could simply yield throughnon-bytes chunks, returning the result of the yield back up (which maybe ignored).

agronholm: what guarantee is there that the replacement has.async_read() unless the filter was specifically designed to be asyncaware?

GothAlice: Or, if the developer was in a particularly black mood, themiddleware could re-set wsgi.async to be false. ;)


agronholm: I don't quite understand the meaning or point of wsgi.async

GothAlice: wsgi.async is a boolean representing the capability of theunderlying server to accept async tokens.

agronholm: why would that ever be false? \ in a blocking/threadedserver, implementing support for that is trivial

GothAlice: Why does no HTTP server in Python conform to the HTTP/1.1spec properly? Lazy developers. ;) [And lack of interestdown-stream. Calling server authors idiots was not my intention.]


agronholm: they could just as well forgo setting wsgi.async altogether

GothAlice: environ.get('wsgi.async', False) is the only way to armoragainst that, I guess.

agronholm: well I think we're talking about *conforming* servers here \there's not much that can be done about incomplete implementations

GothAlice: However, if wsgi.async is going to be in the WSGI2 spec,it'll be required. if the server hasn't gotten around to implementingasync yet, it should be False.

agronholm: I think wsgi.async is useless \ "hasn't gotten around to"?that's not a lot of work, really \

that flag just paves way for half assed implementations

GothAlice: Still, some method to detect the capability should bepresent. Something more than attempting to access wsgi.input'sasync_read attribute and catching the AttributeError exception.

agronholm: the capability should be *required* \ given how easy it isto implement \ I don't see any justification not to require it

GothAlice: We'll have to see how easy it is to add to m.s.http beforeI'll admit it's "easy" in the general sense. ;) If it turns out to besimple (and not-too-badly-performance-impacting) I'll make it required.


agronholm: fair enough

agronholm: robert pointed out the difficulty of executing them in theright order


GothAlice: Indeed; this problem exists with the current middleware system, too.

agronholm: it'd probably be easier to specify them as a list in thedeployment descriptor

GothAlice: (My note about appending to ingress_filters, prepending toegress_filters to simulate middleware behaviour is functional, thoughnon-optimal; the filters, if co-dependant, should be middlewareinstead.)


agronholm: webcore's current middleware system is too much magic imho

GothAlice: I agree. \ A init.d-style ordering system would have to beits own PEP.

agronholm: also, I was thinking if we could filters that needed bothingress/egress capabilities (such as session middleware) in a way thatonly required specifying it once

GothAlice: … wouldn't that be middleware? ;) \ Thus far I've definedingress and egress filters as distinct and separate, withdual-functionality requirements being fulfilled by middleware.

agronholm: we could probably simplify that

GothAlice: "There should be one, and preferably only one, right way todo something." ;)


agronholm: yes, and that is the point of my idea :)

GothAlice: Replacing middleware isn't a small task; the power ofpotentially redirecting application flow (amongst other gems themiddleware structure brings to the table) would be very difficult tomodel cleanly when separated into ingress/egress.

agronholm: btw, I very much agreed with PJE's suggestion of makingfiltering its own middleware instead of a static part of the interface

GothAlice: The problem with not mentioning filtering in the PEP is thatmiddleware authors wont take it into consideration when coding. (That'swhy it's optional for servers to implement and includes an examplemiddleware implementation of the API.)




--- Async

agronholm: +1 for async wsgi using the new concurrent.futures stdlib feature

agronholm: I still don't like the idea of wsgi.executor \ imho thatshould be left up to the application or framework \ not the web server\ and I still disapprove of the wsgi.async flag

GothAlice: The server does, however, need to be able to capture asyncread requests across environ['wsgi.input'].async_read*

GothAlice: What would the semantics be for a worker on asingle-threaded async server to wait for a long-running task? Was mycode example (the simplified try/except block) inadequate?

agronholm: if the app needs to do heavy lifting, it delegates the taskto a thread/process pool, which returns a future, which the app yieldsback \ when the callback is activated, the reactor will resumeexecution of that app \ I think you pretty much got it right in yourrevised example code

GothAlice: Just replace environ['wsgi.executor'] with anapplication-specific one?

agronholm: essentially, yes \ that would greatly simplify theimplementation of the interface

GothAlice: And it is all done via done_callbacks… hmm. For thepurposes of the callbacks, though, exceptions are ignored. :/


agronholm: what is your concern with this specifically?

GothAlice: That my desired syntax (try/except around a value=yieldfuture) won't be able to capture and elevate exceptions back to theWSGI application.

agronholm: oh, that is not a problem since the reactor will call.result() on it anyway and send any exceptions back to the application

GothAlice: Back to the environment issue for a moment: not providing anexecutor in the environment means middleware will not be able toutilize async features without having their own executor in addition tothe application's. How about I explicitly require that servers allowoverriding of the executor used? \ How often would an application wantto utilize multiple executors at once?

agronholm: the middleware could have a constructor argument for passingan executor

GothAlice: That would then require passing an executor to multiplelayers of middleware, creating a number of additional references andlocal variables, vs. configuring a "default executor" at the serverlevel.


agronholm: there are pros and cons with the wsgi.executor approach

GothAlice: There would be no requirement for the application to usewsgi.executor; if an application has a default threaded executor(wsgi.executor), it can use a multi-process one for specific jobs[ignoring the one in the env] without too much worry.


agronholm: essentially wsgi.executor would be a convenience then

GothAlice: Exactly. \ (And mostly a helper to middleware so they don'teach need explicit configuration or management of their own executors.)




--- Optional Components

GothAlice: I think full HTTP/1.1 conformance should be a requirementfor WSGI2 servers, too. (chunked requests, not just chunked responses)\ Because there's really no point in writing a -new- HTTP/1.0 server.;)


agronholm: indeed

GothAlice: One thing I've been grappling [while] rewriting PEP 444 isthat pretty much everything marked 'optional' or 'may' in WSGI 1 / PEP333 no developer actually gets around to implementing. Thus makingHTTP/1.1 support non-optional [in PEP 444].

GothAlice: Something I've noticed with Python HTTP code: none of it iscomplete, and all of the servers that report HTTP/1.1 compliancestraight up lie. Zero I found support chunked response bodies, andzero support chunked requests (which is required by HTTP/1.1). \ (Theservers I looked at universally had embedded comments along the linesof: "Chunked responses are left up to application developers.")

GothAlice: If it's too demanding [or appears too daunting], a "mayimplement" feature becomes a "never will be implemented" feature.


agronholm: I would prefer requiring HTTP/1.1 support from all WSGI2 servers

GothAlice: I mean, if I can do it in 172 Python opcodes, I'm certain itcan't be -that- hard to implement. ;)



_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Re: [Web-SIG] PEP 444 / WSGI 2 Async

Reply via email to