Re: [Web-SIG] Server-side async API implementation sketches

2011-01-10 Thread chris . dent

On Sun, 9 Jan 2011, Alice Bevan–McGregor wrote:


On 2011-01-09 09:03:38 -0800, P.J. Eby said:
Hm.  I'm not sure if I like that.  The typical app developer really 
shouldn't be yielding multiple body strings in the first place.


Wait; what?  So you want the app developer to load a 40MB talkcast MP3 into 
memory before sending it?


My reaction too. I've read this elsewhere on this list too, in other
topics. A general statement that the correct way to make an
efficient WSGI (1) app is to return just one body string.

This runs contrary to everything I've ever understood about making
web apps that appear performant to the user: get the first byte out to
the browser as soon as possible.

This came up in discussions of wanting to have a cascading series of
generators (to save memory and improve responsiveness): store
generates data, serializers generates strings, handler generates
(sends out in chunks) the web page from those strings.

So, this is me saying: I'm in favor of a post-wsgi1 world where apps
are encouraged to be generators. To me they are just as useful in
sync and async contexts.

--
Chris Dent   http://burningchrome.com/
[...]___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-10 Thread James Y Knight

On Jan 10, 2011, at 4:48 AM, chris.d...@gmail.com wrote:

 My reaction too. I've read this elsewhere on this list too, in other
 topics. A general statement that the correct way to make an
 efficient WSGI (1) app is to return just one body string.
 
 This runs contrary to everything I've ever understood about making
 web apps that appear performant to the user: get the first byte out to
 the browser as soon as possible.

Wee. You want to get the earliest byte *which is required to display the 
page* out as soon as possible. The browser usually has to parse a whole lot of 
the response before it starts displaying anything useful.

And in order to do that, you really want to minimize the number of 
round-trip-times, which is heavily dependent upon the number of packets sent 
(not the amount of data!), when the data is small. Using a generator in WSGI 
forces the server to push out partial data as soon as possible, so it could end 
up using many more packets than if you buffered everything and sent it at once, 
and thus, will be slower.

As the buffering and streaming section of WSGI1 already says...:
 Generally speaking, applications will achieve the best throughput by 
 buffering their (modestly-sized) output and sending it all at once. This is a 
 common approach in existing frameworks such as Zope: the output is buffered 
 in a StringIO or similar object, then transmitted all at once, along with the 
 response headers.
 
 [...]
 
 For large files, however, or for specialized uses of HTTP streaming (such as 
 multipart server push), an application may need to provide output in 
 smaller blocks (e.g. to avoid loading a large file into memory). It's also 
 sometimes the case that part of a response may be time-consuming to produce, 
 but it would be useful to send ahead the portion of the response that 
 precedes it.

James
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-10 Thread P.J. Eby

At 04:39 PM 1/9/2011 -0800, Alice Bevan­McGregor wrote:

On 2011-01-09 09:26:19 -0800, P.J. Eby said:

If wsgi.input offers any synchronous methods...


Regardless of whether or not wsgi.input is implemented in an async 
way, wrap it in a future and eventually get around to yielding 
it.  Problem /solved/.


Not the API problem.  If I'm accustomed to writing synchronous code, 
the async version looks ridiculous.  Also, an existing WSGI web 
framework isn't going to be able to be ported to this API without 
putting it in a future.


My hope was for an API that would be a simple enough translation that 
*everybody* could be persuaded to use it, but having to use futures 
just to write a normal application simply isn't going to work for 
the core WSGI API.  As a separate WSGI-A profile, sure, it works fine.



If it offers only asynchronous methods, OTOH, then you can't pass 
wsgi.input to any existing libraries (e.g. the cgi module).


Describe to me how a function can be suspended (other than magical 
greenthreads) if it does not yield; if I knew this, maybe I wouldn't 
be so confused.


I'm not sure what you're confused about.  I'm the one who forgot you 
have to read from wsgi.input in a blocking way to write a normal app.  ;-)


(Mainly, because I was so excited about the potential in your 
sketched API, and I got sucked into the process of implementing/improving it.)



I've deviated from your sketch, obviously, and any semblance of 
yielding a 3-tuple.  Stop thinking of my example code as conforming 
to your ideas; it's a new idea, or, worst case, a narrowing of an 
idea into its simplest form.


What I'm trying to point out is that you've missed two important API 
enhancements in my sketch, that make it so that app and middleware 
authors don't have to explicitly manage any generator methods or even 
future methods.



 The mechanics of yielding futures instances allows you to (in your 
server) implement the necessary async code however you wish while 
providing a uniform interface to both sync and async applications 
running on sync and async servers.  In fact, you would be able to 
safely run a sync application on an async server and 
vice-versa.  You can, on an async server:


:: Add a callback to the yielded future to re-schedule the 
application generator.


:: If using greenthreads, just block on future.result() then 
immediately wake up the application generator.


:: Do other things I can't think of because I'm still waking up.


I am not sure why you're reiterating these things.  The sample code I 
posted shows precisely where you'd *do* them in a sync or async 
server.  That's not where the problem lies.



That is not optimum, because now you have an optional API that 
applications who want to be compatible will need to detect and choose between.


It wasn't supposed to be optional, but it's beside the point since 
the presence of a blocking API means the application can block.


The issue might be addressable by having an environment key like, 
'wsgi.canblock' (indicating whether the application is already in a 
separate thread/process), and a piece of middleware that simply 
spawns its child app to a future if wsgi.canblock isn't set.  Then 
people who write blocking applications could use the decorator.




Mostly, though, it seems to me that the need to be able to write 
blocking code does away with most of the benefit of trying to have 
a single API in the first place.


You have artificially created this need, ignoring the semantics of 
using the server-specific executor to detect async-capable requests 
and the yield mechanics I suggested; which happens to be a single, 
coherent API across sync and async servers and applications.


I haven't ignored them.  I'm simply representing the POV of existing 
WSGI apps and frameworks, which currently block, and are unlikely to 
be rewritten so as not to block.  I thought, briefly, that it was 
possible to make an API with a low-enough conceptual overhead to 
allow that porting to occur, and let my enthusiasm carry me away.


I was wrong, though: even the extremely minimalist version isn't 
going to be usable for ported code, which relegates the async version 
to a niche role.


I would note, though, that this is *still* better than my previous 
position, which was that there was no point making an async API *at all*.  ;-)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-10 Thread P.J. Eby

At 05:06 PM 1/9/2011 -0800, Alice Bevan­McGregor wrote:

On 2011-01-09 09:03:38 -0800, P.J. Eby said:
Hm.  I'm not sure if I like that.  The typical app developer really 
shouldn't be yielding multiple body strings in the first place.


Wait; what?  So you want the app developer to load a 40MB talkcast 
MP3 into memory before sending it?


Statistically speaking, the typical app is producing a web page, 
made of HTML and severely limited in size by the short attention span 
of the human user reading it.  ;-)


Obviously, the spec should allow and support streaming.


  You want to completely eliminate the ability to stream an HTML 
page to the client in chunks (e.g. head block, headers + search 
box, search results, advertisements, footer -- the exact thing 
Google does with every search result)?  That sounds like 
artificially restricting application developers, to me.


First, I don't want to eliminate it.   Second, Google is hardly the 
typical app developer.  If you need the capability, it'll still be there.





In your approach, the above samples have to be rewritten as:
 return app(environ)
[snip]


My code does not use return.  At all.  Only yield.


If you return the calling of a generator, then you pass the original 
generator through to the caller, and it is the equivalent of writing 
a loop in place that iterates over the subgenerator, only without the 
additional complexity of needing to send/throw.



The above middleware pattern works with the sketches I gaveon the 
PEAK wiki, and I've now updated the wiki to include an exampleapp 
and middleware for clarity.


I'll need to re-read the code on your wiki; I find it incredibly 
difficult to grok, however, you can help me out a bit by answering a 
few questions about it: How does middleware trap exceptions raised 
by the application.


With try/except around the yield app(environ) call (main app run), 
or with try/except around the yield body_iter call (body iterator run).



 (Specifically how does the server pass the buck with 
exceptions?  And how does the exception get to the application to 
bubble out towards the server, through middleware, as it does now?)


All that is in the Coroutine class, which is a generator-based green 
thread implementation.


Remember how you were saying that your sketch would benefit from PEP 380?

The Coroutine class is a pure-Python implementation of PEP 380, minus 
the syntactic sugar.  It turns yield into yield from whenever the 
value you yield is itself a geniter.


So, if you pretend that yield app(environ) and yield body_iter 
are actually yield froms instead, then the mechanics should become clearer.


Coroutine runs a generator by sending or throwing into it.  It then 
takes the result (either a value or an exception) and decides where 
to send that.  If it's an object with send/throw methods, it pushes 
it on the stack, and passes None into it to start it running, thereby 
calling the subgenerator.  If it's an exception or a return value 
(e.g. StopIteration(value=None)), it pops the stack and propagates 
the exception or return value to calling generator.


If it's a future or some other object the server cares about, then 
the server can pause the coroutine (by returning 'routine.PAUSE' when 
the coroutine asks it what to do).


Coroutine accepts a trampoline function and a completion callback as 
parameters: the trampoline function inspects a value yielded by a 
generator and then tells the coroutine whether it should PAUSE, CALL, 
RETURN, RESUME, or RAISE in response to that particular 
yield.  RESUME is used for synchronous replies, where the yield 
returns immediately.  RETURN means pop the current generator off the 
stack and return a value to the calling generator.  RAISE raises an 
error immediately in the top-of-stack generator.  CALL pushes a 
geniter on the stack.


IOW, the Coroutine class lets you write servers with just a little 
glue code to tell it how you want the control to flow.  It's actually 
entirely independent of WSGI or any particular WSGI protocol...  I'm 
thinking that I should probably wrap it up into a PyPI package with 
some docs and tests, though I'm not sure when I'd get around to it.


(Heck, it's the sort of thing that probably ought to be in the stdlib 
-- certainly PEP 380 can be implemented in terms of it.)


Anyway, both the sync and async server examples have trampolines that 
detect futures and process them accordingly.  If you yield to a 
future, you get back its result -- either a value or an exception at 
the point where you yielded it.  You don't have to explicitly call 
.result() (in fact, you *can't*); it's already been called before 
control gets back to the place that yielded it.


IOW, in my sketch, yielding to a future looks like this:

data = yield submit(wsgi_input.read, 4096)

without the '.result()' on the end.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: 

Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice Bevan–McGregor

On 2011-01-08 20:06:19 -0800, Alex Grönholm said:

I liked the idea of having a separate async_read() method in 
wsgi.input, which would set the underlying socket in nonblocking mode 
and return a future. The event loop would watch the socket and read 
data into a buffer and trigger the callback when the given amount of 
data has been read. Conversely, .read() would set the socket in 
blocking mode. What kinds of problems would this cause?


Manipulating the underlying socket is potentially dangerous 
(pipelining) and, in fact, not possible AFIK while being 
PEP444-compliant.  When the request body is fully consumed, additional 
attempts to read _must_ return empty strings.  Thus raw sockets are 
right out at a high level; internal to the reactor this may be 
possible, however.  It'd be interesting to adapt marrow.io to using 
futures in this way as an experiment.


OTOH, if you utilize callbacks extensively (as m.s.http does) you run 
into the problem of data passing.  Your application is called (wrapped 
in middleware), sets up some futures and callbacks, then returns.  No 
returned data.  Middleware just got shot in the foot.  The server, 
also, got shot in the foot.  How can it get a resopnse tuple back from 
a callback?  How can middleware be utilized?  That's a weird problem to 
wrap my head around.  Blocking the application pending the results of 
various socket operations is something that would have to be mandated 
to avoid this issue.  :/


Multiple in-flight reads would also be problematic; you may end up with 
buffer interleaving issues.  (e.g. job A reads 128 bytes at a time and 
has been requested to return 4KB, job B does the same... what happens 
to the data?)  Then you begin to involve locking...


Notice that my write_body method [1], writes using async, passing the 
iterable to the callback which is itself.  This is after-the-fact 
(after the request has been returned) and is A-OK, though would need to 
be updated heavily to support the ideas of async floating around right 
now.  I'm also extremely careful to never have multiple async callbacks 
pending (and thus never have muliple jobs for a single connection 
working at once).


- Alice.

[1] 
https://github.com/pulp/marrow.server.http/blob/draft/marrow/server/http/protocol.py#L313-332 




___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice Bevan–McGregor

On 2011-01-08 19:34:41 -0800, P.J. Eby said:


At 04:40 AM 1/9/2011 +0200, Alex Grönholm wrote:

09.01.2011 04:15, Alice Bevan­McGregor kirjoitti:
I hope that clearly identifies my idea on the subject. Since 
asyncservers will /already/ be implementing their own executors, I 
don'tsee this as too crazy.
-1 on this. Those executors are meant for executing code in a 
threadpool. Mandating a magical socket operation filter here 
wouldconsiderably complicate server implementation.


Actually, the *reverse* is true.  If you do it the way Alice proposes, 
my sketches don't get any more complex, because the filtering goes in 
the executor facade or submit function.


Indeed; the executor is what then adds the file descriptor to the 
underlying server async reactor (select/epoll/kqueue/other).  In the 
case of the Marrow server, this would utilize a reactor callback (some 
might say deferred) to update the Future instance with the data, 
setting completion status, executing callbacks, etc.  One might even be 
able to use a threading.Event (or whatever is the opposite of a lock) 
to wake up blocking .result() calls, even if not multi-threaded 
(greenthreads, etc.).


Of course, adding the file descriptor to a pure async reactor then 
.result() blocking on it from your application would result in a 
deadlock; the .result() would never complete as the reactor would never 
get a chance to perform the pending request.  (This is why Marrow 
requires threading be enabled globally before adding an executor to the 
environment; this requires rather explicit documentation.)  This 
problem is solved completely by yielding the future instance (pausing 
the application) to let the reactor do its thing.  (Yielding the future 
becomes a replacement for the blocking behaviour of future.result().)


Effectively what I propose adds emulation of threading on top of async 
by mutating an Executor.  (The Executor would be a mixed 
threading+async executor.)


I suggest bubbling a future back up the yield stack instead of the 
actual result to allow the application (or middleware, or whatever 
happened to yield the future) to capture exceptions generated by the 
future'd request.  Bubbling the future instance avoids excessive 
exception handling cruft in each middleware layer; and I see no real 
issue with this.  AFIK, you can use a shorthand (possibly wrapped in a 
try: block) if all you care about is the result:


   data = (yield my_future).result()

Truthfully, I don't really see the point of exposing the map() method 
(which is the only other executor method we'd expose), so it probably 
makes more sense to just offer a 'wsgi.submit' key... which can be a 
function as follows: [snip]


True; the executor itself could easily be hidden behind the filter.  In 
a multi-threaded environment, however, the map call poses no problem, 
and can be quite useful.  (E.g. with one of my use cases for inclusion 
of an executor in the environment: image scaling.)


Granted, this might be a rather long function.  However, since it's 
essentially an optimization, a given server can decide how many 
functions can be shortcut in this way.  The spec may wish to offer a 
guarantee or recommendation for specific methods of certain 
stdlib-provided types (sockets in particular) and wsgi.input.


+1

Personally, I do think it might be *better* to offer extended 
operations on wsgi.input that could be used via yield, e.g. yield 
input.nb_read().  But of course then the trampoline code has 
torecognize those values instead of futures.


Because wsgi.input is provided by the server, and the executor is 
provided by the server, is there a reason why these extended functions 
couldn't return... futures?  :)


Note, too, that this complexity also only affects servers that want to 
offer a truly async API.  A synchronous server has no reason to pay 
particular attention to what's in a future, since it can't offer any 
performance improvement.


I feel a sync server and async server should provide the same API for 
accessing the input.  E.g. the application/middleware must be agnostic 
to the server in this regard.  This is why a little bit of magic goes a 
long way.  The following code would work on any WSGI2 stack that offers 
an executor (sync, async, or provided by middleware):


   data = (yield env['wsgi.submit'](env['wsgi.input'].read, 4096)).result()

In a sync server, the blocking read would execute in another thread.  
In an async one appropriate actions would be taken to request a socket 
read from the client.  Both cases pause the application pending the 
result.  (If you don't immediately yield the future the behaviour 
between servers is the same!)


I do think that this sort of API discussion, though, is the most 
dangerous part of trying to do an async spec.  That is, I don'texpect 
that everyone will spontaneously agree on the exact same API.  Alice's 
proposal (simply submitting object methods) has theadvantage of 
severely limiting the scope 

Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice Bevan–McGregor

On 2011-01-08 13:16:52 -0800, P.J. Eby said:

In the limit case, it appears that any WSGI 1 server could provide an 
(emulated) async WSGI2 implementation, simply by wrapping WSGI2 apps 
with a finished version of the decorator in my sketch.


Or, since users could do it themselves, this would mean that WSGI2 
deployment wouldn't be dependent on all server implementers immediately 
turning out their own WSGI2 implementations.


This, if you'll pardon my language, is bloody awesome.  :D  That would 
strongly drive adoption of WSGI2.  Note that adapting a WSGI1 
application to WSGI2 server would likewise be very handy, and I 
suspect, even easier to implement.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread exarkun

On 11:36 am, al...@gothcandy.com wrote:

On 2011-01-08 19:34:41 -0800, P.J. Eby said:

At 04:40 AM 1/9/2011 +0200, Alex Gr�nholm wrote:

09.01.2011 04:15, Alice Bevan�McGregor kirjoitti:
I hope that clearly identifies my idea on the subject. Since 
asyncservers will /already/ be implementing their own executors, I 
don'tsee this as too crazy.
-1 on this. Those executors are meant for executing code in a 
threadpool. Mandating a magical socket operation filter here 
wouldconsiderably complicate server implementation.


Actually, the *reverse* is true.  If you do it the way Alice proposes, 
my sketches don't get any more complex, because the filtering goes in 
the executor facade or submit function.


Indeed; the executor is what then adds the file descriptor to the 
underlying server async reactor (select/epoll/kqueue/other).  In the 
case of the Marrow server, this would utilize a reactor callback (some 
might say deferred) to


Don't say it if it's not true.  Deferreds aren't tied to a reactor, and 
Marrow doesn't appear to have anything called deferred.  So this 
parallel to Twisted's Deferred is misleading and confusing.


Since each async server will either implement or utilize a specific 
async framework, each will offer its own async-supported featureset. 
What I mean is that all servers should make wsgi.input calls async- 
able, some would go further to make all socket calls async.  Some might 
go even further than that and define an API for external libraries 
(e.g. DBs) to be truly cooperatively async.


I think this effort would benefit from more thought on how exactly 
accessing this external library support will work.  If async wsgi is 
limited to performing a single read asynchronously, then it hardly seems 
compelling.


Jean-Paul
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice Bevan–McGregor
On 2011-01-09 07:04:49 -0800, 
exar...@twistedmatrix.com said:
I think this effort would benefit from more thought on how exactly 
accessing this external library support will work.  If async wsgi is 
limited to performing a single read asynchronously, then it hardly 
seems compelling.


Apologies if the last e-mail was too harsh; I'm about to go to bed, and 
it' been a long night/morning.  ;)


Here's a proposed solution: a generator API on top of futures.

If the async server implementing the executor can detect a generator 
being submitted, then:


:: The executor accepts the generator and begins iteration (passing the 
executor and the arguments supplied to submit).


:: The generator is expected to be /fast/.

:: The generator does work until it needs an operation over a file 
descriptor, at which point it yields the fd and the operation (say, 
'r', or 'w').


:: The executor schedules with the async reactor the generator to be 
re-called when the operation is possible.


:: The Future is considered complete when the generator raises 
GeneratorExit and the first argument is used as the return value of the 
Future.


Yielding a 2-tuple of readers/writers would work, too, and allow for 
more concurrent utilization of sockets, though I'm not sure of the use 
cases for this.  If so, the generator would be woken up when any of the 
readers or writers are available and sent() a 2-tuple of 
available_readers, available_writers.


The executor is passed along for any operations the generator can not 
accomplish safely without threads, and the executor, as it's running 
through the generator, will accomplish the same semantics as iterating 
the WSGI application: if a future instance is yielded, the generator is 
suspended until the future is complete, allowing heavy processing to be 
mixed with async calls in a fully async server.


The wsgi.input operations can be implemented this way, as can database 
operations and pretty much anything that uses sockets, pipes, or 
on-disk files.  In fact, the WSGI application -itself- could be called 
in this way (with the omission of the executor or a simple wrapper that 
saves the executor into the environ).


Just a quick thought before running off to bed.

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread P.J. Eby

At 06:06 AM 1/9/2011 +0200, Alex Grönholm wrote:
A new feature here is that the application itself yields a (status, 
headers) tuple and then chunks of the body (or futures).


Hm.  I'm not sure if I like that.  The typical app developer really 
shouldn't be yielding multiple body strings in the first place.  I 
much prefer that the canonical example of a WSGI app just return a 
list with a single bytestring -- preferably in a single statement for 
the entire return operation, whether it's a yield or a return.


IOW, I want it to look like the normal way to do thing is to just 
return the whole request at once, and use the additional difficulty 
of creating a second iterator to discourage people writing iterated 
bodies when they should just write everything to a BytesIO and be done with it.


Also, it makes middleware simpler: the last line can just yield the 
result of calling the app, or a modified version, i.e.:


yield app(environ)

or:

s, h, b = app(environ)
# ... modify or replace s, h, b
yield s, h, b

In your approach, the above samples have to be rewritten as:

return app(environ)

or:

result = app(environ)
s, h = yield result
# ... modify or replace s, h
yield s, h

for data in result:
 # modify b as we go
 yield result

Only that last bit doesn't actually work, because you have to be able 
to send future results back *into* the result.  Try actually making 
some code that runs on this protocol and yields to futures during the 
body iteration.


Really, this modified protocol can't work with a full async API the 
way my coroutine-based version does, AND the middleware is much more 
complicated.  In my version, your do-nothing middleware looks like this:



class NullMiddleware(object):
def __init__(self, app):
self.app = app

def __call__(environ):
# ACTION: pre-application environ mangling

s, h, body = yield self.app(environ)

# modify or replace s, h, body here

yield s, h, body


If you want to actually process the body in some way, it looks like:

class NullMiddleware(object):

def __init__(self, app):
self.app = app

def __call__(environ):
# ACTION: pre-application environ mangling

s, h, body = yield self.app(environ)

# modify or replace s, h, body here

yield s, h, self.process(body)

def process(self, body_iter):
while True:
chunk = yield body_iter
if chunk is None:
break
# process/modify chunk here
yield chunk

And that's still a lot simpler than your sketch.

Personally, I would write both of the above as:

def null_middleware(app):

def wrapped(environ):
# ACTION: pre-application environ mangling
s, h, body = yield app(environ)

# modify or replace s, h, body here
yield s, h, process(body)

def process(body_iter):
while True:
chunk = yield body_iter
if chunk is None:
break
# process/modify chunk here
yield chunk

return wrapped

But that's just personal taste.  Even as a class, it's much easier to 
write.  The above middleware pattern works with the sketches I gave 
on the PEAK wiki, and I've now updated the wiki to include an example 
app and middleware for clarity.


Really, the only hole in this approach is dealing with applications 
that block.  The elephant in the room here is that while it's easy to 
write these example applications so they don't block, in practice 
people read files and do database queries and whatnot in their 
requests, and those APIs are generally synchronous.  So, unless they 
somehow fold their entire application into a future, it doesn't work.



I liked the idea of having a separate async_read() method in 
wsgi.input, which would set the underlying socket in nonblocking 
mode and return a future. The event loop would watch the socket and 
read data into a buffer and trigger the callback when the given 
amount of data has been read. Conversely, .read() would set the 
socket in blocking mode. What kinds of problems would this cause?


That you could never *call* the .read() method outside of a future, 
or else you would block the server, thereby obliterating the point of 
having the async API in the first place.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread P.J. Eby

At 04:25 AM 1/9/2011 -0800, Alice Bevan­McGregor wrote:

On 2011-01-08 13:16:52 -0800, P.J. Eby said:

In the limit case, it appears that any WSGI 1 server could provide 
an (emulated) async WSGI2 implementation, simply by wrapping WSGI2 
apps with a finished version of the decorator in my sketch.
Or, since users could do it themselves, this would mean that WSGI2 
deployment wouldn't be dependent on all server implementers 
immediately turning out their own WSGI2 implementations.


This, if you'll pardon my language, is bloody awesome.  :D  That 
would strongly drive adoption of WSGI2.  Note that adapting a WSGI1 
application to WSGI2 server would likewise be very handy, and I 
suspect, even easier to implement.


I very much doubt that.  You'd need greenlets or a thread with a 
communication channel in order to support WSGI 1 apps that use write() calls.


By the way, I don't really see the point of the new sketches you're 
doing, as they aren't nearly as general as the one I've already done, 
but still have the same fundamental limitation: wsgi.input.


If wsgi.input offers any synchronous methods, then they must be used 
from a future and must somehow raise an error when called from within 
the application -- otherwise it would block, nullifying the point of 
having a generator-based API.


If it offers only asynchronous methods, OTOH, then you can't pass 
wsgi.input to any existing libraries (e.g. the cgi module).


The latter problem is the worse one, because it means that the 
translation of an app between my original WSGI2 API and the current 
sketch is no longer just replace 'return' with 'yield'.


The only way this would work is if WSGI applications are still 
allowed to be written in a blocking style.  Greenlet-based frameworks 
would have no problem with this, of course, but servers like Twisted 
would still have to run WSGI apps in a worker thread pool, just 
because they *might* block.


If we're okay with this as a limitation, then adding _async method 
variants that return futures might work, and we can proceed from there.


Mostly, though, it seems to me that the need to be able to write 
blocking code does away with most of the benefit of trying to have a 
single API in the first place.  Either everyone ends up putting their 
whole app into a future, or else the server has to accept that the 
app could block... and put it into a future for them.  ;-)


So, the former case will be unacceptable to app developers who don't 
feel a need for async code, and the latter doesn't seem to offer 
anything to the developers of non-blocking servers.


(The exception to these conditions, of course, are greenlet-based 
servers, but they can run WSGI *1* apps in a non-blocking way, and so 
have no need for a new protocol.)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alex Grönholm

09.01.2011 19:03, P.J. Eby kirjoitti:

At 06:06 AM 1/9/2011 +0200, Alex Grönholm wrote:
A new feature here is that the application itself yields a (status, 
headers) tuple and then chunks of the body (or futures).


Hm.  I'm not sure if I like that.  The typical app developer really 
shouldn't be yielding multiple body strings in the first place.  I 
much prefer that the canonical example of a WSGI app just return a 
list with a single bytestring -- preferably in a single statement for 
the entire return operation, whether it's a yield or a return.

Uh, so don't yield multiple body strings then? How is that so difficult?



IOW, I want it to look like the normal way to do thing is to just 
return the whole request at once, and use the additional difficulty of 
creating a second iterator to discourage people writing iterated 
bodies when they should just write everything to a BytesIO and be done 
with it.
I fail to understand why a second iterator is necessary when we can get 
away with just one.



Also, it makes middleware simpler: the last line can just yield the 
result of calling the app, or a modified version, i.e.:


yield app(environ)

or:

s, h, b = app(environ)
# ... modify or replace s, h, b
yield s, h, b
Asynchronous applications may not be ready to send the status line as 
the first thing coming out of the generator. Consider an app that 
receives a file. The first thing coming out of the app is a future. The 
app needs to receive the entire file until it can determine what status 
line to send. Maybe there was an I/O error writing the file, so it needs 
to send a 500 response instead of 200. This is not possible with a body 
iterator, and if we are already iterating the application generator, I 
really don't understand why the body needs to be an iterator as well.



In your approach, the above samples have to be rewritten as:

return app(environ)

or:

result = app(environ)
s, h = yield result
# ... modify or replace s, h
yield s, h

for data in result:
 # modify b as we go
 yield result

Only that last bit doesn't actually work, because you have to be able 
to send future results back *into* the result.  Try actually making 
some code that runs on this protocol and yields to futures during the 
body iteration.

Did you miss the gist posted by myself (and improved by Alice)?


Really, this modified protocol can't work with a full async API the 
way my coroutine-based version does, AND the middleware is much more 
complicated.  In my version, your do-nothing middleware looks like this:



class NullMiddleware(object):
def __init__(self, app):
self.app = app

def __call__(environ):
# ACTION: pre-application environ mangling

s, h, body = yield self.app(environ)

# modify or replace s, h, body here

yield s, h, body


If you want to actually process the body in some way, it looks like:

class NullMiddleware(object):

def __init__(self, app):
self.app = app

def __call__(environ):
# ACTION: pre-application environ mangling

s, h, body = yield self.app(environ)

# modify or replace s, h, body here

yield s, h, self.process(body)

def process(self, body_iter):
while True:
chunk = yield body_iter
if chunk is None:
break
# process/modify chunk here
yield chunk

And that's still a lot simpler than your sketch.

Personally, I would write both of the above as:

def null_middleware(app):

def wrapped(environ):
# ACTION: pre-application environ mangling
s, h, body = yield app(environ)

# modify or replace s, h, body here
yield s, h, process(body)

def process(body_iter):
while True:
chunk = yield body_iter
if chunk is None:
break
# process/modify chunk here
yield chunk

return wrapped

But that's just personal taste.  Even as a class, it's much easier to 
write.  The above middleware pattern works with the sketches I gave on 
the PEAK wiki, and I've now updated the wiki to include an example app 
and middleware for clarity.


Really, the only hole in this approach is dealing with applications 
that block.  The elephant in the room here is that while it's easy to 
write these example applications so they don't block, in practice 
people read files and do database queries and whatnot in their 
requests, and those APIs are generally synchronous.  So, unless they 
somehow fold their entire application into a future, it doesn't work.



I liked the idea of having a separate async_read() method in 
wsgi.input, which would set the underlying socket in nonblocking mode 
and return a future. The event loop would watch the socket and read 
data into a buffer and trigger the callback when the given amount of 
data has 

Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alex Grönholm

09.01.2011 22:56, P.J. Eby kirjoitti:

At 08:09 PM 1/9/2011 +0200, Alex Grönholm wrote:
Asynchronous applications may not be ready to send the status line as 
the first thing coming out of the generator.


So?  In the sketches that are the subject of this thread, it doesn't 
have to be the first thing.  If the application yields a future first, 
it will be paused...  and so will the middleware.  When this line is 
executed in the middleware:


status, headers, body = yield app(environ)

...the middleware is paused until the application actually yields its 
response tuple.


Specifically, this yield causes the app iterator to be pushed on the 
Coroutine object's .stack attribute, then iterated.  If the 
application yields a future, the server suspends the whole thing until 
it gets called back, at which point it .send()s the result back into 
the app iterator.


The app iterator then yields its response, which is tagged as a return 
value, so the app is popped off the .stack, and the response is sent 
via .send() into the middleware, which then proceeds as if nothing 
happened in the meantime.  It then yields *its* response, and whatever 
body iterator is given gets put into a second coroutine that proceeds 
similarly.


When the process_response() part of the middleware does a yield 
body_iter, the body iterator is pushed, and the middleware is paused 
until the body iterator yields a chunk.  If the body yields a future, 
the whole process is suspended and resumed.  The middleware won't be 
resumed until the body yields another chunk, at which point it is 
resumed.  If it yields a chunk of its own, then that's passed up to 
any response-processing middleware further up the stack.


In contrast, middleware based on the 2+body protocol cannot process a 
body without embedding coroutine management into the middleware 
itself.   For example, you can't write a standalone body processor 
function, and reuse it inside of two pieces of middleware, without 
doing a bunch of send()/throw() logic to make it work.
Some boilerplate code was necessary in WSGI 1 middleware too. Alice's 
cleaned up example didn't look too bad, and it would not require that 
Coroutine stack at all.


I think that at this point both sides need to present some code that 
really works, and those implementations could then be compared. The 
examples so far have been a bit too abstract to be fairly evaluated.



Outside of the application/middleware you mean? I hope there isn't 
any more confusion left about what a future is. The fact is that you 
cannot use synchronous API calls directly from an async app no matter 
what. Some workaround is always necessary.


Which pretty much kills the whole idea as being a single, universal 
WSGI protocol, since most people don't care about async.
I'm confused. Did you not know this? If so, why then were you at least 
initially receptive to the idea?
Personally I don't think that this is a big problem. Async apps will 
always have to take care not to block the reactor unreasonably long, and 
that is never going to change. Synchronous apps just need to follow the 
protocol, but beyond that they shouldn't have to care about the async 
side of things.





___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice Bevan–McGregor

On 2011-01-09 09:26:19 -0800, P.J. Eby said:

By the way, I don't really see the point of the new sketches you're doing...


I'm sorry.

...as they aren't nearly as general as the one I've already done, but 
still have the same fundamental limitation: wsgi.input.


You missed the point entirely, then.


If wsgi.input offers any synchronous methods...


Regardless of whether or not wsgi.input is implemented in an async way, 
wrap it in a future and eventually get around to yielding it.  Problem 
/solved/.  Identical APIs for both sync and async, and if you have an 
async server but haven't gotten around to implementing your own 
executor yet, wrapping the blocking read call in a future also solves 
the problem (albeit not in the most efficient way).


I.e. wrap every call to a wsgi.input method by passing it to wsgi.submit.

...then they must be used from a future and must some how raise an 
error when called from within the application -- otherwise it would 
block, nullifying the point ofhaving a generator-based API.


See above.  No extra errors, nothing really that insane.

If it offers only asynchronous methods, OTOH, then you can't pass 
wsgi.input to any existing libraries (e.g. the cgi module).


Describe to me how a function can be suspended (other than magical 
greenthreads) if it does not yield; if I knew this, maybe I wouldn't be 
so confused.


The latter problem is the worse one, because it means that the 
translation of an app between my original WSGI2 API and the current 
sketch is no longer just replace 'return' with 'yield'.


I've deviated from your sketch, obviously, and any semblance of 
yielding a 3-tuple.  Stop thinking of my example code as conforming to 
your ideas; it's a new idea, or, worst case, a narrowing of an idea 
into its simplest form.


The only way this would work is if WSGI applications are still allowed 
to be written in a blocking style.  Greenlet-based frameworks would 
have no problem with this, of course, but servers like Twisted would 
still have to run WSGI apps in a worker thread pool, just because they 
*might* block.


Then that is not acceptable and would not work.  The mechanics of 
yielding futures instances allows you to (in your server) implement the 
necessary async code however you wish while providing a uniform 
interface to both sync and async applications running on sync and async 
servers.  In fact, you would be able to safely run a sync application 
on an async server and vice-versa.  You can, on an async server:


:: Add a callback to the yielded future to re-schedule the application 
generator.


:: If using greenthreads, just block on future.result() then 
immediately wake up the application generator.


:: Do other things I can't think of because I'm still waking up.

The first solution is how Marrow HTTPd would operate.

If we're okay with this as a limitation, then adding _async method 
variants that return futures might work, and we can proceed from there.


That is not optimum, because now you have an optional API that 
applications who want to be compatible will need to detect and choose 
between.


Mostly, though, it seems to me that the need to be able to write 
blocking code does away with most of the benefit of trying to have a 
single API in the first place.


You have artificially created this need, ignoring the semantics of 
using the server-specific executor to detect async-capable requests and 
the yield mechanics I suggested; which happens to be a single, coherent 
API across sync and async servers and applications.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice Bevan–McGregor

On 2011-01-09 09:03:38 -0800, P.J. Eby said:
Hm.  I'm not sure if I like that.  The typical app developer really 
shouldn't be yielding multiple body strings in the first place.


Wait; what?  So you want the app developer to load a 40MB talkcast MP3 
into memory before sending it?  You want to completely eliminate the 
ability to stream an HTML page to the client in chunks (e.g. head 
block, headers + search box, search results, advertisements, footer -- 
the exact thing Google does with every search result)?  That sounds 
like artificially restricting application developers, to me.


I much prefer that the canonical example of a WSGI app just return a 
list with a single bytestring...


Why is it wrapped in a list, then?

IOW, I want it to look like the normal way to do thing is to just 
return the whole request at once, and use the additional difficulty of 
creating a second iterator to discourage people writing iterated bodies 
when they should just write everything to a BytesIO and be done with it.


It sounds to me like your should doesn't cover an extremely large 
range of common use cases.



In your approach, the above samples have to be rewritten as:

 return app(environ)

[snip]


My code does not use return.  At all.  Only yield.

Try actually making some code that runs on this protocol and yields to 
futures during the body iteration.


Sure.  I'll also implement my actual proposal of not having a separate 
body iterable.


The above middleware pattern works with the sketches I gaveon the PEAK 
wiki, and I've now updated the wiki to include an exampleapp and 
middleware for clarity.


I'll need to re-read the code on your wiki; I find it incredibly 
difficult to grok, however, you can help me out a bit by answering a 
few questions about it: How does middleware trap exceptions raised by 
the application.  (Specifically how does the server pass the buck with 
exceptions?  And how does the exception get to the application to 
bubble out towards the server, through middleware, as it does now?)



Really, the only hole in this approach is dealing with applications that block.


That's what the executor in the environ is for.  If you have image 
scaling or something else that will block you submit it.  All 
networking calls?  You submit them.


The elephant in the room here is that while it's easy towrite these 
example applications so they don't block, in practicepeople read files 
and do database queries and what not in their requests, and those APIs 
are generally synchronous.  So, unless they somehow fold their entire 
application into a future, it doesn't work.


Actually, that's how multithreading support in marrow.server[.http] was 
implemented.  Overhead?  40-60 RSecs.  The option is provided for those 
who can do nothing about their application blocking, while still 
maintaining the internally async nature of the server.


That you could never *call* the .read() method outside of a future,or 
else you would block the server, thereby obliterating the point 
ofhaving the async API in the first place.


See above re: your confusion over the calling semantics of wsgi.input 
in regards to my (and Alex's) proposal.  Specifically:


   data = (yield submit(wsgi_input.read, 4096)).result()

This would work on sync and async servers, and with sync and async 
applications, with no difference in the code.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice Bevan–McGregor

On 2011-01-09 17:06:28 -0800, Alice Bevan-McGregor said:

On 2011-01-09 09:03:38 -0800, P.J. Eby said:

The elephant in the room here is that while it's easy towrite these
example applications so they don't block, in practicepeople read files
and do database queries and what not in their requests, and those APIs
are generally synchronous.  So, unless they somehow fold their entire
application into a future, it doesn't work.


Actually, that's how multithreading support in marrow.server[.http] was
implemented.  Overhead?  40-60 RSecs.


Clarification here, that's less than 2% of total RSecs.

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-08 Thread Alice Bevan–McGregor

On 2011-01-08 17:22:44 -0800, Alex Grönholm said:


On 2011-01-08 13:16:52 -0800, P.J. Eby said:

I've written the sketches dealing only with PEP 3148 futures, but 
sockets were also proposed, and IMO there should be simple support for 
obtaining data from wsgi.input.


I'm a bit unclear as to how this will work with async. How do you 
propose that an asynchronous application receives the request body?


In my example https://gist.github.com/770743 (which has been simplified 
greatly by P.J. Eby in the Future- and Generator-Based Async Idea 
thread) for dealing with wsgi.input, I have:


   future = environ['wsgi.executor'].submit(environ['wsgi.input'].read, 4096)
   yield future

While ugly, if you were doing this, you'd likely:

submit = environ['wsgi.executor'].submit
input_ = environ['wsgi.input']

   future = yield submit(input_.read, 4096)
   data = future.

That's a bit nicer to read, and simplifies things if you need to make a 
number of async calls.


The idea here is that:

:: Your async server subclasses ThreadPoolExecutor.

:: The subclass overloads the submit method.

:: Your submit method detects bound methods on wsgi.input, sockets, and files.

:: If one of the above is detected, create a mock future that defines 
'fd' and 'operation' attributes or similar.


:: When yielding the mock future, your async reactor can detect 'fd' 
and do the appropriate thing for your async framework.  (Generally 
adding the fd to the appropriate select/epoll/kqueue readers/writers 
lists.)


:: When the condition is met, set_running_or_notify_cancel (when 
internally reading or writing data), set_result, saving the value, and 
return the future (filled with its data) back up to the application.


:: The application accepts the future instance as the return value of 
yield, and calls result across it to get the data.  (Obviously writes, 
if allowed, won't have data, but reads will.)


I hope that clearly identifies my idea on the subject.  Since async 
servers will /already/ be implementing their own executors, I don't see 
this as too crazy.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-08 Thread P.J. Eby

At 04:40 AM 1/9/2011 +0200, Alex Grönholm wrote:

09.01.2011 04:15, Alice Bevan­McGregor kirjoitti:
I hope that clearly identifies my idea on the subject. Since async 
servers will /already/ be implementing their own executors, I don't 
see this as too crazy.
-1 on this. Those executors are meant for executing code in a thread 
pool. Mandating a magical socket operation filter here would 
considerably complicate server implementation.


Actually, the *reverse* is true.  If you do it the way Alice 
proposes, my sketches don't get any more complex, because the 
filtering goes in the executor facade or submit function.


Truthfully, I don't really see the point of exposing the map() method 
(which is the only other executor method we'd expose), so it probably 
makes more sense to just offer a 'wsgi.submit' key...  which can be a 
function as follows:


  def submit(callable, *args, **kw):
  ob = getattr(callable, '__self__', None)
  if isinstance(ob, ServerProvidedSocket):  # could be an ABC
   future = MockFuture()
   if callable==ob.read:
   # set up read callback to fire future
   elif callable==ob.write:
   # set up write callback to fire future
   return future
  else:
  return real_executor.submit(callable, *args, **kw)

Granted, this might be a rather long function.  However, since it's 
essentially an optimization, a given server can decide how many 
functions can be shortcut in this way.  The spec may wish to offer a 
guarantee or recommendation for specific methods of certain 
stdlib-provided types (sockets in particular) and wsgi.input.


Personally, I do think it might be *better* to offer extended 
operations on wsgi.input that could be used via yield, e.g. yield 
input.nb_read().  But of course then the trampoline code has to 
recognize those values instead of futures.  Either way works, but 
somewhere there is going to be some type-testing (explicit or 
implicit) taking place to determine how to suspend and resume the app.


Note, too, that this complexity also only affects servers that want 
to offer a truly async API.  A synchronous server has no reason to 
pay particular attention to what's in a future, since it can't offer 
any performance improvement.


I do think that this sort of API discussion, though, is the most 
dangerous part of trying to do an async spec.  That is, I don't 
expect that everyone will spontaneously agree on the exact same 
API.  Alice's proposal (simply submitting object methods) has the 
advantage of severely limiting the scope of API discussions.  ;-)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-08 Thread P.J. Eby

At 06:15 PM 1/8/2011 -0800, Alice Bevan­McGregor wrote:

On 2011-01-08 17:22:44 -0800, Alex Grönholm said:

On 2011-01-08 13:16:52 -0800, P.J. Eby said:
I've written the sketches dealing only with PEP 3148 futures, but 
sockets were also proposed, and IMO there should be simple support 
for obtaining data from wsgi.input.
I'm a bit unclear as to how this will work with async. How do you 
propose that an asynchronous application receives the request body?


In my example https://gist.github.com/770743 (which has been 
simplified greatly by P.J. Eby in the Future- and Generator-Based 
Async Idea thread) for dealing with wsgi.input, I have:


   future = environ['wsgi.executor'].submit(environ['wsgi.input'].read, 4096)
   yield future

While ugly, if you were doing this, you'd likely:

submit = environ['wsgi.executor'].submit
input_ = environ['wsgi.input']

   future = yield submit(input_.read, 4096)
   data = future.


I don't quite understand the above -- in my sketch, the above would be:

data = yield submit(input._read, 4096)

It looks like your original sketch wants to call .result() on the 
future, whereas in my version, the return value of yielding a future 
is the result (or an error is thrown if the result was an error).


Is there some reason I'm missing, for why you'd want to explicitly 
fetch the result in a separate step?


Meanwhile, thinking about Alex's question, ISTM that if WSGI 2 is 
asynchronous, then the wsgi.input object should probably just have 
read(), readline() etc. methods that simply return (possibly-mock) 
futures.  That's *much* better than having to do all that submit() 
crud just to read data from wsgi.input().


OTOH, if you want to use the cgi module to parse a form POST from the 
input, you're going to need to write an async version of it in that 
case, or else feed the entire operation to an executor...  but then 
the methods would need to be synchronous...  *argh*.


I'm starting to not like this idea at all.  Alex has actually 
pinpointed a very weak spot in the scheme, which is that if 
wsgi.input is synchronous, you destroy the asynchrony, but if it's 
asynchronous, you can't use it with any normal code that operates on a stream.


I don't see any immediate fixes for this problem, so I'll let it 
marinate in the back of my mind for a while.  This might be the 
achilles heel for the whole idea of a low-rent async WSGI.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-08 Thread Alex Grönholm

09.01.2011 05:45, P.J. Eby kirjoitti:

At 06:15 PM 1/8/2011 -0800, Alice Bevan­McGregor wrote:

On 2011-01-08 17:22:44 -0800, Alex Grönholm said:

On 2011-01-08 13:16:52 -0800, P.J. Eby said:
I've written the sketches dealing only with PEP 3148 futures, but 
sockets were also proposed, and IMO there should be simple support 
for obtaining data from wsgi.input.
I'm a bit unclear as to how this will work with async. How do you 
propose that an asynchronous application receives the request body?


In my example https://gist.github.com/770743 (which has been 
simplified greatly by P.J. Eby in the Future- and Generator-Based 
Async Idea thread) for dealing with wsgi.input, I have:


   future = 
environ['wsgi.executor'].submit(environ['wsgi.input'].read, 4096)

   yield future

While ugly, if you were doing this, you'd likely:

submit = environ['wsgi.executor'].submit
input_ = environ['wsgi.input']

   future = yield submit(input_.read, 4096)
   data = future.


I don't quite understand the above -- in my sketch, the above would be:

data = yield submit(input._read, 4096)

It looks like your original sketch wants to call .result() on the 
future, whereas in my version, the return value of yielding a future 
is the result (or an error is thrown if the result was an error).
I cooked up a simple do-nothing middleware example which Alice decorated 
with some comments:

https://gist.github.com/771398

A new feature here is that the application itself yields a (status, 
headers) tuple and then chunks of the body (or futures).



Is there some reason I'm missing, for why you'd want to explicitly 
fetch the result in a separate step?


Meanwhile, thinking about Alex's question, ISTM that if WSGI 2 is 
asynchronous, then the wsgi.input object should probably just have 
read(), readline() etc. methods that simply return (possibly-mock) 
futures.  That's *much* better than having to do all that submit() 
crud just to read data from wsgi.input().


OTOH, if you want to use the cgi module to parse a form POST from the 
input, you're going to need to write an async version of it in that 
case, or else feed the entire operation to an executor...  but then 
the methods would need to be synchronous...  *argh*.


I'm starting to not like this idea at all.  Alex has actually 
pinpointed a very weak spot in the scheme, which is that if wsgi.input 
is synchronous, you destroy the asynchrony, but if it's asynchronous, 
you can't use it with any normal code that operates on a stream.
I liked the idea of having a separate async_read() method in wsgi.input, 
which would set the underlying socket in nonblocking mode and return a 
future. The event loop would watch the socket and read data into a 
buffer and trigger the callback when the given amount of data has been 
read. Conversely, .read() would set the socket in blocking mode. What 
kinds of problems would this cause?



I don't see any immediate fixes for this problem, so I'll let it 
marinate in the back of my mind for a while.  This might be the 
achilles heel for the whole idea of a low-rent async WSGI.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/alex.gronholm%40nextday.fi


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com