Re: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware.

2010-12-13 Thread Robert Brewer
Alice Bevan–McGregor
 There's one issue I've seen repeated a lot in working with WSGI1 and
 that is the use of middleware to process incoming data, but not
 outgoing, and vice-versa; middleware which filters the output in some
 way, but cares not about the input.
 
 Wrapping middleware around an application is simple and effective, but
 costly in terms of stack allocation overhead; it also makes debugging a
 bit more of a nightmare as the stack trace can be quite deep.
 
 My updated draft PEP 444[1] includes a section describing Filters, both
 ingress (input filtering) and egress (output filtering).  The API is
 trivially simple, optional (as filters can be easily adapted as
 middleware if the host server doesn't support filters) and easy to
 implement in a server.  (The Marrow HTTP/1.1 server implements them as
 two for loops.)
 
 Basically an input filter accepts the environment dictionary and can
 mutate it.  Ingress filters take a single positional argument that is
 the environ.  The return value is ignored.  (This is questionable; it
 may sometimes be good to have ingress filters return responses.  Not
 sure about that, though.)
 
 An egress filter accepts the status, headers, body tuple from the
 applciation and returns a status, headers, and body tuple of its own
 which then replaces the response.  An example implementation is:
 
   for filter_ in ingress_filters:
   filter_(environ)
 
   response = application(environ)
 
   for filter_ in egress_filters:
   response = filter_(*response)

That looks amazingly like the code for CherryPy Filters circa 2005. In version 
2 of CherryPy, Filters were the canonical extension method (for the 
framework, not WSGI, but the same lessons apply). It was still expensive in 
terms of stack allocation overhead, because you had to call () each filter to 
see if it was on. It would be much better to find a way to write something 
like:

for f in ingress_filters:
if f.on:
f(environ)

It was also fiendishly difficult to get executed in the right order: if you had 
a filter that was both ingress and egress, the natural tendency for core 
developers and users alike was to append each to each list, but this is almost 
never the correct order. But even if you solve the issue of static composition, 
there's still a demand for programmatic composition (if X then add Y after 
it), and even decomposition (find the caching filter my framework added 
automatically and turn it off), and list.insert()/remove() isn't stellar at 
that. Calling the filter to ask it whether it is on also leads filter 
developers down the wrong path; you really don't want to have Filter A trying 
to figure out if some other, conflicting Filter B has already run (or will run 
soon) that demands Filter A return without executing anything. You really, 
really want the set of filters to be both statically defined and statically 
analyzable.

Finally, you want the execution of filters to be configurable per URI and also 
configurable per controller. So the above should be rewritten again to 
something like:

for f in ingress_filters(controller):
if f.on(environ['path_info']):
f(environ)

It was for these reasons that CherryPy 3 ditched its version 2 filters and 
replaced them with hooks and tools in version 3. You might find more insight 
by studying the latest cherrypy/_cptools.py


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware.

2010-12-13 Thread Alice Bevan–McGregor
That looks amazingly like the code for CherryPy Filters circa 2005. In 
version 2 of CherryPy, Filters were the canonical extension method 
(for the framework, not WSGI, but the same lessons apply). It was still 
expensive in terms of stack allocation overhead, because you had to 
call () each filter to see if it was on. It would be much better to 
find a way to write something like:




for f in ingress_filters:

if f.on:

f(environ)


.on will need to be an @property in most cases, still not avoiding 
stack allocation and, in fact, doubling the overhead per filter.  
Statically disabled filters should not be added to the filter list.


It was also fiendishly difficult to get executed in the right order: if 
you had a filter that was both ingress and egress, the natural tendency 
for core developers and users alike was to append each to each list, 
but this is almost never the correct order.


If something is both an ingress and egress filter, it should be 
implemented as middleware instead.  Nothing can prevent developers from 
doing bad things if they really try.  Appending to ingress and 
prepending to egress would be the right thing to simulate middleware 
behaviour with filters, but again, don't do that.  ;)


But even if you solve the issue of static composition, there's still a 
demand for programmatic composition (if X then add Y after it), and 
even decomposition (find the caching filter my framework added 
automatically and turn it off), and list.insert()/remove() isn't 
stellar at that.


I have plans (and partial implementation) of a init.d-style 
needs/uses/provides declaration and automatic dependency graphing.  
WebCore, for example, adds the declarations to existing middleware 
layers to sort the middleware.


Calling the filter to ask it whether it is on also leads filter 
developers down the wrong path; you really don't want to have Filter A 
trying to figure out if some other, conflicting Filter B has already 
run (or will run soon) that demands Filter A return without executing 
anything. You really, really want the set of filters to be both 
statically defined and statically analyzable.


Unfortunately, most, if not all filters need to check for request 
headers and response headers to determine the capability to run.  E.g. 
compression checks environ.get('HTTP_ACCEPT_ENCODING', '').lower() for 
'gzip', and checks the response to determine if a 'Content-Encoding' 
header has already been specified.


Finally, you want the execution of filters to be configurable per URI 
and also configurable per controller. So the above should be rewritten 
again to something like:




for f in ingress_filters(controller):

if f.on(environ['path_info']):

f(environ)



It was for these reasons that CherryPy 3 ditched its version 2 
filters and replaced them with hooks and tools in version 3.


This is possible by wrapping multiple applications, say, in the filter 
middleware adapter with differing filter setups, then using the 
separate wrapped applications with some form of dispatch.  You could 
also utilize filters as decorators.  This is an implementation detail 
left up to the framework utilizing WSGI2, however.  WSGI2 itself has no 
concept of controllers.


None of this prevents the simplified stack from being useful during 
exception handling, though.  ;)  What I was really trying to do is 
reduce the level of nesting on each request and make what used to be 
middleware more explicit in its purpose.



You might find more insight by studying the latest cherrypy/_cptools.py


I'll give it a gander, though I firmly believe filter management (as 
middleware stack management) is the domain of a framework on top of 
WSGI2, not the domain of the protocol.


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com