Re: [Web-SIG] WSGI2: write callable?

2014-09-26 Thread Benoit Chesneau
On Fri, Sep 26, 2014 at 9:58 PM, PJ Eby p...@telecommunity.com wrote:

 On Thu, Sep 25, 2014 at 11:32 PM, Robert Collins
 robe...@robertcollins.net wrote:
  So I propose we drop the write callable, and include a queue based
  implementation in the adapter for PEP- code.

 If you're dropping write(), then you might as well drop
 start_response() altogether, and replace it with returning a (status,
 headers, body-iterator) tuple, as in wsgi_lite (
 https://github.com/pjeby/wsgi_lite ) or as found in other languages'
 versions of WSGI.  (start_response+write was only ever needed in order
 to support legacy apps, so other languages never bothered.)

 wsgi_lite has a couple of other protocol extensions, namely the
 'wsgi_lite.closing' environment key, flagging callables' supported
 WSGI version (for transparent interop), and the argument binding
 protocol, but for the most part these are orthogonal to the calling
 schema.  I would suggest, however, that the calling protocol be
 flagged in some way to allow easier interop.


I quite like the idea of always returning an iterator for the body it would
simplify the code a lot...

About returning the status and other thing, I quite agree, but imo we also
need to return an extra parameter where the application or the middleware
could maintain a state or something like it. Thoughts?

- benoit


 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 https://mail.python.org/mailman/options/web-sig/bchesneau%40gmail.com

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI2: write callable?

2014-09-26 Thread Robert Collins
On 27 September 2014 08:21, Benoit Chesneau bchesn...@gmail.com wrote:


 On Fri, Sep 26, 2014 at 5:32 AM, Robert Collins robe...@robertcollins.net
 wrote:
...
 So I propose we drop the write callable, and include a queue based
 implementation in the adapter for PEP- code.

 -Rob


 What would be the advantage of using a queue compared to simply write to the
 server? Internally the server can use queue, but why the client should know
 it? What is the reasoning behind it?

The point is to remove the complexity of having both an iterator over
content *and* a write method.

Thats really complex for server [and middleware] writers. So the
interface to send bytes to the container would just be 'yield them'.
(Or return a fully populated list).

So the point about the Queue is that to support PEP- we either
need to retain the write() callable, or we need an adapter that can
expose on its upper side the iterator we want, and on the lower side
accept *either* an iterator  *or* use of  write() method - I think
you'll find thats quite hard to write without a Queue or similar
construct.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI2: write callable?

2014-09-26 Thread PJ Eby
On Fri, Sep 26, 2014 at 5:02 PM, Robert Collins
robe...@robertcollins.net wrote:
 But perhaps it would be nicer to say:
 iterator of headers_dict_or_body_bytes
 With the first item yielded having to be headers (or error thrown),and
 the last item yielded may be a dict to emit trailers.

 So:
 def app(environ):
 yield {':status': '200'}
 yield b'hello world'
 yield {'Foo': 'Bar'}

 is an entirely valid, if trivial, app.

 What do you think?

I think this would make it harder to write middleware, actually, and
for the same reason that I dislike folding status into the headers.
It's a case of flat is better than nested, I think, in both cases.
That is, if the status is always required, it's easier to validate its
presence in a 3-tuple than nested inside another data structure.  As
far as trailers go, I'm not sure what those are used for or how they'd
be used in practice, but my initial thought is that they should be
attached to the response body, analagous to how FileWrapper works.

The other alternative is to use a dict as the response object
(analagous to environ as the request object), with named keys for
status, headers, trailers, body, etc.  It would then be extensible to
handle things like the Associated content concept.

In this way, middleware that is simply passing things through
unchanged can do so, while middleware that is creating a new response
can discard the old object.


 wsgi_lite has a couple of other protocol extensions, namely the
 'wsgi_lite.closing' environment key, flagging callables' supported
 WSGI version (for transparent interop), and the argument binding
 protocol, but for the most part these are orthogonal to the calling
 schema.  I would suggest, however, that the calling protocol be
 flagged in some way to allow easier interop.

 We're bumping the WSGI version, will that serve as a sufficient flag?

I mean, flagged on the app end.  For example, wsgi_lite marks apps
that support wsgi_lite with a  true-valued `__wsgi_lite__` attribute.
In this way, a container invoking the app knows it can be called with
just an environ (and no start_response).

So, I'm saying that an app callable would opt in to this new WSGI
version, so that servers and middleware don't need to grow new APIs
for registering apps -- they can auto-detect.  Also, having
auto-detection means you can write a decorator (e.g. in wsgiref), to
wrap and convert WSGI 1 apps to WSGI 2, without needing to know if
you're passing something already wrapped.  It means that a WSGI 2
server or middleware can just wrap whatever apps it sees, and get back
a WSGI 2 app, whether the thing it got was WSGI 1 or WSGI 2.


 The closing thing is nice - its basically unittest.TestCase.addCleanup
 for WSGI, allowing apps to not have to write a deep nested finally.
 Lets start a new thread about the design for that specifically. You
 note that exception management isn't defined yet - perhaps we can
 tackle that as a group?

Sure.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI2: write callable?

2014-09-26 Thread Robert Collins
On 27 September 2014 10:31, PJ Eby p...@telecommunity.com wrote:
 On Fri, Sep 26, 2014 at 5:02 PM, Robert Collins
 robe...@robertcollins.net wrote:
 But perhaps it would be nicer to say:
 iterator of headers_dict_or_body_bytes
 With the first item yielded having to be headers (or error thrown),and
 the last item yielded may be a dict to emit trailers.

 So:
 def app(environ):
 yield {':status': '200'}
 yield b'hello world'
 yield {'Foo': 'Bar'}

 is an entirely valid, if trivial, app.

 What do you think?

 I think this would make it harder to write middleware, actually, and
 for the same reason that I dislike folding status into the headers.
 It's a case of flat is better than nested, I think, in both cases.
 That is, if the status is always required, it's easier to validate its
 presence in a 3-tuple than nested inside another data structure.

I'm intrigued here - validation of the status code is tied into into
the details of the headers. For instance, 301/302 need a Location
header to be valid. So I don't understand how its any easier with
status split out. I'd be delighted to whip up a few constrasting
middleware samples to let us compare and contrast.

Note too that folk can still return bad status codes with a different layout
  (status, headers, body, trailers)
return None, {}, [], {}

One thing we could do with the status code in the headers dict is to
default to 200 - the vastly common case (in the same way that throwing
an error generates a 500). Then status wouldn't be required at all for
trivial uses. That would make things easier, no?


 As
 far as trailers go, I'm not sure what those are used for or how they'd
 be used in practice, but my initial thought is that they should be
 attached to the response body, analagous to how FileWrapper works.

So a classic example for Trailers is digitally signing streamed
content. Using the same strawman API as above:

def app(environ):
   yield {':status': '200}
   md5sum = md5.new()
   for bytes in block_reader(open('foo', 'rb'), 65536):
   md5sum.update(bytes)
   yield bytes
   digest = md5sum.hexdigest()
   signature = sign_bytes(digest.encode('utf8'))
   yield {'Content-MD5Sum': digest, 'X-Signature': signature}

Note that this doesn't need to buffer or use a closure.

Writing that with a callback for trailers (which is the only
alternative - its either a callback or a generator - because until the
body is fully handled the content of the trailers cannot be
determined):

def app(environ):
   md5sum = md5.new()
   def body():
   for bytes in block_reader(open('foo', 'rb'), 65536):
   md5sum.update(bytes)
   yield bytes
   def trailers():
   digest = md5sum.hexdigest()
   signature = sign_bytes(digest.encode('utf8'))
   yield {'Content-MD5Sum': digest, 'X-Signature': signature}
   return '200', {}, body, trailers

 The other alternative is to use a dict as the response object
 (analagous to environ as the request object), with named keys for
 status, headers, trailers, body, etc.  It would then be extensible to
 handle things like the Associated content concept.

That might work, though it will force more closures. One of the things
I like about the generator style is the clarity in code that we can
achieve.

 In this way, middleware that is simply passing things through
 unchanged can do so, while middleware that is creating a new response
 can discard the old object.

That seems to apply either way, right?

Here's a body-size logging middleware:

def logger(app):
def middleware(environ):
wrapped = app(environ)
yield next(wrapped)
body_bytes = 0
for maybe_body in wrapped:
if type(maybe_body) is bytes:
body_bytes += len(maybe_body)
yield maybe_body
logging.info(Saw %d bytes for %s % (body_bytes, environ['PATH_INFO']))
return middleware

..
 We're bumping the WSGI version, will that serve as a sufficient flag?

 I mean, flagged on the app end.  For example, wsgi_lite marks apps
 that support wsgi_lite with a  true-valued `__wsgi_lite__` attribute.
 In this way, a container invoking the app knows it can be called with
 just an environ (and no start_response).

Ok, So we'd use the absence of such a mark to trigger the WSGI1
adapter automagically? I'm curious if that will work well enough we
are given wsgi_lite or other extensions to wsgi. Perhaps we should
refuse to guess and just supply the adapters and instructions?

 So, I'm saying that an app callable would opt in to this new WSGI
 version, so that servers and middleware don't need to grow new APIs
 for registering apps -- they can auto-detect.  Also, having
 auto-detection means you can write a decorator (e.g. in wsgiref), to
 wrap and convert WSGI 1 apps to WSGI 2, without needing to know if
 you're passing something already wrapped.  It means that a WSGI 2
 server or middleware can just wrap whatever apps it sees, and get back
 a WSGI 2 app, whether the 

Re: [Web-SIG] WSGI2: write callable?

2014-09-26 Thread PJ Eby
On Fri, Sep 26, 2014 at 7:41 PM, Robert Collins
robe...@robertcollins.net wrote:
 One thing we could do with the status code in the headers dict is to
 default to 200 - the vastly common case (in the same way that throwing
 an error generates a 500). Then status wouldn't be required at all for
 trivial uses. That would make things easier, no?

At the cost of variation.  A core design principle of WSGI is that
variations make things *harder*, not easier, because it means more
alternatives that apps, servers, and middleware have to support, with
more code paths and fewer of them properly tested.  Every variation
that is part of the spec (as opposed to an extension), creates a LOT
of complexity in the field.  (Which is one reason it'll be nice to get
rid of start_response(), and all its convoluted sequencing logics.)


 So a classic example for Trailers is digitally signing streamed
 content. Using the same strawman API as above:

 def app(environ):
yield {':status': '200}
md5sum = md5.new()
for bytes in block_reader(open('foo', 'rb'), 65536):
md5sum.update(bytes)
yield bytes
digest = md5sum.hexdigest()
signature = sign_bytes(digest.encode('utf8'))
yield {'Content-MD5Sum': digest, 'X-Signature': signature}

 Note that this doesn't need to buffer or use a closure.

Please bear in mind that another core WSGI design principle is that we
don't make apps easier to write by making servers and middleware
harder to write.  That kills adoption and growth, because the audience
that *needs* to adopt WSGI (or any successor standard) is the audience
of people who write servers and middleware.  If a feature is sinfully
ugly for the app writer, but a thing of beauty for a middleware
author, we *want* that feature.

Conversely, if a feature means that *every* piece of middleware now
has to add an extra if statement to support the feature in order to
make it pretty for the app writer, then we do NOT want that feature,
and it should be taken out and shot *at once*.

It's not a fair tradeoff, because only server authors and middleware
authors *have to* deal with WSGI directly.  App authors can use
libraries to pretty it up, so we don't need to pretty it for them in
advance -- especially since we don't know what their *personal* idea
of pretty is going to be.  ;-)

The above API is cute and clean for the app writer, but for a
middleware writer it's a barrel of misery.  *Every* piece of
middleware that even wants to *read* anything from the response (let
alone modify it), now needs to check types of yielded values,
accumulate headers, and maybe buffer content.  And there are many ways
to write that middleware that will be wrong, but *appear* right
because the author didn't think of all the ways that an app could
violate the middleware author's assumptions.

On the other hand, if somebody wants to make a library implementing a
similar API to your proposal *on top* of WSGI, then sure, why not?
That's fine: it only adds overhead at a *single point*: the library
that implements the pretty API on top of WSGI.


 Writing that with a callback for trailers (which is the only
 alternative - its either a callback or a generator - because until the
 body is fully handled the content of the trailers cannot be
 determined):

Doesn't look bad to me.  It'd also be fine as a method on the response
body, and that would let us stick to (status, headers, body) as a
return value.


 The other alternative is to use a dict as the response object
 (analagous to environ as the request object), with named keys for
 status, headers, trailers, body, etc.  It would then be extensible to
 handle things like the Associated content concept.

 That might work, though it will force more closures. One of the things
 I like about the generator style is the clarity in code that we can
 achieve.

Please try to think instead of how you could implement those things in
a make it nice API for app authors.  WSGI wasn't made ugly on a
whim; it's the direct result of some very important design principles.
While the need for start_response() is gone, many of the other reasons
for its ugliness remain.

(In any case, you can still implement a generator-based API for
writing WSGI apps, without needing to make WSGI *itself* be
implemented that way.)

 Here's a body-size logging middleware:

 def logger(app):
 def middleware(environ):
 wrapped = app(environ)
 yield next(wrapped)
 body_bytes = 0
 for maybe_body in wrapped:
 if type(maybe_body) is bytes:
 body_bytes += len(maybe_body)
 yield maybe_body
 logging.info(Saw %d bytes for %s % (body_bytes, 
 environ['PATH_INFO']))
 return middleware

Perhaps you meant this as a sketch, but note that you're not calling
close() on the underlying iterator.  At minimum, you need a
try/finally to do that, or else you need to use the wsgi_lite closing
extension -- and you need to assume that your parent middleware