Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Cory Benfield

> On 5 Jan 2016, at 00:12, Graham Dumpleton  wrote:
> 
> 
>> On 4 Jan 2016, at 11:27 PM, Cory Benfield > > wrote:
>> 
>> All,
>> 
>> **TL;DR: What do you believe WSGI 2.0 should and should not do? Should we do 
>> it at all?**
>> 
>> It’s a new year, and that means it’s time for another attempt to get WSGI 
>> 2.0 off the ground. Many of you may remember that we attempted to do this 
>> last year with Rob Collins leading the charge, but unfortunately personal 
>> commitments made it impossible for Rob to keep pushing that attempt forward.
> 
> Although you call this round 2, it isn’t really. Robert’s effort was not the 
> first time someone has pushed a WSGI 2.0 variant. So this is more like being 
> about round 5 or 6.
> 
> In part because of those repeated attempts by people to propose something and 
> label it as WSGI 2.0, I am very cool on reusing the WSGI 2.0 moniker. You 
> will find little or no mention of ‘WSGI 2.0’ as a label in:
> 
> https://github.com/python-web-sig/wsgi-ng 
> 
> 
> That is probably somewhat due to my grumbling about the use of ‘WSGI 2.0’ 
> back then.
> 
> Time has moved on and so the bad feelings and memories associated with the 
> ‘WSGI 2.0’ label due to early failed efforts have faded, but I would still 
> suggest avoiding the label ‘WSGI 2.0’ if at all possible.

Thanks for that feedback. Consider WSGI 2.0 a catch-all name for the purposes 
of this specific discussion (the “what do we want WSGI to be going forward” 
one). As you’ve suggested here, it’s entirely possible that the result of this 
discussion will be several PEPs/APIs, or none at all, and it’s entirely 
possible that none of them would be called WSGI 2.0.

> My general feeling is that if any proposed changes to the existing WSGI (PEP 
> ) specification cannot be technically implemented on all existing WSGI 
> server/adapter implementations that any new specification should not still be 
> called WSGI.
> 
> In other words, even if many of these implementations may not be used much 
> any more, it must be able to work, without needing to mark things as 
> optional, on CGI, FASTCGI, SCGI, mod_wsgi, gunicorn, uWSGI, Waitress, etc etc.
> 
> This is purely to avoid the confusion whereby implementations cannot or 
> choose not to implement any new specification. The last thing any WSGI server 
> author wants is having to deal with a constant stream of questions and bug 
> reports about not supporting an updated specification where technically it 
> was never going to be possible. We have some obligation not to inflict this 
> on what are, in nearly all cases, volunteers in the Open Source world who 
> work on these things in their spare time and who are not doing it as part of 
> their paid employment.

Can I clarify this requirement a bit? Are you wanting to say that any future 
version of WSGI must be entirely compatible with PEP : that is, may not 
introduce optional features or change existing behaviour, only clarify? Please 
don’t mistake this for me challenging the idea: I’m wanting to get a good 
understanding of what you’re suggesting with this, not agreeing or disagreeing 
at this stage.

> For example, mod_wsgi already supports HTTP/2 by virtue of the fact that the 
> mod_h2 module in Apache exists. The existing internal APIs of Apache and how 
> mod_wsgi uses those means that HTTP/2 bridges into the WSGI world with no 
> code changes to mod_wsgi.

Agreed. If all we want is to keep the request/response cycle intact, then WSGI 
supports H2 already. One possibility that has already been suggested here would 
be to define a HTTP/2 extension to WSGI, advertised in the environ dict, that 
allows the application to signal pushes to the server. This would be a fairly 
simple extension to write and implement.

> They are therefore two different APIs and so why WebSocket should be dealt 
> with in a separate specification and not carry the WSGI label at all. A 
> specific WSGI server could still support the new WebSocket API, but purely 
> because it decides to support both in the same process. Not because the 
> WebSocket API makes use of the WSGI specification.

That’s reasonable: I’d be happy to have websocket support either be a WSGI 
extension or, as you suggest here, a wholly new API. One difficulty with 
creating a new API from whole cloth is encouraging server authors to support 
it, but it’s certainly possible to do. I’d like to hear back from the uWSGI, 
gunicorn, and Twisted folks in addition to yourself about whether they’d be 
interested in implementing such a non-WSGI API.

>> - Graceful incremental adoption path - no upgrade-all-components requirement 
>> baked into the design.
> 
> It is hard to see what you expectations are here.
> 
> Prior attempts to force ASYNC into WSGI, and in some respects WebSockets 
> through forcing raw fd access have not been practical. WSGI 

Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Graham Dumpleton

> On 6 Jan 2016, at 9:27 AM, chris.d...@gmail.com wrote:
> 
> On Wed, 6 Jan 2016, Graham Dumpleton wrote:
> 
>> 
>>> On 6 Jan 2016, at 12:09 AM, chris.d...@gmail.com wrote:
>>> 
>>> As someone who writes their WSGI applications as functions that take
>>> `start_response` and `environ` and doesn't bother with much
>>> framework the things I would like to see in a minor revision to WSGI
>>> are:
>>> 
>>> * A consistent way to access the raw un-decoded request URI. This is
>>> so I can reconstruct a realistic `PATH_INFO` that has not been
>>> subjected to destructive handling by the server (e.g. apache
>>> messing with `%2F`) before continuing on to a route dispatcher.
>> 
>> This is already available in some servers by way of the REQUEST_URI value.
> 
> Yes, and in others (as mentioned by Benoit) as RAW_URI. One
> ("consistent") way would be better.
> 
> [Lots of good information about the challenges associated with using
> that information to do anything useful, deleted.]
> 
> What I've done in one app is this:
> https://github.com/tiddlyweb/tiddlyweb/blob/cc6b67d2855ea4d8d908f1a3e58db0dce7e8d138/tiddlyweb/web/serve.py#L119
> 
> Despite the fact that that is not strictly correct, it does mostly work
> for the situation described in the comment and the context of that
> app. One of the things I want from a light rev of WSGI is not to have
> to jump through those hoops.
> 
> It may be that's not feasible but I reckon we're at the wishing
> stage of the discussion.

Yeah, that code would have problems.

One other thing just remembered is that technically it is allowed that the path 
part of the request line can actually be a URI.

GET http://hostname/a/b/c HTTP/1.0

This would yield:

REQUEST_URI: 'http://hostname/a/b/c' 
SCRIPT_NAME: ‘'
PATH_INFO: '/a/b/c’

Obviously I didn’t even mention the % encoding issues as part of SCRIPT_NAME 
part as you are obviously aware of those being an issue in PATH_INFO at least.

Lots of fun.

Graham___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Graham Dumpleton

> On 6 Jan 2016, at 12:09 AM, chris.d...@gmail.com wrote:
> 
> As someone who writes their WSGI applications as functions that take
> `start_response` and `environ` and doesn't bother with much
> framework the things I would like to see in a minor revision to WSGI
> are:
> 
> * A consistent way to access the raw un-decoded request URI. This is
>  so I can reconstruct a realistic `PATH_INFO` that has not been
>  subjected to destructive handling by the server (e.g. apache
>  messing with `%2F`) before continuing on to a route dispatcher.

This is already available in some servers by way of the REQUEST_URI value.

This is the original first line of any HTTP request and can be split apart to 
get the path.

The problem is that you cannot easily use it unless you want to replicate 
normalisations that the underlying server may do.

The key problem is working out where SCRIPT_NAME ends and PATH_INFO starts with 
the original path given in REQUEST_URI.

Sure if you only deal with a web application mounted at the root of the host it 
is easier because SCRIPT_NAME would be empty, but when mounted at a sub URL it 
gets trickier.

This is because a web server will eliminate things like repeating slashes in 
the part of the path that may match the mount point (sub url) for the web 
application. The sub url here could be dictated by what is defined in a 
configuration file, or could instead be due to matching against a file system 
path.

Further, the web server will eliminate attempts at relative directory traversal 
using ‘..’ and ‘.’.

So an original path may be something like:

/a/b//c/../d/.//e/../f/g/h

If the mount point was ‘/a/b/d’, then that is what gets passed through 
SCRIPT_NAME.

Now if you instead go to the raw path you would need to replicate all the 
normalisations. Only then could you maybe based on length of SCRIPT_NAME, 
number of component parts, or actual components in the path, try and calculate 
where SCRIPT_NAME ended and PATH_INFO started in the raw path.

This will still all fail if a web server does internal rewrites though, as the 
final SCRIPT_NAME may not even match the raw path, although at that point URL 
reconstruction can be a problem as well if what the application is given by way 
of the rewrite isn’t a public path.

I have only looked at SCRIPT_NAME. Even in PATH_INFO servers will apply same 
sort of normalisations.

So even this isn’t so simple to do properly if you want to go back and do it 
yourself using the raw path.

I have never seen anyone trying to extract repeating slashes intact out of a 
raw path even attempt to do it properly. They tend to assume that the raw path 
is pure and doesn’t have stuff in it which needs to be normalised and that 
rewrites aren’t occurring. As a result they assume that they can just strip 
number of characters off raw path based on length of SCRIPT_NAME passed 
through. This will be fragile though if the raw path isn’t pure.

Graham___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread chris . dent

On Wed, 6 Jan 2016, Graham Dumpleton wrote:




On 6 Jan 2016, at 12:09 AM, chris.d...@gmail.com wrote:

As someone who writes their WSGI applications as functions that take
`start_response` and `environ` and doesn't bother with much
framework the things I would like to see in a minor revision to WSGI
are:

* A consistent way to access the raw un-decoded request URI. This is
 so I can reconstruct a realistic `PATH_INFO` that has not been
 subjected to destructive handling by the server (e.g. apache
 messing with `%2F`) before continuing on to a route dispatcher.


This is already available in some servers by way of the REQUEST_URI value.


Yes, and in others (as mentioned by Benoit) as RAW_URI. One
("consistent") way would be better.

[Lots of good information about the challenges associated with using
that information to do anything useful, deleted.]

What I've done in one app is this:
https://github.com/tiddlyweb/tiddlyweb/blob/cc6b67d2855ea4d8d908f1a3e58db0dce7e8d138/tiddlyweb/web/serve.py#L119

Despite the fact that that is not strictly correct, it does mostly work
for the situation described in the comment and the context of that
app. One of the things I want from a light rev of WSGI is not to have
to jump through those hoops.

It may be that's not feasible but I reckon we're at the wishing
stage of the discussion.

--
Chris Dent   http://burningchrome.com/
[...]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Graham Dumpleton

> On 6 Jan 2016, at 9:19 AM, Graham Dumpleton  
> wrote:
> 
>> On 6 Jan 2016, at 12:09 AM, chris.d...@gmail.com 
>>  wrote:
>> 
>> As someone who writes their WSGI applications as functions that take
>> `start_response` and `environ` and doesn't bother with much
>> framework the things I would like to see in a minor revision to WSGI
>> are:
>> 
>> * A consistent way to access the raw un-decoded request URI. This is
>>  so I can reconstruct a realistic `PATH_INFO` that has not been
>>  subjected to destructive handling by the server (e.g. apache
>>  messing with `%2F`) before continuing on to a route dispatcher.
> 
> This is already available in some servers by way of the REQUEST_URI value.
> 
> This is the original first line of any HTTP request and can be split apart to 
> get the path.

Whoops. My foggy memory. REQUEST_URI is only raw path part, not the whole 
request line with method, protocol and path.

Graham___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Graham Dumpleton

> On 5 Jan 2016, at 8:40 PM, Cory Benfield  wrote:
> 
> 
>> On 5 Jan 2016, at 00:12, Graham Dumpleton > > wrote:
>> 
>> 
>>> On 4 Jan 2016, at 11:27 PM, Cory Benfield >> > wrote:
>>> 
>>> All,
>>> 
>>> **TL;DR: What do you believe WSGI 2.0 should and should not do? Should we 
>>> do it at all?**
>>> 
>>> It’s a new year, and that means it’s time for another attempt to get WSGI 
>>> 2.0 off the ground. Many of you may remember that we attempted to do this 
>>> last year with Rob Collins leading the charge, but unfortunately personal 
>>> commitments made it impossible for Rob to keep pushing that attempt forward.
>> 
>> Although you call this round 2, it isn’t really. Robert’s effort was not the 
>> first time someone has pushed a WSGI 2.0 variant. So this is more like being 
>> about round 5 or 6.
>> 
>> In part because of those repeated attempts by people to propose something 
>> and label it as WSGI 2.0, I am very cool on reusing the WSGI 2.0 moniker. 
>> You will find little or no mention of ‘WSGI 2.0’ as a label in:
>> 
>> https://github.com/python-web-sig/wsgi-ng 
>> 
>> 
>> That is probably somewhat due to my grumbling about the use of ‘WSGI 2.0’ 
>> back then.
>> 
>> Time has moved on and so the bad feelings and memories associated with the 
>> ‘WSGI 2.0’ label due to early failed efforts have faded, but I would still 
>> suggest avoiding the label ‘WSGI 2.0’ if at all possible.
> 
> Thanks for that feedback. Consider WSGI 2.0 a catch-all name for the purposes 
> of this specific discussion (the “what do we want WSGI to be going forward” 
> one). As you’ve suggested here, it’s entirely possible that the result of 
> this discussion will be several PEPs/APIs, or none at all, and it’s entirely 
> possible that none of them would be called WSGI 2.0.
> 
>> My general feeling is that if any proposed changes to the existing WSGI (PEP 
>> ) specification cannot be technically implemented on all existing WSGI 
>> server/adapter implementations that any new specification should not still 
>> be called WSGI.
>> 
>> In other words, even if many of these implementations may not be used much 
>> any more, it must be able to work, without needing to mark things as 
>> optional, on CGI, FASTCGI, SCGI, mod_wsgi, gunicorn, uWSGI, Waitress, etc 
>> etc.
>> 
>> This is purely to avoid the confusion whereby implementations cannot or 
>> choose not to implement any new specification. The last thing any WSGI 
>> server author wants is having to deal with a constant stream of questions 
>> and bug reports about not supporting an updated specification where 
>> technically it was never going to be possible. We have some obligation not 
>> to inflict this on what are, in nearly all cases, volunteers in the Open 
>> Source world who work on these things in their spare time and who are not 
>> doing it as part of their paid employment.
> 
> Can I clarify this requirement a bit? Are you wanting to say that any future 
> version of WSGI must be entirely compatible with PEP : that is, may not 
> introduce optional features or change existing behaviour, only clarify? 
> Please don’t mistake this for me challenging the idea: I’m wanting to get a 
> good understanding of what you’re suggesting with this, not agreeing or 
> disagreeing at this stage.

I am saying that any update to the WSGI specification should still be able to 
be implemented using any of the existing technologies that can already 
implement WSGI.

I would see it as just causing problems to bring out an updated WSGI 
specification which couldn’t be implemented on top of CGI, FASTCGI, SCGI or 
even mod_wsgi.

Further, it does really still need to be compatible with the existing 
specifications/applications. Changes I am talking about are clarifications or 
suggesting better ways of doing stuff like wsgi.file_wrapper to avoid known 
problems or to eliminate the use of assumptions about how something works.

If a framework or application is made dependent on some new aspect of the WSGI 
specification which has no fallback because the specification was changed to 
not really be compatible with prior versions in some way then it is me as the 
author of a WSGI server who would have to endure the constant questions of why 
that framework or application doesn’t now work on mod_wsgi if the changes 
couldn’t be supported.

People will not care what version of WSGI the framework or application adhered 
to. Their attitude will be that it supports WSGI and since mod_wsgi says it 
supports WSGI it must work, but since it doesn’t mod_wsgi must be broken. They 
will ignore version vagaries. 

So I am being selfish in not wanting to have to put up with more users 
complaining about stuff. :-)

As for optional stuff, if they are truly extensions and can work through 
stuffing things in the WSGI 

Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Armin Ronacher

Hi,

I just want to reply to this because I think many people seem to be 
missing why things are done in a certain way.  Especially if the appear 
to be odd.


On 05/01/2016 12:26, Cory Benfield wrote:

1. WSGI is prone to header injection vulnerabilities issues by
designdue to the conversion of HTTP headers to CGI-style environment
variables: if the server doesn’t specifically prevent it, X-Foo and
X_Foo both become HTTP_X_Foo. I don’t believe it’s a good choice to
destructively encode headers, expect applications to undo the damage
somehow, and introduce security vulnerabilities in the process. If
mimicking CGI is still considered a must-have — 1% of current Python web
programmers may have heard about it, most of them from PEP  — then
that burden should be pushed onto the server, not the application.
Headers always will have to be encoded destructively if you want any 
form of generic processing.  We need header joining, we need to 
normalize the keys already at least to the extend of the HTTP 
specification.  I'm happy to not perform the conversion of dashes to 
underscores but you will work in environments where this conversion was 
already done so the spec will need to deal with that case anyways.


The WSGI spec currently also does not sufficiently explain how to join 
headers.  In particular the cookie header was written without header 
joining in mind which is why it needs to be joined differently than all 
other headers.  Header joining also comes up as a big topic in HTTP 2

so the spec will need to cover this.


2. More generally, I fail to see how mixing HTTP headers,
server-related inputs, and environment variables in a dict adds
values. It prevents iterating on each collection separately. It only
makes sense if not offering more features than CGI is a design goal;
in that case, this discussion doesn’t serve a purpose anyway. It
would be nicer and possibly more secure if the application received
separately:
I think this is largely a nice to have, not something that has any 
overall benefits.  I rather just clean up the actual stupid things such 
as CONTENT_TYPE and CONTENT_LENGTH which cause a lot more real world 
friction than just the names of keys in general.  This really should not 
turn into meaningless bikeshedding about what information should be 
called.  Also consider how much code out there already assumes CGI/WSGI 
variables so any move off that really should have good reasons or we all 
will just waste enormous amounts just to transpose between the two 
representations.



a. Configuration information, which servers could read from
environment variables by default for backwards compatibility, but could
also get through more secure channels and restrict to what the
application needs in order to better isolate it from the entire OS.
What WSGI traditionally lacked was a setup phase where data could be 
passed to the application that was server specific but not request 
bound.  For instance there is no reason an application cannot get hold 
of wsgi.errors before a request comes in.  I would like to see this 
fixed in a new specification.



3. Stop pretending that HTTP is a unicode protocol, or at least stop
ignoring reality when doing so. WSGI enforces ISO-8859-1-decoded str
objects in the environ, which is just wrong. It’s all the more a
surprising choice since this change was driven by Python 3, that UTF-8
is the correct choice, and that Python 3 defaults to UTF-8. Django has
to re-encode and re-decode before doing anything with HTTP headers:
I agree with this but you will have to have that fight with others.  I 
said many times before that values should never have been unicode values 
in the first place but certain decisions in the Python 3 standard 
library at the time prevented that.  In particular until 3.2 or so it 
was impossible to parse byte URLs.



5. Improve request / response length handling and connection closure.
Armin and Graham have talked about in the past and know the topic
better than I do. There’s also a rejected PEP by Armin which made
sense to me.
I think last time I discussed that with Graham it was not clear what the 
solution is in the context of WSGI.  The idea that there is a 
content-length is laughable in the context of a real application where 
the server is performing conversions on the input and output stream.  We 
would need many more than just one content length and an automatically 
terminated input stream.


However at that point you will quickly realize that you can't have it 
both ways and you either have a WSGI like protocol, or raw access to 
sockets but certainly not both.  This topic has caused a lot of 
bikeshedding in the past and I fail to see how it will be differently 
this time.


My current thinking is that the most realistic approach to most of those 
problems will be the concept of framing on both the input and output 
side.  That's somewhat compatible with both chunked transports well as 
websockets.  But if we do go down 

Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Cory Benfield
Forwarding this message from the django-developers list.

Hi Cory,

I’m not subscribed to web-sig but I read the discussion there. Feel free to 
forward my answer to the group if you think it’s useful.

I have roughly the same convictions as Graham Dumpleton. If you want to support 
HTTP/2 and WebSockets, don’t start with design decisions anchored in CGI. 
Figure out what a simple and flexible API for these new protocols would be, 
specify it, implement it, and make sure it degrades gracefully to HTTP/1. You 
may be able to channel most of the communication through a single generator, 
but it’s unclear to me that this will be the most convenient design.

If you want to improve WSGI, here’s a list of mistakes or shortcomings in PEP 
 that you can take a stab at. There’s a general theme: for a specification 
that looks at the future, I believe that making modern PaaS-based deployments 
secure by default matters more than not implementing anything beyond what’s 
available in legacy CGI-based deployments.

1. WSGI is prone to header injection vulnerabilities issues by design due to 
the conversion of HTTP headers to CGI-style environment variables: if the 
server doesn’t specifically prevent it, X-Foo and X_Foo both become HTTP_X_Foo. 
I don’t believe it’s a good choice to destructively encode headers, expect 
applications to undo the damage somehow, and introduce security vulnerabilities 
in the process. If mimicking CGI is still considered a must-have — 1% of 
current Python web programmers may have heard about it, most of them from PEP 
 — then that burden should be pushed onto the server, not the application.

2. More generally, I fail to see how mixing HTTP headers, server-related 
inputs, and environment variables in a dict adds values. It prevents iterating 
on each collection separately. It only makes sense if not offering more 
features than CGI is a design goal; in that case, this discussion doesn’t serve 
a purpose anyway. It would be nicer and possibly more secure if the application 
received separately:

a. Configuration information, which servers could read from environment 
variables by default for backwards compatibility, but could also get through 
more secure channels and restrict to what the application needs in order to 
better isolate it from the entire OS.
b. Server APIs mandated by the spec, per request.
c. HTTP headers, per request.

3. Stop pretending that HTTP is a unicode protocol, or at least stop ignoring 
reality when doing so. WSGI enforces ISO-8859-1-decoded str objects in the 
environ, which is just wrong. It’s all the more a surprising choice since this 
change was driven by Python 3, that UTF-8 is the correct choice, and that 
Python 3 defaults to UTF-8. Django has to re-encode and re-decode before doing 
anything with HTTP headers: 
https://github.com/django/django/blob/d5b90c8e120687863c1d41cf92a4cdb11413ad7f/django/core/handlers/wsgi.py#L231-L253

4. Normalize the way to tell the application about the original protocol, IP 
address and port. When dev and ops responsibilities are separate, this is 
clearly an ops responsibility, but due to the lack of standardization devs end 
up dealing with this problem in custom middleware, when they do it at all. 
Everyone keeps getting it wrong, which introduces security vulnerabilities. 
Also it always breaks silently on infrastructure changes.

5. Improve request / response length handling and connection closure. Armin and 
Graham have talked about in the past and know the topic better than I do. 
There’s also a rejected PEP by Armin which made sense to me.

As you can see from these comments, I don’t quite share the design choices that 
led to WSGI as it currently stands. I think it will be easier to build a new 
standard than evolve the current one.

I hope this helps!

Aymeric


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Graham Dumpleton

> On 5 Jan 2016, at 10:57 PM, Graham Dumpleton  
> wrote:
> 
> 
>> On 5 Jan 2016, at 10:26 PM, Cory Benfield > > wrote:
>> 
>> Forwarding this message from the django-developers list.
>> 
>> Hi Cory,
>> 
>> I’m not subscribed to web-sig but I read the discussion there. Feel free to 
>> forward my answer to the group if you think it’s useful.
>> 
>> I have roughly the same convictions as Graham Dumpleton. If you want to 
>> support HTTP/2 and WebSockets, don’t start with design decisions anchored in 
>> CGI. Figure out what a simple and flexible API for these new protocols would 
>> be, specify it, implement it, and make sure it degrades gracefully to 
>> HTTP/1. You may be able to channel most of the communication through a 
>> single generator, but it’s unclear to me that this will be the most 
>> convenient design.
>> 
>> If you want to improve WSGI, here’s a list of mistakes or shortcomings in 
>> PEP  that you can take a stab at. There’s a general theme: for a 
>> specification that looks at the future, I believe that making modern 
>> PaaS-based deployments secure by default matters more than not implementing 
>> anything beyond what’s available in legacy CGI-based deployments.
>> 
>> 1. WSGI is prone to header injection vulnerabilities issues by design due to 
>> the conversion of HTTP headers to CGI-style environment variables: if the 
>> server doesn’t specifically prevent it, X-Foo and X_Foo both become 
>> HTTP_X_Foo. I don’t believe it’s a good choice to destructively encode 
>> headers, expect applications to undo the damage somehow, and introduce 
>> security vulnerabilities in the process. If mimicking CGI is still 
>> considered a must-have — 1% of current Python web programmers may have heard 
>> about it, most of them from PEP  — then that burden should be pushed 
>> onto the server, not the application.
> 
> FWIW, Apache 2.4 will discard headers which would use underscore, as well as 
> many other characters. Basically it probably only accepts alphanumeric and 
> ‘-‘ in original name.
> 
> In mod_wsgi, it does the same thing, even for Apache 2.2 where it wasn’t done.
> 
> So with mod_wsgi at least you are safe. Or at least if not still using some 
> ancient mod_wsgi version. (Death to LTS Linux versions and out of date 
> packages) :-)
> 
> The nginx server if used as a front end and where it is populating CGI like 
> variables for passing to a builtin module such as uWSGI will also I believe 
> discard headers which don’t match that requirement as well.
> 
> I can’t remember if gunicorn was updated to do something similar, or whether 
> when uWSGI isn’t used behind nginx via its uwsgi protocol, but instead 
> listens publicly via HTTP whether it does it either. 


I should clarify a point here. Apache 2.4 will discard the headers at the point 
of converting them to a CGI like environment when a handler asks for a CGI like 
set of variables. Raw headers will always be passed through as they were.

Graham___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Graham Dumpleton

> On 5 Jan 2016, at 10:26 PM, Cory Benfield  wrote:
> 
> Forwarding this message from the django-developers list.
> 
> Hi Cory,
> 
> I’m not subscribed to web-sig but I read the discussion there. Feel free to 
> forward my answer to the group if you think it’s useful.
> 
> I have roughly the same convictions as Graham Dumpleton. If you want to 
> support HTTP/2 and WebSockets, don’t start with design decisions anchored in 
> CGI. Figure out what a simple and flexible API for these new protocols would 
> be, specify it, implement it, and make sure it degrades gracefully to HTTP/1. 
> You may be able to channel most of the communication through a single 
> generator, but it’s unclear to me that this will be the most convenient 
> design.
> 
> If you want to improve WSGI, here’s a list of mistakes or shortcomings in PEP 
>  that you can take a stab at. There’s a general theme: for a 
> specification that looks at the future, I believe that making modern 
> PaaS-based deployments secure by default matters more than not implementing 
> anything beyond what’s available in legacy CGI-based deployments.
> 
> 1. WSGI is prone to header injection vulnerabilities issues by design due to 
> the conversion of HTTP headers to CGI-style environment variables: if the 
> server doesn’t specifically prevent it, X-Foo and X_Foo both become 
> HTTP_X_Foo. I don’t believe it’s a good choice to destructively encode 
> headers, expect applications to undo the damage somehow, and introduce 
> security vulnerabilities in the process. If mimicking CGI is still considered 
> a must-have — 1% of current Python web programmers may have heard about it, 
> most of them from PEP  — then that burden should be pushed onto the 
> server, not the application.

FWIW, Apache 2.4 will discard headers which would use underscore, as well as 
many other characters. Basically it probably only accepts alphanumeric and ‘-‘ 
in original name.

In mod_wsgi, it does the same thing, even for Apache 2.2 where it wasn’t done.

So with mod_wsgi at least you are safe. Or at least if not still using some 
ancient mod_wsgi version. (Death to LTS Linux versions and out of date 
packages) :-)

The nginx server if used as a front end and where it is populating CGI like 
variables for passing to a builtin module such as uWSGI will also I believe 
discard headers which don’t match that requirement as well.

I can’t remember if gunicorn was updated to do something similar, or whether 
when uWSGI isn’t used behind nginx via its uwsgi protocol, but instead listens 
publicly via HTTP whether it does it either. 

> 2. More generally, I fail to see how mixing HTTP headers, server-related 
> inputs, and environment variables in a dict adds values. It prevents 
> iterating on each collection separately. It only makes sense if not offering 
> more features than CGI is a design goal; in that case, this discussion 
> doesn’t serve a purpose anyway. It would be nicer and possibly more secure if 
> the application received separately:
> 
> a. Configuration information, which servers could read from environment 
> variables by default for backwards compatibility, but could also get through 
> more secure channels and restrict to what the application needs in order to 
> better isolate it from the entire OS.

I have always had a bit of a beef with the way that the use of environment 
variables for configuration was promoted by the 12 factor manifesto. It grew 
out of how a specific hosting service did things and ignored that various web 
servers used configuration files instead or did things in other ways. Of course 
the hosting service made it difficult to impossible to use some of those 
traditional web servers, so they were safe in their narrow view of things.

Anyway, if environment variables were used where appropriate and with an 
intermediate mapping layer within Python web applications that would have been 
fine. The problem was that you started to see direct lookup of environment 
variables deep in code bases. So people wedded themselves to use of environment 
variables.

The more sensible thing to do would have been to use an intermediate Python 
module/package providing an abstraction layer for getting configuration. Code 
would then use that. The configuration layer could then look up environment 
variables or use other means to get configuration, such as from more 
traditional configuration files, or pulling it done from configuration servers.

As far as I know there is no good Python package out there which serves as such 
a intermediary configuration system which could be plugged into any application 
and which doesn’t carry a huge amount of baggage. Would love to hear about one 
if it exists.

> b. Server APIs mandated by the spec, per request.
> c. HTTP headers, per request.
> 
> 3. Stop pretending that HTTP is a unicode protocol, or at least stop ignoring 
> reality when doing so. WSGI enforces ISO-8859-1-decoded str objects in the 

Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Luke Plant
Just to add my 2c - as another Django developer, I agree completely with 
Aymeric here. My own experience was that the HTTP handling done by WSGI 
(especially URL handing, HTTP header mangling, os.environ as a 
destination - all due to CGI compatibility - and semi-broken unicode 
handling) only made things harder for us. We would much rather have 
dealt with raw streams of bytes and done all HTTP parsing ourselves.


Like Graham said, for HTTP/2 let's ignore the history of WSGI and start 
from scratch with a API that actually serves us well.


Regards,

Luke


On 05/01/16 11:26, Cory Benfield wrote:

Forwarding this message from the django-developers list.

Hi Cory,

I’m not subscribed to web-sig but I read the discussion there. Feel free to 
forward my answer to the group if you think it’s useful.

I have roughly the same convictions as Graham Dumpleton. If you want to support 
HTTP/2 and WebSockets, don’t start with design decisions anchored in CGI. 
Figure out what a simple and flexible API for these new protocols would be, 
specify it, implement it, and make sure it degrades gracefully to HTTP/1. You 
may be able to channel most of the communication through a single generator, 
but it’s unclear to me that this will be the most convenient design.

If you want to improve WSGI, here’s a list of mistakes or shortcomings in PEP 
 that you can take a stab at. There’s a general theme: for a specification 
that looks at the future, I believe that making modern PaaS-based deployments 
secure by default matters more than not implementing anything beyond what’s 
available in legacy CGI-based deployments.

1. WSGI is prone to header injection vulnerabilities issues by design due to 
the conversion of HTTP headers to CGI-style environment variables: if the 
server doesn’t specifically prevent it, X-Foo and X_Foo both become HTTP_X_Foo. 
I don’t believe it’s a good choice to destructively encode headers, expect 
applications to undo the damage somehow, and introduce security vulnerabilities 
in the process. If mimicking CGI is still considered a must-have — 1% of 
current Python web programmers may have heard about it, most of them from PEP 
 — then that burden should be pushed onto the server, not the application.

2. More generally, I fail to see how mixing HTTP headers, server-related 
inputs, and environment variables in a dict adds values. It prevents iterating 
on each collection separately. It only makes sense if not offering more 
features than CGI is a design goal; in that case, this discussion doesn’t serve 
a purpose anyway. It would be nicer and possibly more secure if the application 
received separately:

a. Configuration information, which servers could read from environment 
variables by default for backwards compatibility, but could also get through 
more secure channels and restrict to what the application needs in order to 
better isolate it from the entire OS.
b. Server APIs mandated by the spec, per request.
c. HTTP headers, per request.

3. Stop pretending that HTTP is a unicode protocol, or at least stop ignoring 
reality when doing so. WSGI enforces ISO-8859-1-decoded str objects in the 
environ, which is just wrong. It’s all the more a surprising choice since this 
change was driven by Python 3, that UTF-8 is the correct choice, and that 
Python 3 defaults to UTF-8. Django has to re-encode and re-decode before doing 
anything with HTTP headers: 
https://github.com/django/django/blob/d5b90c8e120687863c1d41cf92a4cdb11413ad7f/django/core/handlers/wsgi.py#L231-L253

4. Normalize the way to tell the application about the original protocol, IP 
address and port. When dev and ops responsibilities are separate, this is 
clearly an ops responsibility, but due to the lack of standardization devs end 
up dealing with this problem in custom middleware, when they do it at all. 
Everyone keeps getting it wrong, which introduces security vulnerabilities. 
Also it always breaks silently on infrastructure changes.

5. Improve request / response length handling and connection closure. Armin and 
Graham have talked about in the past and know the topic better than I do. 
There’s also a rejected PEP by Armin which made sense to me.

As you can see from these comments, I don’t quite share the design choices that 
led to WSGI as it currently stands. I think it will be easier to build a new 
standard than evolve the current one.

I hope this helps!

Aymeric


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/l.plant.98%40cantab.net


--
"Trouble: Luck can't last a lifetime, unless you die young."
(despair.com)

Luke Plant || http://lukeplant.me.uk/

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 

Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-05 Thread Armin Ronacher

Hi,

On 05/01/2016 13:09, Luke Plant wrote:

Just to add my 2c - as another Django developer, I agree completely with
Aymeric here. My own experience was that the HTTP handling done by WSGI
(especially URL handing, HTTP header mangling, os.environ as a
destination - all due to CGI compatibility - and semi-broken unicode
handling) only made things harder for us. We would much rather have
dealt with raw streams of bytes and done all HTTP parsing ourselves.

Like Graham said, for HTTP/2 let's ignore the history of WSGI and start
from scratch with a API that actually serves us well.
Alright. I bite: if it would not be done that way you had different 
problems.  In particular a problem that comes up very often is that 
people want the PATH_INFO and SCRIPT_NAME to not be encoded.  That 
however completely breaks any form of routing you would want to do the 
moment they contain unicode characters.


I keep having the argument about PATH_INFO and the header semantics 
constantly with people and i'm absolutely convinced (from the theory 
behind it as well as playing around with ideas for PEP 444 a few years 
ago) that it only gets worse the moment you leave the WSGI territory too 
far.


Likewise I wonder how many people that ask for more low level access 
concerned themselves with chunked requests/responses, transport 
encodings and all the complexity that servers do for you.  Yes, quite a 
bit of this is broken in WSGI but would have been trivial to fix without 
throwing the whole specification into the toilette :)



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com