Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

Armin Ronacher Tue, 05 Jan 2016 03:52:04 -0800

Hi,

I just want to reply to this because I think many people seem to bemissing why things are done in a certain way. Especially if the appearto be odd.


On 05/01/2016 12:26, Cory Benfield wrote:

1. WSGI is prone to header injection vulnerabilities issues by
designdue to the conversion of HTTP headers to CGI-style environment
variables: if the server doesn’t specifically prevent it, X-Foo and
X_Foo both become HTTP_X_Foo. I don’t believe it’s a good choice to
destructively encode headers, expect applications to undo the damage
somehow, and introduce security vulnerabilities in the process. If
mimicking CGI is still considered a must-have — 1% of current Python web
programmers may have heard about it, most of them from PEP 3333 — then
that burden should be pushed onto the server, not the application.

Headers always will have to be encoded destructively if you want anyform of generic processing. We need header joining, we need tonormalize the keys already at least to the extend of the HTTPspecification. I'm happy to not perform the conversion of dashes tounderscores but you will work in environments where this conversion wasalready done so the spec will need to deal with that case anyways.

The WSGI spec currently also does not sufficiently explain how to joinheaders. In particular the cookie header was written without headerjoining in mind which is why it needs to be joined differently than allother headers. Header joining also comes up as a big topic in HTTP 2

so the spec will need to cover this.

2. More generally, I fail to see how mixing HTTP headers,
server-related inputs, and environment variables in a dict adds
values. It prevents iterating on each collection separately. It only
makes sense if not offering more features than CGI is a design goal;
in that case, this discussion doesn’t serve a purpose anyway. It
would be nicer and possibly more secure if the application received
separately:

I think this is largely a nice to have, not something that has anyoverall benefits. I rather just clean up the actual stupid things suchas CONTENT_TYPE and CONTENT_LENGTH which cause a lot more real worldfriction than just the names of keys in general. This really should notturn into meaningless bikeshedding about what information should becalled. Also consider how much code out there already assumes CGI/WSGIvariables so any move off that really should have good reasons or we allwill just waste enormous amounts just to transpose between the tworepresentations.

a. Configuration information, which servers could read from
environment variables by default for backwards compatibility, but could
also get through more secure channels and restrict to what the
application needs in order to better isolate it from the entire OS.

What WSGI traditionally lacked was a setup phase where data could bepassed to the application that was server specific but not requestbound. For instance there is no reason an application cannot get holdof wsgi.errors before a request comes in. I would like to see thisfixed in a new specification.

3. Stop pretending that HTTP is a unicode protocol, or at least stop
ignoring reality when doing so. WSGI enforces ISO-8859-1-decoded str
objects in the environ, which is just wrong. It’s all the more a
surprising choice since this change was driven by Python 3, that UTF-8
is the correct choice, and that Python 3 defaults to UTF-8. Django has
to re-encode and re-decode before doing anything with HTTP headers:

I agree with this but you will have to have that fight with others. Isaid many times before that values should never have been unicode valuesin the first place but certain decisions in the Python 3 standardlibrary at the time prevented that. In particular until 3.2 or so itwas impossible to parse byte URLs.

5. Improve request / response length handling and connection closure.
Armin and Graham have talked about in the past and know the topic
better than I do. There’s also a rejected PEP by Armin which made
sense to me.

I think last time I discussed that with Graham it was not clear what thesolution is in the context of WSGI. The idea that there is acontent-length is laughable in the context of a real application wherethe server is performing conversions on the input and output stream. Wewould need many more than just one content length and an automaticallyterminated input stream.

However at that point you will quickly realize that you can't have itboth ways and you either have a WSGI like protocol, or raw access tosockets but certainly not both. This topic has caused a lot ofbikeshedding in the past and I fail to see how it will be differentlythis time.

My current thinking is that the most realistic approach to most of thoseproblems will be the concept of framing on both the input and outputside. That's somewhat compatible with both chunked transports well aswebsockets. But if we do go down this road we will most likely have tostandardize on a library that implements WSGI as the complexity ofdealing with this sort of stuff is significantly higher than what we hadto do in the past.



Regards,
Armin
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

Reply via email to