Re: [Web-SIG] WSGI for Python 3
At 02:37 PM 8/30/2010 +1000, Graham Dumpleton wrote: Anyway, rather than keep arguing the point and move forward, let us perhaps start now with the following definitions and new names to identify them. We can even go a bit stupid and give each its own code name so they are in part more memorable. Any next option based on your suggestions about changing the WHEAT option can be called MAIZE. And if you thinking I am going stark raving mad and should be put in a white jacket and locked up, you could well be right. I am not a happy camper right now, but that is because of many things besides this WSGI stuff. :-) And yes I know about the page that has been just recently put up at: http://www.wsgi.org/wsgi/Python_3 From memory when I first read it I wasn't sure if that it was completely accurate, but at least it doesn't now mention mod_python instead of mod_wsgi which was mighty confusing. We can perhaps merge the following into that page, ie., expand the table, and talk more about the abstract definitions rather than linking it to specific implementations at this point. We can perhaps then start capturing the pros and cons against each option in the page rather than loosing them in the email chain. I've added a column to the page called flat that captures my current proposal (native keys, surrogateescape values, byte stream in, strict bytes-only for all outputs). This seems to me an optimum balance between: * Verifiability (especially *composable* verifiability) * Low cognitive overhead (i.e., fewest things to remember) * Low amount of finger-typing and fewer conversions But I certainly could be convinced otherwise by example or argument. (One other thing I consider a plus for this approach, btw: os.environ is still largely usable as a WSGI environ in the CGI case. This isn't so much a valuable thing in itself, as that it's an indicator of low complexity and cognitive overhead.) ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Am 28.08.2010 13:13, schrieb Armin Ronacher: Hi, On 2010-08-28 1:04 PM, Georg Brandl wrote: Let me just throw in here that it's *NOT* too late to do something about Python 3.2. It is not even in beta state yet, and I am very willing to introduce the changes to make web programming work again, or even hold up 3.2 for a bit if you need more time. Sorry if I was not clear. I was talking about only wsgiref here. And for that to be adapted to a possible new WSGI specification we would need more time than you can hold the 3.2 release I think. That is certainly true :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
At 06:05 PM 8/27/2010 +0200, Christoph Zwerschke wrote: For instance, user = 'özkan'.encode('latin1') if user in request.META.get('REMOTE_USER', b'').lower(): will not work it the user has logged in as 'Özkan'. Isn't that a problem with code that does this now? ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Paul Davis wrote: Since the major stumbling block, irrespective of other changes, to any sort of agreement is still bytes vs unicode I ran into this while I was attempting to put together enough code to play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has deftly pointed out, its a pretty big pain in the rear. Specifically, if we specify that all keys in the environ dictionary are byte strings, then there's a noticeable amount of pain in trying to write code that runs on both platforms. I object to 2to3.py on religious grounds, so when I was implementing this I was doing so with code that would run unmodified on both 2 and 3. Religion is what gets us into this mess. Pragmatism will get us out. We have two options: 1. Continue to try to write code that runs unmodified on Python 2 and 3, or that runs when 2to3 is applied. There is a morass of corner cases and state machines that behave differently depending on when you look at them lurking here. You can all see where that is getting us: nowhere. By the time you all discover how to write a spec that deals with all the pain points which 2to3 introduces, Python 2 will be dead and you will have wasted your time. 2. Write a Python 3 version of your code. Yes, it's more drudge work. Suck it up. To ameliorate that, make the Python 3 version the default as soon as possible. Deprecate the Python 2 branch. Backport features as necessary to the Python 2 branch (just as Python itself has been doing, if you notice). If you do that, we can write a WSGI for Python 3 now that doesn't suffer from any of the complexities of 2to3. Robert Brewer fuman...@aminus.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Aug 27, 2010 at 4:04 PM, Robert Brewer fuman...@aminus.org wrote: Paul Davis wrote: Since the major stumbling block, irrespective of other changes, to any sort of agreement is still bytes vs unicode I ran into this while I was attempting to put together enough code to play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has deftly pointed out, its a pretty big pain in the rear. Specifically, if we specify that all keys in the environ dictionary are byte strings, then there's a noticeable amount of pain in trying to write code that runs on both platforms. I object to 2to3.py on religious grounds, so when I was implementing this I was doing so with code that would run unmodified on both 2 and 3. Religion is what gets us into this mess. Pragmatism will get us out. We have two options: 1. Continue to try to write code that runs unmodified on Python 2 and 3, or that runs when 2to3 is applied. There is a morass of corner cases and state machines that behave differently depending on when you look at them lurking here. You can all see where that is getting us: nowhere. By the time you all discover how to write a spec that deals with all the pain points which 2to3 introduces, Python 2 will be dead and you will have wasted your time. 2. Write a Python 3 version of your code. Yes, it's more drudge work. Suck it up. To ameliorate that, make the Python 3 version the default as soon as possible. Deprecate the Python 2 branch. Backport features as necessary to the Python 2 branch (just as Python itself has been doing, if you notice). If you do that, we can write a WSGI for Python 3 now that doesn't suffer from any of the complexities of 2to3. Robert Brewer fuman...@aminus.org No. What got us into this mess was the idea that it would be a good to silently type cast unicode objects into bytes. Perhaps I could've been more clear on avoiding 2to3 though. I wanted to avoid coding any of its oddities into a reference implementation because as you point out it's just a source of confusion. I'd like to point out that the code I posted works on both 2.x and 3.x. Its fairly easy to implement the backwards compatible code in Python. There's nothing near the level of requiring a branched/back-port strategy. Not to mention, a branched reference implementation is bit of a contradiction in terms. The hard part is figuring out a specification that doesn't suck when people try and implement it on multiple interpreters. Also, I think you're overestimating the rate at which people are going to be converting to Python 3. I still have people ask for Python 2.4 support. I wouldn't be the least bit surprised if there's a WSGI 3 before we deprecate 2.x support. HTH, Paul Davis ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Hi, On 2010-08-27 6:05 PM, Christoph Zwerschke wrote: Btw, another problem with this is that the lower() method does not know that it has to use latin1 when lowercasing. That is not a problem insofar that case insensitive HTTP tokens are limited to ASCII only. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Tuesday, July 20, 2010, Etienne Robillard e...@gthcfoundation.org wrote: AFAICT, the main difference is that under a bytes-only regime, the changes should be more consistent/mechanical, i.e., able to be performed by relatively superficial code inspection. The problem in all these discussions is that practically no one has been prepared to actually sit down and attempt to migrate any significant code over to any of these proposals and Python 3.0. The only notable attempt is the work Robert Brewer did with CherryPy. Ultimately though I don't think the CherryPy case tells us much as it simple translates the interface in to an internal way of doing things. The true litmus test will be the conversion of any framework which keeps the WSGI interface exposed, with it being used as a means of composing together components to make a stack. Until someone has done that we have absolutely no evidence one way or the other as to what proposal is easier or even viable given potential short comings, or otherwise, in the Python language and standard libraries. It is a chicken and egg problem though in that I would say practically everyone doesn't want to do anything until the WSGI specification has been updated as they don't want to waste their time. You cant though update the specification without truly knowing whether a particular approach will work and to do that you have no choice but to actually try it. Hi Graham et al, One could maybe write a migration app for porting WSGI 1 apps to WSGI 2, in the same way 2to3.py was written. That's how at least I hoped to migrate notmm to Python 3. A switch could be used also to enable/disable bytes or text-mode only for HTTP headers parsing... Is there no such tools yet ready to slowly start moving ahead with WSGI 2 ? I recognize it's a chicken and egg problem but I don't think its necessary for framework authors to migrate to Python 3 in an attempt to solve mistery encoding errors affecting Windows platforms... The issues are not Windows specific. You are misunderstanding past comments if you believe that. The purpose to actually trying it is to work out how viable bytes everywhere and/or users dealing with % encoding is. If dealing with bytes everywhere proves to be easy then great, going that way may be best idea. If it is a PITA as some have said dealing with bytes is in Python 3.0 then we will know rather than it being speculation at this point. Graham A easy-to-follow roadmap to WSGI 2 and writing related development tools should be a more effective way to port frameworks (to WSGI 2) and stick with Python 2 if they want so! ;-) my 2 cents, E -- Etienne Robillard Green Tea Hackers Club E-mail: e...@gthcfoundation.org Work phone: 1 (514) 962-7703 Website:https://gthc.org/ During times of universal deceit, telling the truth becomes a revolutionary act. -- George Orwell ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
I'm still in denial regarding Python 3 generally speaking, but it looks like something important is going on here. Could someone summarize the main points (intelligible to a Python 2 troglodyte)? thanks in advance, -- Aaron Watters === % man less less is more. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Go back through my blog and read some of the posts there so you have some of the history. Recent discussions build on some of the stuff there and I don't think anyone has the time to keep explaining all this to every new person who comes along. Graham On Monday, July 19, 2010, Aaron Watters arw1...@yahoo.com wrote: I'm still in denial regarding Python 3 generally speaking, but it looks like something important is going on here. Could someone summarize the main points (intelligible to a Python 2 troglodyte)? thanks in advance, -- Aaron Watters === % man less less is more. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Tue, Jul 20, 2010 at 12:37 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: On 19 July 2010 03:19, P.J. Eby p...@telecommunity.com wrote: At 01:01 PM 7/18/2010 +1000, Graham Dumpleton wrote: This is on the basis that if people are going to have to rewrite their code a fair bit to handle bytes everywhere, What you mean by rewrite their code a fair bit, and who is it that you think will have to do this? Or, more precisely, how is that any different from the text or text-and-bytes proposals? My comments are based on the mood I have got from listening to discussions here on this list and discussions in other forums and irc channels. To me there appears to be a tendency towards people thinking that having bytes everywhere will be harder to deal with than the text proposal. AFAICT, the main difference is that under a bytes-only regime, the changes should be more consistent/mechanical, i.e., able to be performed by relatively superficial code inspection. The problem in all these discussions is that practically no one has been prepared to actually sit down and attempt to migrate any significant code over to any of these proposals and Python 3.0. The only notable attempt is the work Robert Brewer did with CherryPy. Ultimately though I don't think the CherryPy case tells us much as it simple translates the interface in to an internal way of doing things. The true litmus test will be the conversion of any framework which keeps the WSGI interface exposed, with it being used as a means of composing together components to make a stack. Until someone has done that we have absolutely no evidence one way or the other as to what proposal is easier or even viable given potential short comings, or otherwise, in the Python language and standard libraries. It is a chicken and egg problem though in that I would say practically everyone doesn't want to do anything until the WSGI specification has been updated as they don't want to waste their time. You cant though update the specification without truly knowing whether a particular approach will work and to do that you have no choice but to actually try it. And before you argue that the hosting mechanisms haven't been there to do that I will point out that mod_wsgi specifically implemented a way of being able to selectively say whether bytes or text was passed through. That code for bytes support sat there for six months or more and there was zero interest expressed to me by anyone in using it as a basis of some actual attempts at migrating existing code as a test. In the end it got thrown out due to that lack of interest and due to it holding up a new release of mod_wsgi. Distinct from mod_wsgi, it also wouldn't be that hard for interested people to modify wsgiref to implement the different proposals. I stress again that no one seems prepared to do that and again even if it was done, who is then going to try and use it. Thus we all just sit here on the fence waiting for others to do something, pushing our particular ideas and occasionally flip flopping between those ideas as well. Finally and for the record, I will not be modifying mod_wsgi to change it in anyway now until I see a separate proof of concept WSGI server and a decent sized framework ported to it. So yes I am going to sit on the fence as well, but that is because I have been burned in the past in putting in effort on this only see it go now where. I am not going to waste my time again like that. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com Just a quick note. I've started working on a project to try and get a version of wsgi running on 2.x and 3.x. I've been needing a reason to start using 3.1 for sometime and this thread has managed to spur me into action. To be clear, I'm coming at this from the point of view that as long as there are breaking changes, I might as well make things really broken. So I'll be incorporating ideas from [1] as well as other bits of trivia I've picked up. I realize this will lower the probability that anything comes of this work, but I reckon it'll at least be some code to discuss. My current plan is to get a reference implementation with some tests that runs on 2.x and 3.x. Once I get there I'll try porting WebOb [2] and maybe Django [3] (depending on the progress of its port [4]). If I get that far I'll probably make a fork of Gunicorn [5] so that there's a whole stack that runs on both 2.x and 3.x. Optimistically, I'd like to have enough code to show the reference implementation and tests by this weekend. Although, I'm still learning 3.x differences and work arounds so I could fail miserably. Paul J. Davis [1] http://wsgi.org/wsgi/WSGI_2.0 [2] http://pythonpaste.org/webob/ [3] http://www.djangoproject.com/ [4]
Re: [Web-SIG] WSGI for Python 3
At 01:01 PM 7/18/2010 +1000, Graham Dumpleton wrote: This is on the basis that if people are going to have to rewrite their code a fair bit to handle bytes everywhere, What you mean by rewrite their code a fair bit, and who is it that you think will have to do this? Or, more precisely, how is that any different from the text or text-and-bytes proposals? AFAICT, the main difference is that under a bytes-only regime, the changes should be more consistent/mechanical, i.e., able to be performed by relatively superficial code inspection. My personal opinion is that if you are going to go bytes everywhere, then you may as well throw out the complete WSGI specification as it stands now and fix all the other problems with the specification. That may not be a bad idea; I'm certainly in favor of going ahead and ditching start_response/write while we're at it. The requirement to change both the entry and exit points to match the calling convention also seems to provide an ideal opportunity to insert any necessary encoding or decoding operations. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Saturday, July 17, 2010, Gustavo Narea m...@gustavonarea.net wrote: Hello, Ian said: Having two ways of expressing the same information will lead to bugs related to which data is canonical. If an application is using SCRIPT_NAME/PATH_INFO and then updates those values in any way, and wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be weird bugs and code will disagree about which one is correct. Since %2f can exist in the raw versions, there isn't even a way to chunk the two variables in the same way. I can't agree more. I would propose the following, and excuse me in advance if this has already been proposed and discarded -- I've tried to follow this topic on the mailing list over the past few months, until it becomes an endless discussion. I think only the raw values should be available. Even if a middleware changes them, it must put them with raw values. And because you cannot change those values without knowing what encoding the request uses, the character encoding *must* be present. I know that sounds easy but it's not, because browsers don't specify the charset in the Content-Type and instead they generate a new request using the charset from the previous response. So the charset is unknown to the server/gateway and the middleware stack. So, what we could do is introduce a mandatory variable called, say, wsgi.charset, and would be used as follows: Something like this was proposed before, but it only applied to the keys that mattered, specifically PATH_INFO and maybe QUERY_STRING, (the latter of which this discussion has been ignoring and I can't remember how we worked out before it should be treated). It didn't cover SCRIPT_NAME as as I indicated before, the encoding of that is really dictated by the server and not the application for the initial value at least. The idea was that the server would pass them as Latin 1 and set the encoding key. If a consumer of it didn't like the encoding it was in, it would convert it back to bytes and then to what it wants and update the encoding key to what it used. Thus you had a hint available to allow reliable transcoding. This proposal didn't get acceptance either. Graham - It MUST be set by the server or gateway on every request. - Every middleware or application that reads or writes these values MUST use the charset specified in wsgi.charset. - If a server, gateway, middleware or application wants to change the charset and it is possible*, it MUST convert the *entire* request into that charset and update wsgi.charset accordingly. - When the charset is not specified in the HTTP request, UTF-8 MUST be assumed by the server/gateway. Unless another default charset has been specified by the user. I think/hope that will solve all the problems. What happens when a WSGI application is actually made up two WSGI applications and they send the responses in different charsets? If it's not possible to configure them so that they both use the same charsets, then one of them would have to be wrapped by a middleware which: - On egress, converts the responses using the charset used by the other application. - On ingress, if the charset is not specified in the request, it will assume it's the one used by the other application, and thus it will convert the request using the charset supported by the wrapped application. It would look like this: === def application(environ, start_response): if environ.startswith(/trac/): # Say Trac only supports Latin-1 and we want responses to use UTF-8: app = trac.web.main.dispatch_request app = CharsetNormalizer(app, response=latin-1, request=utf8) else: # myapp uses UTF-8 app = myapp return app(environ, start_response) === Then there's the string vs bytes issue. Bytes would be the natural choice to represent these raw values, but it would probably cause more trouble than they solve. So, I think they should be strings that contain the the ASCII raw encoded values (i.e., str on both versions of Python). What do you think about this? Again, sorry if this has been discarded before! :) * For example, you can always convert Latin-1 to UTF-8, but not every UTF-8 string can be converted to Latin-1. -- Gustavo Narea xri://=Gustavo. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Sat, Jul 17, 2010 at 12:38 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: On Friday, July 16, 2010, And Clover and...@doxdesk.com wrote: On 07/14/2010 06:43 AM, Ian Bicking wrote: There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO, and HTTP_COOKIE. (And of those, PATH_INFO is the only one that really matters, in that no-one really uses non-ASCII script filenames, FWIW, I had to go to a lot of trouble to allow non ASCII in final SCRIPT_NAME in mod_wsgi. Specifically using AddHandler directive in Apache means a file system path can make up part of SCRIPT_NAME. I had someone who was specifically using Russian in a WSGI script file name and because with AddHandler that becomes part of SCRIPT_NAME you had to cater for it. Anyway this was more of a Windows issue in having to use special file system functions to deal with fact that on Windows filesystem paths aren't UTF-8 but something else. What this does highlight though is that although one can talk about passing raw script name through to application, that isn't necessarily right as it isn't the application that dictates what encoding may be used but the web server which is performing the mapping of that part of the original URL path to a potential filesystem resource, or alternatively where file based configuration for mount point, the encoding of the web sever configuration file. This is an Apache-specific issue. It definitely doesn't apply to paste.httpserver, I doubt CherryPy or wsgiref. I don't really know how Nginx or other servers work. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Hi, On 7/17/10 9:15 AM, Ian Bicking wrote: This is an Apache-specific issue. It definitely doesn't apply to paste.httpserver, I doubt CherryPy or wsgiref. I don't really know how Nginx or other servers work. This will be an issue for every server that... * supports unicode filesystems * decides to do internal mapping based on URIs and not IRIs In fact, this will be an issue for things like middlewares that want to map applications to paths. In fact, this already is an issue on Python 2 already, just that nobody cares. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Hi, On 7/17/10 12:57 PM, Armin Ronacher wrote: In fact, this will be an issue for things like middlewares that want to map applications to paths. In fact, this already is an issue on Python 2 already, just that nobody cares. s/applications/serving static files from folders/ Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, 16 Jul 2010, P.J. Eby wrote: At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote: There should be one, and preferably *only* one, obvious way to do it. And given that HTTP is inherently a bunch of bytes, bytes is the one obvious way. I think this makes sense. The thing which is assembling the WSGI environment should do bytes and things further down the stack can deal with it as they like. This aligns well with how I like to think about such stuff: bytes on the outside, unicode on the inside. Given that app and frameworks developers can throw whatever keys they like back into the environment, they can cope as they like.[1] What would be horrible is if there need to be multiple coping strategies. Better to be able to say, Oh it doesn't work? Try this way to cope: remember it is bytes. However, unless I'm misreading the thread, the bytes issue isn't really the bone of contention. People seem okay with bytes as long as specifc points of pain are addressed, such as: * What's my PATH_INFO and SCRIPT_NAME? * This server, which hosts, but is not, the WSGI environment builder doesn't play well with this model. * Some others I can't remember now. It seems then that perhaps a way forward is to say: Okay, it's gonna be bytes. Now, given that, how do we deal with these other issues, which perhaps can be recast and encapsulated to be considered orthogonal to the bytes/not-bytes debate. Because we _know_ that any choice is going to come with costs, but as things have dragged on, the lack of choice thus far is starting to have as much of a cost as the costs that are wanting to be resolved. [1] I not expecting or hoping for porting/migrating to Python 3 to be simple/automatic/easy, but perhaps I'm cruel. -- Chris Dent http://burningchrome.com/~cdent/ [...] ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
[PJ Eby] IOW, the bytes/string discussion on Python-dev has kind of led me to realize that we might just as well make the *entire* stack bytes (incoming and outgoing headers *and* streams), and rewrite that bit in PEP 333 about using str on Python 3000 to say we go with bytes on Python 3+ for everything that's a str in today's WSGI. Or, to put it another way, if I knew then what I know *now*, I think I'd have written the PEP the other way around, such that the use of 'str' in WSGI would be a substitute for the future 'bytes' type, rather than viewing some byte strings as a forward-compatible substitute for Py3K unicode strings. Of course, this would be a WSGI 2 change, but IMO we're better off making a clean break with backward compatibility here anyway, rather than having conditionals. Also, going with bytes everywhere means we don't have to rename SCRIPT_NAME and PATH_INFO, which in turn avoids deeper rewrites being required in today's apps. +1 (Hm. Although actually, I suppose we *could* just borrow the time machine and pretend that WSGI called for byte-strings everywhere all along...) +1/0 Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On 17-07-2010, chris.d...@gmail.com wrote: On Fri, 16 Jul 2010, P.J. Eby wrote: At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote: There should be one, and preferably *only* one, obvious way to do it. And given that HTTP is inherently a bunch of bytes, bytes is the one obvious way. I think this makes sense. The thing which is assembling the WSGI environment should do bytes and things further down the stack can deal with it as they like. This aligns well with how I like to think about such stuff: bytes on the outside, unicode on the inside. Given that app and frameworks developers can throw whatever keys they like back into the environment, they can cope as they like.[1] What would be horrible is if there need to be multiple coping strategies. Better to be able to say, Oh it doesn't work? Try this way to cope: remember it is bytes. This thread is difficult to follow, but this make sense to me also. KISS -- William Dodé - http://flibuste.net Informaticien Indépendant ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Chris McDonough chr...@plope.com wrote: On Sat, 2010-07-17 at 01:33 +0200, Armin Ronacher wrote: Hi, On 7/17/10 1:20 AM, Chris McDonough wrote: Let me know if I'm missing something. The only thing you miss is that the bytes type of Python 3 is badly supported in the stdlib (not an issue if we reimplement everything in our libraries, not an issue for me) and that the bytes type has no string formattings which makes us do the encode/decode dance in our own implementation so of the missing stdlib functions. This is why the docs mention bytes with benefits instead (like the Python 2 str type). The existence of such a type would be the result of us lobbying for its inclusion into some future Python 3, or at least the result of lobbying for a String ABC that would allow us to define our own. I think the most effective way to lobby here would be to provide the String ABC and an implementation of encoded strings, i.e. strings with an internal representation that's a byte sequence in a particular encoding. Bill ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On 17 July 2010 22:30, chris.d...@gmail.com wrote: On Fri, 16 Jul 2010, P.J. Eby wrote: At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote: There should be one, and preferably *only* one, obvious way to do it. And given that HTTP is inherently a bunch of bytes, bytes is the one obvious way. I think this makes sense. The thing which is assembling the WSGI environment should do bytes and things further down the stack can deal with it as they like. This aligns well with how I like to think about such stuff: bytes on the outside, unicode on the inside. Given that app and frameworks developers can throw whatever keys they like back into the environment, they can cope as they like.[1] What would be horrible is if there need to be multiple coping strategies. Better to be able to say, Oh it doesn't work? Try this way to cope: remember it is bytes. However, unless I'm misreading the thread, the bytes issue isn't really the bone of contention. Actually it still is. There are still two competing camps. Some want text, some want bytes. The whole discussion started purely around basis of progressing the text based proposal. As usual, those wanting bytes step up and we get two interwoven discussions which if you don't know the history can be hard to follow. My personal opinion is that if you are going to go bytes everywhere, then you may as well throw out the complete WSGI specification as it stands now and fix all the other problems with the specification. This is on the basis that if people are going to have to rewrite their code a fair bit to handle bytes everywhere, you may as well structurally change the WSGI interface API as well to address other problems. Anyway, it seems to be moot at this point as some believe that bytes everywhere with Python language as it stands, plus state of stdlib would make use of bytes everywhere rather unmanageable, which is where ebytes comes in. Thus bytes everywhere doesn't sound like a short term solution and requires changes in Python itself to make it viable. Graham People seem okay with bytes as long as specifc points of pain are addressed, such as: * What's my PATH_INFO and SCRIPT_NAME? * This server, which hosts, but is not, the WSGI environment builder doesn't play well with this model. * Some others I can't remember now. It seems then that perhaps a way forward is to say: Okay, it's gonna be bytes. Now, given that, how do we deal with these other issues, which perhaps can be recast and encapsulated to be considered orthogonal to the bytes/not-bytes debate. Because we _know_ that any choice is going to come with costs, but as things have dragged on, the lack of choice thus far is starting to have as much of a cost as the costs that are wanting to be resolved. [1] I not expecting or hoping for porting/migrating to Python 3 to be simple/automatic/easy, but perhaps I'm cruel. -- Chris Dent http://burningchrome.com/~cdent/ [...] ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, 2010-07-16 at 23:38 -0500, Ian Bicking wrote: On Fri, Jul 16, 2010 at 9:43 PM, Chris McDonough chr...@plope.com wrote: Nah, not nearly that hard: path_info = urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8') I don't see the problem? If you want to distinguish %2f from /, then you'll do it slightly differently, like: path_parts = [ urllib.parse.unquote_to_bytes(p).decode('UTF-8') for p in environ['wsgi.raw_path_info'].split('/')] This second recipe is impossible to do currently with WSGI. So... before jumping to conclusions, what's the hard part with using text? It's extremely hard to swallow Python 3's current disregard for the primacy of bytes at I/O boundaries. I'm trying, but I can't help but feel that the existence of an API like unquote_to_bytes is more symptom treatment than solution. Of course something that unquotes a URL segment unquotes it into bytes; it's the only sane default because URL segments found in URLs on the internet are bytes. Yes, URL quoted strings should decode to bytes, though arguably it is reasonable to also use the very reasonable UTF-8 default that urllib.parse.quote/unquote uses. So it's really just a question of names, should be quote_to_string or quote_to_bytes that name. Which honestly... whatever. After some careful consideration, I realize I'm only able to offer stop energy regarding the WSGI-as-text proposal, so I'll bow out of any maillist conversation about it for now. - C ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On 07/14/2010 06:43 AM, Ian Bicking wrote: There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO, and HTTP_COOKIE. (And of those, PATH_INFO is the only one that really matters, in that no-one really uses non-ASCII script filenames, and non-ASCII characters in Cookie/Set-Cookie are still handled so differently/brokenly across browsers that you can't rely on them at all.) * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them exclusively with encoded versions For compatibility with existing apps, how about keeping the existing SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying that the new 'raw' versions (whatever they are called) are added only if they really are raw, not reconstructed. Then existing scripts that don't care about non-ASCII and slashes can carry on as before, and for apps that do care about them, they'll be able to be *sure* the input is correct. Or they can fall back to PATH_INFO when not present, and avoid producing these kind of URLs in response. (Or an app might have enough special knowledge to try other fallback mechanisms when the raw versions are unavailable, such as REQUEST_URI or Windows ctypes envvar hacking. But if the server/gateway has good raw paths it shouldn't bother use these.) -- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com/ ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Friday, July 16, 2010, And Clover and...@doxdesk.com wrote: On 07/14/2010 06:43 AM, Ian Bicking wrote: There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO, and HTTP_COOKIE. (And of those, PATH_INFO is the only one that really matters, in that no-one really uses non-ASCII script filenames, and non-ASCII characters in Cookie/Set-Cookie are still handled so differently/brokenly across browsers that you can't rely on them at all.) * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them exclusively with encoded versions For compatibility with existing apps, how about keeping the existing SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying that the new 'raw' versions (whatever they are called) are added only if they really are raw, not reconstructed. Then existing scripts that don't care about non-ASCII and slashes can carry on as before, and for apps that do care about them, they'll be able to be *sure* the input is correct. Or they can fall back to PATH_INFO when not present, and avoid producing these kind of URLs in response. (Or an app might have enough special knowledge to try other fallback mechanisms when the raw versions are unavailable, such as REQUEST_URI or Windows ctypes envvar hacking. But if the server/gateway has good raw paths it shouldn't bother use these.) Which is exactly what I have suggested in the past. If you do that, one has to ask the question, given it is more convention than anything, why it isn't just a x-wsgiorg extension specification like routing args is rather than a core part of the WSGI specification. Servers could still implement the extension as they are able to and don't have to worry about changing core specification then and what we have now stands. Graham -- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com/ ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On 07/16/2010 12:07 PM, Graham Dumpleton wrote: If you do that, one has to ask the question, given it is more convention than anything, why it isn't just a x-wsgiorg extension specification Yes, fine by me either way. I just want to be able to say this application can use Unicode paths when run on a server/gateway that supports standardised feature X, rather than the current mess of you can have Unicode paths if you use one of the dozen different server-and-platform combinations we've specifically coded workarounds for. -- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com/ ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote: And this doesn't help with Python 3: either we have byte values of SCRIPT_NAME and PATH_INFO in Python 3, or we have text values. I think bytes will be more awkward to port to than text, and inconsistent with other WSGI values. OTOH, it has the tremendous advantage of pushing the encoding question onto the app (or framework) developer... who's really the only one who can make the right decision for their particular application. And personally, I'd rather have clear boundaries between text and bytes, such that porting (even if tedious or awkward) is *consistent*, and clear as to when you're finished, not, oh, did I check to make sure I converted SCRIPT_NAME and PATH_INFO... not just in my app code, but in all the library code I call *from* my app? IOW, the bytes/string discussion on Python-dev has kind of led me to realize that we might just as well make the *entire* stack bytes (incoming and outgoing headers *and* streams), and rewrite that bit in PEP 333 about using str on Python 3000 to say we go with bytes on Python 3+ for everything that's a str in today's WSGI. Or, to put it another way, if I knew then what I know *now*, I think I'd have written the PEP the other way around, such that the use of 'str' in WSGI would be a substitute for the future 'bytes' type, rather than viewing some byte strings as a forward-compatible substitute for Py3K unicode strings. Of course, this would be a WSGI 2 change, but IMO we're better off making a clean break with backward compatibility here anyway, rather than having conditionals. Also, going with bytes everywhere means we don't have to rename SCRIPT_NAME and PATH_INFO, which in turn avoids deeper rewrites being required in today's apps. (Hm. Although actually, I suppose we *could* just borrow the time machine and pretend that WSGI called for byte-strings everywhere all along...) ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Friday, July 16, 2010, Ian Bicking wrote: We could make everything bytes and be done with it, but it would make it much harder to port Python 2 WSGI code to Python 3. I think this might be best having seen all of the discussion. One could easily write a compatibility middleware that makes porting Python 2 applications easy or even completely transparent (from a WSGI spec point of view). Regards, Stephan -- Entrepreneur and Software Geek Google me. Zope Stephan Richter ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Hello, Ian said: Having two ways of expressing the same information will lead to bugs related to which data is canonical. If an application is using SCRIPT_NAME/PATH_INFO and then updates those values in any way, and wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be weird bugs and code will disagree about which one is correct. Since %2f can exist in the raw versions, there isn't even a way to chunk the two variables in the same way. I can't agree more. I would propose the following, and excuse me in advance if this has already been proposed and discarded -- I've tried to follow this topic on the mailing list over the past few months, until it becomes an endless discussion. I think only the raw values should be available. Even if a middleware changes them, it must put them with raw values. And because you cannot change those values without knowing what encoding the request uses, the character encoding *must* be present. I know that sounds easy but it's not, because browsers don't specify the charset in the Content-Type and instead they generate a new request using the charset from the previous response. So the charset is unknown to the server/gateway and the middleware stack. So, what we could do is introduce a mandatory variable called, say, wsgi.charset, and would be used as follows: - It MUST be set by the server or gateway on every request. - Every middleware or application that reads or writes these values MUST use the charset specified in wsgi.charset. - If a server, gateway, middleware or application wants to change the charset and it is possible*, it MUST convert the *entire* request into that charset and update wsgi.charset accordingly. - When the charset is not specified in the HTTP request, UTF-8 MUST be assumed by the server/gateway. Unless another default charset has been specified by the user. I think/hope that will solve all the problems. What happens when a WSGI application is actually made up two WSGI applications and they send the responses in different charsets? If it's not possible to configure them so that they both use the same charsets, then one of them would have to be wrapped by a middleware which: - On egress, converts the responses using the charset used by the other application. - On ingress, if the charset is not specified in the request, it will assume it's the one used by the other application, and thus it will convert the request using the charset supported by the wrapped application. It would look like this: === def application(environ, start_response): if environ.startswith(/trac/): # Say Trac only supports Latin-1 and we want responses to use UTF-8: app = trac.web.main.dispatch_request app = CharsetNormalizer(app, response=latin-1, request=utf8) else: # myapp uses UTF-8 app = myapp return app(environ, start_response) === Then there's the string vs bytes issue. Bytes would be the natural choice to represent these raw values, but it would probably cause more trouble than they solve. So, I think they should be strings that contain the the ASCII raw encoded values (i.e., str on both versions of Python). What do you think about this? Again, sorry if this has been discarded before! :) * For example, you can always convert Latin-1 to UTF-8, but not every UTF-8 string can be converted to Latin-1. -- Gustavo Narea xri://=Gustavo. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby p...@telecommunity.com wrote: At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote: And this doesn't help with Python 3: either we have byte values of SCRIPT_NAME and PATH_INFO in Python 3, or we have text values. I think bytes will be more awkward to port to than text, and inconsistent with other WSGI values. OTOH, it has the tremendous advantage of pushing the encoding question onto the app (or framework) developer... who's really the only one who can make the right decision for their particular application. And personally, I'd rather have clear boundaries between text and bytes, such that porting (even if tedious or awkward) is *consistent*, and clear as to when you're finished, not, oh, did I check to make sure I converted SCRIPT_NAME and PATH_INFO... not just in my app code, but in all the library code I call *from* my app? IOW, the bytes/string discussion on Python-dev has kind of led me to realize that we might just as well make the *entire* stack bytes (incoming and outgoing headers *and* streams), and rewrite that bit in PEP 333 about using str on Python 3000 to say we go with bytes on Python 3+ for everything that's a str in today's WSGI. This was my first intuition too, until I started thinking in more detail about the particular values involved. Some obviously are textish, like environ['SERVER_NAME']. Not a very useful value, but definitely text. Basically all the internal strings are textish, so we're left with: wsgi.url_scheme SCRIPT_NAME/PATH_INFO QUERY_STRING HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers) response status response headers (name and value) And there's a few things like REMOTE_USER that are kind of in the middle. Everyone is in agreement that bodies should be bytes. One initial problem is that the Python 3 stdlib handles bytes poorly, so for instance there's no good way to reconstruct the URL using the stdlib. That explains certain tensions, but I think we should ignore that, and in fact that's what Python-Dev seemed to say pretty clearly. Now, the other keys: wsgi.url_scheme: clearly ASCII SCRIPT_NAME/PATH_INFO: often UTF-8, could be no encoding, could be some old legacy encoding. raw request path: should be ASCII (non-ASCII should be URL-encoded). URL encoding happens at the byte layer, so a server could reasonably URL encode any non-ASCII characters without imposing any encoding. QUERY_STRING: should be ASCII, same as raw request path headers: Most are ASCII. Latin1 is a reasonable fallback and suggested by the specification. The spec also implies you have use the RFC2047 inline encoding (like ?iso-8859-1?q?some=20text?=), but nothing supports this and supporting it would probably be a bad idea for security reasons. The Atompub spec (reasonably modern) specifically says Title headers should be encoded with RFC2047 (if they are not ISO-8859-1): http://tools.ietf.org/html/draft-ietf-atompub-protocol-08#page-17 -- decoding this kind of encoding at the application layer seems reasonable to me. cookie header: this specific header can easily have multiple encodings, as the browser encodes data then treats it as opaque bytes, so a cookie can be set via UTF-8 one place, Latin1 another, and those coexist in one header. That is, there is no real encoding and this should be treated as bytes. (Latin1 is an approximation of bytes... a spotty way to treat bytes, but entirely workable.) response status: I believe the spec says this must be Latin1/ISO-8859-1. In practice it is almost always ASCII, and since it is not user-visible it's not something that really needs localization. response headers: the spec implies Latin1, in practice the Set-Cookie header is bytes (since interoperation with wonky legacy systems is not uncommon). I'm not sure of any other exceptions? So... to me it seems pretty reasonable for HTTP specifically that text can work. And if feels weird that, say, environ['SERVER_NAME'] be text and environ['HTTP_HOST'] not, and I don't know what environ['REMOTE_ADDR'] should be in that mode. And it would also be weird if environ['SERVER_NAME'] was bytes. In the past when we've gotten down to specifics, the only holdup has been SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 5:08 PM, Chris McDonough chr...@plope.com wrote: On Fri, 2010-07-16 at 17:47 -0400, Tres Seaver wrote: In the past when we've gotten down to specifics, the only holdup has been SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those. I think I favor PJE's suggestion: let WSGI deal only in bytes. I'd prefer that WSGI 2 was defined in terms of a bytes with benefits type (Python 2's ``str`` with an optional encoding attribute as a hint for cast to unicode str) instead of Python 3-style bytes. But if I had to make the Hobson's choice between Python 3 style bytes and Python 3 style str, I'd choose bytes. If I then needed to write middleware or applications, I'd use WebOb or an equivalent library to enable a policy which converted those bytes to strings on my behalf. Making it easy to write raw middleware or applications without using such a library doesn't seem as compelling a goal as being able to easily write one which allowed me direct control at the raw level. What are the concrete problems you envision with text request headers, text (URL-quoted) path, and text response status and headers? -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 5:06 PM, Ian Bicking i...@colorstudy.com wrote: On Fri, Jul 16, 2010 at 4:47 PM, Tres Seaver tsea...@palladion.comwrote: Basically all the internal strings are textish, so we're left with: What do you mean by internal? Anything in the headers or the CGI environment is intrinsically bytes-ish to me. Do you mean that you want application programmers to have them transparently decoded? If so, we can make that the responsibility of the non-middleware framework / application. By internal I mean all the CGI variables that aren't representing HTTP, like SERVER_NAME. Actually I was thinking SERVER_SOFTWARE, though SERVER_NAME is somewhat similar as it doesn't come from HTTP, it comes from server configuration. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, 2010-07-16 at 17:11 -0500, Ian Bicking wrote: On Fri, Jul 16, 2010 at 5:08 PM, Chris McDonough chr...@plope.com wrote: On Fri, 2010-07-16 at 17:47 -0400, Tres Seaver wrote: In the past when we've gotten down to specifics, the only holdup has been SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those. I think I favor PJE's suggestion: let WSGI deal only in bytes. I'd prefer that WSGI 2 was defined in terms of a bytes with benefits type (Python 2's ``str`` with an optional encoding attribute as a hint for cast to unicode str) instead of Python 3-style bytes. But if I had to make the Hobson's choice between Python 3 style bytes and Python 3 style str, I'd choose bytes. If I then needed to write middleware or applications, I'd use WebOb or an equivalent library to enable a policy which converted those bytes to strings on my behalf. Making it easy to write raw middleware or applications without using such a library doesn't seem as compelling a goal as being able to easily write one which allowed me direct control at the raw level. What are the concrete problems you envision with text request headers, text (URL-quoted) path, and text response status and headers? Documentation is the main reason. For example, the documentation for making sense of path_info segments in a WSGI that used unicodey-strings would, as I understand it, read something like this: The PATH_INFO environment variable is a string. To decode it, - First, split it on slashes:: segments = PATH_INFO.split('/') - Then turn each segment into bytes:: bytes_segments = [ bytes(x, encoding='latin-1') for x in segments ] - Then, de-encode each segment's urlencoded portions: urldecoded_segments = [ urllib.unquote(x) for x in bytes_segments ] - Then re-encode each urldecoded segment into the encoding expected by your application app_segments = [ str(x, encoding='utf-8') for x in urldecoded_segments ] .. note:: We decode from latin-1 above because WSGI tunnels the bytes representing the PATH_INFO by way of a string type which contains bytes as characters. That looks pretty apologetic to me, and to be honest, I'm not even sure it will work reliably in the face of existing/legacy applications which have emitted URLs that are not url-encoded properly if those old URLs need to be supported. http://bugs.python.org/issue8136 contains a variation on this theme. I'd much rather say be able to say: The PATH_INFO environment variable is a ``bytes-with-benefits`` type. To decode it: - First, split it on slashes:: segments = PATH_INFO.split('/') - Then, de-encode each segment's urlencoded portions: urldecoded_segments = [ urllib.unquote(x) for x in segments ] - Then re-encode each urldecoded segment into the encoding expected by your application app_segments = [ str(x, encoding='utf-8') for x in urldecoded_segments ] Let me know if I'm missing something. - C ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote: On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby mailto:p...@telecommunity.comp...@telecommunity.com wrote: At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote: And this doesn't help with Python 3: either we have byte values of SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.à  I think bytes will be more awkward to port to than text, and inconsistent with other WSGI values. OTOH, it has the tremendous advantage of pushing the encoding question onto the app (or framework) developer...  who's really the only one who can make the right decision for their particular application.  And personally, I'd rather have clear boundaries between text and bytes, such that porting (even if tedious or awkward) is *consistent*, and clear as to when you're finished, not, oh, did I check to make sure I converted SCRIPT_NAME and PATH_INFO...  not just in my app code, but in all the library code I call *from* my app? IOW, the bytes/string discussion on Python-dev has kind of led me to realize that we might just as well make the *entire* stack bytes (incoming and outgoing headers *and* streams), and rewrite that bit in PEP 333 about using str on Python 3000 to say we go with bytes on Python 3+ for everything that's a str in today's WSGI. This was my first intuition too, until I started thinking in more detail about the particular values involved. Some obviously are textish, like environ['SERVER_NAME']. Not a very useful value, but definitely text. Basically all the internal strings are textish, so we're left with: wsgi.url_scheme SCRIPT_NAME/PATH_INFO QUERY_STRING HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers) response status response headers (name and value) What I'm getting at, though, is it's precisely this sort of hm, which ones are bytes again? stuff that makes you have to stop and *think*, i.e., it doesn't Fit My Braintm any more. ;-) There should be one, and preferably *only* one, obvious way to do it. And given that HTTP is inherently a bunch of bytes, bytes is the one obvious way. I previously was under the impression that bytes wouldn't interoperate with strings in 3.x, but they *do*, in much the same way as they did in 2.x. That means you'll be (mostly) bug-compatible in 3.x, only you'll likely encounter encoding issues *sooner*, rather than later. (i.e., the minute you combine non-ASCII inputs with your regular string constants). Yes, you will also be forced to convert your return values to bytes, but if you've used string constants *anywhere*, then you know you'll be outputting text, which you should already have been encoding for output. (So you'll just be forced to deal with errors on that side sooner as well.) All in all, I'd say this also fits with what people on Python-Dev keep hammering on as the One Obvious Way to deal with bytes and strings in a program: i.e., bytes for I/O, text for text processing. WSGI is HTTP, and HTTP is I/O, ergo, WSGI is I/O, and we should therefore byte the bullet here. ;-) ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
At 05:42 PM 7/16/2010 -0400, Tres Seaver wrote: P.J. Eby wrote: (Hm. Although actually, I suppose we *could* just borrow the time machine and pretend that WSGI called for byte-strings everywhere all along...) I like the idea of pushing responsibility for decoding stuff into the framework / app writer's hands. OTOH, doesn't that hose authors of existing middleware, due to the borkedness of working with bytes in Python3? It only creates a new problem if they are currently not using *any* unicode in 2.x, and are passing through bytes from the input to the output without any encoding or decoding. AFAICT, if any part of their app is currently unicode, they would have the same problems in 2.x. (Minus, of course, any problems introduced by missing bytes methods in 3.x, or the fact that single-subscripted bytes are ints rather than bytestrings.) Anyway, the problems introduced will be problems that can be solved by waving a fairly standard set of dead chickens at the problem, i.e. picking where you're going to encode/decode, and deciding what encoding(s) are meaningful to your app. And frameworks that already have a unicode API are ahead of the game here. So, AFAICT, the only people who'd be punished by a change to bytes are the people who have non-ASCII inputs or outputs, but haven't been using unicode (because 2to3 will convert them to using strings instead of bytes). From what I can tell, though, this is also the group it's most politically correct to hate on in Python-Dev, so we should be relatively safe in shifting the burden to them. ;-) ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Hi, On 7/17/10 1:20 AM, Chris McDonough wrote: Let me know if I'm missing something. The only thing you miss is that the bytes type of Python 3 is badly supported in the stdlib (not an issue if we reimplement everything in our libraries, not an issue for me) and that the bytes type has no string formattings which makes us do the encode/decode dance in our own implementation so of the missing stdlib functions. So I am pretty sure we can't totally bypass the encoding/decoding. We might however require less encodes/decodes if we leave bytes on the WSGI layer. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 P.J. Eby wrote: At 07:20 PM 7/16/2010 -0400, Chris McDonough wrote: I'd much rather say be able to say: The PATH_INFO environment variable is a ``bytes-with-benefits`` type. To decode it: - First, split it on slashes:: segments = PATH_INFO.split('/') - Then, de-encode each segment's urlencoded portions: urldecoded_segments = [ urllib.unquote(x) for x in segments ] - Then re-encode each urldecoded segment into the encoding expected by your application app_segments = [ str(x, encoding='utf-8') for x in urldecoded_segments ] +1. I do wish we actually *had* a bytes-with-benefits type (as I proposed on Python-Dev), but I don't think we can really get one until the language moratorium is over. Plain old bytes are the next best thing. We might be able to write one which would work in reduce-instruction-set mode, and have the server wrap the environ valuee in it. Some operations might not be natural, and we might have to implement some wrappers around stdlib stuff, but maybe it would be worthwhile to try a spike on it. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkxBA00ACgkQ+gerLs4ltQ4xlQCghykpuIBK97nwJczkZpddlrCf rZQAoI6xRwsIo5jQiD781o8Q5Y5wxoSx =4WBq -END PGP SIGNATURE- ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Sat, 2010-07-17 at 01:33 +0200, Armin Ronacher wrote: Hi, On 7/17/10 1:20 AM, Chris McDonough wrote: Let me know if I'm missing something. The only thing you miss is that the bytes type of Python 3 is badly supported in the stdlib (not an issue if we reimplement everything in our libraries, not an issue for me) and that the bytes type has no string formattings which makes us do the encode/decode dance in our own implementation so of the missing stdlib functions. This is why the docs mention bytes with benefits instead (like the Python 2 str type). The existence of such a type would be the result of us lobbying for its inclusion into some future Python 3, or at least the result of lobbying for a String ABC that would allow us to define our own. But.. yeah. Stdlib support for bytes. Dunno. What I really don't want to do is implement a WSGI spec in terms of Unicodey strings just because the webby stuff in the stdlib cannot deal with bytes. Those stdlib implementations should be changed to deal with bytes-ish things instead. I actually think fixing the stdlib will end up being a driver for the bytes with benefits type. Supporting such a type in the implementation of stdlib functions is clearly the right way to fix it in lots of cases, because they will be able to deal with BwB and Unicodey-strings in exactly the same way. In the meantime, I think using bytes is the only sane thing to do in some interim specification, because moving from a spec which is bytes-oriented to a spec that is text-oriented now will leave us in the embarrassing position of needing to create yet another bytes-oriented spec later (as, well, I/O is bytes), when Python 3 matures and realizes it needs such a hybrid type. - C ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 8:46 PM, Ian Bicking i...@colorstudy.com wrote: So... before jumping to conclusions, what's the hard part with using text? Oh, the one thing that will be silly is cookies, but they are totally nuts already. They can be parsed equally well as bytes or latin1, and best only transcoded after parsing. Doing cookie_value.decode(app_encoding) or cookie_value.encode('ISO-8859-1').decode(app_encoding) isn't terribly different. And cookies aren't fair because they are just stupid; like the standard library I don't think we should design anything around their idiosyncrasies. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Saturday, July 17, 2010, Ian Bicking i...@colorstudy.com wrote: On Fri, Jul 16, 2010 at 6:20 PM, Chris McDonough chr...@plope.com wrote: What are the concrete problems you envision with text request headers, text (URL-quoted) path, and text response status and headers? Documentation is the main reason. For example, the documentation for making sense of path_info segments in a WSGI that used unicodey-strings would, as I understand it, read something like this: Nah, not nearly that hard: path_info = urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8') I don't see the problem? If you want to distinguish %2f from /, then you'll do it slightly differently, like: path_parts = [ urllib.parse.unquote_to_bytes(p).decode('UTF-8') for p in environ['wsgi.raw_path_info'].split('/')] This second recipe is impossible to do currently with WSGI. So... before jumping to conclusions, what's the hard part with using Sorry, it is not that simple. The thing that everyone is ignoring is that SCRIPT_NAME and PATH_INFO are also normalized by the web server normally. That is, .. instances are removed. By passing the raw URL through to the application, you are now forcing every application to have to deal with that as well with the possibility of directory traversal attacks when people get it wrong and the URL is mapping somehow to file system resources. It is a huge can of worms which at the moment the web server deals with. I have other issues with the raw stuff, but haven't got to read the last dozen messages in this discussion as yet, so will leave those points to another time. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Fri, Jul 16, 2010 at 11:28 PM, Graham Dumpleton graham.dumple...@gmail.com wrote: Nah, not nearly that hard: path_info = urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8') I don't see the problem? If you want to distinguish %2f from /, then you'll do it slightly differently, like: path_parts = [ urllib.parse.unquote_to_bytes(p).decode('UTF-8') for p in environ['wsgi.raw_path_info'].split('/')] This second recipe is impossible to do currently with WSGI. So... before jumping to conclusions, what's the hard part with using Sorry, it is not that simple. The thing that everyone is ignoring is that SCRIPT_NAME and PATH_INFO are also normalized by the web server normally. That is, .. instances are removed. By passing the raw URL through to the application, you are now forcing every application to have to deal with that as well with the possibility of directory traversal attacks when people get it wrong and the URL is mapping somehow to file system resources. It is a huge can of worms which at the moment the web server deals with. Well... at least to me raw only means not URL decoded, so it doesn't necessarily mean you can't clean up the request path. I guess an attacker could encode . to make things harder. Nevertheless, WSGI servers don't currently guarantee this cleaning. I added it to paste.httpserver, but I don't know one way or the other about any other servers. A quick test shows wsgiref does not clean paths. So apps shouldn't rely on a clean path. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Saturday, July 17, 2010, Ian Bicking i...@colorstudy.com wrote: On Fri, Jul 16, 2010 at 4:33 AM, And Clover and...@doxdesk.com wrote: On 07/14/2010 06:43 AM, Ian Bicking wrote: There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO, and HTTP_COOKIE. (And of those, PATH_INFO is the only one that really matters, in that no-one really uses non-ASCII script filenames, and non-ASCII characters in Cookie/Set-Cookie are still handled so differently/brokenly across browsers that you can't rely on them at all.) * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them exclusively with encoded versions For compatibility with existing apps, how about keeping the existing SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying that the new 'raw' versions (whatever they are called) are added only if they really are raw, not reconstructed. Having two ways of expressing the same information will lead to bugs related to which data is canonical. If an application is using SCRIPT_NAME/PATH_INFO and then updates those values in any way, and wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be weird bugs and code will disagree about which one is correct. Since %2f can exist in the raw versions, there isn't even a way to chunk the two variables in the same way. Then existing scripts that don't care about non-ASCII and slashes can carry on as before, and for apps that do care about them, they'll be able to be *sure* the input is correct. Or they can fall back to PATH_INFO when not present, and avoid producing these kind of URLs in response. I don't think it works to imagine you can just not care about non-ASCII. Requests come in. WSGI should represent those requests. If a request comes in with non-ASCII bytes then WSGI needs to do *something* with it. I don't want to have to configure servers with application policy; servers should just work. And this doesn't help with Python 3: either we have byte values of SCRIPT_NAME and PATH_INFO in Python 3, or we have text values. I think bytes will be more awkward to port to than text, and inconsistent with other WSGI values. If we have text then we have to choose an encoding. Latin1 will work, but it will be the exact wrong encoding most of the time as UTF-8 is the typical (unlike other headers, where Latin1 will mostly be an okay encoding, or as good a guess as we have). If we firmly remove these keys then we can avoid this choice entirely... and we conveniently also get a better representation of the request. One reason I don't want to see the existing keys removed is for debugging purposes. In Apache, various Apache modules such as mod_rewrite will operate on that translated path. I am concerned that if only the raw one is available in the WSGI application then confusion may arise where something doesn't go right with rewrites because the only information that may be able to be dumped in the way of debug by an application will be different to what other Apache modules may operate on. If you aren't going to make use of CGI versions, then would still like to see them present but perhaps renamed. That way you don't have a loss of information when it comes to trying to debug stuff. I could perhaps just put this in a Apache/mod_wsgi specific key as well given that the issue is particular to it. Thus might have apache.path_info or cgi.path_info. Graham Note that libraries can smooth over this change; WebOb for instance will certainly still support req.script_name/req.path_info by decoding the raw values. Admittedly lots of code use these values directly... but at least if they get a KeyError the port/fix will be obvious (as opposed to out of sync values, which will only emerge as a problem occasionally -- I'd rather not invite more occasional bugs). -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Saturday, July 17, 2010, Ian Bicking i...@colorstudy.com wrote: On Fri, Jul 16, 2010 at 12:28 PM, Chris McDonough chr...@plope.com wrote: On Fri, 2010-07-16 at 11:07 -0500, Ian Bicking wrote: And this doesn't help with Python 3: either we have byte values of SCRIPT_NAME and PATH_INFO in Python 3, or we have text values. I think bytes will be more awkward to port to than text, and inconsistent with other WSGI values. If we have text then we have to choose an encoding. Latin1 will work, but it will be the exact wrong encoding most of the time as UTF-8 is the typical (unlike other headers, where Latin1 will mostly be an okay encoding, or as good a guess as we have). If we firmly remove these keys then we can avoid this choice entirely... and we conveniently also get a better representation of the request. My $.02: I'd rather lobby the core folks for a string ABC (which we can hook with a stringlike bytes type) and consider all 3.X releases made so far dead to WSGI than to have to tunnel arbitrary bytes through some misleading Unicode encoding. While I think it would be generally useful, it's also a long way off at best, with serious performance dangers that could torpedo the whole thing. But... I'm also unsure how it would help here, except perhaps we could incrementally annotate bytes with an encoding? Well, I don't really know. Treating the raw request path as text is easy enough, as it should always be ASCII anyway. We don't have to worry what is right or wrong in this case. We could make everything bytes and be done with it, but it would make it much harder to port Python 2 WSGI code to Python FWIW, I see the whole ebytes discussion only relevant were you to make absolutely everything bytes. We don't really need it otherwise. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On Wed, Jul 14, 2010 at 12:19 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them exclusively with encoded versions (that represent the original request URI). We use Latin1 encoding, but it should be ASCII anyway, like most of the headers. BTW, it should be highlighted whether this change is relevant to Python 3 but like some of the other things you relegated as out of scope, purely a wish list item. Certainly; most headers or metadata is pretty much constrained to ASCII, and any use of non-ASCII is... at least peculiar, and presumably application-specific. For instance, there's no reason you'd have anything but ASCII in Cache-Control. The one place encoded information happens regularly in headers (that I know of) is Cookie. The request URI path is generally ASCII, but SCRIPT_NAME and PATH_INFO *aren't* the request URI path, they are URL decoded versions of the request URI path. And they are usually encoded in UTF8... but UTF8 is a lossy encoding, so decoding them is problematic (though we could define that they must be decoded with surrogateescape). And while they are usually UTF8, they are sometimes no valid encoding at all, because anyone can assemble any set of characters they want and web browsers will accept it. By avoiding URL-unquoting of these values, we can also stick to Latin1 and get something reasonable. It's not very attractive to me that we take something that is probably *not* Latin1, and may reasonably not be ASCII, and decode it as Latin1. -- Ian Bicking | http://blog.ianbicking.org ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On 14 July 2010 15:04, Graham Dumpleton graham.dumple...@gmail.com wrote: On 14 July 2010 14:43, Ian Bicking i...@colorstudy.com wrote: So... there's been some discussion of WSGI on Python 3 lately. I'm not feeling as pessimistic as some people, I feel like we were close but just didn't *quite* get there. What I took from the discussion wasn't that one couldn't specify a WSGI interface, and as you say we more or less have one now, the issue is more about how practical that is from a usability perspective for those who have to code stuff on top. The concern seems to be that although it may be easy to work with the specification for those who at the lowest layer immediately wrap it in a higher level abstraction that normalises stuff into something that is then used consistently in that way, for those who use lower level raw WSGI right through the stack, especially in the context of stackable WSGI middleware, that repetitive task of having to deal with the byte/unicode issues at every point it just a big PITA. That said, my job in writing the WSGI adapter is really easy as I don't have to worry about these issues. This is why I don't seem to really appreciate the concerns people are expressing. The above is how I read things though. Here's my thoughts: * Everyone agrees keys in the environ should be native strings * Bodies should stay bytes * Can we make all standard values that are str on Python 2, str on Python 3 with a Latin1 encoding? This is basically what wsgiref did. This means HTTP_*, SERVER_NAME, etc. Everything CGIish, and everything with an all-caps key. There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO, and HTTP_COOKIE. * I propose we let libraries handle HTTP_COOKIE however they want; don't bother transcoding *into* the environ, just do so when you parse the cookie (if you so choose). Happy developers will just urlencode all their cookie values to keep their cookies ASCII-clean. Unhappy developers who have to handle legacy cookies will just run environ['HTTP_COOKIE'].decode('latin1') and then do whatever sad magic they are forced to do. * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them exclusively with encoded versions (that represent the original request URI). We use Latin1 encoding, but it should be ASCII anyway, like most of the headers. BTW, it should be highlighted whether this change is relevant to Python 3 but like some of the other things you relegated as out of scope, purely a wish list item. Graham * I'm terrible at naming, but let's say these new values are RAW_SCRIPT_NAME and RAW_PATH_INFO. My prior suggestion on that since upper case keys for now effectively derive from CGI, was to make them wsgi.script_name and wsgi.path_info. Ie., push them into the wsgi namespace. Does this solve everything? There's broken stuff in the stdlib, but we shouldn't bother ourselves with that -- if we need working code we should just write it and ignore the stdlib or submit our stuff as patches to the stdlib. The quick summary of what I suggest before is at: http://code.google.com/p/modwsgi/wiki/SupportForPython3X I believe the only difference I see is the raw SCRIPT_NAME and PATH_INFO, which got discussed to death previously with no consensus. Some environments will have a hard time constructing RAW_SCRIPT_NAME and RAW_PATH_INFO, but in my opinion they can just encode SCRIPT_NAME and PATH_INFO and be done with it; it's not as accurate, but it's no less accurate than what we have now. Actual transcoding in the environ is not supported or encouraged in this scheme. If you want to adjust an encoding you should do it in your application/library code. There's some other topics, like chunked responses, unknown request body lengths, start_response, and maybe some other things, but these aren't Python 3 issues, they are just... generic issues. app_iter.close() might be worth thinking about given new iterator semantics introduced since WSGI was written. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
On 14 July 2010 15:18, Ian Bicking i...@colorstudy.com wrote: On Wed, Jul 14, 2010 at 12:04 AM, Graham Dumpleton graham.dumple...@gmail.com wrote: On 14 July 2010 14:43, Ian Bicking i...@colorstudy.com wrote: So... there's been some discussion of WSGI on Python 3 lately. I'm not feeling as pessimistic as some people, I feel like we were close but just didn't *quite* get there. What I took from the discussion wasn't that one couldn't specify a WSGI interface, and as you say we more or less have one now, the issue is more about how practical that is from a usability perspective for those who have to code stuff on top. My intuition is that won't be that bad. At least compared to any library that is dealing with str/unicode porting issues; which aren't easy, but so it goes. * I'm terrible at naming, but let's say these new values are RAW_SCRIPT_NAME and RAW_PATH_INFO. My prior suggestion on that since upper case keys for now effectively derive from CGI, was to make them wsgi.script_name and wsgi.path_info. Ie., push them into the wsgi namespace. That's fine with me too. Does this solve everything? There's broken stuff in the stdlib, but we shouldn't bother ourselves with that -- if we need working code we should just write it and ignore the stdlib or submit our stuff as patches to the stdlib. The quick summary of what I suggest before is at: http://code.google.com/p/modwsgi/wiki/SupportForPython3X I believe the only difference I see is the raw SCRIPT_NAME and PATH_INFO, which got discussed to death previously with no consensus. Thanks, I was looking for that. I remember the primary objection to a SCRIPT_NAME/PATH_INFO change was from you. Do you still feel that way? I accept that access to the raw information may help for people who want access to repeating slashes or other encoded information that an underlying web server may alter, but I cant remember in what way this helps with the Python 3 issues. That is why I just made the comment in other email. Perhaps you can cover how this helps with Python 3. I generally agree with your interpretation, except I would want to strictly disallow unicode (Python 3 str) from response bodies. Latin1/ISO-8859-1 is an okay encoding for headers and status and raw SCRIPT_NAME/PATH_INFO, but for bodies it doesn't have any particular validity. I forgot to mention the response, which you cover; I guess I'm okay with being lenient on types there (allowing both bytes and str in Python 3)... though I'm not really that happy with it. I'd rather just keep it symmetric with the request, requiring native strings everywhere. The reason for allowing it in the response content was so the canonical WSGI hello world still work unmodified. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com