Re: [Web-SIG] WSGI for Python 3

2010-08-30 Thread P.J. Eby

At 02:37 PM 8/30/2010 +1000, Graham Dumpleton wrote:

Anyway, rather than keep arguing the point and move forward, let us
perhaps start now with the following definitions and new names to
identify them. We can even go a bit stupid and give each its own code
name so they are in part more memorable. Any next option based on your
suggestions about changing the WHEAT option can be called MAIZE. And
if you thinking I am going stark raving mad and should be put in a
white jacket and locked up, you could well be right. I am not a happy
camper right now, but that is because of many things besides this WSGI
stuff. :-)

 And yes I know about the page that has been just recently put up at:

  http://www.wsgi.org/wsgi/Python_3

From memory when I first read it I wasn't sure if that it was
completely accurate, but at least it doesn't now mention mod_python
instead of mod_wsgi which was mighty confusing. We can perhaps merge
the following into that page, ie., expand the table, and talk more
about the abstract definitions rather than linking it to specific
implementations at this point. We can perhaps then start capturing the
pros and cons against each option in the page rather than loosing them
in the email chain.


I've added a column to the page called flat that captures my 
current proposal (native keys, surrogateescape values, byte stream 
in, strict bytes-only for all outputs).  This seems to me an optimum 
balance between:


* Verifiability (especially *composable* verifiability)
* Low cognitive overhead (i.e., fewest things to remember)
* Low amount of finger-typing and fewer conversions

But I certainly could be convinced otherwise by example or argument.

(One other thing I consider a plus for this approach, btw: os.environ 
is still largely usable as a WSGI environ in the CGI case.  This 
isn't so much a valuable thing in itself, as that it's an indicator 
of low complexity and cognitive overhead.) 


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-28 Thread Georg Brandl
Am 28.08.2010 13:13, schrieb Armin Ronacher:
 Hi,
 
 On 2010-08-28 1:04 PM, Georg Brandl wrote:
 Let me just throw in here that it's *NOT* too late to do something about
 Python 3.2.  It is not even in beta state yet, and I am very willing to
 introduce the changes to make web programming work again, or even hold
 up 3.2 for a bit if you need more time.
 Sorry if I was not clear.  I was talking about only wsgiref here.  And 
 for that to be adapted to a possible new WSGI specification we would 
 need more time than you can hold the 3.2 release I think.

That is certainly true :)

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread P.J. Eby

At 06:05 PM 8/27/2010 +0200, Christoph Zwerschke wrote:

 For instance,

user = 'özkan'.encode('latin1')
if user in request.META.get('REMOTE_USER', b'').lower():

will not work it the user has logged in as 'Özkan'.


Isn't that a problem with code that does this now? 


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Robert Brewer
Paul Davis wrote:
  Since the major stumbling block, irrespective of other changes,
  to any sort of agreement is still bytes vs unicode

 I ran into this while I was attempting to put together enough code to
 play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has
 deftly pointed out, its a pretty big pain in the rear.
 
 Specifically, if we specify that all keys in the environ dictionary
 are byte strings, then there's a noticeable amount of pain in trying
 to write code that runs on both platforms. I object to 2to3.py on
 religious grounds, so when I was implementing this I was doing so with
 code that would run unmodified on both 2 and 3.

Religion is what gets us into this mess. Pragmatism will get us out. We
have two options:

 1. Continue to try to write code that runs unmodified on Python 2 and
3, or that runs when 2to3 is applied. There is a morass of corner cases
and state machines that behave differently depending on when you look at
them lurking here. You can all see where that is getting us: nowhere. By
the time you all discover how to write a spec that deals with all the
pain points which 2to3 introduces, Python 2 will be dead and you will
have wasted your time.
 2. Write a Python 3 version of your code. Yes, it's more drudge work.
Suck it up. To ameliorate that, make the Python 3 version the default as
soon as possible. Deprecate the Python 2 branch. Backport features as
necessary to the Python 2 branch (just as Python itself has been doing,
if you notice). If you do that, we can write a WSGI for Python 3 now
that doesn't suffer from any of the complexities of 2to3.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Paul Davis
On Fri, Aug 27, 2010 at 4:04 PM, Robert Brewer fuman...@aminus.org wrote:
 Paul Davis wrote:
  Since the major stumbling block, irrespective of other changes,
  to any sort of agreement is still bytes vs unicode

 I ran into this while I was attempting to put together enough code to
 play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has
 deftly pointed out, its a pretty big pain in the rear.

 Specifically, if we specify that all keys in the environ dictionary
 are byte strings, then there's a noticeable amount of pain in trying
 to write code that runs on both platforms. I object to 2to3.py on
 religious grounds, so when I was implementing this I was doing so with
 code that would run unmodified on both 2 and 3.

 Religion is what gets us into this mess. Pragmatism will get us out. We
 have two options:

  1. Continue to try to write code that runs unmodified on Python 2 and
 3, or that runs when 2to3 is applied. There is a morass of corner cases
 and state machines that behave differently depending on when you look at
 them lurking here. You can all see where that is getting us: nowhere. By
 the time you all discover how to write a spec that deals with all the
 pain points which 2to3 introduces, Python 2 will be dead and you will
 have wasted your time.
  2. Write a Python 3 version of your code. Yes, it's more drudge work.
 Suck it up. To ameliorate that, make the Python 3 version the default as
 soon as possible. Deprecate the Python 2 branch. Backport features as
 necessary to the Python 2 branch (just as Python itself has been doing,
 if you notice). If you do that, we can write a WSGI for Python 3 now
 that doesn't suffer from any of the complexities of 2to3.


 Robert Brewer
 fuman...@aminus.org


No. What got us into this mess was the idea that it would be a good to
silently type cast unicode objects into bytes. Perhaps I could've been
more clear on avoiding 2to3 though. I wanted to avoid coding any of
its oddities into a reference implementation because as you point out
it's just a source of confusion.

I'd like to point out that the code I posted works on both 2.x and
3.x. Its fairly easy to implement the backwards compatible code in
Python. There's nothing near the level of requiring a
branched/back-port strategy. Not to mention, a branched reference
implementation is bit of a contradiction in terms. The hard part is
figuring out a specification that doesn't suck when people try and
implement it on multiple interpreters.

Also, I think you're overestimating the rate at which people are going
to be converting to Python 3. I still have people ask for Python 2.4
support. I wouldn't be the least bit surprised if there's a WSGI 3
before we deprecate 2.x support.

HTH,
Paul Davis
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Armin Ronacher

Hi,

On 2010-08-27 6:05 PM, Christoph Zwerschke wrote:
 Btw, another problem with this is that the lower() method does not know
 that it has to use latin1 when lowercasing.
That is not a problem insofar that case insensitive HTTP tokens are 
limited to ASCII only.



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-20 Thread Graham Dumpleton
On Tuesday, July 20, 2010, Etienne Robillard e...@gthcfoundation.org wrote:










 AFAICT, the main difference is that under a
 bytes-only regime, the changes should be more consistent/mechanical, i.e.,
 able to be performed by relatively superficial code inspection.



 The problem in all these discussions is that practically no one has
 been prepared to actually sit down and attempt to migrate any
 significant code over to any of these proposals and Python 3.0.

 The only notable attempt is the work Robert Brewer did with CherryPy.
 Ultimately though I don't think the CherryPy case tells us much as it
 simple translates the interface in to an internal way of doing things.
 The true litmus test will be the conversion of any framework which
 keeps the WSGI interface exposed, with it being used as a means of
 composing together components to make a stack.

 Until someone has done that we have absolutely no evidence one way or
 the other as to what proposal is easier or even viable given potential
 short comings, or otherwise, in the Python language and standard
 libraries.

 It is a chicken and egg problem though in that I would say practically
 everyone doesn't want to do anything until the WSGI specification has
 been updated as they don't want to waste their time. You cant though
 update the specification without truly knowing whether a particular
 approach will work and to do that you have no choice but to actually
 try it.


 Hi Graham et al,

 One could maybe write a migration app for porting
 WSGI 1 apps to WSGI 2, in the same way 2to3.py was written.

 That's how at least I hoped to migrate notmm to Python 3. A switch
 could be used
 also to enable/disable bytes or text-mode only for HTTP headers
 parsing...

 Is there no such tools yet ready to slowly start moving ahead with
 WSGI 2 ? I recognize it's a chicken and egg problem but I don't think
 its necessary for framework authors to migrate to Python 3 in an
 attempt to solve mistery encoding
 errors affecting Windows platforms...

The issues are not Windows specific. You are misunderstanding past
comments if you believe that.

The purpose to actually trying it is to work out how viable bytes
everywhere and/or users dealing with % encoding is. If dealing with
bytes everywhere proves to be easy then great, going that way may be
best idea. If it is a PITA as some have said dealing with bytes is in
Python 3.0 then we will know rather than it being speculation at this
point.

Graham

 A  easy-to-follow roadmap to WSGI
 2  and writing
 related development tools should be a more effective way to port
 frameworks (to WSGI 2) and stick with Python 2 if they want so! ;-)

 my 2 cents,

 E
 --
 Etienne Robillard
 Green Tea Hackers Club

 E-mail: e...@gthcfoundation.org
 Work phone: 1 (514) 962-7703
 Website:https://gthc.org/

 During times of universal deceit, telling the truth becomes a revolutionary 
 act. -- George Orwell




___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-19 Thread Aaron Watters
I'm still in denial regarding Python 3 generally speaking,
but it looks like something important is going on here.  Could
someone summarize the main points (intelligible to a Python 2
troglodyte)?

thanks in advance,  -- Aaron Watters

===
% man less
less is more.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-19 Thread Graham Dumpleton
Go back through my blog and read some of the posts there so you have
some of the history. Recent discussions build on some of the stuff
there and I don't think anyone has the time to keep explaining all
this to every new person who comes along.

Graham

On Monday, July 19, 2010, Aaron Watters arw1...@yahoo.com wrote:
 I'm still in denial regarding Python 3 generally speaking,
 but it looks like something important is going on here.  Could
 someone summarize the main points (intelligible to a Python 2
 troglodyte)?

 thanks in advance,  -- Aaron Watters

 ===
 % man less
 less is more.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-19 Thread Paul Davis
On Tue, Jul 20, 2010 at 12:37 AM, Graham Dumpleton
graham.dumple...@gmail.com wrote:
 On 19 July 2010 03:19, P.J. Eby p...@telecommunity.com wrote:
 At 01:01 PM 7/18/2010 +1000, Graham Dumpleton wrote:

 This is on the basis that if people are going to have to rewrite their
 code
 a fair bit to handle bytes everywhere,

 What you mean by rewrite their code a fair bit, and who is it that you
 think will have to do this?
 Or, more precisely, how is that any different from the text or
 text-and-bytes proposals?

 My comments are based on the mood I have got from listening to
 discussions here on this list and discussions in other forums and irc
 channels. To me there appears to be a tendency towards people thinking
 that having bytes everywhere will be harder to deal with than the text
 proposal.

 AFAICT, the main difference is that under a
 bytes-only regime, the changes should be more consistent/mechanical, i.e.,
 able to be performed by relatively superficial code inspection.

 The problem in all these discussions is that practically no one has
 been prepared to actually sit down and attempt to migrate any
 significant code over to any of these proposals and Python 3.0.

 The only notable attempt is the work Robert Brewer did with CherryPy.
 Ultimately though I don't think the CherryPy case tells us much as it
 simple translates the interface in to an internal way of doing things.
 The true litmus test will be the conversion of any framework which
 keeps the WSGI interface exposed, with it being used as a means of
 composing together components to make a stack.

 Until someone has done that we have absolutely no evidence one way or
 the other as to what proposal is easier or even viable given potential
 short comings, or otherwise, in the Python language and standard
 libraries.

 It is a chicken and egg problem though in that I would say practically
 everyone doesn't want to do anything until the WSGI specification has
 been updated as they don't want to waste their time. You cant though
 update the specification without truly knowing whether a particular
 approach will work and to do that you have no choice but to actually
 try it.

 And before you argue that the hosting mechanisms haven't been there to
 do that I will point out that mod_wsgi specifically implemented a way
 of being able to selectively say whether bytes or text was passed
 through. That code for bytes support sat there for six months or more
 and there was zero interest expressed to me by anyone in using it as a
 basis of some actual attempts at migrating existing code as a test. In
 the end it got thrown out due to that lack of interest and due to it
 holding up a new release of mod_wsgi.

 Distinct from mod_wsgi, it also wouldn't be that hard for interested
 people to modify wsgiref to implement the different proposals. I
 stress again that no one seems prepared to do that and again even if
 it was done, who is then going to try and use it.

 Thus we all just sit here on the fence waiting for others to do
 something, pushing our particular ideas and occasionally flip flopping
 between those ideas as well.

 Finally and for the record, I will not be modifying mod_wsgi to change
 it in anyway now until I see a separate proof of concept WSGI server
 and a decent sized framework ported to it. So yes I am going to sit on
 the fence as well, but that is because I have been burned in the past
 in putting in effort on this only see it go now where. I am not going
 to waste my time again like that.

 Graham
 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe: 
 http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com


Just a quick note. I've started working on a project to try and get a
version of wsgi running on 2.x and 3.x. I've been needing a reason to
start using 3.1 for sometime and this thread has managed to spur me
into action.

To be clear, I'm coming at this from the point of view that as long as
there are breaking changes, I might as well make things really broken.
So I'll be incorporating ideas from [1] as well as other bits of
trivia I've picked up. I realize this will lower the probability that
anything comes of this work, but I reckon it'll at least be some code
to discuss.

My current plan is to get a reference implementation with some tests
that runs on 2.x and 3.x. Once I get there I'll try porting WebOb [2]
and maybe Django [3] (depending on the progress of its port [4]). If I
get that far I'll probably make a fork of Gunicorn [5] so that there's
a whole stack that runs on both 2.x and 3.x.

Optimistically, I'd like to have enough code to show the reference
implementation and tests by this weekend. Although, I'm still learning
3.x differences and work arounds so I could fail miserably.

Paul J. Davis

[1] http://wsgi.org/wsgi/WSGI_2.0
[2] http://pythonpaste.org/webob/
[3] http://www.djangoproject.com/
[4] 

Re: [Web-SIG] WSGI for Python 3

2010-07-18 Thread P.J. Eby

At 01:01 PM 7/18/2010 +1000, Graham Dumpleton wrote:

This is on the basis that if people are going to have to rewrite their code
a fair bit to handle bytes everywhere,


What you mean by rewrite their code a fair bit, and who is it that 
you think will have to do this?


Or, more precisely, how is that any different from the text or 
text-and-bytes proposals?  AFAICT, the main difference is that under 
a bytes-only regime, the changes should be more 
consistent/mechanical, i.e., able to be performed by relatively 
superficial code inspection.




My personal opinion is that if you are going to go bytes everywhere,
then you may as well throw out the complete WSGI specification as it
stands now and fix all the other problems with the specification.


That may not be a bad idea; I'm certainly in favor of going ahead and 
ditching start_response/write while we're at it.  The requirement to 
change both the entry and exit points to match the calling convention 
also seems to provide an ideal opportunity to insert any necessary 
encoding or decoding operations.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Graham Dumpleton
On Saturday, July 17, 2010, Gustavo Narea m...@gustavonarea.net wrote:
 Hello,

 Ian said:
 Having two ways of expressing the same information will lead to bugs
 related to which data is canonical.  If an application is using
 SCRIPT_NAME/PATH_INFO and then updates those values in any way, and
 wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be
 weird bugs and code will disagree about which one is correct.  Since %2f
 can exist in the raw versions, there isn't even a way to chunk the two
 variables in the same way.

 I can't agree more.

 I would propose the following, and excuse me in advance if this has already
 been proposed and discarded -- I've tried to follow this topic on the mailing
 list over the past few months, until it becomes an endless discussion.

 I think only the raw values should be available. Even if a middleware changes
 them, it must put them with raw values. And because you cannot change those
 values without knowing what encoding the request uses, the character encoding
 *must* be present.

 I know that sounds easy but it's not, because browsers don't specify the
 charset in the Content-Type and instead they generate a new request using the
 charset from the previous response. So the charset is unknown to the
 server/gateway and the middleware stack.

 So, what we could do is introduce a mandatory variable called, say,
 wsgi.charset, and would be used as follows:

Something like this was proposed before, but it only applied to the
keys that mattered, specifically PATH_INFO and maybe QUERY_STRING,
(the latter of which this discussion has been ignoring and I can't
remember how we worked out before it should be treated). It didn't
cover SCRIPT_NAME as as I indicated before, the encoding of that is
really dictated by the server and not the application for the initial
value at least.

The idea was that the server would pass them as Latin 1 and set the
encoding key. If a consumer of it didn't like the encoding it was in,
it would convert it back to bytes and then to what it wants and update
the encoding key to what it used. Thus you had a hint available to
allow reliable transcoding. This proposal didn't get acceptance
either.

Graham

  - It MUST be set by the server or gateway on every request.
  - Every middleware or application that reads or writes these values MUST use
 the charset specified in wsgi.charset.
  - If a server, gateway, middleware or application wants to change the charset
 and it is possible*, it MUST convert the *entire* request into that charset
 and update wsgi.charset accordingly.
  - When the charset is not specified in the HTTP request, UTF-8 MUST be
 assumed by the server/gateway. Unless another default charset has been
 specified by the user.

 I think/hope that will solve all the problems.

 What happens when a WSGI application is actually made up two WSGI applications
 and they send the responses in different charsets? If it's not possible to
 configure them so that they both use the same charsets, then one of them would
 have to be wrapped by a middleware which:
  - On egress, converts the responses using the charset used by the other
 application.
  - On ingress, if the charset is not specified in the request, it will assume
 it's the one used by the other application, and thus it will convert the
 request using the charset supported by the wrapped application.

 It would look like this:
 ===
 def application(environ, start_response):
     if environ.startswith(/trac/):
         # Say Trac only supports Latin-1 and we want responses to use UTF-8:
         app = trac.web.main.dispatch_request
         app = CharsetNormalizer(app, response=latin-1, request=utf8)
     else:
         # myapp uses UTF-8
         app = myapp
     return app(environ, start_response)
 ===

 Then there's the string vs bytes issue. Bytes would be the natural choice to
 represent these raw values, but it would probably cause more trouble than they
 solve. So, I think they should be strings that contain the the ASCII raw
 encoded values (i.e., str on both versions of Python).

 What do you think about this? Again, sorry if this has been discarded before!
 :)

 * For example, you can always convert Latin-1 to UTF-8, but not every UTF-8
 string can be converted to Latin-1.
 --
 Gustavo Narea xri://=Gustavo.
 | Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |
 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe: 
 http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Ian Bicking
On Sat, Jul 17, 2010 at 12:38 AM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

 On Friday, July 16, 2010, And Clover and...@doxdesk.com wrote:
  On 07/14/2010 06:43 AM, Ian Bicking wrote:
 
 
  There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
  and HTTP_COOKIE.
 
 
  (And of those, PATH_INFO is the only one that really matters, in that
 no-one really uses non-ASCII script filenames,

 FWIW, I had to go to a lot of trouble to allow non ASCII in final
 SCRIPT_NAME in mod_wsgi. Specifically using AddHandler directive in
 Apache means a file system path can make up part of SCRIPT_NAME. I had
 someone who was specifically using Russian in a WSGI script file name
 and because with AddHandler that becomes part of SCRIPT_NAME you had
 to cater for it. Anyway this was more of a Windows issue in having to
 use special file system functions to deal with fact that on Windows
 filesystem paths aren't UTF-8 but something else.

 What this does highlight though is that although one can talk about
 passing raw script name through to application, that isn't necessarily
 right as it isn't the application that dictates what encoding may be
 used but the web server which is performing the mapping of that part
 of the original URL path to a potential filesystem resource, or
 alternatively where file based configuration for mount point, the
 encoding of the web sever configuration file.


This is an Apache-specific issue.  It definitely doesn't apply to
paste.httpserver, I doubt CherryPy or wsgiref.  I don't really know how
Nginx or other servers work.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Armin Ronacher

Hi,

On 7/17/10 9:15 AM, Ian Bicking wrote:

This is an Apache-specific issue.  It definitely doesn't apply to
paste.httpserver, I doubt CherryPy or wsgiref.  I don't really know how
Nginx or other servers work.

This will be an issue for every server that...

 * supports unicode filesystems
 * decides to do internal mapping based on URIs and not IRIs

In fact, this will be an issue for things like middlewares that want to 
map applications to paths.  In fact, this already is an issue on Python 
2 already, just that nobody cares.



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Armin Ronacher

Hi,

On 7/17/10 12:57 PM, Armin Ronacher wrote:

In fact, this will be an issue for things like middlewares that want to
map applications to paths. In fact, this already is an issue on Python 2
already, just that nobody cares.

s/applications/serving static files from folders/


Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread chris . dent

On Fri, 16 Jul 2010, P.J. Eby wrote:


At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
There should be one, and preferably *only* one, obvious way to do it.

And given that HTTP is inherently a bunch of bytes, bytes is the one obvious 
way.


I think this makes sense. The thing which is assembling the WSGI
environment should do bytes and things further down the stack can
deal with it as they like. This aligns well with how I like to think
about such stuff: bytes on the outside, unicode on the inside.

Given that app and frameworks developers can throw whatever keys
they like back into the environment, they can cope as they like.[1]

What would be horrible is if there need to be multiple coping
strategies. Better to be able to say, Oh it doesn't work? Try this
way to cope: remember it is bytes.

However, unless I'm misreading the thread, the bytes issue isn't
really the bone of contention. People seem okay with bytes as long
as specifc points of pain are addressed, such as:

* What's my PATH_INFO and SCRIPT_NAME?
* This server, which hosts, but is not, the WSGI environment builder
  doesn't play well with this model.
* Some others I can't remember now.

It seems then that perhaps a way forward is to say: Okay, it's gonna
be bytes. Now, given that, how do we deal with these other issues,
which perhaps can be recast and encapsulated to be considered
orthogonal to the bytes/not-bytes debate.

Because we _know_ that any choice is going to come with costs, but
as things have dragged on, the lack of choice thus far is starting
to have as much of a cost as the costs that are wanting to be
resolved.

[1] I not expecting or hoping for  porting/migrating to Python 3 to
be simple/automatic/easy, but perhaps I'm cruel.
--
Chris Dent  http://burningchrome.com/~cdent/
  [...]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Alan Kennedy
[PJ Eby]
 IOW, the bytes/string discussion on Python-dev has kind of led me to realize
 that we might just as well make the *entire* stack bytes (incoming and
 outgoing headers *and* streams), and rewrite that bit in PEP 333 about using
 str on Python 3000 to say we go with bytes on Python 3+ for everything
 that's a str in today's WSGI.

 Or, to put it another way, if I knew then what I know *now*, I think I'd
 have written the PEP the other way around, such that the use of 'str' in
 WSGI would be a substitute for the future 'bytes' type, rather than viewing
 some byte strings as a forward-compatible substitute for Py3K unicode
 strings.

 Of course, this would be a WSGI 2 change, but IMO we're better off making a
 clean break with backward compatibility here anyway, rather than having
 conditionals.  Also, going with bytes everywhere means we don't have to
 rename SCRIPT_NAME and PATH_INFO, which in turn avoids deeper rewrites being
 required in today's apps.

+1

 (Hm.  Although actually, I suppose we *could* just borrow the time machine
 and pretend that WSGI called for byte-strings everywhere all along...)

+1/0

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread William Dode
On 17-07-2010, chris.d...@gmail.com wrote:
 On Fri, 16 Jul 2010, P.J. Eby wrote:

 At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
 There should be one, and preferably *only* one, obvious way to do it.

 And given that HTTP is inherently a bunch of bytes, bytes is the one obvious 
 way.

 I think this makes sense. The thing which is assembling the WSGI
 environment should do bytes and things further down the stack can
 deal with it as they like. This aligns well with how I like to think
 about such stuff: bytes on the outside, unicode on the inside.

 Given that app and frameworks developers can throw whatever keys
 they like back into the environment, they can cope as they like.[1]

 What would be horrible is if there need to be multiple coping
 strategies. Better to be able to say, Oh it doesn't work? Try this
 way to cope: remember it is bytes.

This thread is difficult to follow, but this make sense to me also. KISS 

-- 
William Dodé - http://flibuste.net
Informaticien Indépendant

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Bill Janssen
Chris McDonough chr...@plope.com wrote:

 On Sat, 2010-07-17 at 01:33 +0200, Armin Ronacher wrote:
  Hi,
  
  On 7/17/10 1:20 AM, Chris McDonough wrote:
Let me know if I'm missing something.
  The only thing you miss is that the bytes type of Python 3 is badly 
  supported in the stdlib (not an issue if we reimplement everything in 
  our libraries, not an issue for me) and that the bytes type has no 
  string formattings which makes us do the encode/decode dance in our own 
  implementation so of the missing stdlib functions.
 
 This is why the docs mention bytes with benefits instead (like the
 Python 2 str type). The existence of such a type would be the result
 of us lobbying for its inclusion into some future Python 3, or at least
 the result of lobbying for a String ABC that would allow us to define
 our own.

I think the most effective way to lobby here would be to provide the
String ABC and an implementation of encoded strings, i.e. strings with
an internal representation that's a byte sequence in a particular
encoding.

Bill
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Graham Dumpleton
On 17 July 2010 22:30,  chris.d...@gmail.com wrote:
 On Fri, 16 Jul 2010, P.J. Eby wrote:

 At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
 There should be one, and preferably *only* one, obvious way to do it.

 And given that HTTP is inherently a bunch of bytes, bytes is the one
 obvious way.

 I think this makes sense. The thing which is assembling the WSGI
 environment should do bytes and things further down the stack can
 deal with it as they like. This aligns well with how I like to think
 about such stuff: bytes on the outside, unicode on the inside.

 Given that app and frameworks developers can throw whatever keys
 they like back into the environment, they can cope as they like.[1]

 What would be horrible is if there need to be multiple coping
 strategies. Better to be able to say, Oh it doesn't work? Try this
 way to cope: remember it is bytes.

 However, unless I'm misreading the thread, the bytes issue isn't
 really the bone of contention.

Actually it still is. There are still two competing camps. Some want
text, some want bytes. The whole discussion started purely around
basis of progressing the text based proposal. As usual, those wanting
bytes step up and we get two interwoven discussions which if you don't
know the history can be hard to follow.

My personal opinion is that if you are going to go bytes everywhere,
then you may as well throw out the complete WSGI specification as it
stands now and fix all the other problems with the specification. This
is on the basis that if people are going to have to rewrite their code
a fair bit to handle bytes everywhere, you may as well structurally
change the WSGI interface API as well to address other problems.

Anyway, it seems to be moot at this point as some believe that bytes
everywhere with Python language as it stands, plus state of stdlib
would make use of bytes everywhere rather unmanageable, which is where
ebytes comes in. Thus bytes everywhere doesn't sound like a short term
solution and requires changes in Python itself to make it viable.

Graham

 People seem okay with bytes as long
 as specifc points of pain are addressed, such as:

 * What's my PATH_INFO and SCRIPT_NAME?
 * This server, which hosts, but is not, the WSGI environment builder
  doesn't play well with this model.
 * Some others I can't remember now.

 It seems then that perhaps a way forward is to say: Okay, it's gonna
 be bytes. Now, given that, how do we deal with these other issues,
 which perhaps can be recast and encapsulated to be considered
 orthogonal to the bytes/not-bytes debate.

 Because we _know_ that any choice is going to come with costs, but
 as things have dragged on, the lack of choice thus far is starting
 to have as much of a cost as the costs that are wanting to be
 resolved.

 [1] I not expecting or hoping for  porting/migrating to Python 3 to
 be simple/automatic/easy, but perhaps I'm cruel.
 --
 Chris Dent                      http://burningchrome.com/~cdent/
                              [...]
 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe:
 http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Chris McDonough
On Fri, 2010-07-16 at 23:38 -0500, Ian Bicking wrote:
 On Fri, Jul 16, 2010 at 9:43 PM, Chris McDonough chr...@plope.com
 wrote:
 
  Nah, not nearly that hard:
 
  path_info =
 
 
 urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
 
  I don't see the problem?  If you want to distinguish %2f
 from /, then
  you'll do it slightly differently, like:
 
  path_parts = [
  urllib.parse.unquote_to_bytes(p).decode('UTF-8')
  for p in environ['wsgi.raw_path_info'].split('/')]
 
  This second recipe is impossible to do currently with WSGI.
 
  So... before jumping to conclusions, what's the hard part
 with using
  text?
 
 
 It's extremely hard to swallow Python 3's current disregard
 for the
 primacy of bytes at I/O boundaries.  I'm trying, but I can't
 help but
 feel that the existence of an API like unquote_to_bytes is
 more
 symptom treatment than solution.  Of course something that
 unquotes a
 URL segment unquotes it into bytes; it's the only sane default
 because
 URL segments found in URLs on the internet are bytes.
 
 Yes, URL quoted strings should decode to bytes, though arguably it is
 reasonable to also use the very reasonable UTF-8 default that
 urllib.parse.quote/unquote uses.  So it's really just a question of
 names, should be quote_to_string or quote_to_bytes that name.  Which
 honestly... whatever.

After some careful consideration, I realize I'm only able to offer stop
energy regarding the WSGI-as-text proposal, so I'll bow out of any
maillist conversation about it for now.

- C





___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread And Clover

On 07/14/2010 06:43 AM, Ian Bicking wrote:


There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
and HTTP_COOKIE.


(And of those, PATH_INFO is the only one that really matters, in that 
no-one really uses non-ASCII script filenames, and non-ASCII characters 
in Cookie/Set-Cookie are still handled so differently/brokenly across 
browsers that you can't rely on them at all.)



* I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
exclusively with encoded versions


For compatibility with existing apps, how about keeping the existing 
SCRIPT_NAME and PATH_INFO as-is (with all their problems), and 
specifying that the new 'raw' versions (whatever they are called) are 
added only if they really are raw, not reconstructed.


Then existing scripts that don't care about non-ASCII and slashes can 
carry on as before, and for apps that do care about them, they'll be 
able to be *sure* the input is correct. Or they can fall back to 
PATH_INFO when not present, and avoid producing these kind of URLs in 
response.


(Or an app might have enough special knowledge to try other fallback 
mechanisms when the raw versions are unavailable, such as REQUEST_URI or 
Windows ctypes envvar hacking. But if the server/gateway has good raw 
paths it shouldn't bother use these.)


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Friday, July 16, 2010, And Clover and...@doxdesk.com wrote:
 On 07/14/2010 06:43 AM, Ian Bicking wrote:


 There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
 and HTTP_COOKIE.


 (And of those, PATH_INFO is the only one that really matters, in that no-one 
 really uses non-ASCII script filenames, and non-ASCII characters in 
 Cookie/Set-Cookie are still handled so differently/brokenly across browsers 
 that you can't rely on them at all.)


 * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
 exclusively with encoded versions


 For compatibility with existing apps, how about keeping the existing 
 SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying 
 that the new 'raw' versions (whatever they are called) are added only if they 
 really are raw, not reconstructed.

 Then existing scripts that don't care about non-ASCII and slashes can carry 
 on as before, and for apps that do care about them, they'll be able to be 
 *sure* the input is correct. Or they can fall back to PATH_INFO when not 
 present, and avoid producing these kind of URLs in response.

 (Or an app might have enough special knowledge to try other fallback 
 mechanisms when the raw versions are unavailable, such as REQUEST_URI or 
 Windows ctypes envvar hacking. But if the server/gateway has good raw paths 
 it shouldn't bother use these.)

Which is exactly what I have suggested in the past. If you do that,
one has to ask the question, given it is more convention than
anything, why it isn't just a x-wsgiorg extension specification like
routing args is rather than a core part of the WSGI specification.
Servers could still implement the extension as they are able to and
don't have to worry about changing core specification then and what we
have now stands.

Graham

 --
 And Clover
 mailto:a...@doxdesk.com
 http://www.doxdesk.com/
 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe: 
 http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread And Clover

On 07/16/2010 12:07 PM, Graham Dumpleton wrote:


If you do that, one has to ask the question, given it is more convention than
anything, why it isn't just a x-wsgiorg extension specification


Yes, fine by me either way.

I just want to be able to say this application can use Unicode paths 
when run on a server/gateway that supports standardised feature X, 
rather than the current mess of you can have Unicode paths if you use 
one of the dozen different server-and-platform combinations we've 
specifically coded workarounds for.


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread P.J. Eby

At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
And this doesn't help with Python 3: either we have byte values of 
SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I 
think bytes will be more awkward to port to than text, and 
inconsistent with other WSGI values.


OTOH, it has the tremendous advantage of pushing the encoding 
question onto the app (or framework) developer...  who's really the 
only one who can make the right decision for their particular 
application.  And personally, I'd rather have clear boundaries 
between text and bytes, such that porting (even if tedious or 
awkward) is *consistent*, and clear as to when you're finished, not, 
oh, did I check to make sure I converted SCRIPT_NAME and 
PATH_INFO...  not just in my app code, but in all the library code I 
call *from* my app?


IOW, the bytes/string discussion on Python-dev has kind of led me to 
realize that we might just as well make the *entire* stack bytes 
(incoming and outgoing headers *and* streams), and rewrite that bit 
in PEP 333 about using str on Python 3000 to say we go with bytes 
on Python 3+ for everything that's a str in today's WSGI.


Or, to put it another way, if I knew then what I know *now*, I think 
I'd have written the PEP the other way around, such that the use of 
'str' in WSGI would be a substitute for the future 'bytes' type, 
rather than viewing some byte strings as a forward-compatible 
substitute for Py3K unicode strings.


Of course, this would be a WSGI 2 change, but IMO we're better off 
making a clean break with backward compatibility here anyway, rather 
than having conditionals.  Also, going with bytes everywhere means we 
don't have to rename SCRIPT_NAME and PATH_INFO, which in turn avoids 
deeper rewrites being required in today's apps.


(Hm.  Although actually, I suppose we *could* just borrow the time 
machine and pretend that WSGI called for byte-strings everywhere 
all along...)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Stephan Richter
On Friday, July 16, 2010, Ian Bicking wrote:
 We could make everything bytes and be done with it, but it would make it
 much harder to port Python 2 WSGI code to Python 3.

I think this might be best having seen all of the discussion. One could easily 
write a compatibility middleware that makes porting Python 2 applications easy 
or even completely transparent (from a WSGI spec point of view).

Regards,
Stephan
-- 
Entrepreneur and Software Geek
Google me. Zope Stephan Richter
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Gustavo Narea
Hello,

Ian said:
 Having two ways of expressing the same information will lead to bugs
 related to which data is canonical.  If an application is using
 SCRIPT_NAME/PATH_INFO and then updates those values in any way, and
 wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be
 weird bugs and code will disagree about which one is correct.  Since %2f
 can exist in the raw versions, there isn't even a way to chunk the two
 variables in the same way.

I can't agree more.

I would propose the following, and excuse me in advance if this has already 
been proposed and discarded -- I've tried to follow this topic on the mailing 
list over the past few months, until it becomes an endless discussion.

I think only the raw values should be available. Even if a middleware changes 
them, it must put them with raw values. And because you cannot change those 
values without knowing what encoding the request uses, the character encoding 
*must* be present.

I know that sounds easy but it's not, because browsers don't specify the 
charset in the Content-Type and instead they generate a new request using the 
charset from the previous response. So the charset is unknown to the 
server/gateway and the middleware stack.

So, what we could do is introduce a mandatory variable called, say, 
wsgi.charset, and would be used as follows:
 - It MUST be set by the server or gateway on every request.
 - Every middleware or application that reads or writes these values MUST use 
the charset specified in wsgi.charset.
 - If a server, gateway, middleware or application wants to change the charset 
and it is possible*, it MUST convert the *entire* request into that charset 
and update wsgi.charset accordingly.
 - When the charset is not specified in the HTTP request, UTF-8 MUST be 
assumed by the server/gateway. Unless another default charset has been 
specified by the user.

I think/hope that will solve all the problems.

What happens when a WSGI application is actually made up two WSGI applications 
and they send the responses in different charsets? If it's not possible to 
configure them so that they both use the same charsets, then one of them would 
have to be wrapped by a middleware which:
 - On egress, converts the responses using the charset used by the other 
application.
 - On ingress, if the charset is not specified in the request, it will assume 
it's the one used by the other application, and thus it will convert the 
request using the charset supported by the wrapped application.

It would look like this:
===
def application(environ, start_response):
if environ.startswith(/trac/):
# Say Trac only supports Latin-1 and we want responses to use UTF-8:
app = trac.web.main.dispatch_request
app = CharsetNormalizer(app, response=latin-1, request=utf8)
else:
# myapp uses UTF-8
app = myapp
return app(environ, start_response)
===

Then there's the string vs bytes issue. Bytes would be the natural choice to 
represent these raw values, but it would probably cause more trouble than they 
solve. So, I think they should be strings that contain the the ASCII raw 
encoded values (i.e., str on both versions of Python).

What do you think about this? Again, sorry if this has been discarded before! 
:)

* For example, you can always convert Latin-1 to UTF-8, but not every UTF-8 
string can be converted to Latin-1.
-- 
Gustavo Narea xri://=Gustavo.
| Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby p...@telecommunity.com wrote:

 At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:

 And this doesn't help with Python 3: either we have byte values of
 SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think
 bytes will be more awkward to port to than text, and inconsistent with other
 WSGI values.


 OTOH, it has the tremendous advantage of pushing the encoding question onto
 the app (or framework) developer...  who's really the only one who can make
 the right decision for their particular application.  And personally, I'd
 rather have clear boundaries between text and bytes, such that porting (even
 if tedious or awkward) is *consistent*, and clear as to when you're
 finished, not, oh, did I check to make sure I converted SCRIPT_NAME and
 PATH_INFO...  not just in my app code, but in all the library code I call
 *from* my app?

 IOW, the bytes/string discussion on Python-dev has kind of led me to
 realize that we might just as well make the *entire* stack bytes (incoming
 and outgoing headers *and* streams), and rewrite that bit in PEP 333 about
 using str on Python 3000 to say we go with bytes on Python 3+ for
 everything that's a str in today's WSGI.


This was my first intuition too, until I started thinking in more detail
about the particular values involved.  Some obviously are textish, like
environ['SERVER_NAME'].  Not a very useful value, but definitely text.

Basically all the internal strings are textish, so we're left with:

wsgi.url_scheme
SCRIPT_NAME/PATH_INFO
QUERY_STRING
HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
response status
response headers (name and value)

And there's a few things like REMOTE_USER that are kind of in the middle.
Everyone is in agreement that bodies should be bytes.

One initial problem is that the Python 3 stdlib handles bytes poorly, so for
instance there's no good way to reconstruct the URL using the stdlib.  That
explains certain tensions, but I think we should ignore that, and in fact
that's what Python-Dev seemed to say pretty clearly.

Now, the other keys:

wsgi.url_scheme: clearly ASCII

SCRIPT_NAME/PATH_INFO: often UTF-8, could be no encoding, could be some old
legacy encoding.
raw request path: should be ASCII (non-ASCII should be URL-encoded).  URL
encoding happens at the byte layer, so a server could reasonably URL encode
any non-ASCII characters without imposing any encoding.

QUERY_STRING: should be ASCII, same as raw request path

headers: Most are ASCII.  Latin1 is a reasonable fallback and suggested by
the specification.  The spec also implies you have use the RFC2047 inline
encoding (like ?iso-8859-1?q?some=20text?=), but nothing supports this and
supporting it would probably be a bad idea for security reasons.  The
Atompub spec (reasonably modern) specifically says Title headers should be
encoded with RFC2047 (if they are not ISO-8859-1):
http://tools.ietf.org/html/draft-ietf-atompub-protocol-08#page-17 --
decoding this kind of encoding at the application layer seems reasonable to
me.

cookie header: this specific header can easily have multiple encodings, as
the browser encodes data then treats it as opaque bytes, so a cookie can be
set via UTF-8 one place, Latin1 another, and those coexist in one header.
That is, there is no real encoding and this should be treated as bytes.
(Latin1 is an approximation of bytes... a spotty way to treat bytes, but
entirely workable.)

response status: I believe the spec says this must be Latin1/ISO-8859-1.  In
practice it is almost always ASCII, and since it is not user-visible it's
not something that really needs localization.

response headers: the spec implies Latin1, in practice the Set-Cookie header
is bytes (since interoperation with wonky legacy systems is not uncommon).
I'm not sure of any other exceptions?


So... to me it seems pretty reasonable for HTTP specifically that text can
work.  And if feels weird that, say, environ['SERVER_NAME'] be text and
environ['HTTP_HOST'] not, and I don't know what environ['REMOTE_ADDR']
should be in that mode.  And it would also be weird if
environ['SERVER_NAME'] was bytes.

In the past when we've gotten down to specifics, the only holdup has been
SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 5:08 PM, Chris McDonough chr...@plope.com wrote:

 On Fri, 2010-07-16 at 17:47 -0400, Tres Seaver wrote:

   In the past when we've gotten down to specifics, the only holdup has
 been
   SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.
 
  I think I favor PJE's suggestion:  let WSGI deal only in bytes.

 I'd prefer that WSGI 2 was defined in terms of a bytes with benefits
 type (Python 2's ``str`` with an optional encoding attribute as a hint
 for cast to unicode str) instead of Python 3-style bytes.

 But if I had to make the Hobson's choice between Python 3 style bytes
 and Python 3 style str, I'd choose bytes.  If I then needed to write
 middleware or applications, I'd use WebOb or an equivalent library to
 enable a policy which converted those bytes to strings on my behalf.
 Making it easy to write raw middleware or applications without using
 such a library doesn't seem as compelling a goal as being able to easily
 write one which allowed me direct control at the raw level.


What are the concrete problems you envision with text request headers, text
(URL-quoted) path, and text response status and headers?

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 5:06 PM, Ian Bicking i...@colorstudy.com wrote:

 On Fri, Jul 16, 2010 at 4:47 PM, Tres Seaver tsea...@palladion.comwrote:

   Basically all the internal strings are textish, so we're left with:

 What do you mean by internal?  Anything in the headers or the CGI
 environment is intrinsically bytes-ish to me.  Do you mean that you
 want application programmers to have them transparently decoded?  If so,
 we can make that the responsibility of the non-middleware framework /
 application.


 By internal I mean all the CGI variables that aren't representing HTTP,
 like SERVER_NAME.


Actually I was thinking SERVER_SOFTWARE, though SERVER_NAME is somewhat
similar as it doesn't come from HTTP, it comes from server configuration.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Chris McDonough
On Fri, 2010-07-16 at 17:11 -0500, Ian Bicking wrote:
 On Fri, Jul 16, 2010 at 5:08 PM, Chris McDonough chr...@plope.com
 wrote:
 On Fri, 2010-07-16 at 17:47 -0400, Tres Seaver wrote:
 
   In the past when we've gotten down to specifics, the only
 holdup has been
   SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate
 those.
 
  I think I favor PJE's suggestion:  let WSGI deal only in
 bytes.
 
 
 I'd prefer that WSGI 2 was defined in terms of a bytes with
 benefits
 type (Python 2's ``str`` with an optional encoding attribute
 as a hint
 for cast to unicode str) instead of Python 3-style bytes.
 
 But if I had to make the Hobson's choice between Python 3
 style bytes
 and Python 3 style str, I'd choose bytes.  If I then needed to
 write
 middleware or applications, I'd use WebOb or an equivalent
 library to
 enable a policy which converted those bytes to strings on my
 behalf.
 Making it easy to write raw middleware or applications
 without using
 such a library doesn't seem as compelling a goal as being able
 to easily
 write one which allowed me direct control at the raw level.
 
 What are the concrete problems you envision with text request headers,
 text (URL-quoted) path, and text response status and headers?

Documentation is the main reason.  For example, the documentation for
making sense of path_info segments in a WSGI that used unicodey-strings
would, as I understand it, read something like this:


The PATH_INFO environment variable is a string.  To decode it,

- First, split it on slashes::

segments = PATH_INFO.split('/')

- Then turn each segment into bytes::

bytes_segments = [ bytes(x, encoding='latin-1') for x in segments ]

- Then, de-encode each segment's urlencoded portions:

urldecoded_segments = [ urllib.unquote(x) for x in bytes_segments ]

- Then re-encode each urldecoded segment into the encoding expected
  by your application

app_segments = [ str(x, encoding='utf-8') for x in 
 urldecoded_segments ]

.. note:: We decode from latin-1 above because WSGI tunnels the bytes
representing the PATH_INFO by way of a string type which contains bytes
as characters.


That looks pretty apologetic to me, and to be honest, I'm not even sure
it will work reliably in the face of existing/legacy applications which
have emitted URLs that are not url-encoded properly if those old URLs
need to be supported.   http://bugs.python.org/issue8136 contains a
variation on this theme.

I'd much rather say be able to say:


The PATH_INFO environment variable is a ``bytes-with-benefits`` type.
To decode it:

- First, split it on slashes::

segments = PATH_INFO.split('/')

- Then, de-encode each segment's urlencoded portions:

urldecoded_segments = [ urllib.unquote(x) for x in segments ]

- Then re-encode each urldecoded segment into the encoding expected
  by your application

app_segments = [ str(x, encoding='utf-8') for x in 
 urldecoded_segments ]


Let me know if I'm missing something.

- C



___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread P.J. Eby

At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby 
mailto:p...@telecommunity.comp...@telecommunity.com wrote:

At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
And this doesn't help with Python 3: either we have byte values of 
SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I 
think bytes will be more awkward to port to than text, and 
inconsistent with other WSGI values.



OTOH, it has the tremendous advantage of pushing the encoding 
question onto the app (or framework) developer... Â who's really the 
only one who can make the right decision for their particular 
application. Â And personally, I'd rather have clear boundaries 
between text and bytes, such that porting (even if tedious or 
awkward) is *consistent*, and clear as to when you're finished, not, 
oh, did I check to make sure I converted SCRIPT_NAME and 
PATH_INFO... Â not just in my app code, but in all the library code 
I call *from* my app?


IOW, the bytes/string discussion on Python-dev has kind of led me to 
realize that we might just as well make the *entire* stack bytes 
(incoming and outgoing headers *and* streams), and rewrite that bit 
in PEP 333 about using str on Python 3000 to say we go with bytes 
on Python 3+ for everything that's a str in today's WSGI.



This was my first intuition too, until I started thinking in more 
detail about the particular values involved.  Some obviously are 
textish, like environ['SERVER_NAME'].  Not a very useful value, but 
definitely text.


Basically all the internal strings are textish, so we're left with:

wsgi.url_scheme
SCRIPT_NAME/PATH_INFO
QUERY_STRING
HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
response status
response headers (name and value)


What I'm getting at, though, is it's precisely this sort of hm, 
which ones are bytes again? stuff that makes you have to stop and 
*think*, i.e., it doesn't Fit My Braintm any more.  ;-)


There should be one, and preferably *only* one, obvious way to do it.

And given that HTTP is inherently a bunch of bytes, bytes is the one 
obvious way.


I previously was under the impression that bytes wouldn't 
interoperate with strings in 3.x, but they *do*, in much the same way 
as they did in 2.x.  That means you'll be (mostly) bug-compatible in 
3.x, only you'll likely encounter encoding issues *sooner*, rather 
than later.  (i.e., the minute you combine non-ASCII inputs with your 
regular string constants).


Yes, you will also be forced to convert your return values to bytes, 
but if you've used string constants *anywhere*, then you know you'll 
be outputting text, which you should already have been encoding for 
output.  (So you'll just be forced to deal with errors on that side 
sooner as well.)


All in all, I'd say this also fits with what people on Python-Dev 
keep hammering on as the One Obvious Way to deal with bytes and 
strings in a program: i.e., bytes for I/O, text for text processing.


WSGI is HTTP, and HTTP is I/O, ergo, WSGI is I/O, and we should 
therefore byte the bullet here.  ;-)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread P.J. Eby

At 05:42 PM 7/16/2010 -0400, Tres Seaver wrote:

P.J. Eby wrote:

 (Hm.  Although actually, I suppose we *could* just borrow the time
 machine and pretend that WSGI called for byte-strings everywhere
 all along...)

I like the idea of pushing responsibility for decoding stuff into the
framework / app writer's hands.  OTOH, doesn't that hose authors of
existing middleware, due to the borkedness of working with bytes in Python3?


It only creates a new problem if they are currently not using *any* 
unicode in 2.x, and are passing through bytes from the input to the 
output without any encoding or decoding.  AFAICT, if any part of 
their app is currently unicode, they would have the same problems in 2.x.


(Minus, of course, any problems introduced by missing bytes methods 
in 3.x, or the fact that single-subscripted bytes are ints rather 
than bytestrings.)


Anyway, the problems introduced will be problems that can be solved 
by waving a fairly standard set of dead chickens at the problem, i.e. 
picking where you're going to encode/decode, and deciding what 
encoding(s) are meaningful to your app.  And frameworks that already 
have a unicode API are ahead of the game here.


So, AFAICT, the only people who'd be punished by a change to bytes 
are the people who have non-ASCII inputs or outputs, but haven't been 
using unicode (because 2to3 will convert them to using strings 
instead of bytes).


From what I can tell, though, this is also the group it's most 
politically correct to hate on in Python-Dev, so we should be 
relatively safe in shifting the burden to them.  ;-)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Armin Ronacher

Hi,

On 7/17/10 1:20 AM, Chris McDonough wrote:
 Let me know if I'm missing something.
The only thing you miss is that the bytes type of Python 3 is badly 
supported in the stdlib (not an issue if we reimplement everything in 
our libraries, not an issue for me) and that the bytes type has no 
string formattings which makes us do the encode/decode dance in our own 
implementation so of the missing stdlib functions.


So I am pretty sure we can't totally bypass the encoding/decoding.  We 
might however require less encodes/decodes if we leave bytes on the WSGI 
layer.



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

P.J. Eby wrote:
 At 07:20 PM 7/16/2010 -0400, Chris McDonough wrote:
 I'd much rather say be able to say:

 
 The PATH_INFO environment variable is a ``bytes-with-benefits`` type.
 To decode it:

 - First, split it on slashes::

 segments = PATH_INFO.split('/')

 - Then, de-encode each segment's urlencoded portions:

 urldecoded_segments = [ urllib.unquote(x) for x in segments ]

 - Then re-encode each urldecoded segment into the encoding expected
   by your application

 app_segments = [ str(x, encoding='utf-8') for x in
  urldecoded_segments ]
 
 
 +1.  I do wish we actually *had* a bytes-with-benefits type (as I 
 proposed on Python-Dev), but I don't think we can really get one 
 until the language moratorium is over.  Plain old bytes are the next 
 best thing. 

We might be able to write one which would work in reduce-instruction-set
mode, and have the server wrap the environ valuee in it.  Some
operations might not be natural, and we might have to implement some
wrappers around stdlib stuff, but maybe it would be worthwhile to try a
spike on it.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkxBA00ACgkQ+gerLs4ltQ4xlQCghykpuIBK97nwJczkZpddlrCf
rZQAoI6xRwsIo5jQiD781o8Q5Y5wxoSx
=4WBq
-END PGP SIGNATURE-

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Chris McDonough
On Sat, 2010-07-17 at 01:33 +0200, Armin Ronacher wrote:
 Hi,
 
 On 7/17/10 1:20 AM, Chris McDonough wrote:
   Let me know if I'm missing something.
 The only thing you miss is that the bytes type of Python 3 is badly 
 supported in the stdlib (not an issue if we reimplement everything in 
 our libraries, not an issue for me) and that the bytes type has no 
 string formattings which makes us do the encode/decode dance in our own 
 implementation so of the missing stdlib functions.

This is why the docs mention bytes with benefits instead (like the
Python 2 str type). The existence of such a type would be the result
of us lobbying for its inclusion into some future Python 3, or at least
the result of lobbying for a String ABC that would allow us to define
our own.

But.. yeah.  Stdlib support for bytes.  Dunno.   What I really don't
want to do is implement a WSGI spec in terms of Unicodey strings just
because the webby stuff in the stdlib cannot deal with bytes.  Those
stdlib implementations should be changed to deal with bytes-ish things
instead.  I actually think fixing the stdlib will end up being a driver
for the bytes with benefits type.  Supporting such a type in the
implementation of stdlib functions is clearly the right way to fix it in
lots of cases, because they will be able to deal with BwB and
Unicodey-strings in exactly the same way.

In the meantime, I think using bytes is the only sane thing to do in
some interim specification, because moving from a spec which is
bytes-oriented to a spec that is text-oriented now will leave us in the
embarrassing position of needing to create yet another bytes-oriented
spec later (as, well, I/O is bytes), when Python 3 matures and realizes
it needs such a hybrid type.

- C


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 8:46 PM, Ian Bicking i...@colorstudy.com wrote:

 So... before jumping to conclusions, what's the hard part with using text?


Oh, the one thing that will be silly is cookies, but they are totally nuts
already.  They can be parsed equally well as bytes or latin1, and best only
transcoded after parsing.  Doing cookie_value.decode(app_encoding) or
cookie_value.encode('ISO-8859-1').decode(app_encoding) isn't terribly
different.  And cookies aren't fair because they are just stupid; like the
standard library I don't think we should design anything around their
idiosyncrasies.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Saturday, July 17, 2010, Ian Bicking i...@colorstudy.com wrote:
 On Fri, Jul 16, 2010 at 6:20 PM, Chris McDonough chr...@plope.com wrote:



 What are the concrete problems you envision with text request headers,
 text (URL-quoted) path, and text response status and headers?

 Documentation is the main reason.  For example, the documentation for
 making sense of path_info segments in a WSGI that used unicodey-strings
 would, as I understand it, read something like this:

 Nah, not nearly that hard:

 path_info = 
 urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')

 I don't see the problem?  If you want to distinguish %2f from /, then you'll 
 do it slightly differently, like:

 path_parts = [
     urllib.parse.unquote_to_bytes(p).decode('UTF-8')
     for p in environ['wsgi.raw_path_info'].split('/')]

 This second recipe is impossible to do currently with WSGI.
 So... before jumping to conclusions, what's the hard part with using

Sorry, it is not that simple. The thing that everyone is ignoring is
that SCRIPT_NAME and PATH_INFO are also normalized by the web server
normally. That is, .. instances are removed. By passing the raw URL
through to the application, you are now forcing every application to
have to deal with that as well with the possibility of directory
traversal attacks when people get it wrong and the URL is mapping
somehow to file system resources. It is a huge can of worms which at
the moment the web server deals with.

I have other issues with the raw stuff, but haven't got to read the
last dozen messages in this discussion as yet, so will leave those
points to another time.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 11:28 PM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

  Nah, not nearly that hard:
 
  path_info =
 urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
 
  I don't see the problem?  If you want to distinguish %2f from /, then
 you'll do it slightly differently, like:
 
  path_parts = [
  urllib.parse.unquote_to_bytes(p).decode('UTF-8')
  for p in environ['wsgi.raw_path_info'].split('/')]
 
  This second recipe is impossible to do currently with WSGI.
  So... before jumping to conclusions, what's the hard part with using

 Sorry, it is not that simple. The thing that everyone is ignoring is
 that SCRIPT_NAME and PATH_INFO are also normalized by the web server
 normally. That is, .. instances are removed. By passing the raw URL
 through to the application, you are now forcing every application to
 have to deal with that as well with the possibility of directory
 traversal attacks when people get it wrong and the URL is mapping
 somehow to file system resources. It is a huge can of worms which at
 the moment the web server deals with.


Well... at least to me raw only means not URL decoded, so it doesn't
necessarily mean you can't clean up the request path.  I guess an attacker
could encode . to make things harder.

Nevertheless, WSGI servers don't currently guarantee this cleaning.  I added
it to paste.httpserver, but I don't know one way or the other about any
other servers.  A quick test shows wsgiref does not clean paths.  So apps
shouldn't rely on a clean path.


-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Saturday, July 17, 2010, Ian Bicking i...@colorstudy.com wrote:
 On Fri, Jul 16, 2010 at 4:33 AM, And Clover and...@doxdesk.com wrote:


 On 07/14/2010 06:43 AM, Ian Bicking wrote:


 There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
 and HTTP_COOKIE.



 (And of those, PATH_INFO is the only one that really matters, in that no-one 
 really uses non-ASCII script filenames, and non-ASCII characters in 
 Cookie/Set-Cookie are still handled so differently/brokenly across browsers 
 that you can't rely on them at all.)




 * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
 exclusively with encoded versions



 For compatibility with existing apps, how about keeping the existing 
 SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying 
 that the new 'raw' versions (whatever they are called) are added only if they 
 really are raw, not reconstructed.

 Having two ways of expressing the same information will lead to bugs related 
 to which data is canonical.  If an application is using SCRIPT_NAME/PATH_INFO 
 and then updates those values in any way, and 
 wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be weird 
 bugs and code will disagree about which one is correct.  Since %2f can exist 
 in the raw versions, there isn't even a way to chunk the two variables in the 
 same way.


 Then existing scripts that don't care about non-ASCII and slashes can carry 
 on as before, and for apps that do care about them, they'll be able to be 
 *sure* the input is correct. Or they can fall back to PATH_INFO when not 
 present, and avoid producing these kind of URLs in response.

 I don't think it works to imagine you can just not care about non-ASCII.  
 Requests come in.  WSGI should represent those requests.  If a request comes 
 in with non-ASCII bytes then WSGI needs to do *something* with it.  I don't 
 want to have to configure servers with application policy; servers should 
 just work.

 And this doesn't help with Python 3: either we have byte values of 
 SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think bytes 
 will be more awkward to port to than text, and inconsistent with other WSGI 
 values.  If we have text then we have to choose an encoding.  Latin1 will 
 work, but it will be the exact wrong encoding most of the time as UTF-8 is 
 the typical  (unlike other headers, where Latin1 will mostly be an okay 
 encoding, or as good a guess as we have).  If we firmly remove these keys 
 then we can avoid this choice entirely... and we conveniently also get a 
 better representation of the request.

One reason I don't want to see the existing keys removed is for
debugging purposes. In Apache, various Apache modules such as
mod_rewrite will operate on that translated path. I am concerned that
if only the raw one is available in the WSGI application then
confusion may arise where something doesn't go right with rewrites
because the only information that may be able to be dumped in the way
of debug by an application will be different to what other Apache
modules may operate on. If you aren't going to make use of CGI
versions, then would still like to see them present but perhaps
renamed. That way you don't have a loss of information when it comes
to trying to debug stuff. I could perhaps just put this in a
Apache/mod_wsgi specific key as well given that the issue is
particular to it. Thus might have apache.path_info or cgi.path_info.

Graham

 Note that libraries can smooth over this change; WebOb for instance will 
 certainly still support req.script_name/req.path_info by decoding the raw 
 values.  Admittedly lots of code use these values directly... but at least if 
 they get a KeyError the port/fix will be obvious (as opposed to out of sync 
 values, which will only emerge as a problem occasionally -- I'd rather not 
 invite more occasional bugs).

 --
 Ian Bicking  |  http://blog.ianbicking.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Saturday, July 17, 2010, Ian Bicking i...@colorstudy.com wrote:
 On Fri, Jul 16, 2010 at 12:28 PM, Chris McDonough chr...@plope.com wrote:


 On Fri, 2010-07-16 at 11:07 -0500, Ian Bicking wrote:

 And this doesn't help with Python 3: either we have byte values of
 SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I
 think bytes will be more awkward to port to than text, and
 inconsistent with other WSGI values.  If we have text then we have to
 choose an encoding.  Latin1 will work, but it will be the exact wrong
 encoding most of the time as UTF-8 is the typical  (unlike other
 headers, where Latin1 will mostly be an okay encoding, or as good a
 guess as we have).  If we firmly remove these keys then we can avoid
 this choice entirely... and we conveniently also get a better
 representation of the request.

 My $.02: I'd rather lobby the core folks for a string ABC (which we can
 hook with a stringlike bytes type) and consider all 3.X releases made so
 far dead to WSGI than to have to tunnel arbitrary bytes through some
 misleading Unicode encoding.

 While I think it would be generally useful, it's also a long way off at best, 
 with serious performance dangers that could torpedo the whole thing.  But... 
 I'm also unsure how it would help here, except perhaps we could incrementally 
 annotate bytes with an encoding?  Well, I don't really know.  Treating the 
 raw request path as text is easy enough, as it should always be ASCII 
 anyway.  We don't have to worry what is right or wrong in this case.

 We could make everything bytes and be done with it, but it would make it much 
 harder to port Python 2 WSGI code to Python

FWIW, I see the whole ebytes discussion only relevant were you to make
absolutely everything bytes. We don't really need it otherwise.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-14 Thread Ian Bicking
On Wed, Jul 14, 2010 at 12:19 AM, Graham Dumpleton 
graham.dumple...@gmail.com wrote:

   * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
  exclusively with encoded versions (that represent the original request
  URI).  We use Latin1 encoding, but it should be ASCII anyway, like most
 of
  the headers.

 BTW, it should be highlighted whether this change is relevant to
 Python 3 but like some of the other things you relegated as out of
 scope, purely a wish list item.


Certainly; most headers or metadata is pretty much constrained to ASCII, and
any use of non-ASCII is... at least peculiar, and presumably
application-specific.  For instance, there's no reason you'd have anything
but ASCII in Cache-Control.  The one place encoded information happens
regularly in headers (that I know of) is Cookie.  The request URI path is
generally ASCII, but SCRIPT_NAME and PATH_INFO *aren't* the request URI
path, they are URL decoded versions of the request URI path.  And they are
usually encoded in UTF8... but UTF8 is a lossy encoding, so decoding them is
problematic (though we could define that they must be decoded with
surrogateescape).  And while they are usually UTF8, they are sometimes no
valid encoding at all, because anyone can assemble any set of characters
they want and web browsers will accept it.

By avoiding URL-unquoting of these values, we can also stick to Latin1 and
get something reasonable.  It's not very attractive to me that we take
something that is probably *not* Latin1, and may reasonably not be ASCII,
and decode it as Latin1.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-13 Thread Graham Dumpleton
On 14 July 2010 15:04, Graham Dumpleton graham.dumple...@gmail.com wrote:
 On 14 July 2010 14:43, Ian Bicking i...@colorstudy.com wrote:
 So... there's been some discussion of WSGI on Python 3 lately.  I'm not
 feeling as pessimistic as some people, I feel like we were close but just
 didn't *quite* get there.

 What I took from the discussion wasn't that one couldn't specify a
 WSGI interface, and as you say we more or less have one now, the issue
 is more about how practical that is from a usability perspective for
 those who have to code stuff on top.

 The concern seems to be that although it may be easy to work with the
 specification for those who at the lowest layer immediately wrap it in
 a higher level abstraction that normalises stuff into something that
 is then used consistently in that way, for those who use lower level
 raw WSGI right through the stack, especially in the context of
 stackable WSGI middleware, that repetitive task of having to deal with
 the byte/unicode issues at every point it just a big PITA.

 That said, my job in writing the WSGI adapter is really easy as I
 don't have to worry about these issues. This is why I don't seem to
 really appreciate the concerns people are expressing. The above is how
 I read things though.

 Here's my thoughts:

 * Everyone agrees keys in the environ should be native strings
 * Bodies should stay bytes
 * Can we make all standard values that are str on Python 2, str on Python
 3 with a Latin1 encoding?  This is basically what wsgiref did.  This means
 HTTP_*, SERVER_NAME, etc.  Everything CGIish, and everything with an
 all-caps key.  There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
 and HTTP_COOKIE.
 * I propose we let libraries handle HTTP_COOKIE however they want; don't
 bother transcoding *into* the environ, just do so when you parse the cookie
 (if you so choose).  Happy developers will just urlencode all their cookie
 values to keep their cookies ASCII-clean.  Unhappy developers who have to
 handle legacy cookies will just run environ['HTTP_COOKIE'].decode('latin1')
 and then do whatever sad magic they are forced to do.
 * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
 exclusively with encoded versions (that represent the original request
 URI).  We use Latin1 encoding, but it should be ASCII anyway, like most of
 the headers.

BTW, it should be highlighted whether this change is relevant to
Python 3 but like some of the other things you relegated as out of
scope, purely a wish list item.

Graham

 * I'm terrible at naming, but let's say these new values are RAW_SCRIPT_NAME
 and RAW_PATH_INFO.

 My prior suggestion on that since upper case keys for now effectively
 derive from CGI, was to make them wsgi.script_name and wsgi.path_info.
 Ie., push them into the wsgi namespace.

 Does this solve everything?  There's broken stuff in the stdlib, but we
 shouldn't bother ourselves with that -- if we need working code we should
 just write it and ignore the stdlib or submit our stuff as patches to the
 stdlib.

 The quick summary of what I suggest before is at:

  http://code.google.com/p/modwsgi/wiki/SupportForPython3X

 I believe the only difference I see is the raw SCRIPT_NAME and
 PATH_INFO, which got discussed to death previously with no consensus.

 Some environments will have a hard time constructing RAW_SCRIPT_NAME and
 RAW_PATH_INFO, but in my opinion they can just encode SCRIPT_NAME and
 PATH_INFO and be done with it; it's not as accurate, but it's no less
 accurate than what we have now.

 Actual transcoding in the environ is not supported or encouraged in this
 scheme.  If you want to adjust an encoding you should do it in your
 application/library code.

 There's some other topics, like chunked responses, unknown request body
 lengths, start_response, and maybe some other things, but these aren't
 Python 3 issues, they are just... generic issues.  app_iter.close() might be
 worth thinking about given new iterator semantics introduced since WSGI was
 written.

 Graham

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-13 Thread Graham Dumpleton
On 14 July 2010 15:18, Ian Bicking i...@colorstudy.com wrote:
 On Wed, Jul 14, 2010 at 12:04 AM, Graham Dumpleton
 graham.dumple...@gmail.com wrote:

 On 14 July 2010 14:43, Ian Bicking i...@colorstudy.com wrote:
  So... there's been some discussion of WSGI on Python 3 lately.  I'm not
  feeling as pessimistic as some people, I feel like we were close but
  just
  didn't *quite* get there.

 What I took from the discussion wasn't that one couldn't specify a
 WSGI interface, and as you say we more or less have one now, the issue
 is more about how practical that is from a usability perspective for
 those who have to code stuff on top.

 My intuition is that won't be that bad.  At least compared to any library
 that is dealing with str/unicode porting issues; which aren't easy, but so
 it goes.


  * I'm terrible at naming, but let's say these new values are
  RAW_SCRIPT_NAME
  and RAW_PATH_INFO.

 My prior suggestion on that since upper case keys for now effectively
 derive from CGI, was to make them wsgi.script_name and wsgi.path_info.
 Ie., push them into the wsgi namespace.

 That's fine with me too.


  Does this solve everything?  There's broken stuff in the stdlib, but we
  shouldn't bother ourselves with that -- if we need working code we
  should
  just write it and ignore the stdlib or submit our stuff as patches to
  the
  stdlib.

 The quick summary of what I suggest before is at:

  http://code.google.com/p/modwsgi/wiki/SupportForPython3X

 I believe the only difference I see is the raw SCRIPT_NAME and
 PATH_INFO, which got discussed to death previously with no consensus.

 Thanks, I was looking for that.  I remember the primary objection to a
 SCRIPT_NAME/PATH_INFO change was from you.  Do you still feel that way?

I accept that access to the raw information may help for people who
want access to repeating slashes or other encoded information that an
underlying web server may alter, but I cant remember in what way this
helps with the Python 3 issues. That is why I just made the comment in
other email.

Perhaps you can cover how this helps with Python 3.

 I generally agree with your interpretation, except I would want to strictly
 disallow unicode (Python 3 str) from response bodies.  Latin1/ISO-8859-1 is
 an okay encoding for headers and status and raw SCRIPT_NAME/PATH_INFO, but
 for bodies it doesn't have any particular validity.

 I forgot to mention the response, which you cover; I guess I'm okay with
 being lenient on types there (allowing both bytes and str in Python 3)...
 though I'm not really that happy with it.  I'd rather just keep it symmetric
 with the request, requiring native strings everywhere.

The reason for allowing it in the response content was so the
canonical WSGI hello world still work unmodified.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com