Re: [Web-SIG] WSGI for Python 3

2010-08-30 Thread P.J. Eby

At 02:37 PM 8/30/2010 +1000, Graham Dumpleton wrote:

Anyway, rather than keep arguing the point and move forward, let us
perhaps start now with the following definitions and new names to
identify them. We can even go a bit stupid and give each its own code
name so they are in part more memorable. Any next option based on your
suggestions about changing the WHEAT option can be called MAIZE. And
if you thinking I am going stark raving mad and should be put in a
white jacket and locked up, you could well be right. I am not a happy
camper right now, but that is because of many things besides this WSGI
stuff. :-)

 And yes I know about the page that has been just recently put up at:

  http://www.wsgi.org/wsgi/Python_3

From memory when I first read it I wasn't sure if that it was
completely accurate, but at least it doesn't now mention mod_python
instead of mod_wsgi which was mighty confusing. We can perhaps merge
the following into that page, ie., expand the table, and talk more
about the abstract definitions rather than linking it to specific
implementations at this point. We can perhaps then start capturing the
pros and cons against each option in the page rather than loosing them
in the email chain.


I've added a column to the page called "flat" that captures my 
current proposal (native keys, surrogateescape values, byte stream 
in, strict bytes-only for all outputs).  This seems to me an optimum 
balance between:


* Verifiability (especially *composable* verifiability)
* Low cognitive overhead (i.e., fewest things to remember)
* Low amount of finger-typing and fewer conversions

But I certainly could be convinced otherwise by example or argument.

(One other thing I consider a plus for this approach, btw: os.environ 
is still largely usable as a WSGI environ in the CGI case.  This 
isn't so much a valuable thing in itself, as that it's an indicator 
of low complexity and cognitive overhead.) 


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-30 Thread Ian Bicking
Just to narrow in on one case, URLs, there are a few pieces of information
that make up the URL:

wsgi.url_scheme: this is *not* present in the request, it's inferred somehow
(e.g., by the port the client connected to)

HTTP_HOST: this is a header.  It typically contains both the hostname and
the port.  The encoding is generally idna, though you have to split the port
off first.  The unicode version of the hostname is not widely supported in
client libraries (it's usually applied at the UI level).

SCRIPT_NAME/PATH_INFO: these represent a portion of the request path (before
?).  As submitted these are generally ASCII (URL-quoted).  After unquoting,
they are typically UTF-8, but may be of any or no encoding.  If an unsafe
character is present in the URL-quoted version of the path, it may be quoted
at the byte level.  The '?' character is effectively a byte-oriented marker
and encodings cannot affect it.

QUERY_STRING: this is also generally ASCII (URL-quoted).  Unsafe characters
could be quoted at the byte level.

Generally I'm unaware of any reasonable situation where quoting unsafe
characters in an HTTP request would be improper, or even lose any meaningful
information.  Mostly because I don't know of any clients that actually would
expect unsafe characters to work.  Quoting HTTP_HOST is difficult, as it's
not a byte-oriented quoting, it's a fairly complex encoding.  But I'm also
not sure where in a stack you could actually handle unsafe characters in
HTTP_HOST -- it seems like simply an invalid request, and deferring the
error won't give another part of the stack the opportunity to do the right
thing.

In their quoted form all these values (at least including the quoted path,
not the unquoted SCRIPT_NAME/PATH_INFO) *should* be ASCII, and I believe a
WSGI server could ensure they were all ASCII without any loss of useful
information (either by simply rejecting the request or by applying
quoting).  I don't see any place where bytes are advantageous.  Representing
invalid requests does not seem particularly helpful -- *some* invalid
requests are useful to handle (e.g., weird cookies) but in the case of the
URL variables I don't see any benefit.

IMHO all the tricky encoding issues are in the request and response bodies,
and I'm pretty sure we have consensus that those should be bytes.

Reiterating other encoding issues I'm aware of:

Cookie encodings, but parsing cookies as bytes or Latin1 is basically
equivalent, and I don't believe that, for instance, they should ever be
parsed as UTF-8.  Parsing as bytes might avoid an unnecessary
encoding/decoding, but it's all tricky enough that libraries should do it
anyway, and the encoding overhead alone isn't very important.

Another example is the Atom Title header (
http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-08.html#rfc.section.8.1.2)
but that's supposed to be Latin1 with RFC2047 encodings, and I don't believe
anyone is proposing that RFC2047 encodings be handled generally at the WSGI
layer (I think CherryPy does or used to handle these, but there were many
objections at least on this list about it, in part due to security
concerns).  A 2047 encoding is like "Title:
=?utf-8?q?stuff-with=-escaping?=".

Response headers are equivalent to request headers.  Response status is
constrained by the spec to Latin1, and there are no use cases I know of
(even really obscure ones) where it would be necessary to use other
encodings.

And that's it!  HTTP has a fairly finite amount of surface area.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-29 Thread Graham Dumpleton
On 30 August 2010 13:07, P.J. Eby  wrote:
> At 11:16 AM 8/30/2010 +1000, Graham Dumpleton wrote:
>>
>> Although I almost begged that if we are going to discuss bytes,
>> compared to text/unicode, that agreement at least first be made about
>> the definition of the bytes leaning option, that request has pretty
>> well fallen on death ears.
>
> Did you not see my reply?  I (thought I) answered your question, and I
> actually also suggested that a variation of your unicode proposal might
> work, too.  See:
>
> http://mail.python.org/pipermail/web-sig/2010-August/004545.html

I was purely asking about bytes, what that means to people who want to
push that, and set aside the unicode one for the moment.

There have been others as well in the past who have pushed bytes, but
they haven't said anything about what it means and I really wanted
more input given that in the past the discussions had over the unicode
leaning proposals between us core people have been in part derailed by
these people who sit mostly on the sidelines and start shouting 'I
want bytes instead'. So, I want to give those critics their chance to
confirm what they mean by bytes, else we will keep having them pop up
time and time again when we are trying to discuss other stuff. So it
is the lack of response beyond the usual suspects that am grumpy
about.

Even in what you mention about bytes you are a bit fuzzy. Having value
of wsgi.url_scheme be bytes is reasonable and have no issue with that
given that other URL components will be bytes as well, but when you
yourself mention keys, you are a bit unsure because of the 'b' plague.
So, still no clarity on that point and if people are going to keep
raising bytes, would like that better definition of what they are
talking about.

The only other person who has said anything about bytes is Armin but
all that he really said was 'all bytes only'. This isn't much clearer
than when people have in the past said 'bytes everywhere', but in some
cases didn't actually mean keys. This is why I asked that people cut
and paste the definition I gave and change it to exactly what they
meant, so not having to second guess. FWIW, from separate discussion
understand Armin does mean bytes for keys.

So, was really after that clarity so we can say without confusion that
our starting point from now is that have two overall proposals and
that they be A and B as defined, with possibly even a C and D if need
be, not even using the labels bytes and unicode. We can then discuss
each in isolation as to whether as defined they would work or not.
>From that one or more might die, or might mutate further and actually
become closer to the other option but where all are still valid
options. Either way, people up till now have it stuck in their heads
now this bytes vs unicode divide when strictly speaking it isn't
necessarily pure bytes vs pure unicode, but merely a number of
different proposals with certain bits in one case using unicode
instead of bytes.

Given that we have dedicated most time to the unicode leaning
solution, would like to go and look properly at the bytes leaning
solutions now. That way we have the definitions and also have done the
analysis and when people come along later and say 'bytes everywhere',
we have something proper to refer back to about it.

Anyway, rather than keep arguing the point and move forward, let us
perhaps start now with the following definitions and new names to
identify them. We can even go a bit stupid and give each its own code
name so they are in part more memorable. Any next option based on your
suggestions about changing the WHEAT option can be called MAIZE. And
if you thinking I am going stark raving mad and should be put in a
white jacket and locked up, you could well be right. I am not a happy
camper right now, but that is because of many things besides this WSGI
stuff. :-)

 And yes I know about the page that has been just recently put up at:

  http://www.wsgi.org/wsgi/Python_3

>From memory when I first read it I wasn't sure if that it was
completely accurate, but at least it doesn't now mention mod_python
instead of mod_wsgi which was mighty confusing. We can perhaps merge
the following into that page, ie., expand the table, and talk more
about the abstract definitions rather than linking it to specific
implementations at this point. We can perhaps then start capturing the
pros and cons against each option in the page rather than loosing them
in the email chain.

OPTION : BARLEY

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are byte strings.

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a byte string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are byte strings.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is

Re: [Web-SIG] WSGI for Python 3

2010-08-29 Thread P.J. Eby

At 11:16 AM 8/30/2010 +1000, Graham Dumpleton wrote:

Although I almost begged that if we are going to discuss bytes,
compared to text/unicode, that agreement at least first be made about
the definition of the bytes leaning option, that request has pretty
well fallen on death ears.


Did you not see my reply?  I (thought I) answered your question, and 
I actually also suggested that a variation of your unicode proposal 
might work, too.  See:


http://mail.python.org/pipermail/web-sig/2010-August/004545.html

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-29 Thread Graham Dumpleton
On 30 August 2010 11:02, Ian Bicking  wrote:
> Ugh... why are we back at bytes again?

Because no official decision, by way of a vote or even consensus, has
ever been made, the bytes option never goes away.

The problem with bytes, before one even tries to compare it to
text/unicode option, is that there is no clear description of what is
meant by the bytes option. For all I can see, there are potentially
multiple interpretations of what is meant by bytes.

Although I almost begged that if we are going to discuss bytes,
compared to text/unicode, that agreement at least first be made about
the definition of the bytes leaning option, that request has pretty
well fallen on death ears. Thus the discussion yet again is going the
direction of just dithering with a lot of navel gazing and not much
else.

As I brought up almost two years ago, if we are going to make any
progress on this, we are probably going to have a core group of people
nominated who can officially make the decision of what is done based
on a proper vote. This will be the only way there is going to be any
sort of acceptance of a decision. This idea that we can reach a
consensus just isn't working.

Graham

> I don't know of any concrete
> problems with using Latin1 (basically how mod_wsgi works).  It would be nice
> to try out some tricky cases -- cookie parsing, HTTP proxies,
> output-modifying middleware, a few other cases.  But I don't see a reason to
> expect they won't work.  It also doesn't feel particularly *wrong*.  The
> parsed portions of the request and response are mostly ASCII anyway, and the
> exceptions generally require wonky code anyway so a little transcoding isn't
> so bad.
>
> --
> Ian Bicking  |  http://blog.ianbicking.org
>
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-29 Thread Ian Bicking
Ugh... why are we back at bytes again?  I don't know of any concrete
problems with using Latin1 (basically how mod_wsgi works).  It would be nice
to try out some tricky cases -- cookie parsing, HTTP proxies,
output-modifying middleware, a few other cases.  But I don't see a reason to
expect they won't work.  It also doesn't feel particularly *wrong*.  The
parsed portions of the request and response are mostly ASCII anyway, and the
exceptions generally require wonky code anyway so a little transcoding isn't
so bad.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-28 Thread Georg Brandl
Am 28.08.2010 13:13, schrieb Armin Ronacher:
> Hi,
> 
> On 2010-08-28 1:04 PM, Georg Brandl wrote:
>> Let me just throw in here that it's *NOT* too late to do something about
>> Python 3.2.  It is not even in beta state yet, and I am very willing to
>> introduce the changes to make web programming work again, or even hold
>> up 3.2 for a bit if you need more time.
> Sorry if I was not clear.  I was talking about only wsgiref here.  And 
> for that to be adapted to a possible new WSGI specification we would 
> need more time than you can hold the 3.2 release I think.

That is certainly true :)

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-28 Thread Armin Ronacher

Hi,

On 2010-08-28 1:04 PM, Georg Brandl wrote:

Let me just throw in here that it's *NOT* too late to do something about
Python 3.2.  It is not even in beta state yet, and I am very willing to
introduce the changes to make web programming work again, or even hold
up 3.2 for a bit if you need more time.
Sorry if I was not clear.  I was talking about only wsgiref here.  And 
for that to be adapted to a possible new WSGI specification we would 
need more time than you can hold the 3.2 release I think.



However, someone who actually *does* web programming has to do that, in
other words, one of you.  All I see is complaints that it will not work
and one has to forget the stdlib.  That is somewhat sad.
While I am not happy with the decisions of the stdlib for unicode in 
some parts, my mail was not related to that.



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-28 Thread Georg Brandl
Am 27.08.2010 01:37, schrieb Armin Ronacher:
> Hi,
> 
> Is there a status update on that now I missed?  Did something decide on 
> bytes for the environment values or are we still unsure about that?
> 
>  From a discussion lately I had with Graham on #pocoo it seems like he 
> lost interest on supporting WSGI on Python 3 for the time being due to 
> lack of interest.
> 
> My personal pet project of actively redesigning WSGI to see if a 
> higher-level protocol would solve the unicode issue better failed and 
> was not worth the effort.
> 
> As I understand Python 3.0/1/2 will be broken for WSGI anyways so we can 
> stop caring about the stdlib.

Let me just throw in here that it's *NOT* too late to do something about
Python 3.2.  It is not even in beta state yet, and I am very willing to
introduce the changes to make web programming work again, or even hold
up 3.2 for a bit if you need more time.

However, someone who actually *does* web programming has to do that, in
other words, one of you.  All I see is complaints that it will not work
and one has to forget the stdlib.  That is somewhat sad.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Armin Ronacher

Hi,

On 2010-08-27 6:05 PM, Christoph Zwerschke wrote:
> Btw, another problem with this is that the lower() method does not know
> that it has to use latin1 when lowercasing.
That is not a problem insofar that case insensitive HTTP tokens are 
limited to ASCII only.



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Paul Davis
On Fri, Aug 27, 2010 at 4:04 PM, Robert Brewer  wrote:
> Paul Davis wrote:
>> > Since the major stumbling block, irrespective of other changes,
>> > to any sort of agreement is still bytes vs unicode
>>
>> I ran into this while I was attempting to put together enough code to
>> play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has
>> deftly pointed out, its a pretty big pain in the rear.
>>
>> Specifically, if we specify that all keys in the environ dictionary
>> are byte strings, then there's a noticeable amount of pain in trying
>> to write code that runs on both platforms. I object to 2to3.py on
>> religious grounds, so when I was implementing this I was doing so with
>> code that would run unmodified on both 2 and 3.
>
> Religion is what gets us into this mess. Pragmatism will get us out. We
> have two options:
>
>  1. Continue to try to write code that runs unmodified on Python 2 and
> 3, or that runs when 2to3 is applied. There is a morass of corner cases
> and state machines that behave differently depending on when you look at
> them lurking here. You can all see where that is getting us: nowhere. By
> the time you all discover how to write a spec that deals with all the
> pain points which 2to3 introduces, Python 2 will be dead and you will
> have wasted your time.
>  2. Write a Python 3 version of your code. Yes, it's more drudge work.
> Suck it up. To ameliorate that, make the Python 3 version the default as
> soon as possible. Deprecate the Python 2 branch. Backport features as
> necessary to the Python 2 branch (just as Python itself has been doing,
> if you notice). If you do that, we can write a WSGI for Python 3 now
> that doesn't suffer from any of the complexities of 2to3.
>
>
> Robert Brewer
> fuman...@aminus.org
>

No. What got us into this mess was the idea that it would be a good to
silently type cast unicode objects into bytes. Perhaps I could've been
more clear on avoiding 2to3 though. I wanted to avoid coding any of
its oddities into a reference implementation because as you point out
it's just a source of confusion.

I'd like to point out that the code I posted works on both 2.x and
3.x. Its fairly easy to implement the backwards compatible code in
Python. There's nothing near the level of requiring a
branched/back-port strategy. Not to mention, a branched reference
implementation is bit of a contradiction in terms. The hard part is
figuring out a specification that doesn't suck when people try and
implement it on multiple interpreters.

Also, I think you're overestimating the rate at which people are going
to be converting to Python 3. I still have people ask for Python 2.4
support. I wouldn't be the least bit surprised if there's a WSGI 3
before we deprecate 2.x support.

HTH,
Paul Davis
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Robert Brewer
Paul Davis wrote:
> > Since the major stumbling block, irrespective of other changes,
> > to any sort of agreement is still bytes vs unicode
>
> I ran into this while I was attempting to put together enough code to
> play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has
> deftly pointed out, its a pretty big pain in the rear.
> 
> Specifically, if we specify that all keys in the environ dictionary
> are byte strings, then there's a noticeable amount of pain in trying
> to write code that runs on both platforms. I object to 2to3.py on
> religious grounds, so when I was implementing this I was doing so with
> code that would run unmodified on both 2 and 3.

Religion is what gets us into this mess. Pragmatism will get us out. We
have two options:

 1. Continue to try to write code that runs unmodified on Python 2 and
3, or that runs when 2to3 is applied. There is a morass of corner cases
and state machines that behave differently depending on when you look at
them lurking here. You can all see where that is getting us: nowhere. By
the time you all discover how to write a spec that deals with all the
pain points which 2to3 introduces, Python 2 will be dead and you will
have wasted your time.
 2. Write a Python 3 version of your code. Yes, it's more drudge work.
Suck it up. To ameliorate that, make the Python 3 version the default as
soon as possible. Deprecate the Python 2 branch. Backport features as
necessary to the Python 2 branch (just as Python itself has been doing,
if you notice). If you do that, we can write a WSGI for Python 3 now
that doesn't suffer from any of the complexities of 2to3.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Paul Davis
On Fri, Aug 27, 2010 at 12:17 AM, Graham Dumpleton
 wrote:
> On 27 August 2010 13:45, P.J. Eby  wrote:
>> At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote:
>>>
>>> Hi,
>>>
>>> Is there a status update on that now I missed?  Did something decide on
>>> bytes for the environment values or are we still unsure about that?
>>
>> To the extent we're "unsure", I think the holdup is simply that nobody has
>> tried doing an all-bytes WSGI implementation -- unless of course you count
>> all our Python 2.x experience as experience with an all-bytes
>> implementation.  ;-)
>>
>> (Of course, that experience won't help us with Python 3 stdlib issues.)
>>
>>
>>> At that point I don't care at all about what is decided on as long as
>>> something is decided.  Can someone please stand up and just do that? :)
>>
>> Essentially the problem right now is that unless such a choice is made,
>> there's little hope of getting the stdlib issues to be resolved, because we
>> can't exactly file bug reports against the stdlib if we don't know what we
>> want it to do.  ;-)
>>
>> My personal inclination is to define WSGI 2 as a bytes-oriented protocol,
>> and then encourage people to port to WSGI 2 before moving to Python 3.
>
> Since the major stumbling block, irrespective of other changes, to any
> sort of agreement is still bytes vs unicode, and where we have a
> reasonable clear definition of what unicode suggestion is, can we
> please as a first step get a definition of what bytes actually implies
> so everyone knows what we are talking about. I specifically ask this,
> as it isn't clear because people don't explain in detail what they
> mean when they are saying 'bytes'.
>
> Going back to my definition #2 in my blog post from a year ago, I had:
>
> 1. The application is passed an instance of a Python dictionary
> containing what is referred to as the WSGI environment. All keys in
> this dictionary are native strings. For CGI variables, all names are
> going to be ISO-8859-1 and so where native strings are unicode
> strings, that encoding is used for the names of CGI variables
>
> 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
> environment, the value of the variable should be a native string.
>
> 3. For the CGI variables contained in the WSGI environment, the values
> of the variables are byte strings.
>
> 4. The WSGI input stream 'wsgi.input' contained in the WSGI
> environment and from which request content is read, should yield byte
> strings.
>
> 5. The status line specified by the WSGI application must be a byte string.
>
> 6. The list of response headers specified by the WSGI application must
> contain tuples consisting of two values, where each value is a byte
> string.
>
> 7. The iterable returned by the application and from which response
> content is derived, must yield byte strings.
>
> The points of disagreement I have seen about this is are as follows.
>
> For (1), the keys should also be bytes, including names of 'wsgi.' special 
> keys.
>
> For (2), the value of 'wsgi.url_scheme' should be bytes.
>
> So, do you really want bytes absolutely everywhere, or are keys still
> going to be unicode taken as ISO-8859-1.
>
> Note that we are not agreeing to the final solution here, just what
> bytes means in contrast to the unicode option, so we know that we are
> comparing only two options and not many options because people have
> different interpretations of what bytes means.
>
> As contrast, what we generally mean by the unicode option is
> definition #3 from my blog post. That being:
>
> 1. The application is passed an instance of a Python dictionary
> containing what is referred to as the WSGI environment. All keys in
> this dictionary are native strings. For CGI variables, all names are
> going to be ISO-8859-1 and so where native strings are unicode
> strings, that encoding is used for the names of CGI variables
>
> 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
> environment, the value of the variable should be a native string.
>
> 3. For the CGI variables contained in the WSGI environment, the values
> of the variables are native strings. Where native strings are unicode
> strings, ISO-8859-1 encoding would be used such that the original
> character data is preserved and as necessary the unicode string can be
> converted back to bytes and thence decoded to unicode again using a
> different encoding.
>
> 4. The WSGI input stream 'wsgi.input' contained in the WSGI
> environment and from which request content is read, should yield byte
> strings.
>
> 5. The status line specified by the WSGI application should be a byte
> string. Where native strings are unicode strings, the native string
> type can also be returned in which case it would be encoded as
> ISO-8859-1.
>
> 6. The list of response headers specified by the WSGI application
> should contain tuples consisting of two values, where each value is a
> byte string. Where native strings are unicode strings, the native
> string ty

Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Christoph Zwerschke

Am 27.08.2010 18:27 schrieb P.J. Eby:
> At 06:05 PM 8/27/2010 +0200, Christoph Zwerschke wrote:
>> user = 'özkan'.encode('latin1')
>> if user = request.META.get('REMOTE_USER', b'').lower():
>>
>> will not work it the user has logged in as 'Özkan'.
>
> Isn't that a problem with code that does this now?

You mean in Python 2? If the locale is set properly, lower() will 
account for non-ascii. I don't think Python 3 does this with bytes.


-- Christoph
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread P.J. Eby

At 02:17 PM 8/27/2010 +1000, Graham Dumpleton wrote:

Since the major stumbling block, irrespective of other changes, to any
sort of agreement is still bytes vs unicode, and where we have a
reasonable clear definition of what unicode suggestion is, can we
please as a first step get a definition of what bytes actually implies
so everyone knows what we are talking about. I specifically ask this,
as it isn't clear because people don't explain in detail what they
mean when they are saying 'bytes'.

Going back to my definition #2 in my blog post from a year ago, I had:

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables


FYI, one thing that's changed here is the existence of os.environb in 
Python 3.2, at least on non-Windows OSes.




2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.


Since any meaningful use of this value is going to end up needing to 
be bytes again (e.g. Location headers), and for consistency's sake, I 
lean towards saying this is bytes too.




3. For the CGI variables contained in the WSGI environment, the values
of the variables are byte strings.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application must be a byte string.

6. The list of response headers specified by the WSGI application must
contain tuples consisting of two values, where each value is a byte
string.

7. The iterable returned by the application and from which response
content is derived, must yield byte strings.

The points of disagreement I have seen about this is are as follows.

For (1), the keys should also be bytes, including names of 'wsgi.' 
special keys.


For (2), the value of 'wsgi.url_scheme' should be bytes.

So, do you really want bytes absolutely everywhere, or are keys still
going to be unicode taken as ISO-8859-1.


If we follow the example of os.environb, then the keys have to be bytes also.

However, I can already see that the big problem with all of this is 
that WSGI code is going to be littered with a plague of "b"s hanging 
off the front of every string literal, and that 2to3 is probably not 
going to handle it correctly.  Making the keys bytes as well just 
multiplies the problem.





Note that we are not agreeing to the final solution here, just what
bytes means in contrast to the unicode option, so we know that we are
comparing only two options and not many options because people have
different interpretations of what bytes means.

As contrast, what we generally mean by the unicode option is
definition #3 from my blog post. That being:

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are native strings. Where native strings are unicode
strings, ISO-8859-1 encoding would be used such that the original
character data is preserved and as necessary the unicode string can be
converted back to bytes and thence decoded to unicode again using a
different encoding.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application should be a byte
string. Where native strings are unicode strings, the native string
type can also be returned in which case it would be encoded as
ISO-8859-1.

6. The list of response headers specified by the WSGI application
should contain tuples consisting of two values, where each value is a
byte string. Where native strings are unicode strings, the native
string type can also be returned in which case it would be encoded as
ISO-8859-1.

7. The iterable returned by the application and from which response
content is derived, should yield byte strings. Where native strings
are unicode strings, the native string type can also be returned in
which case it would be encoded as ISO-8859-1.

Even though call it unicode, it actually has bytes in places as well.
The key issues over bytes vs unicode has been in values in the
dictionary, but as pointed out about, not clear whether for bytes
option, we are talking about bytes for keys as well and for value of
'wsgi.url_scheme'.


The

Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread P.J. Eby

At 06:05 PM 8/27/2010 +0200, Christoph Zwerschke wrote:

 For instance,

user = 'özkan'.encode('latin1')
if user in request.META.get('REMOTE_USER', b'').lower():

will not work it the user has logged in as 'Özkan'.


Isn't that a problem with code that does this now? 


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Christoph Zwerschke

Am 27.08.2010 16:22 schrieb Armin Ronacher:
> For an all bytes approach a tool would have to recognize that this is
> from a WSGI environment and change the code to this:
>
> if b'msie' in request.META.get('HTTP_USER_AGENT', b'').lower():

Btw, another problem with this is that the lower() method does not know 
that it has to use latin1 when lowercasing. For instance,


user = 'özkan'.encode('latin1')
if user in request.META.get('REMOTE_USER', b'').lower():

will not work it the user has logged in as 'Özkan'.

-- Christoph
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Armin Ronacher

Hi,

On 2010-08-27 5:45 AM, P.J. Eby wrote:
> At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote:
> To the extent we're "unsure", I think the holdup is simply that nobody
> has tried doing an all-bytes WSGI implementation -- unless of course you
> count all our Python 2.x experience as experience with an all-bytes
> implementation. ;-)
I have a private branch of Werkzeug that is all bytes only.  Untested 
unfortunately because porting the testsuite over is a huge task on its 
own and not all parts work properly yet.  But it's okayish.


Werkzeug does not use anything from the standard library in the latest 
version except urljoin from the url parse package which I would have to 
rewrite for my little experiment.  In my attempt to port it I'm doing 
the encode/decode dance in a wrapper function.


> In theory, if we did it correctly it could actually minimize the porting
> pain for Python 3.
>
> In practice, I'm not sure how to do this, as I lack experience with 2to3
> at the moment, or any production experience with Python 3 whatsoever.
The big problem for me is that we *will* have to run to 2to3 because 
WSGI sometimes leaks from the framework to the application.  This is 
especially true for Django where request.META is directly passed as WSGI 
environment to the user and no accessor functions exist.  So everybody 
and is parsing the headers themselves there.


So when frameworks are starting to support any version of WSGI on Python 
3 they will also have to ship custom 2to3 fixers that add tiny shims for 
decoding/encoding either side of comparisons etc.


For example it's pretty common to see stuff like this:

if 'msie' in request.META.get('HTTP_USER_AGENT', '').lower():

For an all bytes approach a tool would have to recognize that this is 
from a WSGI environment and change the code to this:


if b'msie' in request.META.get('HTTP_USER_AGENT', b'').lower():

That's not impossible to do and in my mind the right decision, but it 
also means extra work to be done.  And if extra work is required when 
porting a framework and application over to Python 3 we could reward the 
people doing that with improvements of the specification itself.


I'm thinking about improving file_wrapper (so that middlewares can 
either detect that a file_wrapper is here and they should not consume 
the app iter, or just replacing it with a custom header), the input 
stream etc.



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-26 Thread Graham Dumpleton
On 27 August 2010 13:45, P.J. Eby  wrote:
> At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote:
>>
>> Hi,
>>
>> Is there a status update on that now I missed?  Did something decide on
>> bytes for the environment values or are we still unsure about that?
>
> To the extent we're "unsure", I think the holdup is simply that nobody has
> tried doing an all-bytes WSGI implementation -- unless of course you count
> all our Python 2.x experience as experience with an all-bytes
> implementation.  ;-)
>
> (Of course, that experience won't help us with Python 3 stdlib issues.)
>
>
>> At that point I don't care at all about what is decided on as long as
>> something is decided.  Can someone please stand up and just do that? :)
>
> Essentially the problem right now is that unless such a choice is made,
> there's little hope of getting the stdlib issues to be resolved, because we
> can't exactly file bug reports against the stdlib if we don't know what we
> want it to do.  ;-)
>
> My personal inclination is to define WSGI 2 as a bytes-oriented protocol,
> and then encourage people to port to WSGI 2 before moving to Python 3.

Since the major stumbling block, irrespective of other changes, to any
sort of agreement is still bytes vs unicode, and where we have a
reasonable clear definition of what unicode suggestion is, can we
please as a first step get a definition of what bytes actually implies
so everyone knows what we are talking about. I specifically ask this,
as it isn't clear because people don't explain in detail what they
mean when they are saying 'bytes'.

Going back to my definition #2 in my blog post from a year ago, I had:

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are byte strings.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application must be a byte string.

6. The list of response headers specified by the WSGI application must
contain tuples consisting of two values, where each value is a byte
string.

7. The iterable returned by the application and from which response
content is derived, must yield byte strings.

The points of disagreement I have seen about this is are as follows.

For (1), the keys should also be bytes, including names of 'wsgi.' special keys.

For (2), the value of 'wsgi.url_scheme' should be bytes.

So, do you really want bytes absolutely everywhere, or are keys still
going to be unicode taken as ISO-8859-1.

Note that we are not agreeing to the final solution here, just what
bytes means in contrast to the unicode option, so we know that we are
comparing only two options and not many options because people have
different interpretations of what bytes means.

As contrast, what we generally mean by the unicode option is
definition #3 from my blog post. That being:

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are native strings. Where native strings are unicode
strings, ISO-8859-1 encoding would be used such that the original
character data is preserved and as necessary the unicode string can be
converted back to bytes and thence decoded to unicode again using a
different encoding.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application should be a byte
string. Where native strings are unicode strings, the native string
type can also be returned in which case it would be encoded as
ISO-8859-1.

6. The list of response headers specified by the WSGI application
should contain tuples consisting of two values, where each value is a
byte string. Where native strings are unicode strings, the native
string type can also be returned in which case it would be encoded as
ISO-8859-1.

7. The iterable returned by the application and from which response
content is derived, should yield byte strings. Where native strings
are unicode strings, t

Re: [Web-SIG] WSGI for Python 3

2010-08-26 Thread P.J. Eby

At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote:

Hi,

Is there a status update on that now I missed?  Did something decide 
on bytes for the environment values or are we still unsure about that?


To the extent we're "unsure", I think the holdup is simply that 
nobody has tried doing an all-bytes WSGI implementation -- unless of 
course you count all our Python 2.x experience as experience with an 
all-bytes implementation.  ;-)


(Of course, that experience won't help us with Python 3 stdlib issues.)


At that point I don't care at all about what is decided on as long 
as something is decided.  Can someone please stand up and just do that? :)


Essentially the problem right now is that unless such a choice is 
made, there's little hope of getting the stdlib issues to be 
resolved, because we can't exactly file bug reports against the 
stdlib if we don't know what we want it to do.  ;-)


My personal inclination is to define WSGI 2 as a bytes-oriented 
protocol, and then encourage people to port to WSGI 2 before moving 
to Python 3.


In theory, if we did it correctly it could actually minimize the 
porting pain for Python 3.


In practice, I'm not sure how to do this, as I lack experience with 
2to3 at the moment, or any production experience with Python 3 whatsoever.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-26 Thread Armin Ronacher

Hi,

Is there a status update on that now I missed?  Did something decide on 
bytes for the environment values or are we still unsure about that?


From a discussion lately I had with Graham on #pocoo it seems like he 
lost interest on supporting WSGI on Python 3 for the time being due to 
lack of interest.


My personal pet project of actively redesigning WSGI to see if a 
higher-level protocol would solve the unicode issue better failed and 
was not worth the effort.


As I understand Python 3.0/1/2 will be broken for WSGI anyways so we can 
stop caring about the stdlib.


CherryPy seems to be the only system currently with an actively 
maintained Python 3 version of WSGI which from my understanding is based 
on unicode and bytes, where unicode is seen as latin1.


At that point I don't care at all about what is decided on as long as 
something is decided.  Can someone please stand up and just do that? :)



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-20 Thread Graham Dumpleton
On Tuesday, July 20, 2010, Etienne Robillard  wrote:
>
>
>
>
>
>
> Sorry to disagree. I dont think I've misunderstood any comments in this
> thread.
> At least some (encoding) issues seems from happening in Windows.

Can you please then point out which specific issue you are taking about?

The only Windows reference in this discussion that I recollect is my
own reference to it as part of an extended example about the fact that
the server is what ultimately dictates how any characters, including %
encodings, in the SCRIPT_NAME are. This is because the server derives
that part of the URL and not the WSGI application. That underlying
issue isn't Windows specific however.

Graham

> The
> point I
> attempted to made was that WSGI 2 could fix the chicken and egg
> problem. Python 3
> is not a solution but part of the problem, that is why a script could
> be written to
> port WSGI 1 apps to WSGI 2, assuming such a spec exists and stipulates
> how to parse
> http headers in Python 3...
>
> Regards,
>
> Etienne
>
> Graham Dumpleton wrote:
>
>   On Tuesday, July 20, 2010, Etienne Robillard 
>   wrote:
>
>
>
>
>
>
>
>
>
>
>
> AFAICT, the main difference is that under a
> bytes-only regime, the changes should be more consistent/mechanical, i.e.,
> able to be performed by relatively superficial code inspection.
>
>
>
> The problem in all these discussions is that practically no one has
> been prepared to actually sit down and attempt to migrate any
> significant code over to any of these proposals and Python 3.0.
>
> The only notable attempt is the work Robert Brewer did with CherryPy.
> Ultimately though I don't think the CherryPy case tells us much as it
> simple translates the interface in to an internal way of doing things.
> The true litmus test will be the conversion of any framework which
> keeps the WSGI interface exposed, with it being used as a means of
> composing together components to make a stack.
>
> Until someone has done that we have absolutely no evidence one way or
> the other as to what proposal is easier or even viable given potential
> short comings, or otherwise, in the Python language and standard
> libraries.
>
> It is a chicken and egg problem though in that I would say practically
> everyone doesn't want to do anything until the WSGI specification has
> been updated as they don't want to waste their time. You cant though
> update the specification without truly knowing whether a particular
> approach will work and to do that you have no choice but to actually
> try it.
>
>
> Hi Graham et al,
>
> One could maybe write a migration app for porting
> WSGI 1 apps to WSGI 2, in the same way 2to3.py was written.
>
> That's how at least I hoped to migrate notmm to Python 3. A switch
> could be used
> also to enable/disable bytes or text-mode only for HTTP headers
> parsing...
>
> Is there no such tools yet ready to slowly start moving ahead with
> WSGI 2 ? I recognize it's a chicken and egg problem but I don't think
> its necessary for framework authors to migrate to Python 3 in an
> attempt to solve mistery encoding
> errors affecting Windows platforms...
>
>
>
> The issues are not Windows specific. You are misunderstanding past
> comments if you believe that.
>
> The purpose to actually trying it is to work out how viable bytes
> everywhere and/or users dealing with % encoding is. If dealing with
> bytes everywhere proves to be easy then great, going that way may be
> best idea. If it is a PITA as some have said dealing with bytes is in
> Python 3.0 then we will know rather than it being speculation at this
> point.
>
> Graham
>
>
>
> A  easy-to-follow roadmap to WSGI
> 2  and writing
> related development tools should be a more effective way to port
> frameworks (to WSGI 2) and stick with Python 2 if they want so! ;-)
>
> my 2 cents,
>
> E
> --
> Etienne Robillard
> Green Tea Hackers Club
>
> E-mail: e...@gthcfoundation.org
> Work phone: 1 (514) 962-7703
> Website:https://gthc.org/
>
> During times of universal deceit, telling the truth becomes a revolutionary 
> act. -- George Orwell
>
>
>
>
>
>
>   ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/erob%40gthc.org
>
>
>
>
> --
> Etienne Robillard
> Green Tea Hackers Club
>
> E-mail: e...@gthcfoundation.org
> Work phone: 1 (514) 962-7703
> Website:https://gthc.org/
>
> During times of universal deceit, telling the truth becomes a revolutionary 
> act. -- George Orwell
>
>
>
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-20 Thread Graham Dumpleton
On Tuesday, July 20, 2010, Etienne Robillard  wrote:
>
>
>
>
>
>
>
>
>
>
> AFAICT, the main difference is that under a
> bytes-only regime, the changes should be more consistent/mechanical, i.e.,
> able to be performed by relatively superficial code inspection.
>
>
>
> The problem in all these discussions is that practically no one has
> been prepared to actually sit down and attempt to migrate any
> significant code over to any of these proposals and Python 3.0.
>
> The only notable attempt is the work Robert Brewer did with CherryPy.
> Ultimately though I don't think the CherryPy case tells us much as it
> simple translates the interface in to an internal way of doing things.
> The true litmus test will be the conversion of any framework which
> keeps the WSGI interface exposed, with it being used as a means of
> composing together components to make a stack.
>
> Until someone has done that we have absolutely no evidence one way or
> the other as to what proposal is easier or even viable given potential
> short comings, or otherwise, in the Python language and standard
> libraries.
>
> It is a chicken and egg problem though in that I would say practically
> everyone doesn't want to do anything until the WSGI specification has
> been updated as they don't want to waste their time. You cant though
> update the specification without truly knowing whether a particular
> approach will work and to do that you have no choice but to actually
> try it.
>
>
> Hi Graham et al,
>
> One could maybe write a migration app for porting
> WSGI 1 apps to WSGI 2, in the same way 2to3.py was written.
>
> That's how at least I hoped to migrate notmm to Python 3. A switch
> could be used
> also to enable/disable bytes or text-mode only for HTTP headers
> parsing...
>
> Is there no such tools yet ready to slowly start moving ahead with
> WSGI 2 ? I recognize it's a chicken and egg problem but I don't think
> its necessary for framework authors to migrate to Python 3 in an
> attempt to solve mistery encoding
> errors affecting Windows platforms...

The issues are not Windows specific. You are misunderstanding past
comments if you believe that.

The purpose to actually trying it is to work out how viable bytes
everywhere and/or users dealing with % encoding is. If dealing with
bytes everywhere proves to be easy then great, going that way may be
best idea. If it is a PITA as some have said dealing with bytes is in
Python 3.0 then we will know rather than it being speculation at this
point.

Graham

> A  easy-to-follow roadmap to WSGI
> 2  and writing
> related development tools should be a more effective way to port
> frameworks (to WSGI 2) and stick with Python 2 if they want so! ;-)
>
> my 2 cents,
>
> E
> --
> Etienne Robillard
> Green Tea Hackers Club
>
> E-mail: e...@gthcfoundation.org
> Work phone: 1 (514) 962-7703
> Website:https://gthc.org/
>
> During times of universal deceit, telling the truth becomes a revolutionary 
> act. -- George Orwell
>
>
>
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-19 Thread Paul Davis
On Tue, Jul 20, 2010 at 12:37 AM, Graham Dumpleton
 wrote:
> On 19 July 2010 03:19, P.J. Eby  wrote:
>> At 01:01 PM 7/18/2010 +1000, Graham Dumpleton wrote:
>>>
>>> This is on the basis that if people are going to have to rewrite their
>>> code
>>> a fair bit to handle bytes everywhere,
>>
>> What you mean by "rewrite their code a fair bit", and who is it that you
>> think will have to do this?
>> Or, more precisely, how is that any different from the text or
>> text-and-bytes proposals?
>
> My comments are based on the mood I have got from listening to
> discussions here on this list and discussions in other forums and irc
> channels. To me there appears to be a tendency towards people thinking
> that having bytes everywhere will be harder to deal with than the text
> proposal.
>
>> AFAICT, the main difference is that under a
>> bytes-only regime, the changes should be more consistent/mechanical, i.e.,
>> able to be performed by relatively superficial code inspection.
>
> The problem in all these discussions is that practically no one has
> been prepared to actually sit down and attempt to migrate any
> significant code over to any of these proposals and Python 3.0.
>
> The only notable attempt is the work Robert Brewer did with CherryPy.
> Ultimately though I don't think the CherryPy case tells us much as it
> simple translates the interface in to an internal way of doing things.
> The true litmus test will be the conversion of any framework which
> keeps the WSGI interface exposed, with it being used as a means of
> composing together components to make a stack.
>
> Until someone has done that we have absolutely no evidence one way or
> the other as to what proposal is easier or even viable given potential
> short comings, or otherwise, in the Python language and standard
> libraries.
>
> It is a chicken and egg problem though in that I would say practically
> everyone doesn't want to do anything until the WSGI specification has
> been updated as they don't want to waste their time. You cant though
> update the specification without truly knowing whether a particular
> approach will work and to do that you have no choice but to actually
> try it.
>
> And before you argue that the hosting mechanisms haven't been there to
> do that I will point out that mod_wsgi specifically implemented a way
> of being able to selectively say whether bytes or text was passed
> through. That code for bytes support sat there for six months or more
> and there was zero interest expressed to me by anyone in using it as a
> basis of some actual attempts at migrating existing code as a test. In
> the end it got thrown out due to that lack of interest and due to it
> holding up a new release of mod_wsgi.
>
> Distinct from mod_wsgi, it also wouldn't be that hard for interested
> people to modify wsgiref to implement the different proposals. I
> stress again that no one seems prepared to do that and again even if
> it was done, who is then going to try and use it.
>
> Thus we all just sit here on the fence waiting for others to do
> something, pushing our particular ideas and occasionally flip flopping
> between those ideas as well.
>
> Finally and for the record, I will not be modifying mod_wsgi to change
> it in anyway now until I see a separate proof of concept WSGI server
> and a decent sized framework ported to it. So yes I am going to sit on
> the fence as well, but that is because I have been burned in the past
> in putting in effort on this only see it go now where. I am not going
> to waste my time again like that.
>
> Graham
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: 
> http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com
>

Just a quick note. I've started working on a project to try and get a
version of wsgi running on 2.x and 3.x. I've been needing a reason to
start using 3.1 for sometime and this thread has managed to spur me
into action.

To be clear, I'm coming at this from the point of view that as long as
there are breaking changes, I might as well make things really broken.
So I'll be incorporating ideas from [1] as well as other bits of
trivia I've picked up. I realize this will lower the probability that
anything comes of this work, but I reckon it'll at least be some code
to discuss.

My current plan is to get a reference implementation with some tests
that runs on 2.x and 3.x. Once I get there I'll try porting WebOb [2]
and maybe Django [3] (depending on the progress of its port [4]). If I
get that far I'll probably make a fork of Gunicorn [5] so that there's
a whole stack that runs on both 2.x and 3.x.

Optimistically, I'd like to have enough code to show the reference
implementation and tests by this weekend. Although, I'm still learning
3.x differences and work arounds so I could fail miserably.

Paul J. Davis

[1] http://wsgi.org/wsgi/WSGI_2.0
[2] http://pythonpaste.org/

Re: [Web-SIG] WSGI for Python 3

2010-07-19 Thread Graham Dumpleton
On 19 July 2010 03:19, P.J. Eby  wrote:
> At 01:01 PM 7/18/2010 +1000, Graham Dumpleton wrote:
>>
>> This is on the basis that if people are going to have to rewrite their
>> code
>> a fair bit to handle bytes everywhere,
>
> What you mean by "rewrite their code a fair bit", and who is it that you
> think will have to do this?
> Or, more precisely, how is that any different from the text or
> text-and-bytes proposals?

My comments are based on the mood I have got from listening to
discussions here on this list and discussions in other forums and irc
channels. To me there appears to be a tendency towards people thinking
that having bytes everywhere will be harder to deal with than the text
proposal.

> AFAICT, the main difference is that under a
> bytes-only regime, the changes should be more consistent/mechanical, i.e.,
> able to be performed by relatively superficial code inspection.

The problem in all these discussions is that practically no one has
been prepared to actually sit down and attempt to migrate any
significant code over to any of these proposals and Python 3.0.

The only notable attempt is the work Robert Brewer did with CherryPy.
Ultimately though I don't think the CherryPy case tells us much as it
simple translates the interface in to an internal way of doing things.
The true litmus test will be the conversion of any framework which
keeps the WSGI interface exposed, with it being used as a means of
composing together components to make a stack.

Until someone has done that we have absolutely no evidence one way or
the other as to what proposal is easier or even viable given potential
short comings, or otherwise, in the Python language and standard
libraries.

It is a chicken and egg problem though in that I would say practically
everyone doesn't want to do anything until the WSGI specification has
been updated as they don't want to waste their time. You cant though
update the specification without truly knowing whether a particular
approach will work and to do that you have no choice but to actually
try it.

And before you argue that the hosting mechanisms haven't been there to
do that I will point out that mod_wsgi specifically implemented a way
of being able to selectively say whether bytes or text was passed
through. That code for bytes support sat there for six months or more
and there was zero interest expressed to me by anyone in using it as a
basis of some actual attempts at migrating existing code as a test. In
the end it got thrown out due to that lack of interest and due to it
holding up a new release of mod_wsgi.

Distinct from mod_wsgi, it also wouldn't be that hard for interested
people to modify wsgiref to implement the different proposals. I
stress again that no one seems prepared to do that and again even if
it was done, who is then going to try and use it.

Thus we all just sit here on the fence waiting for others to do
something, pushing our particular ideas and occasionally flip flopping
between those ideas as well.

Finally and for the record, I will not be modifying mod_wsgi to change
it in anyway now until I see a separate proof of concept WSGI server
and a decent sized framework ported to it. So yes I am going to sit on
the fence as well, but that is because I have been burned in the past
in putting in effort on this only see it go now where. I am not going
to waste my time again like that.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-19 Thread Graham Dumpleton
Go back through my blog and read some of the posts there so you have
some of the history. Recent discussions build on some of the stuff
there and I don't think anyone has the time to keep explaining all
this to every new person who comes along.

Graham

On Monday, July 19, 2010, Aaron Watters  wrote:
> I'm still in denial regarding Python 3 generally speaking,
> but it looks like something important is going on here.  Could
> someone summarize the main points (intelligible to a Python 2
> troglodyte)?
>
> thanks in advance,  -- Aaron Watters
>
> ===
> % man less
> less is more.
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-19 Thread Aaron Watters
I'm still in denial regarding Python 3 generally speaking,
but it looks like something important is going on here.  Could
someone summarize the main points (intelligible to a Python 2
troglodyte)?

thanks in advance,  -- Aaron Watters

===
% man less
less is more.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-18 Thread P.J. Eby

At 01:01 PM 7/18/2010 +1000, Graham Dumpleton wrote:

This is on the basis that if people are going to have to rewrite their code
a fair bit to handle bytes everywhere,


What you mean by "rewrite their code a fair bit", and who is it that 
you think will have to do this?


Or, more precisely, how is that any different from the text or 
text-and-bytes proposals?  AFAICT, the main difference is that under 
a bytes-only regime, the changes should be more 
consistent/mechanical, i.e., able to be performed by relatively 
superficial code inspection.




My personal opinion is that if you are going to go bytes everywhere,
then you may as well throw out the complete WSGI specification as it
stands now and fix all the other problems with the specification.


That may not be a bad idea; I'm certainly in favor of going ahead and 
ditching start_response/write while we're at it.  The requirement to 
change both the entry and exit points to match the calling convention 
also seems to provide an ideal opportunity to insert any necessary 
encoding or decoding operations.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Chris McDonough
On Fri, 2010-07-16 at 23:38 -0500, Ian Bicking wrote:
> On Fri, Jul 16, 2010 at 9:43 PM, Chris McDonough 
> wrote:
> 
> > Nah, not nearly that hard:
> >
> > path_info =
> >
> 
> urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
> >
> > I don't see the problem?  If you want to distinguish %2f
> from /, then
> > you'll do it slightly differently, like:
> >
> > path_parts = [
> > urllib.parse.unquote_to_bytes(p).decode('UTF-8')
> > for p in environ['wsgi.raw_path_info'].split('/')]
> >
> > This second recipe is impossible to do currently with WSGI.
> >
> > So... before jumping to conclusions, what's the hard part
> with using
> > text?
> 
> 
> It's extremely hard to swallow Python 3's current disregard
> for the
> primacy of bytes at I/O boundaries.  I'm trying, but I can't
> help but
> feel that the existence of an API like "unquote_to_bytes" is
> more
> symptom treatment than solution.  Of course something that
> unquotes a
> URL segment unquotes it into bytes; it's the only sane default
> because
> URL segments found in URLs on the internet are bytes.
> 
> Yes, URL quoted strings should decode to bytes, though arguably it is
> reasonable to also use the very reasonable UTF-8 default that
> urllib.parse.quote/unquote uses.  So it's really just a question of
> names, should be quote_to_string or quote_to_bytes that name.  Which
> honestly... whatever.

After some careful consideration, I realize I'm only able to offer stop
energy regarding the WSGI-as-text proposal, so I'll bow out of any
maillist conversation about it for now.

- C





___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Graham Dumpleton
On 17 July 2010 22:30,   wrote:
> On Fri, 16 Jul 2010, P.J. Eby wrote:
>
>> At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
>> There should be one, and preferably *only* one, obvious way to do it.
>>
>> And given that HTTP is inherently a bunch of bytes, bytes is the one
>> obvious way.
>
> I think this makes sense. The thing which is assembling the WSGI
> environment should do bytes and things further down the stack can
> deal with it as they like. This aligns well with how I like to think
> about such stuff: bytes on the outside, unicode on the inside.
>
> Given that app and frameworks developers can throw whatever keys
> they like back into the environment, they can cope as they like.[1]
>
> What would be horrible is if there need to be multiple coping
> strategies. Better to be able to say, "Oh it doesn't work? Try this
> way to cope: remember it is bytes."
>
> However, unless I'm misreading the thread, the bytes issue isn't
> really the bone of contention.

Actually it still is. There are still two competing camps. Some want
text, some want bytes. The whole discussion started purely around
basis of progressing the text based proposal. As usual, those wanting
bytes step up and we get two interwoven discussions which if you don't
know the history can be hard to follow.

My personal opinion is that if you are going to go bytes everywhere,
then you may as well throw out the complete WSGI specification as it
stands now and fix all the other problems with the specification. This
is on the basis that if people are going to have to rewrite their code
a fair bit to handle bytes everywhere, you may as well structurally
change the WSGI interface API as well to address other problems.

Anyway, it seems to be moot at this point as some believe that bytes
everywhere with Python language as it stands, plus state of stdlib
would make use of bytes everywhere rather unmanageable, which is where
ebytes comes in. Thus bytes everywhere doesn't sound like a short term
solution and requires changes in Python itself to make it viable.

Graham

> People seem okay with bytes as long
> as specifc points of pain are addressed, such as:
>
> * What's my PATH_INFO and SCRIPT_NAME?
> * This server, which hosts, but is not, the WSGI environment builder
>  doesn't play well with this model.
> * Some others I can't remember now.
>
> It seems then that perhaps a way forward is to say: Okay, it's gonna
> be bytes. Now, given that, how do we deal with these other issues,
> which perhaps can be recast and encapsulated to be considered
> orthogonal to the bytes/not-bytes debate.
>
> Because we _know_ that any choice is going to come with costs, but
> as things have dragged on, the lack of choice thus far is starting
> to have as much of a cost as the costs that are wanting to be
> resolved.
>
> [1] I not expecting or hoping for  porting/migrating to Python 3 to
> be simple/automatic/easy, but perhaps I'm cruel.
> --
> Chris Dent                      http://burningchrome.com/~cdent/
>                              [...]
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Ian Bicking
On Sat, Jul 17, 2010 at 5:57 AM, Armin Ronacher  wrote:

>  On 7/17/10 9:15 AM, Ian Bicking wrote:
>
>> This is an Apache-specific issue.  It definitely doesn't apply to
>> paste.httpserver, I doubt CherryPy or wsgiref.  I don't really know how
>> Nginx or other servers work.
>>
>
> This will be an issue for every server that...
>
>  * supports unicode filesystems
>  * decides to do internal mapping based on URIs and not IRIs
>

I think specifically it's hard to go back and forth between URL-encoded and
decoded paths, so if a system parses the decoded path then it's difficult to
go back to a raw form.  For example Paste includes several URL mappers, and
they would require (minor) rewriting; but then they can be rewritten so it's
not as concerning.  Apache cannot be rewritten to parse the encoded URL.  I
think working on the encoded URLs is a better representation of HTTP, and
HTTP URLs, and of browser behavior... but there is a legacy concern.

I don't think IRI or URI matters in this case; by decoding you *could*
transcode URLs from UTF-8 to some local encoding, but that's not the issue I
see us encountering here, it's really the more simple issue of URL encoding.

(I should say I appreciate this concrete concern; it keeps us grounded when
we discuss HTTP *specifically*, not bytes-v-unicode generally)

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Bill Janssen
Chris McDonough  wrote:

> On Sat, 2010-07-17 at 01:33 +0200, Armin Ronacher wrote:
> > Hi,
> > 
> > On 7/17/10 1:20 AM, Chris McDonough wrote:
> >  > Let me know if I'm missing something.
> > The only thing you miss is that the bytes type of Python 3 is badly 
> > supported in the stdlib (not an issue if we reimplement everything in 
> > our libraries, not an issue for me) and that the bytes type has no 
> > string formattings which makes us do the encode/decode dance in our own 
> > implementation so of the missing stdlib functions.
> 
> This is why the docs mention "bytes with benefits" instead (like the
> Python 2 "str" type). The existence of such a type would be the result
> of us lobbying for its inclusion into some future Python 3, or at least
> the result of lobbying for a String ABC that would allow us to define
> our own.

I think the most effective way to lobby here would be to provide the
String ABC and an implementation of "encoded strings", i.e. strings with
an internal representation that's a byte sequence in a particular
encoding.

Bill
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread William Dode
On 17-07-2010, chris.d...@gmail.com wrote:
> On Fri, 16 Jul 2010, P.J. Eby wrote:
>
>> At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
>> There should be one, and preferably *only* one, obvious way to do it.
>>
>> And given that HTTP is inherently a bunch of bytes, bytes is the one obvious 
>> way.
>
> I think this makes sense. The thing which is assembling the WSGI
> environment should do bytes and things further down the stack can
> deal with it as they like. This aligns well with how I like to think
> about such stuff: bytes on the outside, unicode on the inside.
>
> Given that app and frameworks developers can throw whatever keys
> they like back into the environment, they can cope as they like.[1]
>
> What would be horrible is if there need to be multiple coping
> strategies. Better to be able to say, "Oh it doesn't work? Try this
> way to cope: remember it is bytes."

This thread is difficult to follow, but this make sense to me also. KISS 

-- 
William Dodé - http://flibuste.net
Informaticien Indépendant

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Alan Kennedy
[PJ Eby]
> IOW, the bytes/string discussion on Python-dev has kind of led me to realize
> that we might just as well make the *entire* stack bytes (incoming and
> outgoing headers *and* streams), and rewrite that bit in PEP 333 about using
> str on "Python 3000" to say we go with bytes on Python 3+ for everything
> that's a str in today's WSGI.
>
> Or, to put it another way, if I knew then what I know *now*, I think I'd
> have written the PEP the other way around, such that the use of 'str' in
> WSGI would be a substitute for the future 'bytes' type, rather than viewing
> some byte strings as a forward-compatible substitute for Py3K unicode
> strings.
>
> Of course, this would be a WSGI 2 change, but IMO we're better off making a
> clean break with backward compatibility here anyway, rather than having
> conditionals.  Also, going with bytes everywhere means we don't have to
> rename SCRIPT_NAME and PATH_INFO, which in turn avoids deeper rewrites being
> required in today's apps.

+1

> (Hm.  Although actually, I suppose we *could* just borrow the time machine
> and pretend that WSGI called for "byte-strings everywhere" all along...)

+1/0

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread chris . dent

On Fri, 16 Jul 2010, P.J. Eby wrote:


At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
There should be one, and preferably *only* one, obvious way to do it.

And given that HTTP is inherently a bunch of bytes, bytes is the one obvious 
way.


I think this makes sense. The thing which is assembling the WSGI
environment should do bytes and things further down the stack can
deal with it as they like. This aligns well with how I like to think
about such stuff: bytes on the outside, unicode on the inside.

Given that app and frameworks developers can throw whatever keys
they like back into the environment, they can cope as they like.[1]

What would be horrible is if there need to be multiple coping
strategies. Better to be able to say, "Oh it doesn't work? Try this
way to cope: remember it is bytes."

However, unless I'm misreading the thread, the bytes issue isn't
really the bone of contention. People seem okay with bytes as long
as specifc points of pain are addressed, such as:

* What's my PATH_INFO and SCRIPT_NAME?
* This server, which hosts, but is not, the WSGI environment builder
  doesn't play well with this model.
* Some others I can't remember now.

It seems then that perhaps a way forward is to say: Okay, it's gonna
be bytes. Now, given that, how do we deal with these other issues,
which perhaps can be recast and encapsulated to be considered
orthogonal to the bytes/not-bytes debate.

Because we _know_ that any choice is going to come with costs, but
as things have dragged on, the lack of choice thus far is starting
to have as much of a cost as the costs that are wanting to be
resolved.

[1] I not expecting or hoping for  porting/migrating to Python 3 to
be simple/automatic/easy, but perhaps I'm cruel.
--
Chris Dent  http://burningchrome.com/~cdent/
  [...]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Armin Ronacher

Hi,

On 7/17/10 12:57 PM, Armin Ronacher wrote:

In fact, this will be an issue for things like middlewares that want to
map applications to paths. In fact, this already is an issue on Python 2
already, just that nobody cares.

s/applications/serving static files from folders/


Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Armin Ronacher

Hi,

On 7/17/10 9:15 AM, Ian Bicking wrote:

This is an Apache-specific issue.  It definitely doesn't apply to
paste.httpserver, I doubt CherryPy or wsgiref.  I don't really know how
Nginx or other servers work.

This will be an issue for every server that...

 * supports unicode filesystems
 * decides to do internal mapping based on URIs and not IRIs

In fact, this will be an issue for things like middlewares that want to 
map applications to paths.  In fact, this already is an issue on Python 
2 already, just that nobody cares.



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Graham Dumpleton
On Saturday, July 17, 2010, Ian Bicking  wrote:
> On Sat, Jul 17, 2010 at 12:38 AM, Graham Dumpleton 
>  wrote:
>
>
> On Friday, July 16, 2010, And Clover  wrote:
>> On 07/14/2010 06:43 AM, Ian Bicking wrote:
>>
>>
>> There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
>> and HTTP_COOKIE.
>>
>>
>> (And of those, PATH_INFO is the only one that really matters, in that no-one 
>> really uses non-ASCII script filenames,
>
> FWIW, I had to go to a lot of trouble to allow non ASCII in final
> SCRIPT_NAME in mod_wsgi. Specifically using AddHandler directive in
> Apache means a file system path can make up part of SCRIPT_NAME. I had
> someone who was specifically using Russian in a WSGI script file name
> and because with AddHandler that becomes part of SCRIPT_NAME you had
> to cater for it. Anyway this was more of a Windows issue in having to
> use special file system functions to deal with fact that on Windows
> filesystem paths aren't UTF-8 but something else.
>
> What this does highlight though is that although one can talk about
> passing raw script name through to application, that isn't necessarily
> right as it isn't the application that dictates what encoding may be
> used but the web server which is performing the mapping of that part
> of the original URL path to a potential filesystem resource, or
> alternatively where file based configuration for mount point, the
> encoding of the web sever configuration file.
>
> This is an Apache-specific issue.  It definitely doesn't apply to 
> paste.httpserver, I doubt CherryPy or wsgiref.  I don't really know how Nginx 
> or other servers work.

The only reason it doesn't apply to paste.httpserver is because it
doesn't have a URL mapping system of it's own. That is, you host a
single WSGI application at the root of the server. Any server which
allows hosting of a WSGI application at a sub URL will have such
issues. Specifically, the details of the sub URL are worked out by the
server, be it by mapping a URL to the file system or through matching
to a configuration parameter in a server configuration file.


Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Ian Bicking
On Sat, Jul 17, 2010 at 12:38 AM, Graham Dumpleton <
graham.dumple...@gmail.com> wrote:

> On Friday, July 16, 2010, And Clover  wrote:
> > On 07/14/2010 06:43 AM, Ian Bicking wrote:
> >
> >
> > There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
> > and HTTP_COOKIE.
> >
> >
> > (And of those, PATH_INFO is the only one that really matters, in that
> no-one really uses non-ASCII script filenames,
>
> FWIW, I had to go to a lot of trouble to allow non ASCII in final
> SCRIPT_NAME in mod_wsgi. Specifically using AddHandler directive in
> Apache means a file system path can make up part of SCRIPT_NAME. I had
> someone who was specifically using Russian in a WSGI script file name
> and because with AddHandler that becomes part of SCRIPT_NAME you had
> to cater for it. Anyway this was more of a Windows issue in having to
> use special file system functions to deal with fact that on Windows
> filesystem paths aren't UTF-8 but something else.
>
> What this does highlight though is that although one can talk about
> passing raw script name through to application, that isn't necessarily
> right as it isn't the application that dictates what encoding may be
> used but the web server which is performing the mapping of that part
> of the original URL path to a potential filesystem resource, or
> alternatively where file based configuration for mount point, the
> encoding of the web sever configuration file.
>

This is an Apache-specific issue.  It definitely doesn't apply to
paste.httpserver, I doubt CherryPy or wsgiref.  I don't really know how
Nginx or other servers work.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Saturday, July 17, 2010, Graham Dumpleton  wrote:
> On Saturday, July 17, 2010, Ian Bicking  wrote:
>> On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby  wrote:
>>
>>
>> At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
>>
>> And this doesn't help with Python 3: either we have byte values of 
>> SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think 
>> bytes will be more awkward to port to than text, and inconsistent with other 
>> WSGI values.
>>
>>
>> OTOH, it has the tremendous advantage of pushing the encoding question onto 
>> the app (or framework) developer...  who's really the only one who can make 
>> the right decision for their particular application.  And personally, I'd 
>> rather have clear boundaries between text and bytes, such that porting (even 
>> if tedious or awkward) is *consistent*, and clear as to when you're 
>> finished, not, "oh, did I check to make sure I converted SCRIPT_NAME and 
>> PATH_INFO...  not just in my app code, but in all the library code I call 
>> *from* my app?"
>>
>> IOW, the bytes/string discussion on Python-dev has kind of led me to realize 
>> that we might just as well make the *entire* stack bytes (incoming and 
>> outgoing headers *and* streams), and rewrite that bit in PEP 333 about using 
>> str on "Python 3000" to say we go with bytes on Python 3+ for everything 
>> that's a str in today's WSGI.
>>
>> This was my first intuition too, until I started thinking in more detail 
>> about the particular values involved.  Some obviously are textish, like 
>> environ['SERVER_NAME'].  Not a very useful value, but definitely text.
>>
>> Basically all the internal strings are textish, so we're left with:
>>
>> wsgi.url_scheme
>> SCRIPT_NAME/PATH_INFO
>> QUERY_STRING
>> HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
>> response status
>> response headers (name and value)
>>
>> And there's a few things like REMOTE_USER that are kind of in the middle.  
>> Everyone is in agreement that bodies should be bytes.
>>
>> One initial problem is that the Python 3 stdlib handles bytes poorly, so for 
>> instance there's no good way to reconstruct the URL using the stdlib.  That 
>> explains certain tensions, but I think we should ignore that, and in fact 
>> that's what Python-Dev seemed to say pretty clearly.
>>
>> Now, the other keys:
>>
>> wsgi.url_scheme: clearly ASCII
>>
>> SCRIPT_NAME/PATH_INFO: often UTF-8, could be no encoding, could be some old 
>> legacy encoding.
>> raw request path: should be ASCII (non-ASCII should be URL-encoded).  URL 
>> encoding happens at the byte layer, so a server could reasonably URL encode 
>> any non-ASCII characters without imposing any  encoding.
>>
>> QUERY_STRING: should be ASCII, same as raw request path
>>
>> headers: Most are ASCII.  Latin1 is a reasonable fallback and suggested by 
>> the specification.  The spec also implies you have use the RFC2047 inline 
>> encoding (like ?iso-8859-1?q?some=20text?=), but nothing supports this and 
>> supporting it would probably be a bad idea for security reasons.  The 
>> Atompub spec (reasonably modern) specifically says Title headers should be 
>> encoded with RFC2047 (if they are not ISO-8859-1): 
>> http://tools.ietf.org/html/draft-ietf-atompub-protocol-08#page-17 -- 
>> decoding this kind of encoding at the application layer seems reasonable to 
>> me.
>>
>> cookie header: this specific header can easily have multiple encodings, as 
>> the browser encodes data then treats it as opaque bytes, so a cookie can be 
>> set via UTF-8 one place, Latin1 another, and those coexist in one header.  
>> That is, there is no real encoding and this should be treated as bytes.  
>> (Latin1 is an approximation of bytes... a spotty way to treat bytes, but 
>> entirely workable.)
>>
>> response status: I believe the spec says this must be Latin1/ISO-8859-1.  In 
>> practice it is almost always ASCII, and since it is not user-visible it's 
>> not something that really needs localization.
>>
>> response headers: the spec implies Latin1, in practice the Set-Cookie header 
>> is bytes (since interoperation with wonky legacy systems is not uncommon).  
>> I'm not sure of any other exceptions?
>>
>>
>> So... to me it seems pretty reasonable for HTTP specifically that text can 
>> work.  And if feels weird that, say, environ['SERVER_NAME'] be text and 
>> environ['HTTP_HOST'] not, and I don't know what environ['REMOTE_ADDR'] 
>> should be in that mode.  And it would also be weird if 
>> environ['SERVER_NAME'] was bytes.
>>
>> In the past when we've gotten down to specifics, the only holdup has been 
>> SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.
>
> There were a few other weird ones which are though server specific.
> For example PATH_TRANSLATED (??). These are ones where again the
> server or operating system dictates the encoding due to them having
> bits in them deriving from things like filesystem paths and server
> configuration files. I laboriously went through all these in

Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Saturday, July 17, 2010, Ian Bicking  wrote:
> On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby  wrote:
>
>
> At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
>
> And this doesn't help with Python 3: either we have byte values of 
> SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think 
> bytes will be more awkward to port to than text, and inconsistent with other 
> WSGI values.
>
>
> OTOH, it has the tremendous advantage of pushing the encoding question onto 
> the app (or framework) developer...  who's really the only one who can make 
> the right decision for their particular application.  And personally, I'd 
> rather have clear boundaries between text and bytes, such that porting (even 
> if tedious or awkward) is *consistent*, and clear as to when you're finished, 
> not, "oh, did I check to make sure I converted SCRIPT_NAME and PATH_INFO...  
> not just in my app code, but in all the library code I call *from* my app?"
>
> IOW, the bytes/string discussion on Python-dev has kind of led me to realize 
> that we might just as well make the *entire* stack bytes (incoming and 
> outgoing headers *and* streams), and rewrite that bit in PEP 333 about using 
> str on "Python 3000" to say we go with bytes on Python 3+ for everything 
> that's a str in today's WSGI.
>
> This was my first intuition too, until I started thinking in more detail 
> about the particular values involved.  Some obviously are textish, like 
> environ['SERVER_NAME'].  Not a very useful value, but definitely text.
>
> Basically all the internal strings are textish, so we're left with:
>
> wsgi.url_scheme
> SCRIPT_NAME/PATH_INFO
> QUERY_STRING
> HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
> response status
> response headers (name and value)
>
> And there's a few things like REMOTE_USER that are kind of in the middle.  
> Everyone is in agreement that bodies should be bytes.
>
> One initial problem is that the Python 3 stdlib handles bytes poorly, so for 
> instance there's no good way to reconstruct the URL using the stdlib.  That 
> explains certain tensions, but I think we should ignore that, and in fact 
> that's what Python-Dev seemed to say pretty clearly.
>
> Now, the other keys:
>
> wsgi.url_scheme: clearly ASCII
>
> SCRIPT_NAME/PATH_INFO: often UTF-8, could be no encoding, could be some old 
> legacy encoding.
> raw request path: should be ASCII (non-ASCII should be URL-encoded).  URL 
> encoding happens at the byte layer, so a server could reasonably URL encode 
> any non-ASCII characters without imposing any  encoding.
>
> QUERY_STRING: should be ASCII, same as raw request path
>
> headers: Most are ASCII.  Latin1 is a reasonable fallback and suggested by 
> the specification.  The spec also implies you have use the RFC2047 inline 
> encoding (like ?iso-8859-1?q?some=20text?=), but nothing supports this and 
> supporting it would probably be a bad idea for security reasons.  The Atompub 
> spec (reasonably modern) specifically says Title headers should be encoded 
> with RFC2047 (if they are not ISO-8859-1): 
> http://tools.ietf.org/html/draft-ietf-atompub-protocol-08#page-17 -- decoding 
> this kind of encoding at the application layer seems reasonable to me.
>
> cookie header: this specific header can easily have multiple encodings, as 
> the browser encodes data then treats it as opaque bytes, so a cookie can be 
> set via UTF-8 one place, Latin1 another, and those coexist in one header.  
> That is, there is no real encoding and this should be treated as bytes.  
> (Latin1 is an approximation of bytes... a spotty way to treat bytes, but 
> entirely workable.)
>
> response status: I believe the spec says this must be Latin1/ISO-8859-1.  In 
> practice it is almost always ASCII, and since it is not user-visible it's not 
> something that really needs localization.
>
> response headers: the spec implies Latin1, in practice the Set-Cookie header 
> is bytes (since interoperation with wonky legacy systems is not uncommon).  
> I'm not sure of any other exceptions?
>
>
> So... to me it seems pretty reasonable for HTTP specifically that text can 
> work.  And if feels weird that, say, environ['SERVER_NAME'] be text and 
> environ['HTTP_HOST'] not, and I don't know what environ['REMOTE_ADDR'] should 
> be in that mode.  And it would also be weird if environ['SERVER_NAME'] was 
> bytes.
>
> In the past when we've gotten down to specifics, the only holdup has been 
> SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.

There were a few other weird ones which are though server specific.
For example PATH_TRANSLATED (??). These are ones where again the
server or operating system dictates the encoding due to them having
bits in them deriving from things like filesystem paths and server
configuration files. I laboriously went through all these in an email
last year or earlier.

Same reason why SCRIPT_NAME is really dictated by server and raw value
perhaps should be going through to application.

Graham
_

Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Saturday, July 17, 2010, Gustavo Narea  wrote:
> Hello,
>
> Ian said:
>> Having two ways of expressing the same information will lead to bugs
>> related to which data is canonical.  If an application is using
>> SCRIPT_NAME/PATH_INFO and then updates those values in any way, and
>> wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be
>> weird bugs and code will disagree about which one is correct.  Since %2f
>> can exist in the raw versions, there isn't even a way to chunk the two
>> variables in the same way.
>
> I can't agree more.
>
> I would propose the following, and excuse me in advance if this has already
> been proposed and discarded -- I've tried to follow this topic on the mailing
> list over the past few months, until it becomes an endless discussion.
>
> I think only the raw values should be available. Even if a middleware changes
> them, it must put them with raw values. And because you cannot change those
> values without knowing what encoding the request uses, the character encoding
> *must* be present.
>
> I know that sounds easy but it's not, because browsers don't specify the
> charset in the Content-Type and instead they generate a new request using the
> charset from the previous response. So the charset is unknown to the
> server/gateway and the middleware stack.
>
> So, what we could do is introduce a mandatory variable called, say,
> wsgi.charset, and would be used as follows:

Something like this was proposed before, but it only applied to the
keys that mattered, specifically PATH_INFO and maybe QUERY_STRING,
(the latter of which this discussion has been ignoring and I can't
remember how we worked out before it should be treated). It didn't
cover SCRIPT_NAME as as I indicated before, the encoding of that is
really dictated by the server and not the application for the initial
value at least.

The idea was that the server would pass them as Latin 1 and set the
encoding key. If a consumer of it didn't like the encoding it was in,
it would convert it back to bytes and then to what it wants and update
the encoding key to what it used. Thus you had a hint available to
allow reliable transcoding. This proposal didn't get acceptance
either.

Graham

>  - It MUST be set by the server or gateway on every request.
>  - Every middleware or application that reads or writes these values MUST use
> the charset specified in wsgi.charset.
>  - If a server, gateway, middleware or application wants to change the charset
> and it is possible*, it MUST convert the *entire* request into that charset
> and update wsgi.charset accordingly.
>  - When the charset is not specified in the HTTP request, UTF-8 MUST be
> assumed by the server/gateway. Unless another default charset has been
> specified by the user.
>
> I think/hope that will solve all the problems.
>
> What happens when a WSGI application is actually made up two WSGI applications
> and they send the responses in different charsets? If it's not possible to
> configure them so that they both use the same charsets, then one of them would
> have to be wrapped by a middleware which:
>  - On egress, converts the responses using the charset used by the other
> application.
>  - On ingress, if the charset is not specified in the request, it will assume
> it's the one used by the other application, and thus it will convert the
> request using the charset supported by the wrapped application.
>
> It would look like this:
> ===
> def application(environ, start_response):
>     if environ.startswith("/trac/"):
>         # Say Trac only supports Latin-1 and we want responses to use UTF-8:
>         app = trac.web.main.dispatch_request
>         app = CharsetNormalizer(app, response="latin-1", request="utf8")
>     else:
>         # myapp uses UTF-8
>         app = myapp
>     return app(environ, start_response)
> ===
>
> Then there's the string vs bytes issue. Bytes would be the natural choice to
> represent these raw values, but it would probably cause more trouble than they
> solve. So, I think they should be strings that contain the the ASCII raw
> encoded values (i.e., str on both versions of Python).
>
> What do you think about this? Again, sorry if this has been discarded before!
> :)
>
> * For example, you can always convert Latin-1 to UTF-8, but not every UTF-8
> string can be converted to Latin-1.
> --
> Gustavo Narea .
> | Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: 
> http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Saturday, July 17, 2010, Ian Bicking  wrote:
> On Fri, Jul 16, 2010 at 12:28 PM, Chris McDonough  wrote:
>
>
> On Fri, 2010-07-16 at 11:07 -0500, Ian Bicking wrote:
>
>> And this doesn't help with Python 3: either we have byte values of
>> SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I
>> think bytes will be more awkward to port to than text, and
>> inconsistent with other WSGI values.  If we have text then we have to
>> choose an encoding.  Latin1 will work, but it will be the exact wrong
>> encoding most of the time as UTF-8 is the typical  (unlike other
>> headers, where Latin1 will mostly be an okay encoding, or as good a
>> guess as we have).  If we firmly remove these keys then we can avoid
>> this choice entirely... and we conveniently also get a better
>> representation of the request.
>
> My $.02: I'd rather lobby the core folks for a string ABC (which we can
> hook with a stringlike bytes type) and consider all 3.X releases made so
> far "dead to WSGI" than to have to tunnel arbitrary bytes through some
> misleading Unicode encoding.
>
> While I think it would be generally useful, it's also a long way off at best, 
> with serious performance dangers that could torpedo the whole thing.  But... 
> I'm also unsure how it would help here, except perhaps we could incrementally 
> annotate bytes with an encoding?  Well, I don't really know.  Treating the 
> raw request path as text is easy enough, as it should always be ASCII 
> anyway.  We don't have to worry what is "right" or "wrong" in this case.
>
> We could make everything bytes and be done with it, but it would make it much 
> harder to port Python 2 WSGI code to Python

FWIW, I see the whole ebytes discussion only relevant were you to make
absolutely everything bytes. We don't really need it otherwise.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Saturday, July 17, 2010, Ian Bicking  wrote:
> On Fri, Jul 16, 2010 at 4:33 AM, And Clover  wrote:
>
>
> On 07/14/2010 06:43 AM, Ian Bicking wrote:
>
>
> There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
> and HTTP_COOKIE.
>
>
>
> (And of those, PATH_INFO is the only one that really matters, in that no-one 
> really uses non-ASCII script filenames, and non-ASCII characters in 
> Cookie/Set-Cookie are still handled so differently/brokenly across browsers 
> that you can't rely on them at all.)
>
>
>
>
> * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
> exclusively with encoded versions
>
>
>
> For compatibility with existing apps, how about keeping the existing 
> SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying 
> that the new 'raw' versions (whatever they are called) are added only if they 
> really are raw, not reconstructed.
>
> Having two ways of expressing the same information will lead to bugs related 
> to which data is canonical.  If an application is using SCRIPT_NAME/PATH_INFO 
> and then updates those values in any way, and 
> wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be weird 
> bugs and code will disagree about which one is correct.  Since %2f can exist 
> in the raw versions, there isn't even a way to chunk the two variables in the 
> same way.
>
>
> Then existing scripts that don't care about non-ASCII and slashes can carry 
> on as before, and for apps that do care about them, they'll be able to be 
> *sure* the input is correct. Or they can fall back to PATH_INFO when not 
> present, and avoid producing these kind of URLs in response.
>
> I don't think it works to imagine you can just not care about non-ASCII.  
> Requests come in.  WSGI should represent those requests.  If a request comes 
> in with non-ASCII bytes then WSGI needs to do *something* with it.  I don't 
> want to have to configure servers with application policy; servers should 
> just work.
>
> And this doesn't help with Python 3: either we have byte values of 
> SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think bytes 
> will be more awkward to port to than text, and inconsistent with other WSGI 
> values.  If we have text then we have to choose an encoding.  Latin1 will 
> work, but it will be the exact wrong encoding most of the time as UTF-8 is 
> the typical  (unlike other headers, where Latin1 will mostly be an okay 
> encoding, or as good a guess as we have).  If we firmly remove these keys 
> then we can avoid this choice entirely... and we conveniently also get a 
> better representation of the request.

One reason I don't want to see the existing keys removed is for
debugging purposes. In Apache, various Apache modules such as
mod_rewrite will operate on that translated path. I am concerned that
if only the raw one is available in the WSGI application then
confusion may arise where something doesn't go right with rewrites
because the only information that may be able to be dumped in the way
of debug by an application will be different to what other Apache
modules may operate on. If you aren't going to make use of CGI
versions, then would still like to see them present but perhaps
renamed. That way you don't have a loss of information when it comes
to trying to debug stuff. I could perhaps just put this in a
Apache/mod_wsgi specific key as well given that the issue is
particular to it. Thus might have apache.path_info or cgi.path_info.

Graham

> Note that libraries can smooth over this change; WebOb for instance will 
> certainly still support req.script_name/req.path_info by decoding the raw 
> values.  Admittedly lots of code use these values directly... but at least if 
> they get a KeyError the port/fix will be obvious (as opposed to out of sync 
> values, which will only emerge as a problem occasionally -- I'd rather not 
> invite more occasional bugs).
>
> --
> Ian Bicking  |  http://blog.ianbicking.org
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Friday, July 16, 2010, And Clover  wrote:
> On 07/14/2010 06:43 AM, Ian Bicking wrote:
>
>
> There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
> and HTTP_COOKIE.
>
>
> (And of those, PATH_INFO is the only one that really matters, in that no-one 
> really uses non-ASCII script filenames,

FWIW, I had to go to a lot of trouble to allow non ASCII in final
SCRIPT_NAME in mod_wsgi. Specifically using AddHandler directive in
Apache means a file system path can make up part of SCRIPT_NAME. I had
someone who was specifically using Russian in a WSGI script file name
and because with AddHandler that becomes part of SCRIPT_NAME you had
to cater for it. Anyway this was more of a Windows issue in having to
use special file system functions to deal with fact that on Windows
filesystem paths aren't UTF-8 but something else.

What this does highlight though is that although one can talk about
passing raw script name through to application, that isn't necessarily
right as it isn't the application that dictates what encoding may be
used but the web server which is performing the mapping of that part
of the original URL path to a potential filesystem resource, or
alternatively where file based configuration for mount point, the
encoding of the web sever configuration file.

We touched on all of this before in prior discussions, thus original
raw value is only relevant in PATH_INFO and not SCRIPT_NAME as in the
case of the latter it is the web server that dictates the charset
based on configuration file encoding or file system encoding.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 9:43 PM, Chris McDonough  wrote:

> > Nah, not nearly that hard:
> >
> > path_info =
> >
> urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
> >
> > I don't see the problem?  If you want to distinguish %2f from /, then
> > you'll do it slightly differently, like:
> >
> > path_parts = [
> > urllib.parse.unquote_to_bytes(p).decode('UTF-8')
> > for p in environ['wsgi.raw_path_info'].split('/')]
> >
> > This second recipe is impossible to do currently with WSGI.
> >
> > So... before jumping to conclusions, what's the hard part with using
> > text?
>
> It's extremely hard to swallow Python 3's current disregard for the
> primacy of bytes at I/O boundaries.  I'm trying, but I can't help but
> feel that the existence of an API like "unquote_to_bytes" is more
> symptom treatment than solution.  Of course something that unquotes a
> URL segment unquotes it into bytes; it's the only sane default because
> URL segments found in URLs on the internet are bytes.
>

Yes, URL quoted strings should decode to bytes, though arguably it is
reasonable to also use the very reasonable UTF-8 default that
urllib.parse.quote/unquote uses.  So it's really just a question of names,
should be quote_to_string or quote_to_bytes that name.  Which honestly...
whatever.

So I guess the "hard part" is more meta.  When you have legitimate
> backwards compatibility constraints, suboptimal choices made during
> protocol design are excusable.  But it just seems really very weird to
> design one (WSGI 2) from scratch with such choices when the only reason
> to do so is a systematic low-level denial of reality.  Why would we use
> (and, worse, by doing so, implicitly promote) such a system in the first
> place?
>
> On the other hand, indignance about the issue shouldn't rule the day
> either.  To me, the most pragmatic thing to do that doesn't deny reality
> would be to use bytes.  It's also the easiest thing to remember (the
> values in the environment are all bytes) and I think we'll be able to
> drive the Py3K stdlib forward in a much saner direction if we choose
> bytes than if we choose text to represent things that are naturally more
> bytes-like.
>

I do feel like indignance has played a part here.  And in my brief forays
into Python 3 I have been frustrated by the over-textification of APIs.
But... if a compromise works let's not let those experiences color our
choices.

So, here's my criteria for resolving this particular Python 3 issue:

* We should not lose information from the request.  Decoding with UTF-8
(without surrogateescape) would be an example.  URL-decoding loses us
information currently; which is why I wouldn't be sad to see it go (though
if it was only for that reason I wouldn't bother -- the unicode issue just
makes it serendipitous).

* We shouldn't produce wildly inaccurate strings.  E.g., decoding something
with Latin1 when it's an implausible encoding.

* Encoding/decoding errors should only possibly happen at the application
level, or maybe middleware if you are playing around with stuff.  Servers
specifically should never have them (because they can't gracefully handle
them).

* We should avoid server configuration with respect to application policy
(we've avoided it so far, yay!)

* We should support eclectic application layouts, e.g., an application that
sometimes serves Latin-1, sometimes UTF-8 (like if the application proxies
requests or serves up legacy content/apps).

* We should make things as easy to port as possible.  Errors in porting
should be loud.

* As much as possible WSGI should be readable and usable.  Maybe most people
will use a library, but we also have a lot of libraries that handle WSGI,
and it's nice that's been able to happen, so we don't want to make things
any harder than they have to be.  E.g., clearly we should use text environ
keys (luckily we don't have to worry about non-ASCII header names, I guess?)

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 11:28 PM, Graham Dumpleton <
graham.dumple...@gmail.com> wrote:

> > Nah, not nearly that hard:
> >
> > path_info =
> urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
> >
> > I don't see the problem?  If you want to distinguish %2f from /, then
> you'll do it slightly differently, like:
> >
> > path_parts = [
> > urllib.parse.unquote_to_bytes(p).decode('UTF-8')
> > for p in environ['wsgi.raw_path_info'].split('/')]
> >
> > This second recipe is impossible to do currently with WSGI.
> > So... before jumping to conclusions, what's the hard part with using
>
> Sorry, it is not that simple. The thing that everyone is ignoring is
> that SCRIPT_NAME and PATH_INFO are also normalized by the web server
> normally. That is, .. instances are removed. By passing the raw URL
> through to the application, you are now forcing every application to
> have to deal with that as well with the possibility of directory
> traversal attacks when people get it wrong and the URL is mapping
> somehow to file system resources. It is a huge can of worms which at
> the moment the web server deals with.
>

Well... at least to me "raw" only means "not URL decoded", so it doesn't
necessarily mean you can't clean up the request path.  I guess an attacker
could encode "." to make things harder.

Nevertheless, WSGI servers don't currently guarantee this cleaning.  I added
it to paste.httpserver, but I don't know one way or the other about any
other servers.  A quick test shows wsgiref does not clean paths.  So apps
shouldn't rely on a clean path.


-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Saturday, July 17, 2010, Ian Bicking  wrote:
> On Fri, Jul 16, 2010 at 6:20 PM, Chris McDonough  wrote:
>
>
>
>> What are the concrete problems you envision with text request headers,
>> text (URL-quoted) path, and text response status and headers?
>
> Documentation is the main reason.  For example, the documentation for
> making sense of path_info segments in a WSGI that used unicodey-strings
> would, as I understand it, read something like this:
>
> Nah, not nearly that hard:
>
> path_info = 
> urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
>
> I don't see the problem?  If you want to distinguish %2f from /, then you'll 
> do it slightly differently, like:
>
> path_parts = [
>     urllib.parse.unquote_to_bytes(p).decode('UTF-8')
>     for p in environ['wsgi.raw_path_info'].split('/')]
>
> This second recipe is impossible to do currently with WSGI.
> So... before jumping to conclusions, what's the hard part with using

Sorry, it is not that simple. The thing that everyone is ignoring is
that SCRIPT_NAME and PATH_INFO are also normalized by the web server
normally. That is, .. instances are removed. By passing the raw URL
through to the application, you are now forcing every application to
have to deal with that as well with the possibility of directory
traversal attacks when people get it wrong and the URL is mapping
somehow to file system resources. It is a huge can of worms which at
the moment the web server deals with.

I have other issues with the raw stuff, but haven't got to read the
last dozen messages in this discussion as yet, so will leave those
points to another time.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Chris McDonough
On Fri, 2010-07-16 at 20:46 -0500, Ian Bicking wrote:
> On Fri, Jul 16, 2010 at 6:20 PM, Chris McDonough 
> wrote:
> > What are the concrete problems you envision with text
> request headers,
> > text (URL-quoted) path, and text response status and
> headers?
> 
> 
> Documentation is the main reason.  For example, the
> documentation for
> making sense of path_info segments in a WSGI that used
> unicodey-strings
> would, as I understand it, read something like this:
> 
> Nah, not nearly that hard:
> 
> path_info =
> urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
> 
> I don't see the problem?  If you want to distinguish %2f from /, then
> you'll do it slightly differently, like:
> 
> path_parts = [
> urllib.parse.unquote_to_bytes(p).decode('UTF-8')
> for p in environ['wsgi.raw_path_info'].split('/')]
>  
> This second recipe is impossible to do currently with WSGI.
> 
> So... before jumping to conclusions, what's the hard part with using
> text?

It's extremely hard to swallow Python 3's current disregard for the
primacy of bytes at I/O boundaries.  I'm trying, but I can't help but
feel that the existence of an API like "unquote_to_bytes" is more
symptom treatment than solution.  Of course something that unquotes a
URL segment unquotes it into bytes; it's the only sane default because
URL segments found in URLs on the internet are bytes.

So I guess the "hard part" is more meta.  When you have legitimate
backwards compatibility constraints, suboptimal choices made during
protocol design are excusable.  But it just seems really very weird to
design one (WSGI 2) from scratch with such choices when the only reason
to do so is a systematic low-level denial of reality.  Why would we use
(and, worse, by doing so, implicitly promote) such a system in the first
place?

On the other hand, indignance about the issue shouldn't rule the day
either.  To me, the most pragmatic thing to do that doesn't deny reality
would be to use bytes.  It's also the easiest thing to remember (the
values in the environment are all bytes) and I think we'll be able to
drive the Py3K stdlib forward in a much saner direction if we choose
bytes than if we choose text to represent things that are naturally more
bytes-like.

- C

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 8:46 PM, Ian Bicking  wrote:

> So... before jumping to conclusions, what's the hard part with using text?
>

Oh, the one thing that will be silly is cookies, but they are totally nuts
already.  They can be parsed equally well as bytes or latin1, and best only
transcoded after parsing.  Doing cookie_value.decode(app_encoding) or
cookie_value.encode('ISO-8859-1').decode(app_encoding) isn't terribly
different.  And cookies aren't fair because they are just stupid; like the
standard library I don't think we should design anything around their
idiosyncrasies.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 6:20 PM, Chris McDonough  wrote:

>  > What are the concrete problems you envision with text request headers,
> > text (URL-quoted) path, and text response status and headers?
>
> Documentation is the main reason.  For example, the documentation for
> making sense of path_info segments in a WSGI that used unicodey-strings
> would, as I understand it, read something like this:
>

Nah, not nearly that hard:

path_info =
urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')

I don't see the problem?  If you want to distinguish %2f from /, then you'll
do it slightly differently, like:

path_parts = [
urllib.parse.unquote_to_bytes(p).decode('UTF-8')
for p in environ['wsgi.raw_path_info'].split('/')]

This second recipe is impossible to do currently with WSGI.

So... before jumping to conclusions, what's the hard part with using text?

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Chris McDonough
On Sat, 2010-07-17 at 01:33 +0200, Armin Ronacher wrote:
> Hi,
> 
> On 7/17/10 1:20 AM, Chris McDonough wrote:
>  > Let me know if I'm missing something.
> The only thing you miss is that the bytes type of Python 3 is badly 
> supported in the stdlib (not an issue if we reimplement everything in 
> our libraries, not an issue for me) and that the bytes type has no 
> string formattings which makes us do the encode/decode dance in our own 
> implementation so of the missing stdlib functions.

This is why the docs mention "bytes with benefits" instead (like the
Python 2 "str" type). The existence of such a type would be the result
of us lobbying for its inclusion into some future Python 3, or at least
the result of lobbying for a String ABC that would allow us to define
our own.

But.. yeah.  Stdlib support for bytes.  Dunno.   What I really don't
want to do is implement a WSGI spec in terms of Unicodey strings just
because the webby stuff in the stdlib cannot deal with bytes.  Those
stdlib implementations should be changed to deal with bytes-ish things
instead.  I actually think fixing the stdlib will end up being a driver
for the "bytes with benefits" type.  Supporting such a type in the
implementation of stdlib functions is clearly the right way to fix it in
lots of cases, because they will be able to deal with BwB and
Unicodey-strings in exactly the same way.

In the meantime, I think using bytes is the only sane thing to do in
some interim specification, because moving from a spec which is
bytes-oriented to a spec that is text-oriented now will leave us in the
embarrassing position of needing to create yet another bytes-oriented
spec later (as, well, I/O is bytes), when Python 3 matures and realizes
it needs such a hybrid type.

- C


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

P.J. Eby wrote:
> At 07:20 PM 7/16/2010 -0400, Chris McDonough wrote:
>> I'd much rather say be able to say:
>>
>> """
>> The PATH_INFO environment variable is a ``bytes-with-benefits`` type.
>> To decode it:
>>
>> - First, split it on slashes::
>>
>> segments = PATH_INFO.split('/')
>>
>> - Then, de-encode each segment's urlencoded portions:
>>
>> urldecoded_segments = [ urllib.unquote(x) for x in segments ]
>>
>> - Then re-encode each urldecoded segment into the encoding expected
>>   by your application
>>
>> app_segments = [ str(x, encoding='utf-8') for x in
>>  urldecoded_segments ]
>> """
> 
> +1.  I do wish we actually *had* a bytes-with-benefits type (as I 
> proposed on Python-Dev), but I don't think we can really get one 
> until the language moratorium is over.  Plain old bytes are the next 
> best thing. 

We might be able to write one which would work in reduce-instruction-set
mode, and have the server wrap the environ valuee in it.  Some
operations might not be "natural", and we might have to implement some
wrappers around stdlib stuff, but maybe it would be worthwhile to try a
spike on it.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkxBA00ACgkQ+gerLs4ltQ4xlQCghykpuIBK97nwJczkZpddlrCf
rZQAoI6xRwsIo5jQiD781o8Q5Y5wxoSx
=4WBq
-END PGP SIGNATURE-

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Armin Ronacher

Hi,

On 7/17/10 1:20 AM, Chris McDonough wrote:
> Let me know if I'm missing something.
The only thing you miss is that the bytes type of Python 3 is badly 
supported in the stdlib (not an issue if we reimplement everything in 
our libraries, not an issue for me) and that the bytes type has no 
string formattings which makes us do the encode/decode dance in our own 
implementation so of the missing stdlib functions.


So I am pretty sure we can't totally bypass the encoding/decoding.  We 
might however require less encodes/decodes if we leave bytes on the WSGI 
layer.



Regards,
Armin
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread P.J. Eby

At 07:20 PM 7/16/2010 -0400, Chris McDonough wrote:

I'd much rather say be able to say:

"""
The PATH_INFO environment variable is a ``bytes-with-benefits`` type.
To decode it:

- First, split it on slashes::

segments = PATH_INFO.split('/')

- Then, de-encode each segment's urlencoded portions:

urldecoded_segments = [ urllib.unquote(x) for x in segments ]

- Then re-encode each urldecoded segment into the encoding expected
  by your application

app_segments = [ str(x, encoding='utf-8') for x in
 urldecoded_segments ]
"""


+1.  I do wish we actually *had* a bytes-with-benefits type (as I 
proposed on Python-Dev), but I don't think we can really get one 
until the language moratorium is over.  Plain old bytes are the next 
best thing. 


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread P.J. Eby

At 05:42 PM 7/16/2010 -0400, Tres Seaver wrote:

P.J. Eby wrote:

> (Hm.  Although actually, I suppose we *could* just borrow the time
> machine and pretend that WSGI called for "byte-strings everywhere"
> all along...)

I like the idea of pushing responsibility for decoding stuff into the
framework / app writer's hands.  OTOH, doesn't that hose authors of
existing middleware, due to the borkedness of working with bytes in Python3?


It only creates a "new" problem if they are currently not using *any* 
unicode in 2.x, and are passing through bytes from the input to the 
output without any encoding or decoding.  AFAICT, if any part of 
their app is currently unicode, they would have the same problems in 2.x.


(Minus, of course, any problems introduced by missing bytes methods 
in 3.x, or the fact that single-subscripted bytes are ints rather 
than bytestrings.)


Anyway, the problems introduced will be problems that can be solved 
by waving a fairly standard set of dead chickens at the problem, i.e. 
picking where you're going to encode/decode, and deciding what 
encoding(s) are meaningful to your app.  And frameworks that already 
have a unicode API are ahead of the game here.


So, AFAICT, the only people who'd be punished by a change to bytes 
are the people who have non-ASCII inputs or outputs, but haven't been 
using unicode (because 2to3 will convert them to using strings 
instead of bytes).


From what I can tell, though, this is also the group it's most 
politically correct to hate on in Python-Dev, so we should be 
relatively safe in shifting the burden to them.  ;-)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread P.J. Eby

At 02:28 PM 7/16/2010 -0500, Ian Bicking wrote:
On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby 
<p...@telecommunity.com> wrote:

At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
And this doesn't help with Python 3: either we have byte values of 
SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I 
think bytes will be more awkward to port to than text, and 
inconsistent with other WSGI values.



OTOH, it has the tremendous advantage of pushing the encoding 
question onto the app (or framework) developer... Â who's really the 
only one who can make the right decision for their particular 
application. Â And personally, I'd rather have clear boundaries 
between text and bytes, such that porting (even if tedious or 
awkward) is *consistent*, and clear as to when you're finished, not, 
"oh, did I check to make sure I converted SCRIPT_NAME and 
PATH_INFO... Â not just in my app code, but in all the library code 
I call *from* my app?"


IOW, the bytes/string discussion on Python-dev has kind of led me to 
realize that we might just as well make the *entire* stack bytes 
(incoming and outgoing headers *and* streams), and rewrite that bit 
in PEP 333 about using str on "Python 3000" to say we go with bytes 
on Python 3+ for everything that's a str in today's WSGI.



This was my first intuition too, until I started thinking in more 
detail about the particular values involved.  Some obviously are 
textish, like environ['SERVER_NAME'].  Not a very useful value, but 
definitely text.


Basically all the internal strings are textish, so we're left with:

wsgi.url_scheme
SCRIPT_NAME/PATH_INFO
QUERY_STRING
HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
response status
response headers (name and value)


What I'm getting at, though, is it's precisely this sort of "hm, 
which ones are bytes again?" stuff that makes you have to stop and 
*think*, i.e., it doesn't Fit My Brain any more.  ;-)


There should be one, and preferably *only* one, obvious way to do it.

And given that HTTP is inherently a bunch of bytes, bytes is the one 
obvious way.


I previously was under the impression that bytes wouldn't 
interoperate with strings in 3.x, but they *do*, in much the same way 
as they did in 2.x.  That means you'll be (mostly) bug-compatible in 
3.x, only you'll likely encounter encoding issues *sooner*, rather 
than later.  (i.e., the minute you combine non-ASCII inputs with your 
regular string constants).


Yes, you will also be forced to convert your return values to bytes, 
but if you've used string constants *anywhere*, then you know you'll 
be outputting text, which you should already have been encoding for 
output.  (So you'll just be forced to deal with errors on that side 
sooner as well.)


All in all, I'd say this also fits with what people on Python-Dev 
keep hammering on as the One Obvious Way to deal with bytes and 
strings in a program: i.e., bytes for I/O, text for text processing.


WSGI is HTTP, and HTTP is I/O, ergo, WSGI is I/O, and we should 
therefore "byte" the bullet here.  ;-)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Chris McDonough
On Fri, 2010-07-16 at 17:11 -0500, Ian Bicking wrote:
> On Fri, Jul 16, 2010 at 5:08 PM, Chris McDonough 
> wrote:
> On Fri, 2010-07-16 at 17:47 -0400, Tres Seaver wrote:
> 
> > > In the past when we've gotten down to specifics, the only
> holdup has been
> > > SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate
> those.
> >
> > I think I favor PJE's suggestion:  let WSGI deal only in
> bytes.
> 
> 
> I'd prefer that WSGI 2 was defined in terms of a "bytes with
> benefits"
> type (Python 2's ``str`` with an optional encoding attribute
> as a hint
> for cast to unicode str) instead of Python 3-style bytes.
> 
> But if I had to make the Hobson's choice between Python 3
> style bytes
> and Python 3 style str, I'd choose bytes.  If I then needed to
> write
> middleware or applications, I'd use WebOb or an equivalent
> library to
> enable a policy which converted those bytes to strings on my
> behalf.
> Making it easy to write "raw" middleware or applications
> without using
> such a library doesn't seem as compelling a goal as being able
> to easily
> write one which allowed me direct control at the raw level.
> 
> What are the concrete problems you envision with text request headers,
> text (URL-quoted) path, and text response status and headers?

Documentation is the main reason.  For example, the documentation for
making sense of path_info segments in a WSGI that used unicodey-strings
would, as I understand it, read something like this:

"""
The PATH_INFO environment variable is a string.  To decode it,

- First, split it on slashes::

segments = PATH_INFO.split('/')

- Then turn each segment into bytes::

bytes_segments = [ bytes(x, encoding='latin-1') for x in segments ]

- Then, de-encode each segment's urlencoded portions:

urldecoded_segments = [ urllib.unquote(x) for x in bytes_segments ]

- Then re-encode each urldecoded segment into the encoding expected
  by your application

app_segments = [ str(x, encoding='utf-8') for x in 
 urldecoded_segments ]

.. note:: We decode from latin-1 above because WSGI tunnels the bytes
representing the PATH_INFO by way of a string type which contains bytes
as characters.
"""

That looks pretty apologetic to me, and to be honest, I'm not even sure
it will work reliably in the face of existing/legacy applications which
have emitted URLs that are not url-encoded properly if those old URLs
need to be supported.   http://bugs.python.org/issue8136 contains a
variation on this theme.

I'd much rather say be able to say:

"""
The PATH_INFO environment variable is a ``bytes-with-benefits`` type.
To decode it:

- First, split it on slashes::

segments = PATH_INFO.split('/')

- Then, de-encode each segment's urlencoded portions:

urldecoded_segments = [ urllib.unquote(x) for x in segments ]

- Then re-encode each urldecoded segment into the encoding expected
  by your application

app_segments = [ str(x, encoding='utf-8') for x in 
 urldecoded_segments ]
"""

Let me know if I'm missing something.

- C



___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 5:06 PM, Ian Bicking  wrote:

> On Fri, Jul 16, 2010 at 4:47 PM, Tres Seaver wrote:
>
>>  > Basically all the internal strings are textish, so we're left with:
>>
>> What do you mean by "internal"?  Anything in the headers or the CGI
>> environment is intrinsically "bytes-ish" to me.  Do you mean that you
>> want application programmers to have them transparently decoded?  If so,
>> we can make that the responsibility of the non-middleware framework /
>> application.
>>
>
> By internal I mean all the CGI variables that aren't representing HTTP,
> like SERVER_NAME.
>

Actually I was thinking SERVER_SOFTWARE, though SERVER_NAME is somewhat
similar as it doesn't come from HTTP, it comes from server configuration.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 5:08 PM, Chris McDonough  wrote:

> On Fri, 2010-07-16 at 17:47 -0400, Tres Seaver wrote:
>
> > > In the past when we've gotten down to specifics, the only holdup has
> been
> > > SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.
> >
> > I think I favor PJE's suggestion:  let WSGI deal only in bytes.
>
> I'd prefer that WSGI 2 was defined in terms of a "bytes with benefits"
> type (Python 2's ``str`` with an optional encoding attribute as a hint
> for cast to unicode str) instead of Python 3-style bytes.
>
> But if I had to make the Hobson's choice between Python 3 style bytes
> and Python 3 style str, I'd choose bytes.  If I then needed to write
> middleware or applications, I'd use WebOb or an equivalent library to
> enable a policy which converted those bytes to strings on my behalf.
> Making it easy to write "raw" middleware or applications without using
> such a library doesn't seem as compelling a goal as being able to easily
> write one which allowed me direct control at the raw level.
>

What are the concrete problems you envision with text request headers, text
(URL-quoted) path, and text response status and headers?

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Chris McDonough
On Fri, 2010-07-16 at 17:47 -0400, Tres Seaver wrote:

> > In the past when we've gotten down to specifics, the only holdup has been
> > SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.
> 
> I think I favor PJE's suggestion:  let WSGI deal only in bytes.

I'd prefer that WSGI 2 was defined in terms of a "bytes with benefits"
type (Python 2's ``str`` with an optional encoding attribute as a hint
for cast to unicode str) instead of Python 3-style bytes.

But if I had to make the Hobson's choice between Python 3 style bytes
and Python 3 style str, I'd choose bytes.  If I then needed to write
middleware or applications, I'd use WebOb or an equivalent library to
enable a policy which converted those bytes to strings on my behalf.
Making it easy to write "raw" middleware or applications without using
such a library doesn't seem as compelling a goal as being able to easily
write one which allowed me direct control at the raw level.

- C


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 4:47 PM, Tres Seaver  wrote:

>  > Basically all the internal strings are textish, so we're left with:
>
> What do you mean by "internal"?  Anything in the headers or the CGI
> environment is intrinsically "bytes-ish" to me.  Do you mean that you
> want application programmers to have them transparently decoded?  If so,
> we can make that the responsibility of the non-middleware framework /
> application.
>

By internal I mean all the CGI variables that aren't representing HTTP, like
SERVER_NAME.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ian Bicking wrote:

>> IOW, the bytes/string discussion on Python-dev has kind of led me to
>> realize that we might just as well make the *entire* stack bytes (incoming
>> and outgoing headers *and* streams), and rewrite that bit in PEP 333 about
>> using str on "Python 3000" to say we go with bytes on Python 3+ for
>> everything that's a str in today's WSGI.
>>
> 
> This was my first intuition too, until I started thinking in more detail
> about the particular values involved.  Some obviously are textish, like
> environ['SERVER_NAME'].  Not a very useful value, but definitely text.
> 
> Basically all the internal strings are textish, so we're left with:

What do you mean by "internal"?  Anything in the headers or the CGI
environment is intrinsically "bytes-ish" to me.  Do you mean that you
want application programmers to have them transparently decoded?  If so,
we can make that the responsibility of the non-middleware framework /
application.

> wsgi.url_scheme
> SCRIPT_NAME/PATH_INFO
> QUERY_STRING
> HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
> response status
> response headers (name and value)
> 
> And there's a few things like REMOTE_USER that are kind of in the middle.
> Everyone is in agreement that bodies should be bytes.
> 
> One initial problem is that the Python 3 stdlib handles bytes poorly, so for
> instance there's no good way to reconstruct the URL using the stdlib.  That
> explains certain tensions, but I think we should ignore that, and in fact
> that's what Python-Dev seemed to say pretty clearly.

python-dev seems to me to be coming to the realization that they should
have tried harder to make real-world apps work before they froze their
choices.

> Now, the other keys:
> 
> wsgi.url_scheme: clearly ASCII
> 
> SCRIPT_NAME/PATH_INFO: often UTF-8, could be no encoding, could be some old
> legacy encoding.
> raw request path: should be ASCII (non-ASCII should be URL-encoded).  URL
> encoding happens at the byte layer, so a server could reasonably URL encode
> any non-ASCII characters without imposing any encoding.
> 
> QUERY_STRING: should be ASCII, same as raw request path
> 
> headers: Most are ASCII.  Latin1 is a reasonable fallback and suggested by
> the specification.  The spec also implies you have use the RFC2047 inline
> encoding (like ?iso-8859-1?q?some=20text?=), but nothing supports this and
> supporting it would probably be a bad idea for security reasons.  The
> Atompub spec (reasonably modern) specifically says Title headers should be
> encoded with RFC2047 (if they are not ISO-8859-1):
> http://tools.ietf.org/html/draft-ietf-atompub-protocol-08#page-17 --
> decoding this kind of encoding at the application layer seems reasonable to
> me.
> 
> cookie header: this specific header can easily have multiple encodings, as
> the browser encodes data then treats it as opaque bytes, so a cookie can be
> set via UTF-8 one place, Latin1 another, and those coexist in one header.
> That is, there is no real encoding and this should be treated as bytes.
> (Latin1 is an approximation of bytes... a spotty way to treat bytes, but
> entirely workable.)
> 
> response status: I believe the spec says this must be Latin1/ISO-8859-1.  In
> practice it is almost always ASCII, and since it is not user-visible it's
> not something that really needs localization.
> 
> response headers: the spec implies Latin1, in practice the Set-Cookie header
> is bytes (since interoperation with wonky legacy systems is not uncommon).
> I'm not sure of any other exceptions?
> 
> 
> So... to me it seems pretty reasonable for HTTP specifically that text can
> work.  And if feels weird that, say, environ['SERVER_NAME'] be text and
> environ['HTTP_HOST'] not, and I don't know what environ['REMOTE_ADDR']
> should be in that mode.  And it would also be weird if
> environ['SERVER_NAME'] was bytes.


> In the past when we've gotten down to specifics, the only holdup has been
> SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.

I think I favor PJE's suggestion:  let WSGI deal only in bytes.



Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkxA03wACgkQ+gerLs4ltQ7x0gCg03P1cT9RsJhagBERqY6SbLQ8
zu0An0T0YoFjzAb+2WjWp20DS3VeP68u
=ybUr
-END PGP SIGNATURE-

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

P.J. Eby wrote:

> (Hm.  Although actually, I suppose we *could* just borrow the time 
> machine and pretend that WSGI called for "byte-strings everywhere" 
> all along...)

I like the idea of pushing responsibility for decoding stuff into the
framework / app writer's hands.  OTOH, doesn't that hose authors of
existing middleware, due to the borkedness of working with bytes in Python3?


Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkxA0iwACgkQ+gerLs4ltQ44BgCcD9BGPD7cvJb+azx7akBUqVHc
X0wAnA3alzFWBXa1jBcEixyrFBRk6dbh
=m9TD
-END PGP SIGNATURE-

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Gustavo Narea
Gustavo said:
>  - On ingress, if the charset is not specified in the request, it will
> assume  it's the one used by the other application, and thus it will
> convert the request using the charset supported by the wrapped
> application.

That should actually be:

"On ingress, if the charset in wsgi.charset differs from the charset supported 
by the wrapped application, the request will be converted into the charset 
supported by the wrapped application."
-- 
Gustavo Narea .
| Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 1:40 PM, P.J. Eby  wrote:

> At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
>
>> And this doesn't help with Python 3: either we have byte values of
>> SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think
>> bytes will be more awkward to port to than text, and inconsistent with other
>> WSGI values.
>>
>
> OTOH, it has the tremendous advantage of pushing the encoding question onto
> the app (or framework) developer...  who's really the only one who can make
> the right decision for their particular application.  And personally, I'd
> rather have clear boundaries between text and bytes, such that porting (even
> if tedious or awkward) is *consistent*, and clear as to when you're
> finished, not, "oh, did I check to make sure I converted SCRIPT_NAME and
> PATH_INFO...  not just in my app code, but in all the library code I call
> *from* my app?"
>
> IOW, the bytes/string discussion on Python-dev has kind of led me to
> realize that we might just as well make the *entire* stack bytes (incoming
> and outgoing headers *and* streams), and rewrite that bit in PEP 333 about
> using str on "Python 3000" to say we go with bytes on Python 3+ for
> everything that's a str in today's WSGI.
>

This was my first intuition too, until I started thinking in more detail
about the particular values involved.  Some obviously are textish, like
environ['SERVER_NAME'].  Not a very useful value, but definitely text.

Basically all the internal strings are textish, so we're left with:

wsgi.url_scheme
SCRIPT_NAME/PATH_INFO
QUERY_STRING
HTTP_*, CONTENT_TYPE, CONTENT_LENGTH (headers)
response status
response headers (name and value)

And there's a few things like REMOTE_USER that are kind of in the middle.
Everyone is in agreement that bodies should be bytes.

One initial problem is that the Python 3 stdlib handles bytes poorly, so for
instance there's no good way to reconstruct the URL using the stdlib.  That
explains certain tensions, but I think we should ignore that, and in fact
that's what Python-Dev seemed to say pretty clearly.

Now, the other keys:

wsgi.url_scheme: clearly ASCII

SCRIPT_NAME/PATH_INFO: often UTF-8, could be no encoding, could be some old
legacy encoding.
raw request path: should be ASCII (non-ASCII should be URL-encoded).  URL
encoding happens at the byte layer, so a server could reasonably URL encode
any non-ASCII characters without imposing any encoding.

QUERY_STRING: should be ASCII, same as raw request path

headers: Most are ASCII.  Latin1 is a reasonable fallback and suggested by
the specification.  The spec also implies you have use the RFC2047 inline
encoding (like ?iso-8859-1?q?some=20text?=), but nothing supports this and
supporting it would probably be a bad idea for security reasons.  The
Atompub spec (reasonably modern) specifically says Title headers should be
encoded with RFC2047 (if they are not ISO-8859-1):
http://tools.ietf.org/html/draft-ietf-atompub-protocol-08#page-17 --
decoding this kind of encoding at the application layer seems reasonable to
me.

cookie header: this specific header can easily have multiple encodings, as
the browser encodes data then treats it as opaque bytes, so a cookie can be
set via UTF-8 one place, Latin1 another, and those coexist in one header.
That is, there is no real encoding and this should be treated as bytes.
(Latin1 is an approximation of bytes... a spotty way to treat bytes, but
entirely workable.)

response status: I believe the spec says this must be Latin1/ISO-8859-1.  In
practice it is almost always ASCII, and since it is not user-visible it's
not something that really needs localization.

response headers: the spec implies Latin1, in practice the Set-Cookie header
is bytes (since interoperation with wonky legacy systems is not uncommon).
I'm not sure of any other exceptions?


So... to me it seems pretty reasonable for HTTP specifically that text can
work.  And if feels weird that, say, environ['SERVER_NAME'] be text and
environ['HTTP_HOST'] not, and I don't know what environ['REMOTE_ADDR']
should be in that mode.  And it would also be weird if
environ['SERVER_NAME'] was bytes.

In the past when we've gotten down to specifics, the only holdup has been
SCRIPT_NAME/PATH_INFO, hence my suggestion to eliminate those.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Gustavo Narea
Hello,

Ian said:
> Having two ways of expressing the same information will lead to bugs
> related to which data is canonical.  If an application is using
> SCRIPT_NAME/PATH_INFO and then updates those values in any way, and
> wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be
> weird bugs and code will disagree about which one is correct.  Since %2f
> can exist in the raw versions, there isn't even a way to chunk the two
> variables in the same way.

I can't agree more.

I would propose the following, and excuse me in advance if this has already 
been proposed and discarded -- I've tried to follow this topic on the mailing 
list over the past few months, until it becomes an endless discussion.

I think only the raw values should be available. Even if a middleware changes 
them, it must put them with raw values. And because you cannot change those 
values without knowing what encoding the request uses, the character encoding 
*must* be present.

I know that sounds easy but it's not, because browsers don't specify the 
charset in the Content-Type and instead they generate a new request using the 
charset from the previous response. So the charset is unknown to the 
server/gateway and the middleware stack.

So, what we could do is introduce a mandatory variable called, say, 
wsgi.charset, and would be used as follows:
 - It MUST be set by the server or gateway on every request.
 - Every middleware or application that reads or writes these values MUST use 
the charset specified in wsgi.charset.
 - If a server, gateway, middleware or application wants to change the charset 
and it is possible*, it MUST convert the *entire* request into that charset 
and update wsgi.charset accordingly.
 - When the charset is not specified in the HTTP request, UTF-8 MUST be 
assumed by the server/gateway. Unless another default charset has been 
specified by the user.

I think/hope that will solve all the problems.

What happens when a WSGI application is actually made up two WSGI applications 
and they send the responses in different charsets? If it's not possible to 
configure them so that they both use the same charsets, then one of them would 
have to be wrapped by a middleware which:
 - On egress, converts the responses using the charset used by the other 
application.
 - On ingress, if the charset is not specified in the request, it will assume 
it's the one used by the other application, and thus it will convert the 
request using the charset supported by the wrapped application.

It would look like this:
===
def application(environ, start_response):
if environ.startswith("/trac/"):
# Say Trac only supports Latin-1 and we want responses to use UTF-8:
app = trac.web.main.dispatch_request
app = CharsetNormalizer(app, response="latin-1", request="utf8")
else:
# myapp uses UTF-8
app = myapp
return app(environ, start_response)
===

Then there's the string vs bytes issue. Bytes would be the natural choice to 
represent these raw values, but it would probably cause more trouble than they 
solve. So, I think they should be strings that contain the the ASCII raw 
encoded values (i.e., str on both versions of Python).

What do you think about this? Again, sorry if this has been discarded before! 
:)

* For example, you can always convert Latin-1 to UTF-8, but not every UTF-8 
string can be converted to Latin-1.
-- 
Gustavo Narea .
| Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Stephan Richter
On Friday, July 16, 2010, Ian Bicking wrote:
> We could make everything bytes and be done with it, but it would make it
> much harder to port Python 2 WSGI code to Python 3.

I think this might be best having seen all of the discussion. One could easily 
write a compatibility middleware that makes porting Python 2 applications easy 
or even completely transparent (from a WSGI spec point of view).

Regards,
Stephan
-- 
Entrepreneur and Software Geek
Google me. "Zope Stephan Richter"
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread P.J. Eby

At 11:07 AM 7/16/2010 -0500, Ian Bicking wrote:
And this doesn't help with Python 3: either we have byte values of 
SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I 
think bytes will be more awkward to port to than text, and 
inconsistent with other WSGI values.


OTOH, it has the tremendous advantage of pushing the encoding 
question onto the app (or framework) developer...  who's really the 
only one who can make the right decision for their particular 
application.  And personally, I'd rather have clear boundaries 
between text and bytes, such that porting (even if tedious or 
awkward) is *consistent*, and clear as to when you're finished, not, 
"oh, did I check to make sure I converted SCRIPT_NAME and 
PATH_INFO...  not just in my app code, but in all the library code I 
call *from* my app?"


IOW, the bytes/string discussion on Python-dev has kind of led me to 
realize that we might just as well make the *entire* stack bytes 
(incoming and outgoing headers *and* streams), and rewrite that bit 
in PEP 333 about using str on "Python 3000" to say we go with bytes 
on Python 3+ for everything that's a str in today's WSGI.


Or, to put it another way, if I knew then what I know *now*, I think 
I'd have written the PEP the other way around, such that the use of 
'str' in WSGI would be a substitute for the future 'bytes' type, 
rather than viewing some byte strings as a forward-compatible 
substitute for Py3K unicode strings.


Of course, this would be a WSGI 2 change, but IMO we're better off 
making a clean break with backward compatibility here anyway, rather 
than having conditionals.  Also, going with bytes everywhere means we 
don't have to rename SCRIPT_NAME and PATH_INFO, which in turn avoids 
deeper rewrites being required in today's apps.


(Hm.  Although actually, I suppose we *could* just borrow the time 
machine and pretend that WSGI called for "byte-strings everywhere" 
all along...)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 12:28 PM, Chris McDonough  wrote:

> On Fri, 2010-07-16 at 11:07 -0500, Ian Bicking wrote:
>
> > And this doesn't help with Python 3: either we have byte values of
> > SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I
> > think bytes will be more awkward to port to than text, and
> > inconsistent with other WSGI values.  If we have text then we have to
> > choose an encoding.  Latin1 will work, but it will be the exact wrong
> > encoding most of the time as UTF-8 is the typical  (unlike other
> > headers, where Latin1 will mostly be an okay encoding, or as good a
> > guess as we have).  If we firmly remove these keys then we can avoid
> > this choice entirely... and we conveniently also get a better
> > representation of the request.
>
> My $.02: I'd rather lobby the core folks for a string ABC (which we can
> hook with a stringlike bytes type) and consider all 3.X releases made so
> far "dead to WSGI" than to have to tunnel arbitrary bytes through some
> misleading Unicode encoding.
>

While I think it would be generally useful, it's also a long way off at
best, with serious performance dangers that could torpedo the whole thing.
But... I'm also unsure how it would help here, except perhaps we could
incrementally annotate bytes with an encoding?  Well, I don't really know.
Treating the raw request path as text is easy enough, as it should always be
ASCII anyway.  We don't have to worry what is "right" or "wrong" in this
case.

We could make everything bytes and be done with it, but it would make it
much harder to port Python 2 WSGI code to Python 3.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Chris McDonough
On Fri, 2010-07-16 at 11:07 -0500, Ian Bicking wrote:

> And this doesn't help with Python 3: either we have byte values of
> SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I
> think bytes will be more awkward to port to than text, and
> inconsistent with other WSGI values.  If we have text then we have to
> choose an encoding.  Latin1 will work, but it will be the exact wrong
> encoding most of the time as UTF-8 is the typical  (unlike other
> headers, where Latin1 will mostly be an okay encoding, or as good a
> guess as we have).  If we firmly remove these keys then we can avoid
> this choice entirely... and we conveniently also get a better
> representation of the request.

My $.02: I'd rather lobby the core folks for a string ABC (which we can
hook with a stringlike bytes type) and consider all 3.X releases made so
far "dead to WSGI" than to have to tunnel arbitrary bytes through some
misleading Unicode encoding.

- C


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Ian Bicking
On Fri, Jul 16, 2010 at 4:33 AM, And Clover  wrote:

> On 07/14/2010 06:43 AM, Ian Bicking wrote:
>
>  There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
>> and HTTP_COOKIE.
>>
>
> (And of those, PATH_INFO is the only one that really matters, in that
> no-one really uses non-ASCII script filenames, and non-ASCII characters in
> Cookie/Set-Cookie are still handled so differently/brokenly across browsers
> that you can't rely on them at all.)
>
>
>  * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
>> exclusively with encoded versions
>>
>
> For compatibility with existing apps, how about keeping the existing
> SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying
> that the new 'raw' versions (whatever they are called) are added only if
> they really are raw, not reconstructed.
>

Having two ways of expressing the same information will lead to bugs related
to which data is canonical.  If an application is using
SCRIPT_NAME/PATH_INFO and then updates those values in any way, and
wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be
weird bugs and code will disagree about which one is correct.  Since %2f can
exist in the raw versions, there isn't even a way to chunk the two variables
in the same way.

Then existing scripts that don't care about non-ASCII and slashes can carry
> on as before, and for apps that do care about them, they'll be able to be
> *sure* the input is correct. Or they can fall back to PATH_INFO when not
> present, and avoid producing these kind of URLs in response.
>

I don't think it works to imagine you can just not care about non-ASCII.
Requests come in.  WSGI should represent those requests.  If a request comes
in with non-ASCII bytes then WSGI needs to do *something* with it.  I don't
want to have to configure servers with application policy; servers should
just work.

And this doesn't help with Python 3: either we have byte values of
SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think
bytes will be more awkward to port to than text, and inconsistent with other
WSGI values.  If we have text then we have to choose an encoding.  Latin1
will work, but it will be the exact wrong encoding most of the time as UTF-8
is the typical  (unlike other headers, where Latin1 will mostly be an okay
encoding, or as good a guess as we have).  If we firmly remove these keys
then we can avoid this choice entirely... and we conveniently also get a
better representation of the request.

Note that libraries can smooth over this change; WebOb for instance will
certainly still support req.script_name/req.path_info by decoding the raw
values.  Admittedly lots of code use these values directly... but at least
if they get a KeyError the port/fix will be obvious (as opposed to out of
sync values, which will only emerge as a problem occasionally -- I'd rather
not invite more occasional bugs).

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread And Clover

On 07/16/2010 12:07 PM, Graham Dumpleton wrote:


If you do that, one has to ask the question, given it is more convention than
anything, why it isn't just a x-wsgiorg extension specification


Yes, fine by me either way.

I just want to be able to say "this application can use Unicode paths 
when run on a server/gateway that supports ", 
rather than the current mess of "you can have Unicode paths if you use 
one of the dozen different server-and-platform combinations we've 
specifically coded workarounds for".


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread Graham Dumpleton
On Friday, July 16, 2010, And Clover  wrote:
> On 07/14/2010 06:43 AM, Ian Bicking wrote:
>
>
> There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
> and HTTP_COOKIE.
>
>
> (And of those, PATH_INFO is the only one that really matters, in that no-one 
> really uses non-ASCII script filenames, and non-ASCII characters in 
> Cookie/Set-Cookie are still handled so differently/brokenly across browsers 
> that you can't rely on them at all.)
>
>
> * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
> exclusively with encoded versions
>
>
> For compatibility with existing apps, how about keeping the existing 
> SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying 
> that the new 'raw' versions (whatever they are called) are added only if they 
> really are raw, not reconstructed.
>
> Then existing scripts that don't care about non-ASCII and slashes can carry 
> on as before, and for apps that do care about them, they'll be able to be 
> *sure* the input is correct. Or they can fall back to PATH_INFO when not 
> present, and avoid producing these kind of URLs in response.
>
> (Or an app might have enough special knowledge to try other fallback 
> mechanisms when the raw versions are unavailable, such as REQUEST_URI or 
> Windows ctypes envvar hacking. But if the server/gateway has good raw paths 
> it shouldn't bother use these.)

Which is exactly what I have suggested in the past. If you do that,
one has to ask the question, given it is more convention than
anything, why it isn't just a x-wsgiorg extension specification like
routing args is rather than a core part of the WSGI specification.
Servers could still implement the extension as they are able to and
don't have to worry about changing core specification then and what we
have now stands.

Graham

> --
> And Clover
> mailto:a...@doxdesk.com
> http://www.doxdesk.com/
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: 
> http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-16 Thread And Clover

On 07/14/2010 06:43 AM, Ian Bicking wrote:


There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
and HTTP_COOKIE.


(And of those, PATH_INFO is the only one that really matters, in that 
no-one really uses non-ASCII script filenames, and non-ASCII characters 
in Cookie/Set-Cookie are still handled so differently/brokenly across 
browsers that you can't rely on them at all.)



* I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
exclusively with encoded versions


For compatibility with existing apps, how about keeping the existing 
SCRIPT_NAME and PATH_INFO as-is (with all their problems), and 
specifying that the new 'raw' versions (whatever they are called) are 
added only if they really are raw, not reconstructed.


Then existing scripts that don't care about non-ASCII and slashes can 
carry on as before, and for apps that do care about them, they'll be 
able to be *sure* the input is correct. Or they can fall back to 
PATH_INFO when not present, and avoid producing these kind of URLs in 
response.


(Or an app might have enough special knowledge to try other fallback 
mechanisms when the raw versions are unavailable, such as REQUEST_URI or 
Windows ctypes envvar hacking. But if the server/gateway has good raw 
paths it shouldn't bother use these.)


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-13 Thread Ian Bicking
On Wed, Jul 14, 2010 at 12:19 AM, Graham Dumpleton <
graham.dumple...@gmail.com> wrote:

>  >> * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
> >> exclusively with encoded versions (that represent the original request
> >> URI).  We use Latin1 encoding, but it should be ASCII anyway, like most
> of
> >> the headers.
>
> BTW, it should be highlighted whether this change is relevant to
> Python 3 but like some of the other things you relegated as out of
> scope, purely a wish list item.
>

Certainly; most headers or metadata is pretty much constrained to ASCII, and
any use of non-ASCII is... at least peculiar, and presumably
application-specific.  For instance, there's no reason you'd have anything
but ASCII in Cache-Control.  The one place encoded information happens
regularly in headers (that I know of) is Cookie.  The request URI path is
generally ASCII, but SCRIPT_NAME and PATH_INFO *aren't* the request URI
path, they are URL decoded versions of the request URI path.  And they are
usually encoded in UTF8... but UTF8 is a lossy encoding, so decoding them is
problematic (though we could define that they must be decoded with
surrogateescape).  And while they are usually UTF8, they are sometimes no
valid encoding at all, because anyone can assemble any set of characters
they want and web browsers will accept it.

By avoiding URL-unquoting of these values, we can also stick to Latin1 and
get something reasonable.  It's not very attractive to me that we take
something that is probably *not* Latin1, and may reasonably not be ASCII,
and decode it as Latin1.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-13 Thread Graham Dumpleton
On 14 July 2010 15:18, Ian Bicking  wrote:
> On Wed, Jul 14, 2010 at 12:04 AM, Graham Dumpleton
>  wrote:
>>
>> On 14 July 2010 14:43, Ian Bicking  wrote:
>> > So... there's been some discussion of WSGI on Python 3 lately.  I'm not
>> > feeling as pessimistic as some people, I feel like we were close but
>> > just
>> > didn't *quite* get there.
>>
>> What I took from the discussion wasn't that one couldn't specify a
>> WSGI interface, and as you say we more or less have one now, the issue
>> is more about how practical that is from a usability perspective for
>> those who have to code stuff on top.
>
> My intuition is that won't be that bad.  At least compared to any library
> that is dealing with str/unicode porting issues; which aren't easy, but so
> it goes.
>
>>
>> > * I'm terrible at naming, but let's say these new values are
>> > RAW_SCRIPT_NAME
>> > and RAW_PATH_INFO.
>>
>> My prior suggestion on that since upper case keys for now effectively
>> derive from CGI, was to make them wsgi.script_name and wsgi.path_info.
>> Ie., push them into the wsgi namespace.
>
> That's fine with me too.
>
>>
>> > Does this solve everything?  There's broken stuff in the stdlib, but we
>> > shouldn't bother ourselves with that -- if we need working code we
>> > should
>> > just write it and ignore the stdlib or submit our stuff as patches to
>> > the
>> > stdlib.
>>
>> The quick summary of what I suggest before is at:
>>
>>  http://code.google.com/p/modwsgi/wiki/SupportForPython3X
>>
>> I believe the only difference I see is the raw SCRIPT_NAME and
>> PATH_INFO, which got discussed to death previously with no consensus.
>
> Thanks, I was looking for that.  I remember the primary objection to a
> SCRIPT_NAME/PATH_INFO change was from you.  Do you still feel that way?

I accept that access to the raw information may help for people who
want access to repeating slashes or other encoded information that an
underlying web server may alter, but I cant remember in what way this
helps with the Python 3 issues. That is why I just made the comment in
other email.

Perhaps you can cover how this helps with Python 3.

> I generally agree with your interpretation, except I would want to strictly
> disallow unicode (Python 3 str) from response bodies.  Latin1/ISO-8859-1 is
> an okay encoding for headers and status and raw SCRIPT_NAME/PATH_INFO, but
> for bodies it doesn't have any particular validity.
>
> I forgot to mention the response, which you cover; I guess I'm okay with
> being lenient on types there (allowing both bytes and str in Python 3)...
> though I'm not really that happy with it.  I'd rather just keep it symmetric
> with the request, requiring native strings everywhere.

The reason for allowing it in the response content was so the
canonical WSGI hello world still work unmodified.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-13 Thread Graham Dumpleton
On 14 July 2010 15:04, Graham Dumpleton  wrote:
> On 14 July 2010 14:43, Ian Bicking  wrote:
>> So... there's been some discussion of WSGI on Python 3 lately.  I'm not
>> feeling as pessimistic as some people, I feel like we were close but just
>> didn't *quite* get there.
>
> What I took from the discussion wasn't that one couldn't specify a
> WSGI interface, and as you say we more or less have one now, the issue
> is more about how practical that is from a usability perspective for
> those who have to code stuff on top.
>
> The concern seems to be that although it may be easy to work with the
> specification for those who at the lowest layer immediately wrap it in
> a higher level abstraction that normalises stuff into something that
> is then used consistently in that way, for those who use lower level
> raw WSGI right through the stack, especially in the context of
> stackable WSGI middleware, that repetitive task of having to deal with
> the byte/unicode issues at every point it just a big PITA.
>
> That said, my job in writing the WSGI adapter is really easy as I
> don't have to worry about these issues. This is why I don't seem to
> really appreciate the concerns people are expressing. The above is how
> I read things though.
>
>> Here's my thoughts:
>>
>> * Everyone agrees keys in the environ should be native strings
>> * Bodies should stay bytes
>> * Can we make all "standard" values that are str on Python 2, str on Python
>> 3 with a Latin1 encoding?  This is basically what wsgiref did.  This means
>> HTTP_*, SERVER_NAME, etc.  Everything CGIish, and everything with an
>> all-caps key.  There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
>> and HTTP_COOKIE.
>> * I propose we let libraries handle HTTP_COOKIE however they want; don't
>> bother transcoding *into* the environ, just do so when you parse the cookie
>> (if you so choose).  Happy developers will just urlencode all their cookie
>> values to keep their cookies ASCII-clean.  Unhappy developers who have to
>> handle legacy cookies will just run environ['HTTP_COOKIE'].decode('latin1')
>> and then do whatever sad magic they are forced to do.
>> * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
>> exclusively with encoded versions (that represent the original request
>> URI).  We use Latin1 encoding, but it should be ASCII anyway, like most of
>> the headers.

BTW, it should be highlighted whether this change is relevant to
Python 3 but like some of the other things you relegated as out of
scope, purely a wish list item.

Graham

>> * I'm terrible at naming, but let's say these new values are RAW_SCRIPT_NAME
>> and RAW_PATH_INFO.
>
> My prior suggestion on that since upper case keys for now effectively
> derive from CGI, was to make them wsgi.script_name and wsgi.path_info.
> Ie., push them into the wsgi namespace.
>
>> Does this solve everything?  There's broken stuff in the stdlib, but we
>> shouldn't bother ourselves with that -- if we need working code we should
>> just write it and ignore the stdlib or submit our stuff as patches to the
>> stdlib.
>
> The quick summary of what I suggest before is at:
>
>  http://code.google.com/p/modwsgi/wiki/SupportForPython3X
>
> I believe the only difference I see is the raw SCRIPT_NAME and
> PATH_INFO, which got discussed to death previously with no consensus.
>
>> Some environments will have a hard time constructing RAW_SCRIPT_NAME and
>> RAW_PATH_INFO, but in my opinion they can just encode SCRIPT_NAME and
>> PATH_INFO and be done with it; it's not as accurate, but it's no less
>> accurate than what we have now.
>>
>> Actual transcoding in the environ is not supported or encouraged in this
>> scheme.  If you want to adjust an encoding you should do it in your
>> application/library code.
>>
>> There's some other topics, like chunked responses, unknown request body
>> lengths, start_response, and maybe some other things, but these aren't
>> Python 3 issues, they are just... generic issues.  app_iter.close() might be
>> worth thinking about given new iterator semantics introduced since WSGI was
>> written.
>
> Graham
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-13 Thread Ian Bicking
On Wed, Jul 14, 2010 at 12:04 AM, Graham Dumpleton <
graham.dumple...@gmail.com> wrote:

> On 14 July 2010 14:43, Ian Bicking  wrote:
> > So... there's been some discussion of WSGI on Python 3 lately.  I'm not
> > feeling as pessimistic as some people, I feel like we were close but just
> > didn't *quite* get there.
>
> What I took from the discussion wasn't that one couldn't specify a
> WSGI interface, and as you say we more or less have one now, the issue
> is more about how practical that is from a usability perspective for
> those who have to code stuff on top.
>

My intuition is that won't be that bad.  At least compared to any library
that is dealing with str/unicode porting issues; which aren't easy, but so
it goes.


> > * I'm terrible at naming, but let's say these new values are
> RAW_SCRIPT_NAME
> > and RAW_PATH_INFO.
>
> My prior suggestion on that since upper case keys for now effectively
> derive from CGI, was to make them wsgi.script_name and wsgi.path_info.
> Ie., push them into the wsgi namespace.
>

That's fine with me too.


>  > Does this solve everything?  There's broken stuff in the stdlib, but we
> > shouldn't bother ourselves with that -- if we need working code we should
> > just write it and ignore the stdlib or submit our stuff as patches to the
> > stdlib.
>
> The quick summary of what I suggest before is at:
>
>  http://code.google.com/p/modwsgi/wiki/SupportForPython3X
>
> I believe the only difference I see is the raw SCRIPT_NAME and
> PATH_INFO, which got discussed to death previously with no consensus.
>

Thanks, I was looking for that.  I remember the primary objection to a
SCRIPT_NAME/PATH_INFO change was from you.  Do you still feel that way?

I generally agree with your interpretation, except I would want to strictly
disallow unicode (Python 3 str) from response bodies.  Latin1/ISO-8859-1 is
an okay encoding for headers and status and raw SCRIPT_NAME/PATH_INFO, but
for bodies it doesn't have any particular validity.

I forgot to mention the response, which you cover; I guess I'm okay with
being lenient on types there (allowing both bytes and str in Python 3)...
though I'm not really that happy with it.  I'd rather just keep it symmetric
with the request, requiring native strings everywhere.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-13 Thread Graham Dumpleton
On 14 July 2010 14:43, Ian Bicking  wrote:
> So... there's been some discussion of WSGI on Python 3 lately.  I'm not
> feeling as pessimistic as some people, I feel like we were close but just
> didn't *quite* get there.

What I took from the discussion wasn't that one couldn't specify a
WSGI interface, and as you say we more or less have one now, the issue
is more about how practical that is from a usability perspective for
those who have to code stuff on top.

The concern seems to be that although it may be easy to work with the
specification for those who at the lowest layer immediately wrap it in
a higher level abstraction that normalises stuff into something that
is then used consistently in that way, for those who use lower level
raw WSGI right through the stack, especially in the context of
stackable WSGI middleware, that repetitive task of having to deal with
the byte/unicode issues at every point it just a big PITA.

That said, my job in writing the WSGI adapter is really easy as I
don't have to worry about these issues. This is why I don't seem to
really appreciate the concerns people are expressing. The above is how
I read things though.

> Here's my thoughts:
>
> * Everyone agrees keys in the environ should be native strings
> * Bodies should stay bytes
> * Can we make all "standard" values that are str on Python 2, str on Python
> 3 with a Latin1 encoding?  This is basically what wsgiref did.  This means
> HTTP_*, SERVER_NAME, etc.  Everything CGIish, and everything with an
> all-caps key.  There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
> and HTTP_COOKIE.
> * I propose we let libraries handle HTTP_COOKIE however they want; don't
> bother transcoding *into* the environ, just do so when you parse the cookie
> (if you so choose).  Happy developers will just urlencode all their cookie
> values to keep their cookies ASCII-clean.  Unhappy developers who have to
> handle legacy cookies will just run environ['HTTP_COOKIE'].decode('latin1')
> and then do whatever sad magic they are forced to do.
> * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
> exclusively with encoded versions (that represent the original request
> URI).  We use Latin1 encoding, but it should be ASCII anyway, like most of
> the headers.
> * I'm terrible at naming, but let's say these new values are RAW_SCRIPT_NAME
> and RAW_PATH_INFO.

My prior suggestion on that since upper case keys for now effectively
derive from CGI, was to make them wsgi.script_name and wsgi.path_info.
Ie., push them into the wsgi namespace.

> Does this solve everything?  There's broken stuff in the stdlib, but we
> shouldn't bother ourselves with that -- if we need working code we should
> just write it and ignore the stdlib or submit our stuff as patches to the
> stdlib.

The quick summary of what I suggest before is at:

  http://code.google.com/p/modwsgi/wiki/SupportForPython3X

I believe the only difference I see is the raw SCRIPT_NAME and
PATH_INFO, which got discussed to death previously with no consensus.

> Some environments will have a hard time constructing RAW_SCRIPT_NAME and
> RAW_PATH_INFO, but in my opinion they can just encode SCRIPT_NAME and
> PATH_INFO and be done with it; it's not as accurate, but it's no less
> accurate than what we have now.
>
> Actual transcoding in the environ is not supported or encouraged in this
> scheme.  If you want to adjust an encoding you should do it in your
> application/library code.
>
> There's some other topics, like chunked responses, unknown request body
> lengths, start_response, and maybe some other things, but these aren't
> Python 3 issues, they are just... generic issues.  app_iter.close() might be
> worth thinking about given new iterator semantics introduced since WSGI was
> written.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] WSGI for Python 3

2010-07-13 Thread Ian Bicking
So... there's been some discussion of WSGI on Python 3 lately.  I'm not
feeling as pessimistic as some people, I feel like we were close but just
didn't *quite* get there.

Here's my thoughts:

* Everyone agrees keys in the environ should be native strings
* Bodies should stay bytes
* Can we make all "standard" values that are str on Python 2, str on Python
3 with a Latin1 encoding?  This is basically what wsgiref did.  This means
HTTP_*, SERVER_NAME, etc.  Everything CGIish, and everything with an
all-caps key.  There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
and HTTP_COOKIE.
* I propose we let libraries handle HTTP_COOKIE however they want; don't
bother transcoding *into* the environ, just do so when you parse the cookie
(if you so choose).  Happy developers will just urlencode all their cookie
values to keep their cookies ASCII-clean.  Unhappy developers who have to
handle legacy cookies will just run environ['HTTP_COOKIE'].decode('latin1')
and then do whatever sad magic they are forced to do.
* I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
exclusively with encoded versions (that represent the original request
URI).  We use Latin1 encoding, but it should be ASCII anyway, like most of
the headers.
* I'm terrible at naming, but let's say these new values are RAW_SCRIPT_NAME
and RAW_PATH_INFO.

Does this solve everything?  There's broken stuff in the stdlib, but we
shouldn't bother ourselves with that -- if we need working code we should
just write it and ignore the stdlib or submit our stuff as patches to the
stdlib.

Some environments will have a hard time constructing RAW_SCRIPT_NAME and
RAW_PATH_INFO, but in my opinion they can just encode SCRIPT_NAME and
PATH_INFO and be done with it; it's not as accurate, but it's no less
accurate than what we have now.

Actual transcoding in the environ is not supported or encouraged in this
scheme.  If you want to adjust an encoding you should do it in your
application/library code.

There's some other topics, like chunked responses, unknown request body
lengths, start_response, and maybe some other things, but these aren't
Python 3 issues, they are just... generic issues.  app_iter.close() might be
worth thinking about given new iterator semantics introduced since WSGI was
written.

-- 
Ian Bicking  |  http://blog.ianbicking.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com