On Fri, Jul 16, 2010 at 4:33 AM, And Clover <and...@doxdesk.com> wrote:

> On 07/14/2010 06:43 AM, Ian Bicking wrote:
>
>  There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
>> and HTTP_COOKIE.
>>
>
> (And of those, PATH_INFO is the only one that really matters, in that
> no-one really uses non-ASCII script filenames, and non-ASCII characters in
> Cookie/Set-Cookie are still handled so differently/brokenly across browsers
> that you can't rely on them at all.)
>
>
>  * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
>> exclusively with encoded versions
>>
>
> For compatibility with existing apps, how about keeping the existing
> SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying
> that the new 'raw' versions (whatever they are called) are added only if
> they really are raw, not reconstructed.
>

Having two ways of expressing the same information will lead to bugs related
to which data is canonical.  If an application is using
SCRIPT_NAME/PATH_INFO and then updates those values in any way, and
wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be
weird bugs and code will disagree about which one is correct.  Since %2f can
exist in the raw versions, there isn't even a way to chunk the two variables
in the same way.

Then existing scripts that don't care about non-ASCII and slashes can carry
> on as before, and for apps that do care about them, they'll be able to be
> *sure* the input is correct. Or they can fall back to PATH_INFO when not
> present, and avoid producing these kind of URLs in response.
>

I don't think it works to imagine you can just not care about non-ASCII.
Requests come in.  WSGI should represent those requests.  If a request comes
in with non-ASCII bytes then WSGI needs to do *something* with it.  I don't
want to have to configure servers with application policy; servers should
just work.

And this doesn't help with Python 3: either we have byte values of
SCRIPT_NAME and PATH_INFO in Python 3, or we have text values.  I think
bytes will be more awkward to port to than text, and inconsistent with other
WSGI values.  If we have text then we have to choose an encoding.  Latin1
will work, but it will be the exact wrong encoding most of the time as UTF-8
is the typical  (unlike other headers, where Latin1 will mostly be an okay
encoding, or as good a guess as we have).  If we firmly remove these keys
then we can avoid this choice entirely... and we conveniently also get a
better representation of the request.

Note that libraries can smooth over this change; WebOb for instance will
certainly still support req.script_name/req.path_info by decoding the raw
values.  Admittedly lots of code use these values directly... but at least
if they get a KeyError the port/fix will be obvious (as opposed to out of
sync values, which will only emerge as a problem occasionally -- I'd rather
not invite more occasional bugs).

-- 
Ian Bicking  |  http://blog.ianbicking.org
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to