P.J. Eby <p...@telecommunity.com> wrote:

> At 02:28 PM 8/4/2009 +1000, Graham Dumpleton wrote:
> >2009/8/4 P.J. Eby <p...@telecommunity.com>:
> > > I'm not clear on your logic here.  If I request foo/bar/baz (where baz
> > > actually has an accent over the 'a') in latin-1 encoding, and 
> > foo/bar is the
> > > script, then the (accented) baz is legitimate for pass-through to the
> > > application, no?
> >
> >Technically, but what I am pointing out is that Apache pretty well
> >says that foo/bar needs to be UTF-8.
> 
> Which doesn't change the fact that you haven't yet proposed what a 
> WSGI server should *do* with such non-UTF8 bytes in PATH_INFO and 
> QUERY_STRING.  Apache can and does pass through such bytes, so the 
> spec needs to say what we do with them.

Particularly QUERY_STRING.  The original thinking around urlencoded was
that it was always Latin-1.  You were supposed to use
"multipart/form-data" for non-Latin-1 encodings.  Long thread on
www-talk circa 1994 about this.

I think bytes are the safest way to go here.  It would be nice if we
could automagically detect the correct encoding, but there's no
foolproof way of doing that.

Bill
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to