P.J. Eby <p...@telecommunity.com> wrote: > At 02:28 PM 8/4/2009 +1000, Graham Dumpleton wrote: > >2009/8/4 P.J. Eby <p...@telecommunity.com>: > > > I'm not clear on your logic here. If I request foo/bar/baz (where baz > > > actually has an accent over the 'a') in latin-1 encoding, and > > foo/bar is the > > > script, then the (accented) baz is legitimate for pass-through to the > > > application, no? > > > >Technically, but what I am pointing out is that Apache pretty well > >says that foo/bar needs to be UTF-8. > > Which doesn't change the fact that you haven't yet proposed what a > WSGI server should *do* with such non-UTF8 bytes in PATH_INFO and > QUERY_STRING. Apache can and does pass through such bytes, so the > spec needs to say what we do with them.
Particularly QUERY_STRING. The original thinking around urlencoded was that it was always Latin-1. You were supposed to use "multipart/form-data" for non-Latin-1 encodings. Long thread on www-talk circa 1994 about this. I think bytes are the safest way to go here. It would be nice if we could automagically detect the correct encoding, but there's no foolproof way of doing that. Bill _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com