2009/9/19 Armin Ronacher <armin.ronac...@active-4.com>: > Graham's suggestion for URL encodings means that the URL encoding would > ahve to be passed to the WSGI server from outside (he proposed the > apache config as an example). This means that the application behavior > will change based on the server configuration, causing even more confusion.
No it doesn't and you could still have things work without needing to override the default encodings applied. The default rule inside of the WSGI adapter would be: try: script_name = raw_script_name.decode('utf-8') path_info = raw_path_info.decode('utf-8') query_string = raw_query_string.decode('utf-8') uri_encoding = 'utf-8' except: script_name = raw_script_name.decode('iso-8859-1') path_info = raw_path_info.decode('iso-8859-1') query_string = raw_query_string.decode('iso-8859-1') uri_encoding = 'iso-8859-1' finally: environ['SCRIPT_NAME'] = script_name environ['PATH_INFO'] = path_info environ['QUERY_STRING'] = query_string environ['wsgi.uri_encoding'] = uri_encoding At the WSGI application level, if it provides for use of an alternate URI encoding, I saw that all it would need to do (ignoring encoding name equivalence issues for now) is: if application_uri_encoding != environ['wsgi.uri_encoding']: raw_script_name = environ['SCRIPT_NAME'].encode(environ['wsgi.uri_encoding']) raw_path_info = environ['PATH_INFO'].encode(environ['wsgi.uri_encoding']) raw_query_string = environ['QUERY_STRING'].encode(environ['wsgi.uri_encoding']) script_name = raw_script_name.decode(application_uri_encoding) path_info = raw_path_info.decode(application_uri_encoding) query_string = raw_query_string.decode(application_uri_encoding) else: script_name = environ['SCRIPT_NAME'] path_info = environ['PATH_INFO'] query_string = environ['QUERY_STRING'] So, no strict need to make the WSGI adapter do it differently. You may want to only do that if concerned about overhead of transcoding. Transcoding just these is most probably going to be less overhead than the WSGI adapter having to set up both unicode and raw values in a dictionary for everything. Even with your iso-8859-4 example, can't see how you can without knowing loose what original characters are, as wsgi.uri_encoding being provided always allows you to transcode to what you needed it to be when what was supplied didn't match. As to the separate argument about repeating slashes and percent encoding of slashes and loosing distinction, the definition using wsgi.uri_encoding also provided REQUEST_URI as bytes anyway, so you can get it directly from that as want you wanted in bytes everywhere solution anyway. Now you can go back to monologue, as definitely sleeping now. ;-) Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com