Hello again, ok, getting back on topic... away from py3k porting methods...
Using an API where the user can request the type wanted solves a lot of encoding issues. This is similar to Grahams suggestion, but instead allowing a user to request which encoding they want, and also get access to the raw data if needed. What is proposed: 1. Default utf-8 to be used. 2. A buffer to be used for raw data. 3. New keys which are callables to request the encoding you want. 4. Encoding keys are specified. 4.a URI encoding key 'wsgi.uri_encoding' 4.b Form data encoding key 'wsgi.form_encoding' 4.c Page encoding key 'wsgi.page_encoding' 4.d Header encoding key 'wsgi.header_encoding' 5. For next version of wsgi (1.1 or 2.0), using an adapter for backwards compat for wsgi 1.0 apps on wsgi2 server. This allows or this is good because: 1. utf-8 is most common for frameworks and web browsers. 2.a Raw values to be accessed in the rare cases they are needed. 2.b More performant wsgi servers (zero-copy and zero-allocation become possible with buffers) 2.c Avoiding bytes type and syntax for compatibility with <= python 2.5.4 (buffer, and unicode) 3. Transcoding to only happen if needed. 4. URI encoding can be explicitly stated in a URI key 5. Backwards compat for wsgi 1.0 apps on wsgi 2 server. Also wsgi 2.0 apps on wsgi 1.0 server with an adapter. How applications use this proposal: # here we get the default encoding and type - unicode utf-8, and it's urldecoded. script_name_default_type = environ['SCRIPT_NAME']() # we can pass in the encoding we want. script_name = environ['SCRIPT_NAME'](application_uri_encoding) script_name_utf8 = environ['SCRIPT_NAME']('utf-8') script_name_iso_8859_1 = environ['SCRIPT_NAME']('iso-8859-1') # we can get it as a buffer with raw bytes. script_name_buffer = environ['SCRIPT_NAME'](as_buffer = True, no_urldecoding = True) # we can get it as whatever the raw native type is. script_name_native = environ['SCRIPT_NAME'](native_type = True, no_urldecoding = True) For servers: Servers store only the native raw version in the environ(as buffer, or whatever their native type and encoding is), and callables to do any transcoding as needed. If the application does not use it, then the server doesn't use any resources transcoding or storing different transcoded versions. Adapters: To make it easier for backwards compatibility wsgiref should have adapters for old servers and clients. For wsgi 1.0 apps on wsgi 2.0 servers: An adapter would be written to return a wsgi1 key suitable environ. For wsgi 1.0 servers running wsgi 2.0 apps. An adapter should be available to let wsgi 2.0 apps run on wsgi 1.0 servers. Issues with proposal? Things this proposal did not consider? - maybe we could be explicit about what the http server, http client, wsgi client, and application think the encodings are. This might allow 'fail fast', and sanity checking so things aren't messed up silently. If the webserver, web client and application developer all specifiy what they are expecting... then checks could be done, otherwise if one of them can't specify for some reason, then it's the situation we are in now. Haven't thought this through much. _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com