On Sat, Sep 19, 2009 at 6:00 PM, Armin Ronacher <armin.ronac...@active-4.com> wrote: > Hi, > > René Dudfield schrieb: >> What is proposed: > Where was that proposed? > >> 1. Default utf-8 to be used. > That's a possibility yes, but it has to be carefully be considered. > >> 2. A buffer to be used for raw data. > What is raw data? If you mean we keep the unencoded data around, I > would strongly argue against that. Otherwise it makes middlewares even > harder to write. >
raw data in this case is what ever the data from the server is. The idea is to convert it on demand. >> 3. New keys which are callables to request the encoding you want. > Did I miss something? Why are we requesting encodings now? > You can request encodings. The idea is to make it explicit about which encoding you want. This also allows no conversion to take place if it isn't needed. Converting strings is a waste of time if it's not needed. >>> b = "a" * 4096 >>> %timeit b.decode('utf-8') 100000 loops, best of 3: 15 µs per loop Even length 1 strings take a while too. >>> b = "a" >>> %timeit b.decode('utf-8') 1000000 loops, best of 3: 1.84 µs per loop Note, that you need a method call with the decode anyway. In comparison a method call is a tiny amount of time. >>> a = {'SCRIPT_INFO':"asdfasdf", 'SCRIPT_INFO2': lambda : 'asdfasdf2'} >>> %timeit a['SCRIPT_INFO2']() 1000000 loops, best of 3: 267 ns per loop >>> %timeit a['SCRIPT_INFO'] 10000000 loops, best of 3: 122 ns per loop This is why avoiding encode/decode work is better. If environ was allowed to be a non dict... and a real object then it would be possible to avoid the dict key lookup and the method call. >> 4. Encoding keys are specified. >> 4.a URI encoding key 'wsgi.uri_encoding' >> 4.b Form data encoding key 'wsgi.form_encoding' >> 4.c Page encoding key 'wsgi.page_encoding' >> 4.d Header encoding key 'wsgi.header_encoding' > I don't know where you are getting that from. The only WSGI key would > be `wsgi.uri_encoding` and that is only set by the server and only used > for legacy non UTF-8 URLs. > I got that from your list of things with different encodings. Why not use it for the other parts as well? Some header keys use different encodings, as does form data, and page encodings. >> 5. For next version of wsgi (1.1 or 2.0), using an adapter for >> backwards compat for wsgi 1.0 apps on wsgi2 server. > No decision about WSGI versioning was made so far. If WSGI in Python 3 > is based on unicode, then the version is raised to 1.1, 2.0 is not yet > discussed as far as I'm concerned. > Sure, it's a separate issue. However I'm addressing it here, . WSGI 2.0 has been discussed in various emails recently, and in grahams blog post. Also here is a wsgi 2.0 wiki page on wsgi.org. >> 2.c Avoiding bytes type and syntax for compatibility with <= >> python 2.5.4 (buffer, and unicode) > If WSGI for Python 3 is based on Unicode it will use '' for textual > context and b'' for bytes. If it's based on bytes it will obviously use > the byte literals. Again, using bytes doesn't seem as nice as using buffers along with unicode. Since buffers can be faster(not immutable so you can avoid memory allocation, and make use of zero copy networking), and buffers are available in more versions of python. > >> 3. Transcoding to only happen if needed. > I can't see how that would work if it's based on unicode, if it's based > on bytes that's already what happens in WSGI 1. > Since you can request different encodings, if an encoding is available it can be given... if it's not available the conversion can be made. If you don't need the conversion to be done... the conversion can be avoided completely. >> 4. URI encoding can be explicitly stated in a URI key > This value is only *set* by the server on decode, the value is to be > ignored by the actual application or middleware except for QUERY_STRING > and REQUEST_URI decoding. Everything else makes things a lot more > complicated without improving anything. > yeah, the server states what is happening. As the application requests what it wants, it doesn't need to query those keys. >> 5. Backwards compat for wsgi 1.0 apps on wsgi 2 server. Also wsgi >> 2.0 apps on wsgi 1.0 server with an adapter. > Again, WSGI 2.0 is something that has to be discussed separately, > otherwise we totally lose track. > >> Issues with proposal? Things this proposal did not consider? > Yes you did: > > - it has no real world advantage over either WSGI based on unicode > that is utf-8 with latin1 fallback or a WSGI based on bytes. I listed all the advantages in the 'This allows or this is good because:' section. Can you explain why they are not real? > - it's backwards incompatible in every way, even to CGI. why is it? wsgi apps can use an adapter to use it. wsgi 1.0 servers can also use an adapter. > - it is slow because every dict access would also cause a function > call. As explained above, the transcoding cost can be avoided or reduced, function calls need to be made anyway (the decode() calls), and there's also the possibility of using buffers to avoid memory allocation and allow zero copy networking. > Furthermore middlewares would most likely start causing > circular dependencies when they replace the callable with a new > callable and they do not alias the value as a local in the frame > that created it. > Yes, I think the callables will need a set method... rather than letting the middleware replace callables. I think this could be used for middleware: environ['SCRIPT_NAME'](set = "/bla/", urldecoding = False, encoding ='utf-8') but then this(one callable) would probably be better ;) environ(what='SCRIPT_NAME', set = "/bla/", urldecoding = False, encoding ='utf-8') Since changing the middleware could potentially trigger the rest of the decoding. In some situations you would want to avoid reading from the socket at all. So middleware changing stuff would mean you would need to read from the socket(obviously you need to read stuff before changing it). Why would you not want to read from the socket at all? (wsgi 1.0 makes these impossible) - to block certain hosts by looking at their ip. - you might just care about a connection, like any connection triggers an action. - for load balancing - to look at the port number, eg, to check if port 443 is used. - if you are overloaded(dos), you want to drop the connection right away. - ... others. So allowing the server to avoid most processing before the application requests certain data could be a good thing. So with middleware changing the environ, it means that all those callables need to be linked to allow the rest of them to know something has been changed. So that when one thing is changed, it drops back to wsgi 1.0 behaviour - that is, some of the encoding is done just before any change is allowed. Or maybe middleware has to call a environ['changing']() callable. Which could then trigger the callables internal transcoding and socket reading etc. I'm not sure if it will make middleware harder to use or not still. I'm working through the function Philip sent to see how it turns out, and will send an updated proposal after that. _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com