On 5 Dec 2011, 08:19 pm, [email protected] wrote: >Sorry it took me so long to get to this. Hopefully it's still relevant >;).
Heh. Heh heh heh. Heh. >On Nov 26, 2011, at 11:52 AM, [email protected] wrote: >>Apart from various issues relating to the lack of patterns in >>twisted.web.template, > >I had some trepidation about marking ><http://twistedmatrix.com/trac/ticket/5040> as "closed" :). What kind >of issues came up with patterns? Anything you feel needs fixing? The approach facilitated by #5040 seems to result in much more boilerplate than the approach facilitated by Nevow's patterns. The code for #4896 has many, many Elements. An implementation using Nevow probably would have had far fewer, perhaps only one. Which of these is better, I don't know. I certainly got bored very early on in the #4896 work, though. >>the main difficulty is in handling non-ascii contents in the >>traceback. Apart from any unicode that may show up in the source code >>being rendered (or, perhaps, eventually, the values of variables to be >>rendered - though for now I do not plan to implement this) the no- >>break space characters which are necessary to get traceback lines >>indented properly mean that there is always some non-ascii to include >>in the output. > >Looking at the actual output now, these characters strike me as >an accident of how browsers collapse different types of whitespace. >They could be replaced with a <span style="width: 4em;" /> to avoid >this problem for now, which is probably more expressive. If I understood Jonathan's reply properly, it sounds like the hack is the best we've got. >>twisted.web.template encodes its output using UTF-8, and this is not >>customizable. Thus, using twisted.web.template, formatFailure's >>result will be a str containing UTF-8 encoded text. Previously the >>result was a str containing only ASCII encoded text, with no-break >>space represented as ` ´. Consequently, callers of >>`formatFailure´ will probably mishandle the result - the caller in >>`twisted.web.server´ does, at least, including the bytes in a page >>with a content type of "text/html". >> >>The solutions that come to mind are all about removing this >>incompatible change and making it so `formatFailure´ can continue to >>return a str with ASCII-encoded text. >> >>One solution is to add support for named entities or numeric character >>references to twisted.web.template. Very likely this is a good idea >>regardless (Nevow supported these). > >I think that this is probably a necessary feature regardless, >eventually. Did you end up filing a ticket for it? Yep, this has been filed and is up for review (for weeks now ;): #5408. >>Another solution is to use a different encoding in >>`twisted.web.template´ - ASCII, with xmlcharrefreplace as the error >>handler. This is tempting since it avoids an obtrusive non-ASCII >>support API (the way Nevow supports these is via `nevow.entities´, >>which must be used rather than normal Python unicode objects). > >I like this idea, because it's so hard to get wrong even if you have >other problems (missing charset, buggy proxies, overly aggressive >encoding detection, etc). We can still say it's UTF-8 but it will work >anywhere ASCII will work :). >>Perhaps another question is whether the encoding used by >>`twisted.web.template´ should be a parameter. A related question >>raised might be whether `twisted.web.template´ should encoded to bytes >>at all, or delegate the responsibility for that to code closer to a >>socket. > >Personal experience looking at profiles of applications which serialize >a lot of XML suggests to me that encoding and decoding text in Python >is a huge chunk of CPU work and memory footprint; keeping the encoding >in t.w.t provides an opportunity for a potentially important >optimization which might not be possible if it were done closer to the >socket. > >For example, if we're generating a long table that generates 10MB of >HTML, if this is encoded incrementally (even foregoing any smarter >optimizations, like caching the encoded form of strings) then there's a >small working set of encoded data which can be collected as the >template renders, and by the time the final string is emitted by >cStringIO.getvalue() or what have you, you're using 20-ish megabytes of >heap to store your UTF-8 bytes (10 in the StringIO and 10 in the str). >If you build this as a unicode string instead, you'll end up using >50MB; 40MB for your unicode string, 10MB for the decoded bytes. Part >of this is just an implementation issue, but even if Python gets a >smarter unicode representation, you still need more space, because you >need to store the encoded and decoded representations concurrently. This all seems to suppose the non-existence of the twisted.web.template.flatten style interface. Doesn't that give you what's needed to do your incremental encoding outside of the flattener? > > >It might be a while until I get around to implementing something smart >in this area, but I'd prefer we have an interface that makes such >optimizations possible without breaking compatibility. >>As a work-around in `formatFailure´ I can decode the output of the >>flattener using UTF-8 and then re-encode it to avoid non-ASCII, but it >>seems like this should be solved in `twisted.web.template´ rather than >>over and over again in application code. > >If this does end up happening in formatFailure or anywhere else, please >(whoever does it) make sure to file a ticket to fix it; this should >never be more than a temporary workaround. Okay. #4896 is still up for review, and the branch implementing it does use the decode/encode hack. I'll file a ticket for fixing that if I ever get to merge the branch (someone review it please). Jean-Paul _______________________________________________ Twisted-web mailing list [email protected] http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
