On Jan 3, 2012, at 10:54 AM, [email protected] wrote: > On 5 Dec 2011, 08:19 pm, [email protected] wrote: >> Sorry it took me so long to get to this. Hopefully it's still relevant >> ;). > > Heh. Heh heh heh. Heh.
So it goes ;-). >> On Nov 26, 2011, at 11:52 AM, [email protected] wrote: >>> Apart from various issues relating to the lack of patterns in >>> twisted.web.template, >> >> I had some trepidation about marking >> <http://twistedmatrix.com/trac/ticket/5040> as "closed" :). What kind >> of issues came up with patterns? Anything you feel needs fixing? > > The approach facilitated by #5040 seems to result in much more > boilerplate than the approach facilitated by Nevow's patterns. The code > for #4896 has many, many Elements. An implementation using Nevow > probably would have had far fewer, perhaps only one. > > Which of these is better, I don't know. I certainly got bored very > early on in the #4896 work, though. Well, if the approach on #5040 is way more verbose, what does it have in its favor? Simplicity? I must imagine that we can get both somehow. >>> the main difficulty is in handling non-ascii contents in the >>> traceback. Apart from any unicode that may show up in the source code >>> being rendered (or, perhaps, eventually, the values of variables to be >>> rendered - though for now I do not plan to implement this) the no- >>> break space characters which are necessary to get traceback lines >>> indented properly mean that there is always some non-ascii to include >>> in the output. >> >> Looking at the actual output now, these characters strike me as >> an accident of how browsers collapse different types of whitespace. >> They could be replaced with a <span style="width: 4em;" /> to avoid >> this problem for now, which is probably more expressive. > > If I understood Jonathan's reply properly, it sounds like the > hack is the best we've got. I don't _want_ to read Jonathan's reply thoroughly enough to understand it, so I'll have to take your word for it. >>> twisted.web.template encodes its output using UTF-8, and this is not >>> customizable. Thus, using twisted.web.template, formatFailure's >>> result will be a str containing UTF-8 encoded text. Previously the >>> result was a str containing only ASCII encoded text, with no-break >>> space represented as ` ´. Consequently, callers of >>> `formatFailure´ will probably mishandle the result - the caller in >>> `twisted.web.server´ does, at least, including the bytes in a page >>> with a content type of "text/html". >>> >>> The solutions that come to mind are all about removing this >>> incompatible change and making it so `formatFailure´ can continue to >>> return a str with ASCII-encoded text. >>> >>> One solution is to add support for named entities or numeric character >>> references to twisted.web.template. Very likely this is a good idea >>> regardless (Nevow supported these). >> >> I think that this is probably a necessary feature regardless, >> eventually. Did you end up filing a ticket for it? > > Yep, this has been filed and is up for review (for weeks now ;): #5408. Great, okay. >>> Another solution is to use a different encoding in >>> `twisted.web.template´ - ASCII, with xmlcharrefreplace as the error >>> handler. This is tempting since it avoids an obtrusive non-ASCII >>> support API (the way Nevow supports these is via `nevow.entities´, >>> which must be used rather than normal Python unicode objects). >> >> I like this idea, because it's so hard to get wrong even if you have >> other problems (missing charset, buggy proxies, overly aggressive >> encoding detection, etc). We can still say it's UTF-8 but it will work >> anywhere ASCII will work :). >>> Perhaps another question is whether the encoding used by >>> `twisted.web.template´ should be a parameter. A related question >>> raised might be whether `twisted.web.template´ should encoded to bytes >>> at all, or delegate the responsibility for that to code closer to a >>> socket. >> >> Personal experience looking at profiles of applications which serialize >> a lot of XML suggests to me that encoding and decoding text in Python >> is a huge chunk of CPU work and memory footprint; keeping the encoding >> in t.w.t provides an opportunity for a potentially important >> optimization which might not be possible if it were done closer to the >> socket. >> >> For example, if we're generating a long table that generates 10MB of >> HTML, if this is encoded incrementally (even foregoing any smarter >> optimizations, like caching the encoded form of strings) then there's a >> small working set of encoded data which can be collected as the >> template renders, and by the time the final string is emitted by >> cStringIO.getvalue() or what have you, you're using 20-ish megabytes of >> heap to store your UTF-8 bytes (10 in the StringIO and 10 in the str). >> If you build this as a unicode string instead, you'll end up using >> 50MB; 40MB for your unicode string, 10MB for the decoded bytes. Part >> of this is just an implementation issue, but even if Python gets a >> smarter unicode representation, you still need more space, because you >> need to store the encoded and decoded representations concurrently. > > This all seems to suppose the non-existence of the > twisted.web.template.flatten > style interface. Doesn't that give you what's needed to do your > incremental encoding outside of the flattener? Hmmmmmm. Okay, generating a couple of short encoded strings does leave one with a much shorter working set. There should definitely be a lot more convenience functions in this area to just do the right thing in the various contexts one might want to flatten something (for which there are already a few tickets, such as <http://tm.tl/5395>). As I recall you've spoken against the flatten() style interface because it makes error-handling somewhat more challenging, but if #5395 were fixed it could take care of those complexities internally. >> It might be a while until I get around to implementing something smart >> in this area, but I'd prefer we have an interface that makes such >> optimizations possible without breaking compatibility. >>> As a work-around in `formatFailure´ I can decode the output of the >>> flattener using UTF-8 and then re-encode it to avoid non-ASCII, but it >>> seems like this should be solved in `twisted.web.template´ rather than >>> over and over again in application code. >> >> If this does end up happening in formatFailure or anywhere else, please >> (whoever does it) make sure to file a ticket to fix it; this should >> never be more than a temporary workaround. > > Okay. #4896 is still up for review, and the branch implementing it does > use the decode/encode hack. I'll file a ticket for fixing that if I > ever get to merge the branch (someone review it please). Why not just file the ticket now? As you said before: "Heh. Heh heh heh. Heh." It might be a while before sufficient review bandwidth becomes available. (If history is any indicator, things will stall out between now and February, and March will be crazily active.) -glyph _______________________________________________ Twisted-web mailing list [email protected] http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
