On 7/22/09 6:08 AM, Christian Boos wrote: > Hello Shane, > > First, great job at looking into the bowels of Trac ;-) > Then, as a general comment, I see that some of your suggestions actually > go against some of the changes I did in #6614, so we have not > surprisingly a trade-off of memory vs. speed. In some environments where > memory is strictly restricted, we have no choice but to optimize for > memory, at the detriment of speed. But in most environments, the extra > memory needed for achieving decent speed might be affordable. So I see > here the opportunity for a configuration setting, something like [trac] > favor_speed_over_memory, defaulting to true, that people having low > resources could turn off.
For the gc.collect item, I think it should be a configurable background thread, rather than happening in the request loop. I've been meaning to explore the memory use with the source browsing and timeline so I can understand what is happening there, but haven't got around to it. For the encode loop, I was hoping that sending the output straight through would be a gain, I think there still might be some opportunity around that idea. Another thought, again from a background thread, monitor memory usage and gc.collect at some threshold. Low memory environments will end up doing this more often. >> >> Even with the gains I still get 100% cpu spikes every request, but at >> least it's a shorter duration. Still digging further on that. I also >> have done a bit on overall browser page load. >> > > That spike is probably occurring during Genshi's generate and render calls. Yeah, yesterday I added a cpu usage monitoring class that throws an exception when the cpu gets high. It always happens deep in genshi during the iterations through the stream. >> Here's my current brain dump. If some of my conclusions are silly, >> please let me know why. >> >> == General == >> >> In general there are only a couple big wins. For me it was removing >> gc.collect (see trac.main) and the timing and estimation plugin. >> Everything else was small potatoes in comparison (10ms here, 5ms there), >> but together they added up to a good 40-50ms per request. Think of it >> this way, using 100%cpu and 50ms/request limits you to a max of 20 >> requests/second/cpu. Every ms counts if we want decent throughput. I'd >> like to get under 30ms. >> > > The gc on every request is the typical memory vs. speed trade-off. If it > can be shown that despite not doing gc after every request, the memory > usage stays within bound, then I think we can make that optional. As you > said elsewhere, it's quite possible that this explicit gc simply hides a > real memory leak that can be avoided by other means (like fixing the db > pool issue with PostgreSQL). out of site, out of mind ;) >> Upgrade to jquery 1.3, it's much faster. trac trunk has done this, and >> there is a diff somewhere that shows what type of changes have to be >> made. You'd have to examine any plugins for similar jquery updates that >> need to be done. >> > > Backport from trunk to 0.11-stable welcomed ;-) My patches are against .11-stable, I'll try and get a set of bugs with patches up this week. I also patched a number of plugins, those are just changing the attribute selector for jquery. >> I have a sneaking suspicion (unproven) that people who use mod_python >> and claim turning off keep alive and/or mod_deflate are having problems >> due to gc.collect. As I understand apache filters (e.g. mod_deflate) >> they wont finish up the request until the mod_python handler has >> returned. So that extra time in gc.collect delays the request being >> returned, which delays mod_deflate finishing. It also delays the start >> of the next request over a persistent socket connection (ie. keep alive). >> > > With regard to mod_deflate, I'm not sure how an extra 100ms can explain > the reported difference in behavior. > IIUC, you're using mod_deflate without any trouble, and switching it off > doesn't make a difference? > Was that also the case for you with 0.11.4? It never did for me, when I read that I tried it right away. However, if it does happen in some situations, gc.collect could be a contributor to that. It's also possible that it is not mod_deflate alone, but mod_deflate in combination with some other apache filter. >> === filters === >> >> Going through all the match_request implementations and removing >> permission checking (put it in process_request), make sure regex matches >> are precompiled, and generally simplifying things helps. There are a >> number of those in trac core, but plugins are greater abusers in this >> area. Also examine any IRequestFilter use and simplify.\ >> > > Not sure if the advice of removing permission checking in match_request > is a good one. > If done after the path_info test, doing the permission check shouldn't > have a big impact and might be needed to enable the use of a fallback > handler. If I recall, quite often it's done prior to path matching. The other item is the use of uncompiled regexs to match a path. I would say that 80% of plugins are simple and don't need to do more than a simple path check and can reserve the permission check until process_request. >> === trac.config === >> >> Other than Genshi, profiling showed trac.config to be the single largest >> time on simple template generation. Profiling showed me that the get >> method in the Configuation class (trac.config.Configuration) was slow. >> I added caching there for a few extra milliseconds boost. I'm also >> looking at removing the auto-reload if the ini file changes, maybe using >> spread or memcached to create reload notifications, to get rid of file >> stats, but timing doesn't show this to be a large issue on my laptop. >> > > Interesting, can you make a ticket out of that? Sure, with a patch :) At least for the caching part which I have working. >> === repository === >> >> while I still want to remove the sync on every request (get rid of >> system calls/file stats), I have been unable to show that performance >> changes much when I test with it removed. There are still bigger fish >> to fry. >> > > This will be addressed in the multirepos branch, I think we discussed > making a setting for this, in order to keep a backward compatible > behavior for the default repository. > >> === database pool === >> >> Watching the postgres log file, I can tell that a lot of cursors get >> created without being actually used to do a query. This shows up >> because psycopg2 executes a BEGIN when a cursor is created. I haven't >> yet looked into where that is happening, but it's extra work the system >> is doing for nothing. >> > > That's probably also worth a ticket on its own, unless this could be > handled in #8443. I think it's a different issue from 8443. If someone does env.get_cnx/cnx.cursor then does some other check and bails before using the cursor, you get the extra transaction start/rollback. >> === wiki === >> >> The current wiki parser design is slow, doing the same large regex over >> each line. I think a general redesign to use re.finditer, rather than >> line split then re.match, would help improve performance here. However, >> post-caching the content would be better. I'm experimenting with >> partial caching of the wiki content and have reduced my request timing >> on WikiStart from 120ms to 75ms while still getting correct content. >> The patch I have doesn't cache macro's unless the macro arguments have >> 'cache' in them (good for page outline macro, which btw seems to >> re-parse the entire page, macros included). There may be other issues >> with the approach I have taken, I haven't tested it much. Once I get >> further, I might actually pre-process when the wiki page is saved and >> stash the result into the database, avoiding the parse in most requests. >> > > Interesting as well, be sure to read the discussions about caching wiki > content (#7739 and #1216). Yeah, I looked at those before doing my own thing. Doing a partial cache requires being in the formatter, I don't think it can be done properly as a generic solution via a plugin. Having some generic memcache for sharing cache data between instances would be great, but I think I like the idea of pre-processing the wiki text and stashing it in the db, or fixing the wiki parser to be faster in combination with a simple cache like I've implemented. I'll try to get the time this week to get some more bugs/patches posted. Shane --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Trac Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/trac-dev?hl=en -~----------~----~----~----~------~----~------~--~---
