[Trac-dev] Re: my various performance tweaks to trac

Shane Caraveo Wed, 22 Jul 2009 09:48:19 -0700

On 7/22/09 6:08 AM, Christian Boos wrote:
> Hello Shane,
>
> First, great job at looking into the bowels of Trac ;-)
> Then, as a general comment, I see that some of your suggestions actually
> go against some of the changes I did in #6614, so we have not
> surprisingly a trade-off of memory vs. speed. In some environments where
> memory is strictly restricted, we have no choice but to optimize for
> memory, at the detriment of speed. But in most environments, the extra
> memory needed for achieving decent speed might be affordable. So I see
> here the opportunity for a configuration setting, something like [trac]
> favor_speed_over_memory, defaulting to true, that people having low
> resources could turn off.


For the gc.collect item, I think it should be a configurable background 
thread, rather than happening in the request loop.  I've been meaning to 
explore the memory use with the source browsing and timeline so I can 
understand what is happening there, but haven't got around to it.  For 
the encode loop, I was hoping that sending the output straight through 
would be a gain, I think there still might be some opportunity around 
that idea.

Another thought, again from a background thread, monitor memory usage 
and gc.collect at some threshold.  Low memory environments will end up 
doing this more often.

>>
>> Even with the gains I still get 100% cpu spikes every request, but at
>> least it's a shorter duration.  Still digging further on that.  I also
>> have done a bit on overall browser page load.
>>
>
> That spike is probably occurring during Genshi's generate and render calls.

Yeah, yesterday I added a cpu usage monitoring class that throws an 
exception when the cpu gets high.  It always happens deep in genshi 
during the iterations through the stream.

>> Here's my current brain dump.  If some of my conclusions are silly,
>> please let me know why.
>>
>> == General ==
>>
>> In general there are only a couple big wins.  For me it was removing
>> gc.collect (see trac.main) and the timing and estimation plugin.
>> Everything else was small potatoes in comparison (10ms here, 5ms there),
>> but together they added up to a good 40-50ms per request.  Think of it
>> this way, using 100%cpu and 50ms/request limits you to a max of 20
>> requests/second/cpu.  Every ms counts if we want decent throughput.  I'd
>> like to get under 30ms.
>>
>
> The gc on every request is the typical memory vs. speed trade-off. If it
> can be shown that despite not doing gc after every request, the memory
> usage stays within bound, then I think we can make that optional. As you
> said elsewhere, it's quite possible that this explicit gc simply hides a
> real memory leak that can be avoided by other means (like fixing the db
> pool issue with PostgreSQL).

out of site, out of mind ;)

>> Upgrade to jquery 1.3, it's much faster.  trac trunk has done this, and
>> there is a diff somewhere that shows what type of changes have to be
>> made.  You'd have to examine any plugins for similar jquery updates that
>> need to be done.
>>
>
> Backport from trunk to 0.11-stable welcomed ;-)

My patches are against .11-stable, I'll try and get a set of bugs with 
patches up this week.  I also patched a number of plugins, those are 
just changing the attribute selector for jquery.


>> I have a sneaking suspicion (unproven) that people who use mod_python
>> and claim turning off keep alive and/or mod_deflate are having problems
>> due to gc.collect.  As I understand apache filters (e.g. mod_deflate)
>> they wont finish up the request until the mod_python handler has
>> returned.  So that extra time in gc.collect delays the request being
>> returned, which delays mod_deflate finishing.  It also delays the start
>> of the next request over a persistent socket connection (ie. keep alive).
>>
>
> With regard to mod_deflate, I'm not sure how an extra 100ms can explain
> the reported difference in behavior.
> IIUC, you're using mod_deflate without any trouble, and switching it off
> doesn't make a difference?
> Was that also the case for you with 0.11.4?

It never did for me, when I read that I tried it right away.  However, 
if it does happen in some situations, gc.collect could be  a contributor 
to that.  It's also possible that it is not mod_deflate alone, but 
mod_deflate in combination with some other apache filter.

>> === filters ===
>>
>> Going through all the match_request implementations and removing
>> permission checking (put it in process_request), make sure regex matches
>> are precompiled, and generally simplifying things helps.  There are a
>> number of those in trac core, but plugins are greater abusers in this
>> area.  Also examine any IRequestFilter use and simplify.\
>>
>
> Not sure if the advice of removing permission checking in match_request
> is a good one.
> If done after the path_info test, doing the permission check shouldn't
> have a big impact and might be needed to enable the use of a fallback
> handler.

If I recall, quite often it's done prior to path matching.  The other 
item is the use of uncompiled regexs to match a path.  I would say that 
80% of plugins are simple and don't need to do more than a simple path 
check and can reserve the permission check until process_request.

>> === trac.config ===
>>
>> Other than Genshi, profiling showed trac.config to be the single largest
>> time on simple template generation.  Profiling showed me that the get
>> method in the Configuation class (trac.config.Configuration) was slow.
>> I added caching there for a few extra milliseconds boost.  I'm also
>> looking at removing the auto-reload if the ini file changes, maybe using
>> spread or memcached to create reload notifications, to get rid of file
>> stats, but timing doesn't show this to be a large issue on my laptop.
>>
>
> Interesting, can you make a ticket out of that?

Sure, with a patch :)  At least for the caching part which I have working.

>> === repository ===
>>
>> while I still want to remove the sync on every request (get rid of
>> system calls/file stats), I have been unable to show that performance
>> changes much when I test with it removed.  There are still bigger fish
>> to fry.
>>
>
> This will be addressed in the multirepos branch, I think we discussed
> making a setting for this, in order to keep a backward compatible
> behavior for the default repository.
>
>> === database pool ===
>>
>> Watching the postgres log file, I can tell that a lot of cursors get
>> created without being actually used to do a query.  This shows up
>> because psycopg2 executes a BEGIN when a cursor is created.  I haven't
>> yet looked into where that is happening, but it's extra work the system
>> is doing for nothing.
>>
>
> That's probably also worth a ticket on its own, unless this could be
> handled in #8443.

I think it's a different issue from 8443.  If someone does 
env.get_cnx/cnx.cursor then does some other check and bails before using 
the cursor, you get the extra transaction start/rollback.

>> === wiki ===
>>
>> The current wiki parser design is slow, doing the same large regex over
>> each line.  I think a general redesign to use re.finditer, rather than
>> line split then re.match, would help improve performance here.  However,
>> post-caching the content would be better.  I'm experimenting with
>> partial caching of the wiki content and have reduced my request timing
>> on WikiStart from 120ms to 75ms while still getting correct content.
>> The patch I have doesn't cache macro's unless the macro arguments have
>> 'cache' in them (good for page outline macro, which btw seems to
>> re-parse the entire page, macros included).  There may be other issues
>> with the approach I have taken, I haven't tested it much.  Once I get
>> further, I might actually pre-process when the wiki page is saved and
>> stash the result into the database, avoiding the parse in most requests.
>>
>
> Interesting as well, be sure to read the discussions about caching wiki
> content (#7739 and #1216).

Yeah, I looked at those before doing my own thing.  Doing a partial 
cache requires being in the formatter, I don't think it can be done 
properly as a generic solution via a plugin.   Having some generic 
memcache for sharing cache data between instances would be great, but I 
think I like the idea of pre-processing the wiki text and stashing it in 
the db, or fixing the wiki parser to be faster in combination with a 
simple cache like I've implemented.

I'll try to get the time this week to get some more bugs/patches posted.

Shane

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Trac-dev] Re: my various performance tweaks to trac

Reply via email to