[Trac-dev] my various performance tweaks to trac

Shane Caraveo Mon, 20 Jul 2009 18:26:15 -0700

I was going to do this as a ticket since it's so long, but edgwall likes 
to think of me as a spammer sometimes.  This was partly written as a 
response/suggestion to people posting on ticket 7490 as well, so 
includes suggestions.


Shane

----

I've recently reduced per request time for *simple* template generation 
from 150ms to 45ms on my laptop (dual 1.6ghz core duo 2GB, tested with 
ab against a simple template).  I also reduced page load times (reported 
via google page load/firebug) to around a second (from 2-8 seconds in 
some cases).  These timings are without any network overhead since I'm 
testing on the local machine.  Times are all taken on my laptop with my 
typical environment/applications open (lots of crap).

This is all done with 11.5.

Even with the gains I still get 100% cpu spikes every request, but at 
least it's a shorter duration.  Still digging further on that.  I also 
have done a bit on overall browser page load.

Here's my current brain dump.  If some of my conclusions are silly, 
please let me know why.

== General ==

In general there are only a couple big wins.  For me it was removing 
gc.collect (see trac.main) and the timing and estimation plugin. 
Everything else was small potatoes in comparison (10ms here, 5ms there), 
but together they added up to a good 40-50ms per request.  Think of it 
this way, using 100%cpu and 50ms/request limits you to a max of 20 
requests/second/cpu.  Every ms counts if we want decent throughput.  I'd 
like to get under 30ms.

=== How I timed ===

My test setup is apache prefork with wsgi deamon mode, one process, 
mod_deflate enabled.  My laptop is pretty loaded, very little free 
memory but generally enough that trac shouldn't be swapping.  Postgres 
is the database, running on the laptop as well.

For timing template generation I used ApacheBench and tossed out 
"warmup" requests, sometimes testing with keep alive (getting slightly 
better req/s)

{{{
ab [-k] -c 1 -n 10 url
}}}

With zero network latency, the timing is essentially how much time trac 
is taking to generate the content.  I tested against a simple template 
generated from a small plugin that just renders a template file (this is 
to involve as little other processing such as querying the ticket table 
or permissions), and against WikiStart.

For "page load" I used Google Page Load extension for firebug and 
firefox 3.5.  I also tested with Safari 4 which has similar 
functionality now, it's much faster than firefox :(

I tested with plugins enabled since I wanted to also discover whether 
any plugins were causing me grief, some were.

I tested against empty projects, so table indexing or table size are not 
involved.  I also tested against an small sized project in some 
instances, which showed up issues in a couple plugins (eg. t&e).

=== Profiling issues ===

It seems to me that cProfile adds times from generators to the function 
that uses them, masking where the real time is spent.  That makes 
profiling Genshi terribly difficult.  I also tried dtrace out, it gives 
the same rough data as cProfile.  Knowing a tiny bit about the python 
engine (having written a c level debugger extension), I am figuring that 
it has something to do with how generators work in the python core.

A couple areas that cProfile claimed a lot of exclusive time in Genshi I 
rewrote in C.  This didn't change my performance at all, but did shift 
timing to other area's which helped me find a couple of the actual time 
sinks in Genshi (see below).

== For general performance ==

Use either a multithreaded apache if you use mod_python, or use wsgi in 
multithreaded deamon mode.  It wont remove high-cpu slow requests, but 
it does lower memory use to some extent.  While that didn't improve the 
per-request performance, I got higher concurrent throughput with that 
setup.   Apache prefork with mod_python or wsgi in embeded mode will 
chew up memory, and due to high cpu usage, processes will be fighting 
each other more for resources.

=== customizations and plugins ===

The single largest gain I got with plugins was by removing timing and 
estimation plugin (primarily seen in ticket reports page where it does 
another sql query for each ticket, doubling the request time).  It's 
stream filters slow lots of stuff down.  Removing other plugins with 
stream filters or reworking them to not use stream filters when possible 
is a win.  In some instances, TracHours plugin is a better solution if 
you don't need all the t&e features.

Get rid of stuff like the google analytics plugin and just add the code 
to a site or theme template.  You'll get better performance.  Its not 
that it's a bad plugin, but it does something simple that adds *genshi* 
overhead.

Examine any custom templates for extraneous py:match use, get rid of it. 
  I had one match in the site template that was removing something from 
an admin panel, which I had forgotten about, big mistake and it cost on 
the performance of *every* page.  I directly patched the template 
instead and saved myself 10ms per request.  This kind of thing doesn't 
show up in the profiler, it was some debug dumps I inserted deep into 
the Genshi bowels that showed me the light.

Make sure all static resources for all plugins are exported.  trac-admin 
deploy will only deploy static resources for enabled plugins in that 
project.

=== javascript ===

This is mostly about page load times.

minify all the javascript output by trac-admin deploy, and make sure it 
gets cache headers via apache conf.

Upgrade to jquery 1.3, it's much faster.  trac trunk has done this, and 
there is a diff somewhere that shows what type of changes have to be 
made.  You'd have to examine any plugins for similar jquery updates that 
need to be done.

If you use firebug, be sure you have disabled it (or at least disable 
the console tab) for trac pages, the console somehow interrupts and adds 
a couple seconds delay in the browser.

=== css ===

I haven't got to this yet, but there is a lot of slow css there. 
Removing all the css as a test, my page load in the browser was 200ms 
faster.  Google Page Load is good at reporting what css *might* be slow, 
it reports lots of inefficient css in trac.

== trac issues ==

=== gc.collect ===

The next big item was removing gc.collect.  If I recall correctly, 
gc.collect took at least 50ms of the request time, sometimes upwards of 
100ms.  It also hides other bugs (ticket:8443 for example, the postgres 
idle bug came back without gc.collect, I've got a patch in that bug). 
gc.collect could be done in a background thread from tracd, but I 
wouldn't suggest background threads for mod_python prefork.  With wsgi 
daemon mode, just limit the number of requests before restart to 
something you think is fine and test memory consumption.  I think the 
use of gc.collect should be seriously reconsidered.

I have a sneaking suspicion (unproven) that people who use mod_python 
and claim turning off keep alive and/or mod_deflate are having problems 
due to gc.collect.  As I understand apache filters (e.g. mod_deflate) 
they wont finish up the request until the mod_python handler has 
returned.  So that extra time in gc.collect delays the request being 
returned, which delays mod_deflate finishing.  It also delays the start 
of the next request over a persistent socket connection (ie. keep alive).

=== filters ===

Going through all the match_request implementations and removing 
permission checking (put it in process_request), make sure regex matches 
are precompiled, and generally simplifying things helps.  There are a 
number of those in trac core, but plugins are greater abusers in this 
area.  Also examine any IRequestFilter use and simplify.

=== trac.config ===

Other than Genshi, profiling showed trac.config to be the single largest 
time on simple template generation.  Profiling showed me that the get 
method in the Configuation class (trac.config.Configuration) was slow. 
I added caching there for a few extra milliseconds boost.  I'm also 
looking at removing the auto-reload if the ini file changes, maybe using 
spread or memcached to create reload notifications, to get rid of file 
stats, but timing doesn't show this to be a large issue on my laptop.

=== repository ===

while I still want to remove the sync on every request (get rid of 
system calls/file stats), I have been unable to show that performance 
changes much when I test with it removed.  There are still bigger fish 
to fry.

=== database pool ===

Watching the postgres log file, I can tell that a lot of cursors get 
created without being actually used to do a query.  This shows up 
because psycopg2 executes a BEGIN when a cursor is created.  I haven't 
yet looked into where that is happening, but it's extra work the system 
is doing for nothing.

=== wiki ===

The current wiki parser design is slow, doing the same large regex over 
each line.  I think a general redesign to use re.finditer, rather than 
line split then re.match, would help improve performance here.  However, 
post-caching the content would be better.  I'm experimenting with 
partial caching of the wiki content and have reduced my request timing 
on WikiStart from 120ms to 75ms while still getting correct content. 
The patch I have doesn't cache macro's unless the macro arguments have 
'cache' in them (good for page outline macro, which btw seems to 
re-parse the entire page, macros included).  There may be other issues 
with the approach I have taken, I haven't tested it much.  Once I get 
further, I might actually pre-process when the wiki page is saved and 
stash the result into the database, avoiding the parse in most requests.

== Genshi ==

Use Genshi trunk, some performance gain there.

With Genshi, one big win is turning off the whitespace filter, I think I 
got around 10ms on a small page, the gain would be larger on larger 
pages.  I did that by patching trac.web.chrome where there is a call to 
stream.render, and adding strip_whitespace=False.  While this increases 
the file size delivered slightly (and adds a lot of whitespace to the 
source view), that can be countered by mod_deflate.  The namespace 
flattener is another slow filter, but it's necessary.

Another win with Genshi is patching genshi.output.encode and getting rid 
of the iteration that calls _encode a bazillion times.  Instead, just do:

{{{
  out.write(_encode(u''.join(list(iterator))))
}}}

This is much faster.

I should note, I also tried changing trac so that the output went 
straight through to the wsgi layer (rather than buffering it in a 
cStringIO object).  While this improved the initial wait time 
dramatically, it actually slowed down the overall request time.  I 
believe this slowdown was also due to the _encode calls inside the 
iteration over the stream.  I'm still going to experiment with this more 
at some point.

Overall, I like Genshi, now that I kind of get how it works it has a 
certain elegance, but I am left unconvinced that it's design and 
flexibility outweighs the performance issues, and the added difficulty 
in figuring them out (refer to profiling at the top).  One way I'm 
thinking of working around this is to beef up the xmlrpc interfaces and 
do everything with ajax.

== Misc ==

=== plugins ===

I don't think most plugin authors know where they will generate 
performance penalties.  It's hard to know all those details without 
spending a lot of time on it.

generally pay attention to any plugins that use stream filters or 
request filters.  These are called for every request and more often than 
not do some thing that is time intensive.  Additionally, IMO, a request 
handlers match_request should never do anything beyond 
req.path_info.startswith(/mypath).  Many match_request implementations 
do a lot more, which gets executed for every request.

=== wysiwyg plugin ===

this adds a large js file to *every* request whether useful or not.  I 
removed the large script from the request filter, then I changed the 
load script to check and see if there is a textarea element in the dom, 
then load the larger script if necessary.



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Trac-dev] my various performance tweaks to trac

Reply via email to