cache-cold repository performance

Ingo Molnar Thu, 14 Apr 2005 03:51:46 -0700

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> i'd be surprised if it was twice as fast - cache-cold linear checkouts 
> are _seek_ limited, and it doesnt matter whether after a 1-2 msec 
> track-to-track disk seek the DMA engine spends another 30 microseconds 
> DMA-ing 60K uncompressed data instead of 30K compressed... (there are 
> other factors, but this is the main thing.)


i've benchmarked cache-cold compressed vs. uncompressed performance, to 
shed some more light on the performance differences between flat and 
compressed repositories.

i did alot of testing, and i primarily concentrated on being able to 
_trust_ the benchmark results, not to generate some quick numbers. The 
major problem was that the timing of the reads associated with 'checking 
out a large tree' is very unstable, even on a completely isolated 
testsystem with very common (and predictable) IO hardware.

the content i tested was a vanilla 2.6.10 kernel tree, with 19042 files 
in it, taking 246 MB uncompressed, and 110 MB compressed (via gzip -9).  
Average file size is 13.2 KB uncompressed, 5.9 KB compressed.

Firstly, the timings are very sensitive to the way the tree was created.  
To have a 'fair' on-disk layout the trees have to be created in an 
identical fashion: e.g. it is not valid to copy the uncompressed tree 
and run gzip over it - that will create a 'sparse' on-disk layout 
penalizing the compressed layout and making it 30% slower than the 
uncompressed layout! I first created the two trees, then i "cp -a"-ed 
them over into a new directory one after each other, so that they get on 
similar on-disk positions as well. I also created 2 more pairs of such 
trees to make sure disk layout is fair.

all timings were taken fresh after reboot, on a UP 1 GB RAM Athlon64 
3200+, using a large, top of the line IDE disk. The kernel was 
2.6.12-rc2, the filesystem was ext3 with enough free space to not be 
fragmented, both noatime and nodiratime was specified so that no write 
activities whatever occur during the 'checkout'.

the operation timed was a simple:

        time find . -type f | xargs cat > /dev/null

done in the root of the given tree. This generates the very same 
readonly IO pattern for each test. I've run the tests 10 times (i.e.  
have done 10 fresh reboots), but after every reboot i permutated the 
order of trees tested - to make sure there is no interaction between 
trees. (there was no interaction)

here are the raw numbers, elapsed real time in seconds:

 flat-1:  29.7 29.5 29.4 29.4 29.5 29.5 29.7 29.6 29.4 29.6 29.5 29.4:  29.5
 gzip-1:  41.2 40.9 40.7 40.7 40.5 41.7 41.0 40.3 40.6 40.8 40.8 40.9:  40.8

 flat-2:  28.0 28.2 27.7 27.9 27.8 27.9 27.7 27.9 27.9 28.1 27.9 28.0:  27.9
 gzip-2:  27.2 27.4 27.4 27.2 27.2 27.2 27.2 27.2 27.1 27.3 27.2 27.4:  27.2
 flat-3:  27.0 27.8 27.6 27.7 27.8 27.8 27.8 27.7 27.8 27.6 27.8 27.8:  27.6
 gzip-3:  25.8 26.8 26.6 26.5 26.5 26.5 26.6 26.4 26.5 26.7 26.6 26.7:  26.5

The final column is the average. (Standard deviation is below 0.1 sec, 
less than 0.3%.)

flat-1 is the original tree, created via tar. gzip-1 is a cp -a copy of 
it, per-file compressed afterwards. flat-2 is a cp -a copy of flat-1, 
gzip-2 is a cp -a copy of gzip-1. flat-3/gzip-3 are cp -a copies of 
flat-2/gzip-2.

note that gzip-1 is ~40% slower due to the 'sparse layout', so its 
results approximate a repository with 'bad' file layout. I'd not expect 
GIT repositories to have such a layout normally, so we can disregard it.

flat-2/3 and gzip-2/3 can be directly compared. Firstly, the results 
show that the on-disk layout cannot be constructed reliably - there's a 
1% systematic difference between flat-2 and flat-3, and a 3% systematic 
difference between gzip-2 and gzip-3 - both systematic errors are larger 
than the 0.5% standard deviation, so they are not measurement errors but 
real layout properties of these trees.

the most interesting result is that gzip-2 is 2.5% faster than flat-2, 
and gzip-3 is 4% faster than flat-3. These differences are close to the 
layout-related systematic error, but slightly above it, so i'd conclude 
that a compressed repository is 3% faster on this hardware.

(since these results were in line with my expectations i double-checked 
everything again and did another 10 reboot tests - same results.)

conclusion [*]: there's a negligible cache-cold performance hit from 
using an uncompressed repository, because cache-cold performance is 
dominated by number of seeks, which is almost identical in the two 
cases.

        Ingo

[*] lots of conditionals apply: these werent flat/compressed GIT 
repositories (although they were quite similar to it), nor was the GIT 
workload measured (although the one measured should be quite close to 
it).

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cache-cold repository performance

Reply via email to