Wow, good work Jim.
A bit of perspective:
50000 active sessions with an average of 15 mins duration per session means
roughly 4,800,000 requests a day or 55 requests per second.
Jim Gallacher wrote:
Hello All,
In testing the new req.get_session() method for the upcoming 3.2.0
release I've noticed that the performance of FileSession degrades badly
as number of session files rises. Assuming this is related to putting a
large number of files in a single directory, I decided to do some
benchmarking outside of mod_python to investigate further.
I tested 4 different persistent stores on a linux system (kernel
2.6.9, ext3 file system). A simple dict was used to represent a typical
session object. The session id is generated using the same method found
in mod_python/Session.py, but adapted for use outside of mod_python.
Tests
=====
Dbm
---
Pickle saved in a dbhash (Berkley DB) table. The db table is opened and
closed for each record saved. This is the same as the current
implementation of DbmSession.
mysql
-----
Pickle saved as a blob in a mysql table. A new connection is
opened and closed for each record saved.
mysql2
------
Same as mysql but only one connection is opened for the complete test.
Gives an indication of the overhead resulting from opening/closing 1000
connections.
FS
--
Pickle saved as a file in a single directory. This is the same as the
current implementation of FileSession.
FS2 no_sync
-----------
Pickle saved as a file in a directory derived from the session id.
There are 256 possible subdirectories.
eg.
sess/00/00fe9c4b32bcb01b60a61cc674aa0ac9
sess/ab/abff1785c78200baaa59e683da4038dd
FS2 sync
--------
Same as FS2 no_sync, but /bin/sync was called after 1000 files were
created to flush the OS write buffer.
Discussion
==========
For comparison, apache on the system used for these tests can serve
plain text files at 1650 req/second, or mod_python req.send_file()
requests at 850 req/second. Reading and writing the session object is a
possible limiting factor. By examining the results below we can estimate
the upper bound that may be imposed when session handling is enabled and
the impact of a give session storage mechanism.
FS
--
The FS store shows O(n) performance. For a small number of session files
the performace is good. For the first 1000 session files our upper bound
would be 2857 requests/second. There will be little impact on
performance since this is faster than apache can serve a page.
At 50,000 files the *best* you could expect within mod_python is 72
requests / per second. This will obviously have a severe impact on
performance. The current FileSession implementation scales poorly and
needs to be re-written.
FS2
---
Using the multiple directory layout gives a significant performance
improvement over a single directory as the number of session files
increases. For 50000 session files it looks like O(1) behaviour. (This
is not strictly true as the behaviour is really O(n), but this does not
become significant until the number of session files rises beyond 5x10**5).
Using this storage scheme imposes an upper bound of 2000 requests/second
with the following caveat. The write buffers of the underlying OS need
to be flushed to disk. The time variations seen in the table (FS2 run 1,
run 2) are a result of this syncronization at random intervals, which
imparts a significant performance penalty. See row 47000 for a really
agregious example - 22 seconds for 1000 files. The FS2-sync test shows
that the storage scheme scales well for a large number of files, but
since we can't control file syncing in a production environment there
will be a performance penalty at high request levels.
FileSession with the FS2 directory layout will give the fastest response
time, but only when the server is under a light to medium load. What
constitutes a medium load will depend on the underlying OS and it's IO
subsystem.
Dbm
---
The dbhash table shows O(n) behaviour and does not scale well for a
larger number of session files, although it is better than the current
FileSession. The benchmarks used in this study did not do any file
locking, but in the actual DbmSession implementation only one
process/thread is allowed access to the dbm file at a time which will
have a negative impact on performance. Likewise, there are no obvious
optimizations for removing expired sessions, which is another potential
bottleneck.
The best case for Dbm is 606 req/second at the 1000 file mark, and 122
req/second at the 50,000 file mark.
MySQL
-----
The data for a mysql backend suggests O(1) behaviour. It may not be
obvious from the table, but there is a cost when the mysql db flushes
it's records to disk. For the number of records inserted in this test
the time for syncing is small and consistent. The sync time looks like
O(n) and may become a consideration as the number of sessions increases
beyond 100,000 but this will not be discussed further here.
For the number of sessions under consideration, we might expect an upper
bound of 660 requests/second. As the number of sessions rises above
100,000 the performance will decrease due to the amount of time required
for flushing the db buffers.
A session class using a SQL DB backend may be the best compromise
between speed and consistent response times for serving a large number
of sessions under a heavy load.
MySQL - 2
---------
A further performance boost can be achieved if the hypothetical
MySqlSession class can borrow a connection object from the user's code.
For example, if the code is already opening a db connection, it could
pass the connection object to the MySqlSession constructor and thus
avoid the cost of opening a new connection.
The upper bound for this case is approx 1700 requests/second. Note that
the syncronization behaviour is more obvious here - see the times for
20000, 29000, 37000, and 45000 records.
---------+---------------------------------------------------------+
Existing | Time in seconds to Add Additional 1000 Records |
Session +--------+-------+-------+-------+-------+-------+--------+
Records | FS | FS2 | FS2 | FS2 | Dbm | mysql | mysql2 |
| |no_sync|no_sync| sync | | | |
| | run 1 | run 2 | | | | |
---------+--------+-------+-------+-------+-------+-------+--------+
0 | 0.35 0.36 4.25 0.53 1.65 1.70 0.47 |
1000 | 0.59 0.40 3.20 0.72 1.64 1.33 0.53 |
2000 | 0.77 0.40 2.34 0.91 1.73 1.33 0.51 |
3000 | 0.98 0.40 3.52 0.71 2.43 1.36 0.52 |
4000 | 1.19 0.40 0.85 0.56 2.38 1.33 0.51 |
5000 | 1.38 0.40 0.59 0.49 2.52 1.33 0.52 |
6000 | 1.64 0.41 4.46 0.56 3.52 1.34 0.52 |
7000 | 1.96 0.41 2.06 0.49 4.04 1.37 0.52 |
8000 | 2.36 0.42 1.55 0.67 3.84 1.35 0.52 |
9000 | 2.68 0.42 0.83 0.57 4.62 1.35 0.52 |
10000 | 3.02 0.43 0.71 0.63 4.14 1.38 0.52 |
11000 | 3.35 0.42 5.58 0.54 4.42 1.36 0.55 |
12000 | 3.71 0.43 4.27 0.41 5.16 1.35 0.52 |
13000 | 4.01 0.38 1.47 0.37 5.39 1.35 0.52 |
14000 | 4.38 0.42 1.19 0.49 5.44 1.38 0.53 |
15000 | 4.73 1.05 2.82 0.52 5.29 1.35 0.52 |
16000 | 5.09 0.42 0.90 0.41 4.71 1.36 0.52 |
17000 | 5.31 0.44 1.16 0.37 5.65 1.35 0.52 |
18000 | 5.63 0.42 1.54 0.34 5.00 1.38 0.52 |
19000 | 5.87 0.42 0.80 0.45 5.28 1.36 0.52 |
20000 | 6.12 0.43 1.99 0.44 5.13 1.35 0.89 |
21000 | 6.44 0.43 1.66 0.45 5.67 1.50 0.53 |
22000 | 6.65 0.43 2.06 0.43 6.16 1.35 0.52 |
23000 | 6.92 0.43 1.89 0.41 5.90 1.37 0.53 |
24000 | 7.24 0.45 1.26 0.39 6.12 1.35 0.52 |
25000 | 7.38 0.43 1.52 0.49 6.10 1.66 0.53 |
26000 | 7.81 1.22 1.02 0.48 5.83 1.36 0.52 |
27000 | 7.92 0.44 1.28 0.39 6.28 1.36 0.53 |
28000 | 8.25 0.44 1.23 0.50 7.55 1.73 0.52 |
29000 | 8.51 0.44 0.73 0.34 7.43 1.36 1.16 |
30000 | 8.75 0.44 1.32 0.41 6.79 1.36 0.53 |
31000 | 9.04 0.45 1.13 0.36 6.84 1.36 0.52 |
32000 | 9.21 0.43 1.87 0.38 6.88 1.91 0.53 |
33000 | 9.41 0.39 1.06 0.42 6.28 1.36 0.53 |
34000 | 9.73 0.48 0.83 0.36 6.42 1.35 0.53 |
35000 | 9.99 1.01 1.03 0.41 7.26 1.92 0.53 |
36000 | 10.26 0.46 1.19 0.42 6.97 1.36 0.53 |
37000 | 10.61 1.38 1.18 0.45 6.81 1.36 1.29 |
38000 | 10.84 0.45 1.62 0.43 6.64 2.08 0.53 |
39000 | 11.11 6.35 2.60 0.45 6.78 1.36 0.53 |
40000 | 11.37 1.93 1.08 0.40 7.54 1.36 0.53 |
41000 | 11.64 1.24 1.98 0.37 6.30 2.18 0.53 |
42000 | 11.85 0.77 1.72 0.40 7.13 1.36 0.53 |
43000 | 12.18 0.48 1.18 0.43 7.00 1.36 0.53 |
44000 | 12.44 0.45 0.69 0.35 6.89 1.36 0.53 |
45000 | 12.72 3.26 1.98 0.40 7.74 2.17 1.58 |
46000 | 12.84 1.80 1.29 0.40 8.30 1.36 0.53 |
47000 | 13.16 4.09 22.10 0.36 7.15 1.36 0.53 |
48000 | 13.47 1.27 1.66 0.35 8.01 2.16 0.52 |
49000 | 13.75 0.54 0.95 0.37 8.22 1.36 0.53 |
---------+--------+-------+-------+-------+-------+-------+--------+
If anyone want to plays with my benchmark scripts let me know and I'll
post them.
Regards,
Jim
--
dharana