I know that you guys still have some ongoing discussion about the long
range design for tracking handles, but I have another item about the
current implementation that might be of interest.
Most of the remaining startup performance problem (after Sam's
optimization patches) appears to be a result of how the db is ordered.
If I modify the attr db's comparison function so that it has a "<"
rather than ">", then all of the preads during startup go in order
through the db rather than backwards. This takes the startup time on a
cold db down to just 34 seconds. Previously it was 2 minutes 22 seconds.
It still could be faster, but that seems to be the biggest part of the
time. I imagine the rest of it is just the access size (4 KB at a time)
that might be tunable through some berkeley db settings.
The downside of making that particular change to the comparison method
is that it breaks storage space compatibility.
I wonder if it might be possible to accomplish the same thing in the
current db format by modifying iterate_handles() to just run the cursor
backwards (using DB_PREV instead of DB_NEXT)? That wouldn't hurt
storage space compability (if it works), but I don't know if it makes
any difference to callers of that function what order the handles come
out in.
-Phil
Phil Carns wrote:
Phil Carns wrote:
Yeah that is odd. Setting the cursor for each call to
iterate_handles may be the reason for it starting over. Do you know
how many times it starts over? The number of times iterate_handles
is called will be (# of files / 4096).
It only goes through the file twice if I am looking at the log
correctly. Also, I just realized that on both passes (the one jumping
backwards 40KB at a time and the one jumping backwards 4KB at a time)
it is only reading 4KB per pread. I don't know what it is doing from
a db point of view, but from an access point of view it looks like it
goes backwards with a strided pattern and then goes backwards reading
the entire thing. There are some other reads scattered here and
there, but those two cycles represent the overwhelming majority of the
total preads in the strace file. By spot checking I don't really see
any significant divergence from the patterns.
It also just occurred to me that maybe I should repeat the strace and
try to capture it with timestamps; I'm not really sure if both of
these pread cycles are actually during the scan or not.
I just double checked- both of those big pread cycles are happening
after this message is logged:
[D 13:06:53.916769] dbpf collection 752900094 - Setting collection
handle ranges to 4-536870914,4294967292-4831838202
... but before the next message. So they do appear to both be a result
of the handle scanning on startup.
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers