Re: [Pvfs2-developers] server crash on startup with millions of files

Sam Lang Fri, 23 Feb 2007 08:03:37 -0800


On Feb 22, 2007, at 5:54 PM, Sam Lang wrote:


....

I'm not sure its time to replace the ledger code. It seems to workok, and to fix the slowness you're seeing would mean switching tosome kind of range tree that could be serialized to disk so that wewouldn't have to iterate through the entire dspace db on startup.That opens up the possibility of the dspace db and the ledger-on-disk getting out of sync, which I'd rather avoid.
We could hand out new handles by choosing one randomly, and thenchecking if its in the DB, getting rid of the need for a ledgerentirely, but I assume this idea was already scratched to avoid thepotential costs at creation time, especially as the filesystem grows.

Actually as I think about this some more, maybe its worthconsidering. Right now genconfig only uses the first 2^32 handles,dividing them up equally amongst the number of servers. That'sobviously not anywhere near the possible limit. If genconfigallocated even half of the 2^64 handles to the servers, that wouldreally decrease the likelihood of selecting an already used handle atrandom, even for a filesystem with millions of files.

Also, the ledger could still be used to keep track of the handlesthat are created during the lifetime of that particular serverprocess, as well as the ones that already exist if a randomly chosenhandle gets a hit. If genconfig allocates over the 1 - 2^63 range,with 64 servers the chance of randomly picking an already used handleis 1 in 2^56. With 16 million files its still 1 in 4 billion.

The interfaces do allow the client to specify the specific handle ora range of handles when doing the create, but we always just get therange directly from the config file. Are there use cases out therewhere more limited ranges (or specific handles) are requested by theclient?


-sam

-sam

<mult.patch>

<server-start.patch>

On Feb 20, 2007, at 11:23 AM, Phil Carns wrote:
Robert Latham wrote:
On Tue, Feb 20, 2007 at 07:29:16AM -0500, Phil Carns wrote:
Oh, and one other detail; the memory usage of the servers looksfine during startup, so this doesn't appear to be a memoryleak. There is quite a bit of CPU work, but I am guessing thatis just berkeley db keeping busy in the iteration function.
How long does it take to scan 1.4 million files on startup?
==rob
That's an interesting issue :)

A few observations:
- we were looking at this on SAN; the results may be different onlocal disks
- the db files are on the order of 500 MB for this particular setup
- the time to scan varies depending on if the db files are hot inthe Linux buffer cache
If we start the daemon right after killing another one that justdid the same scan, then the process is CPU intensive, but fast(about 5 seconds). If we unmount/mount the SAN between the tworuns so that the buffer cache is cleared, then it is very slow(about 5 minutes).
An interesting trick is to use dd with a healthy buffer size toread the .db files and throw the output into /dev/null beforestarting the servers. This only takes a few seconds, and makes itso that the scan consistently finishes in just a few seconds aswell. I think the reason is just that it forces the db data intothe Linux buffer cache using an efficient access pattern so thatberkeley db doesn't have to wait on disk latency for whateversmall accesses it is performing.
This seems to indicate that berkeley db's access pattern generatedby PVFS2 for this case isn't very friendly, at least to SANs thataren't specifically tuned for it.
The 5 minute scan time is a problem, because it makes it hard totell when you will actually be able to mount the file system afterthe daemons appear to have started. We would be happy to try outany optimizations here :)
-Phil

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] server crash on startup with millions of files

Reply via email to