As Trevor cautioned, you run the risk of really impacting the other
applications running on the system and the system itself. As you fill
up the Windows cache manager with your file, you will push out other
relevant data including data that the OS may be using. This will cause
applications to run slower until the data is reloaded and new
applications to start slower. The cache manager will drop cached data
as it loads your file. Assuming that the cache manager is not already
using memory at the limit allowed, it will ask the memory manager for
more pages which will cause the memory manager to (on an LRU basis) drop
pages that are backed by a file via a mapping and write pages to the
paging file in cases where the pages are not backed by a map. These
writes (if they happen) are expensive in a relative sense and will now
be mixed with your reads. If the OS is not under stress and has
sufficient resources, you probably won't see any of this.
Given that you can't write your own VFS, there is not much to suggest,
but one question to ask. Is the sum of the time for the pre-read you
perform and the subsequent database operation(s) smaller than doing the
database operation(s) without the pre-read? I see that in the 3.5.x
source Dr. Hipp gives the file system the proper random access hint to
the file system when opening databases. This is just a hint to the
cache manager and it is not obligated to honor it, but it will
effectively shut down most read ahead and large block reads which is
what you are getting when you sequentially pre-read.
One more thing, did raising the limit on the number of pages SQLITE can
cache internally have any effect?
Trevor Talbot wrote:
On 11/8/07, Julien Renggli <[EMAIL PROTECTED]> wrote:
As I said, the "nasty" trick works and we'll live with it (writing our
own FileSystem is not an option). We would just like to understand
better what's happening, to know why and when the "first run" is so much
slower. And should we read the whole file beforehand, only parts of it?
Does it depend on how much RAM is installed, ...? If you have any hints
they are welcome. I guess I should ask NTFS experts as well.
What you've discovered is pretty accurate: when the delay is due to
disk I/O, pre-reading the file will load it into the OS's disk cache,
so subsequent accesses are faster. It will depend on available RAM,
not only what is physically installed but also memory pressure from
other running applications, other disk I/O in progress, various OS
settings, etc. There isn't any way to accurately predict it.
Reading the entire file when the OS is unwilling to cache all of it
will simply result in only part of the file being cached. It may also
cause other data to be pushed into the pagefile, slowing down
applications when they later access their own stale data. In the
worst case, it could slow everything down for a short time.
Reading the entire file from start to finish is pretty much the only
effective way to pull it into cache. An ifstream is not the most
efficient way to do that, since it does its own buffering, but that's
not important as far as the disk caching effects are concerned.
VACUUM removes internal fragmentation and writes rows in order, which
helps make disk I/O more sequential. You may be able to one-up it by
inserting data in the order you intend to access it (but create any
indexes afterward). Beyond that, I'm not aware of anything that would
help.
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------