On Tue, Apr 20, 2010 at 10:42 AM, Erik Ask erik...@maths.lth.se wrote:
Tobias Ivarsson wrote:
The speedup you are seeing is because of caching. Items that are used are
loaded into an in-memory structure, that does not need to go through any
filesystem API, memory-mapped or not. The best way to load things into cache
is to run the query once to touch everything that needs to be loaded.
Pre-adapting the memory-maps as you suggest would give some speedup to the
actual process of the first query, but that time would be spent in startup
instead, meaning that the time from cold start to completed first query
would be exactly the same.
Cheers,
Tobias
On Mon, Apr 19, 2010 at 6:31 PM, Erik Ask ask.e...@gmail.com wrote:
Hello
I'm getting really slow performance when working against the HD. A
given set of queries can take up to 10 minutes when performed the
first time. Repeating the same set of queries a second time is
executed in seconds (2-5). As far as I can tell from watching in
jconsole, the heap behaves in almost the exact same maner (slowly
rising slope) for both transactions (each set of queries has it own
transaction) so it seems the speedup is due to memory mapping. I've
tinkered with the settings, but is there a way of explicitly forcing
the IO mapper to preload all or part of the node store and
relationship store? Am I right to assume that initially nothing is IO
mapped and these buffers builds up during runtime as requests are
made? Is there any way of tuning access to the HD?
greetz
Then i don't understand the purpose of loading files in to memory. I
thought it was used to make a copy of as much of a file as possible into
memory, then do all subsequent lookups there, and if needed replace
parts if nonloaded parts of the file are more frequently requested than
loaded. This would result in one hd-read per node/rel (assuming it fit
into memory and no replacing was needed), as opposed to searching for
entries in file that would require lots of reads and comparisons. The
amount of data that needs to be loaded into memory just doesn't seem to
warrant that much time being spent. I could easily copy files several
times the size of my complete DB in less time than it takes to run my
query sets.
Hi,
Tobias is right about the caching part but there are issues with
memory mapped I/O in play here too. If you turn off memory mapped I/O
and use normal buffers (use_memory_mapped_buffers=false) you will
probably see a speedup in initial query time. This is because using
memory mapped I/O will result in lots of seeks since most
OS/configurations implement them in such a (maybe not ideal) way. Non
memory mapped buffers will do sequential reads to fill the entire
buffer and (depending on how much of the graph the first search
touches) it will likely be faster.
To explain further, if you request to map a region of a file into
memory it will do so and return almost instantly. The contents of the
file is however not loaded into memory, instead it will do lazy loads
when you start to read bytes from the buffer resulting in more random
I/O and seeks. This in turn results in slow searches and long warmup
time on mechanical disks. (Note, behavior I described here may vary
depending on OS, JVM implementation and so on.)
You are right about the purpose of loading regions of files into
memory so we don't have to do lookups on disk (and dynamically change
those regions depending on access patterns). The problem is that
initial access patterns when nothing has been loaded yet will look
random. Then to further kill performance memory mapped regions will
not do a sequential read of the data (this is very bad in your
scenario but is better when the server is warm).
A work around for this is to pre-fill the OS file-system caches before
you start searching. Write a script that sequentially reads the node,
relationship (and property store file if your searches access
properties). That will cause the memory mapped regions to map against
the file-system cache and then the contents of the file will already
be in memory.
Regards,
Johan
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user