Re: [Neo] force preloading into memory

2010-04-20 Thread Johan Svensson
On Tue, Apr 20, 2010 at 10:42 AM, Erik Ask erik...@maths.lth.se wrote:
 Tobias Ivarsson wrote:
 The speedup you are seeing is because of caching. Items that are used are
 loaded into an in-memory structure, that does not need to go through any
 filesystem API, memory-mapped or not. The best way to load things into cache
 is to run the query once to touch everything that needs to be loaded.

 Pre-adapting the memory-maps as you suggest would give some speedup to the
 actual process of the first query, but that time would be spent in startup
 instead, meaning that the time from cold start to completed first query
 would be exactly the same.

 Cheers,
 Tobias

 On Mon, Apr 19, 2010 at 6:31 PM, Erik Ask ask.e...@gmail.com wrote:


 Hello

 I'm getting really slow performance when working against the HD. A
 given set of queries can take up to 10 minutes when performed the
 first time. Repeating the same set of queries a second time is
 executed in seconds (2-5). As far as I can tell from watching in
 jconsole, the heap behaves in almost the exact same maner (slowly
 rising slope) for both transactions (each set of queries has it own
 transaction) so it seems the speedup is due to memory mapping. I've
 tinkered with the settings, but is there a way of explicitly forcing
 the IO mapper to preload all or part of the node store and
 relationship store? Am I right to assume that initially nothing is IO
 mapped and these buffers builds up during runtime as requests are
 made? Is there any way of tuning access to the HD?

 greetz




 Then i don't understand the purpose of loading files in to memory. I
 thought it was used to make a copy of as much of a file as possible into
 memory, then do all subsequent lookups there, and if needed replace
 parts if nonloaded parts of the file are more frequently requested than
 loaded. This would result in one hd-read per node/rel (assuming it fit
 into memory and no replacing was needed), as opposed to searching for
 entries in file that would require lots of reads and comparisons. The
 amount of data that needs to be loaded into memory just doesn't seem to
 warrant that much time being spent. I could easily copy files several
 times the size of my complete DB in less time than it takes to run my
 query sets.

Hi,

Tobias is right about the caching part but there are issues with
memory mapped I/O in play here too. If you turn off memory mapped I/O
and use normal buffers (use_memory_mapped_buffers=false) you will
probably see a speedup in initial query time. This is because using
memory mapped I/O will result in lots of seeks since most
OS/configurations implement them in such a (maybe not ideal) way. Non
memory mapped buffers will do sequential reads to fill the entire
buffer and (depending on how much of the graph the first search
touches) it will likely be faster.

To explain further, if you request to map a region of a file into
memory it will do so and return almost instantly. The contents of the
file is however not loaded into memory, instead it will do lazy loads
when you start to read bytes from the buffer resulting in more random
I/O and seeks. This in turn results in slow searches and long warmup
time on mechanical disks. (Note, behavior I described here may vary
depending on OS, JVM implementation and so on.)

You are right about the purpose of loading regions of files into
memory so we don't have to do lookups on disk (and dynamically change
those regions depending on access patterns). The problem is that
initial access patterns when nothing has been loaded yet will look
random. Then to further kill performance memory mapped regions will
not do a sequential read of the data (this is very bad in your
scenario but is better when the server is warm).

A work around for this is to pre-fill the OS file-system caches before
you start searching. Write a script that sequentially reads the node,
relationship (and property store file if your searches access
properties). That will cause the memory mapped regions to map against
the file-system cache and then the contents of the file will already
be in memory.

Regards,
Johan
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] force preloading into memory

2010-04-19 Thread Erik Ask
Hello

I'm getting really slow performance when working against the HD. A
given set of queries can take up to 10 minutes when performed the
first time. Repeating the same set of queries a second time is
executed in seconds (2-5). As far as I can tell from watching in
jconsole, the heap behaves in almost the exact same maner (slowly
rising slope) for both transactions (each set of queries has it own
transaction) so it seems the speedup is due to memory mapping. I've
tinkered with the settings, but is there a way of explicitly forcing
the IO mapper to preload all or part of the node store and
relationship store? Am I right to assume that initially nothing is IO
mapped and these buffers builds up during runtime as requests are
made? Is there any way of tuning access to the HD?

greetz
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user