Re: [Neo4j] MMap Error on importing large data

Craig Taverner Sun, 27 Feb 2011 09:34:04 -0800

>
> What about the "IOException Operation not permitted" ?
> Can you check the access rights on your store?
>


They look fine (644 and 755). Also, it would seem strange for the access
rights to change in the middle of a run. The database is being written to
continuously for  about 5 hours successfully before this error. I also note
that I have 20GB free space, so running out of disk space seems unlikely.
Having said that, I will do another run with a parallel check for disk space
also.

While googling I saw that you had a similar problem in November, that Johan
> answered.
> From the answer it seems that the kernel adapts its memory usage and
> segmentation from the store size.
> So as the store size before the import was zero, probably some of the
> adjustments that normally
> take place for such a large store won't be done.
>

I create both the batch inserter and the graph database service with a set
configuration, as in the top of the file at
https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/Neo4jTestCase.java

So your suggestion to run the batch insert in a first VM run and the API
> work in a second one makes a lot of
> sense to me, because the kernel is then able to optimize memory usage at
> startup (if you didn't supply a config file).
>

I will try that tomorrow perhaps. I would need to extract the test code to a
place I can use from a console app first. But I noticed also that Mattias
thought that two JVM's would not help.

Regarding the test-issue. I would really love to have this code elsewhere
> and just used in the tests, then it could be used
> by other people too and that would it perhaps also easier to reproduce your
> problem just with the data file.
>

I can do that. I'm short of time right now, but will see if I can get to
that soon. Should be relatively simple to extract to the OSMDataset, so
other users can call it. Basically the code traverses both the GIS (layers)
views of the OSM data model, and the OSM view (ways, nodes, changesets,
users) and produces some statistics on what is found. Could be generally
interesting. The one messy part is the code also makes a number of
assertions for expected patterns, and this only makes sense in the JUnit
test. I would need to save the stats to a map, return that to the junit code
so it can make the assertions later.

Can you point me to the data file used and attach the test case that you
> probably modified locally? Then I'd try this at my machine.
>

I've just pushed the code to github. The test class is the TestOSMImport.
Currently it skips a test if the test data is missing, and there is only
data for two specific test cases in the code base (Billesholm and Malmö). To
get it to run the big tests, simply download denmark.osm and/or croatia.osm
from downloads.cloudmade.com. At the moment croatia.osm imports fine, at
reasonable performance, but denmark.osm is the one giving the problems.

Looks like the memory mapped buffer configuration needs to be tweaked.
>

>From Johans previous answer, combined with something I read on the wiki, it
seems that the batch inserter needs different mmap settings than the normal
API. I read that the batch inserter uses the heap for its mmap, while the
normal API does not. If I understand correctly, this means that when using
the batch inserter, we have to use smaller mmap, otherwise we might fill the
heap too soon?

In any case, it seems like keeping mmap settings relatively small should
avoid this problem, although might not lead to best performance? Have I
understood correctly?

On Windows heap buffers are used by default and auto configuration
> will look how much heap is available. Getting out of memory exceptions
> is an indication that the configuration passed in is using more memory
> than available heap.
>

I am currently using -Xmx2048 on a 4GB ram machine, 32bit java, and the
settings:

    static {
        NORMAL_CONFIG.put( "neostore.nodestore.db.mapped_memory", "50M" );
        NORMAL_CONFIG.put(
"neostore.relationshipstore.db.mapped_memory", "150M" );
        NORMAL_CONFIG.put( "neostore.propertystore.db.mapped_memory", "200M" );
        NORMAL_CONFIG.put(
"neostore.propertystore.db.strings.mapped_memory", "300M" );
        NORMAL_CONFIG.put(
"neostore.propertystore.db.arrays.mapped_memory", "10M" );
        NORMAL_CONFIG.put( "dump_configuration", "false" );
    }


These settings do not seem to be too high, but if the normal graph database
service will allocate memory outside the heap, and the heap has already been
filled by the batch inserter, perhaps that is where the problem lies?
Perhaps we do need a way of freeing memory more aggressively after the batch
insertion phase? Get the heap down before allowing the normal API access to
the memory?

Regards, Craig
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] MMap Error on importing large data

Reply via email to