Re: [Neo4j] BatchInserter improvement with 1.4M04 but still got relationship building bottleneck [was Re: Speeding up initial import of graph...]
Paul, I tried to use your implementation, it added quite a high performance penalty, when profiling it, it consumed 3/4 of the time for the relationship creation. The other time was spent in acquiring persistence windows for the node-store. I also profiled the version using the simple map. We found one issue (IdGenerator) that could benefit from the assumption that BatchInsertion runs single threaded. Lots of time was spent in acquiring persistence-windows for nodes when creating the relationships. My results with no batch-buffer configuration were not so bad initially but I sped them up tremendously by increasing the memory_buffer size for the node-store (so that i more often finds the persistence-window in the cache and doesn't have to look it up.) so, my configuration looks like this: neostore.nodestore.db.mapped_memory,250M, neostore.relationshipstore.db.mapped_memory,200M, // that could be also lower 50M-100M neostore.propertystore.db.mapped_memory,50M, neostore.propertystore.db.strings.mapped_memory,50M, neostore.propertystore.db.arrays.mapped_memory,0M Then I get the following (with the faster map-cache): Physical mem: 16384MB, Heap size: 3055MB use_memory_mapped_buffers=false neostore.propertystore.db.index.keys.mapped_memory=1M neostore.propertystore.db.strings.mapped_memory=50M neostore.propertystore.db.arrays.mapped_memory=0M neo_store=/Users/mh/java/neo/import/target/hepper/neostore neostore.relationshipstore.db.mapped_memory=200M neostore.propertystore.db.index.mapped_memory=1M neostore.propertystore.db.mapped_memory=50M dump_configuration=true cache_type=weak neostore.nodestore.db.mapped_memory=250M 100 nodes created. Took 2358 200 nodes created. Took 2090 300 nodes created. Took 2082 400 nodes created. Took 2054 500 nodes created. Took 2245 600 nodes created. Took 2100 700 nodes created. Took 2117 800 nodes created. Took 2076 900 nodes created. Took 2391 1000 nodes created. Took 2214 Creating nodes took 21 100 relationships created. Took 4302 200 relationships created. Took 4176 300 relationships created. Took 4260 400 relationships created. Took 6448 500 relationships created. Took 7645 600 relationships created. Took 7290 700 relationships created. Took 8627 800 relationships created. Took 7907 900 relationships created. Took 8292 1000 relationships created. Took 8563 Creating relationships took 67 use_memory_mapped_buffers=false neostore.propertystore.db.index.keys.mapped_memory=1M neostore.propertystore.db.strings.mapped_memory=52M neostore.propertystore.db.arrays.mapped_memory=60M neo_store=N:\TradeModel\target\hepper\neostore neostore.relationshipstore.db.mapped_memory=76M neostore.propertystore.db.index.mapped_memory=1M neostore.propertystore.db.mapped_memory=62M dump_configuration=true cache_type=weak Cheers Michael Am 13.06.2011 um 14:58 schrieb Paul Bandler: can you share your test and the CompactIndex you wrote? That would be great. See below... Also the memory settings (Xmx) you used for the different runs. The heap size is displayed by neo4j is it not with console entries such as:- Physical mem: 1535MB, Heap size: 1016MB So that one came fro -Xmx1024M and Physical mem: 4096MB, Heap size: 2039MB came from -Xms2048M regards, Paul package com.xxx.neo4j.restore; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.DataInputStream; import java.io.DataOutputStream; public class NodeIdPair implements ComparableNodeIdPair { private long_node; private int _id; static final NodeIdPair _prototype = new NodeIdPair(Long.MAX_VALUE, Integer.MAX_VALUE); static Integer MY_SIZE= null; public static int size() { if (MY_SIZE == null) { MY_SIZE = (new NodeIdPair(Long.MAX_VALUE, Integer.MAX_VALUE)) .toByteArray().length; System.out.println(MY_SIZE: + MY_SIZE); } return MY_SIZE; } public NodeIdPair(long node, int id) { _node = node; _id = id; } public NodeIdPair(byte fromByteArray[]) { ByteArrayInputStream bais = new ByteArrayInputStream(fromByteArray); DataInputStream dis = new DataInputStream(bais); try { _node = dis.readLong(); _id = dis.readInt(); } catch (Exception e) { throw new Error(Unexpected exception. byte[] len + fromByteArray.length, e); } } byte[] toByteArray() { ByteArrayOutputStream bos = new ByteArrayOutputStream( MY_SIZE != null ? MY_SIZE : 12); DataOutputStream dos = new DataOutputStream(bos); try { dos.writeLong(_node);
[Neo4j] BatchInserter improvement with 1.4M04 but still got relationship building bottleneck [was Re: Speeding up initial import of graph...]
Having noticed a mention in the 1.4M04 release notes that: Also, the BatchInserterIndex now keeps its memory usage in-check with batched commits of indexed data using a configurable batch commit size. I re-ran this test using M04 and sure enough, node creation no longer eats up the heap linearly so that is good - I should be able to remove the periodic resetting of the BatchInserter during import. So I returned to the issue of removing the index creation and later access bottleneck using an application managed data structure as Michael illustrated, but needing a solution with a smaller memory footprint I wrote a CompactNodeIndex class for mapping integer 'id' key values to long nodes that uses a minimum memory footprint by overlaying a binary-choppable table onto a byte array. Watching heap on jconsole while this ran I could see that had the desired effect of releasing huge amounts of heap once it the CompactNodeIndex is loaded and the source data structure gc'd. However when I attempted to scale the test program back up to the 10M nodes Michael had been testing it appears to run into something of a brick wall becoming massively I/O bound when creating the relationships. With 1M nodes it ran ok, with 2M nodes not too bad, but much beyond that it crawls along using just about 1% of CPU but has loads of heap spare. I re-ran on a more generously configured iMac (giving the test 4G of heap) and it did much better in that it actually showed some progress building relationships over a 10M node-set, but still exhibited massive slow down once past 7M relationships. Below are the test results - the question now is are there any Neo4j parameters that might enable this I/O bottleneck that appears when building relationships over such sized node sets with the BatchInserter...? I note the section in the manual on performance parameters, but I'm afraid not being familiar enough with the Neo4j internals I don't feel that they give enough clear information on how to set them improve the performance of this use-case. Thanks, Paul Run 1 - Windows m/c..REPORT_COUNT = MILLION/10 Physical mem: 1535MB, Heap size: 1016MB use_memory_mapped_buffers=false neostore.propertystore.db.index.keys.mapped_memory=1M neostore.propertystore.db.strings.mapped_memory=52M neostore.propertystore.db.arrays.mapped_memory=60M neo_store=N:\TradeModel\target\hepper\neostore neostore.relationshipstore.db.mapped_memory=76M neostore.propertystore.db.index.mapped_memory=1M neostore.propertystore.db.mapped_memory=62M dump_configuration=true cache_type=weak neostore.nodestore.db.mapped_memory=17M 10 nodes created. Took 2906 20 nodes created. Took 2688 30 nodes created. Took 2828 40 nodes created. Took 2953 50 nodes created. Took 2672 60 nodes created. Took 2766 70 nodes created. Took 2687 80 nodes created. Took 2703 90 nodes created. Took 2719 100 nodes created. Took 2641 Creating nodes took 27 MY_SIZE: 12 CompactNodeIndex slot count: 100 10 relationships created. Took 4125 20 relationships created. Took 3953 30 relationships created. Took 3937 40 relationships created. Took 3610 50 relationships created. Took 3719 60 relationships created. Took 4328 70 relationships created. Took 3750 80 relationships created. Took 3609 90 relationships created. Took 4125 100 relationships created. Took 3781 110 relationships created. Took 4125 120 relationships created. Took 3750 130 relationships created. Took 3907 140 relationships created. Took 4297 150 relationships created. Took 3703 160 relationships created. Took 3687 170 relationships created. Took 4328 180 relationships created. Took 3907 190 relationships created. Took 3718 200 relationships created. Took 3891 Creating relationships took 78 2M Nodes on Windows m/c:- Creating data took 68 seconds Physical mem: 1535MB, Heap size: 1016MB use_memory_mapped_buffers=false neostore.propertystore.db.index.keys.mapped_memory=1M neostore.propertystore.db.strings.mapped_memory=52M neostore.propertystore.db.arrays.mapped_memory=60M neo_store=N:\TradeModel\target\hepper\neostore neostore.relationshipstore.db.mapped_memory=76M neostore.propertystore.db.index.mapped_memory=1M neostore.propertystore.db.mapped_memory=62M dump_configuration=true cache_type=weak neostore.nodestore.db.mapped_memory=17M 10 nodes created. Took 3188 20 nodes created. Took 3094 30 nodes created. Took 3062 40 nodes created. Took 2813 50 nodes created. Took 2718 60 nodes created. Took 3000 70 nodes created. Took 2938 80 nodes created. Took 2828 90 nodes created. Took 4172 100 nodes created. Took 2859 110 nodes created. Took 3625 120 nodes created. Took 3235 130 nodes created. Took 2781 140 nodes created. Took 2891 150 nodes created. Took 2922 160 nodes created. Took 2968 170 nodes created. Took 3438 180 nodes created.
Re: [Neo4j] BatchInserter improvement with 1.4M04 but still got relationship building bottleneck [was Re: Speeding up initial import of graph...]
Paul, can you share your test and the CompactIndex you wrote? That would be great. Also the memory settings (Xmx) you used for the different runs. Thanks so much Michael Am 13.06.2011 um 14:15 schrieb Paul Bandler: Having noticed a mention in the 1.4M04 release notes that: Also, the BatchInserterIndex now keeps its memory usage in-check with batched commits of indexed data using a configurable batch commit size. I re-ran this test using M04 and sure enough, node creation no longer eats up the heap linearly so that is good - I should be able to remove the periodic resetting of the BatchInserter during import. So I returned to the issue of removing the index creation and later access bottleneck using an application managed data structure as Michael illustrated, but needing a solution with a smaller memory footprint I wrote a CompactNodeIndex class for mapping integer 'id' key values to long nodes that uses a minimum memory footprint by overlaying a binary-choppable table onto a byte array. Watching heap on jconsole while this ran I could see that had the desired effect of releasing huge amounts of heap once it the CompactNodeIndex is loaded and the source data structure gc'd. However when I attempted to scale the test program back up to the 10M nodes Michael had been testing it appears to run into something of a brick wall becoming massively I/O bound when creating the relationships. With 1M nodes it ran ok, with 2M nodes not too bad, but much beyond that it crawls along using just about 1% of CPU but has loads of heap spare. I re-ran on a more generously configured iMac (giving the test 4G of heap) and it did much better in that it actually showed some progress building relationships over a 10M node-set, but still exhibited massive slow down once past 7M relationships. Below are the test results - the question now is are there any Neo4j parameters that might enable this I/O bottleneck that appears when building relationships over such sized node sets with the BatchInserter...? I note the section in the manual on performance parameters, but I'm afraid not being familiar enough with the Neo4j internals I don't feel that they give enough clear information on how to set them improve the performance of this use-case. Thanks, Paul Run 1 - Windows m/c..REPORT_COUNT = MILLION/10 Physical mem: 1535MB, Heap size: 1016MB use_memory_mapped_buffers=false neostore.propertystore.db.index.keys.mapped_memory=1M neostore.propertystore.db.strings.mapped_memory=52M neostore.propertystore.db.arrays.mapped_memory=60M neo_store=N:\TradeModel\target\hepper\neostore neostore.relationshipstore.db.mapped_memory=76M neostore.propertystore.db.index.mapped_memory=1M neostore.propertystore.db.mapped_memory=62M dump_configuration=true cache_type=weak neostore.nodestore.db.mapped_memory=17M 10 nodes created. Took 2906 20 nodes created. Took 2688 30 nodes created. Took 2828 40 nodes created. Took 2953 50 nodes created. Took 2672 60 nodes created. Took 2766 70 nodes created. Took 2687 80 nodes created. Took 2703 90 nodes created. Took 2719 100 nodes created. Took 2641 Creating nodes took 27 MY_SIZE: 12 CompactNodeIndex slot count: 100 10 relationships created. Took 4125 20 relationships created. Took 3953 30 relationships created. Took 3937 40 relationships created. Took 3610 50 relationships created. Took 3719 60 relationships created. Took 4328 70 relationships created. Took 3750 80 relationships created. Took 3609 90 relationships created. Took 4125 100 relationships created. Took 3781 110 relationships created. Took 4125 120 relationships created. Took 3750 130 relationships created. Took 3907 140 relationships created. Took 4297 150 relationships created. Took 3703 160 relationships created. Took 3687 170 relationships created. Took 4328 180 relationships created. Took 3907 190 relationships created. Took 3718 200 relationships created. Took 3891 Creating relationships took 78 2M Nodes on Windows m/c:- Creating data took 68 seconds Physical mem: 1535MB, Heap size: 1016MB use_memory_mapped_buffers=false neostore.propertystore.db.index.keys.mapped_memory=1M neostore.propertystore.db.strings.mapped_memory=52M neostore.propertystore.db.arrays.mapped_memory=60M neo_store=N:\TradeModel\target\hepper\neostore neostore.relationshipstore.db.mapped_memory=76M neostore.propertystore.db.index.mapped_memory=1M neostore.propertystore.db.mapped_memory=62M dump_configuration=true cache_type=weak neostore.nodestore.db.mapped_memory=17M 10 nodes created. Took 3188 20 nodes created. Took 3094 30 nodes created. Took 3062 40 nodes created. Took 2813 50 nodes created. Took 2718 60 nodes created. Took 3000 70 nodes created. Took 2938 80 nodes created.
Re: [Neo4j] BatchInserter improvement with 1.4M04 but still got relationship building bottleneck [was Re: Speeding up initial import of graph...]
can you share your test and the CompactIndex you wrote? That would be great. See below... Also the memory settings (Xmx) you used for the different runs. The heap size is displayed by neo4j is it not with console entries such as:- Physical mem: 1535MB, Heap size: 1016MB So that one came fro -Xmx1024M and Physical mem: 4096MB, Heap size: 2039MB came from -Xms2048M regards, Paul package com.xxx.neo4j.restore; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.DataInputStream; import java.io.DataOutputStream; public class NodeIdPair implements ComparableNodeIdPair { private long_node; private int _id; static final NodeIdPair _prototype = new NodeIdPair(Long.MAX_VALUE, Integer.MAX_VALUE); static Integer MY_SIZE= null; public static int size() { if (MY_SIZE == null) { MY_SIZE = (new NodeIdPair(Long.MAX_VALUE, Integer.MAX_VALUE)) .toByteArray().length; System.out.println(MY_SIZE: + MY_SIZE); } return MY_SIZE; } public NodeIdPair(long node, int id) { _node = node; _id = id; } public NodeIdPair(byte fromByteArray[]) { ByteArrayInputStream bais = new ByteArrayInputStream(fromByteArray); DataInputStream dis = new DataInputStream(bais); try { _node = dis.readLong(); _id = dis.readInt(); } catch (Exception e) { throw new Error(Unexpected exception. byte[] len + fromByteArray.length, e); } } byte[] toByteArray() { ByteArrayOutputStream bos = new ByteArrayOutputStream( MY_SIZE != null ? MY_SIZE : 12); DataOutputStream dos = new DataOutputStream(bos); try { dos.writeLong(_node); dos.writeInt(_id); dos.flush(); } catch (Exception e) { throw new Error(Unexpected exception: , e); } return bos.toByteArray(); } @Override public int compareTo(NodeIdPair arg0) { return _id - arg0._id; } public long getNode() { return _node; } public int getId() { return _id; } } package com.xxx.neo4j.restore; import java.util.Arrays; import java.util.TreeSet; public class CompactNodeIndex { private int _offSet = 0; private byte _extent[]; private int _slotCount; public CompactNodeIndex(TreeSetNodeIdPair sortedPairs) { _extent = new byte[sortedPairs.size() * NodeIdPair.size()]; _slotCount = sortedPairs.size(); for (NodeIdPair pair : sortedPairs) { byte pairBytes[] = pair.toByteArray(); copyToExtent(pairBytes); } System.out.println(CompactNodeIndex slot count: + _slotCount); } public NodeIdPair findNodeForId(int id) { return search(id, 0, _slotCount - 1); } @SuppressWarnings(serial) static class FoundIt extends Exception { NodeIdPair _result; FoundIt(NodeIdPair result) { _result = result; } } private NodeIdPair search(int soughtId, int lowerBound, int upperBound) { try { while (true) { if ((upperBound - lowerBound) 1) { int compareSlot = lowerBound + ((upperBound - lowerBound) / 2); int comparison = compareAt(soughtId, compareSlot); if (comparison 0) { lowerBound = compareSlot; continue; } else { upperBound = compareSlot; } } else { compareAt(soughtId, upperBound); compareAt(soughtId, lowerBound); // not found it return null; } } } catch (FoundIt result) { return result._result; } } private int compareAt(int soughtId, int compareSlot) throws FoundIt { NodeIdPair candidate = get(compareSlot); int diff = soughtId - candidate.getId(); if (diff == 0) throw new FoundIt(candidate); return diff; } private NodeIdPair get(int compareSlot) { int startPos = compareSlot * NodeIdPair.size(); byte serialisedPair[] = Arrays.copyOfRange(_extent, startPos, startPos + NodeIdPair.size()); return new NodeIdPair(serialisedPair); } private void copyToExtent(byte[] pairBytes) { for (byte b : pairBytes) { if (_offSet _extent.length) throw new Error(Unexpected extent overflow: + _offSet); _extent[_offSet++] = b; } } } On 13 Jun 2011, at 13:23, Michael Hunger wrote: Paul, can you share your test