Re: [Neo4j] BatchInserter improvement with 1.4M04 but still got relationship building bottleneck [was Re: Speeding up initial import of graph...]

2011-06-14 Thread Michael Hunger
Paul,

I tried to use your implementation, it added quite a high performance penalty, 
when profiling it, it consumed 3/4 of the time for the relationship creation.
The other time was spent in acquiring persistence windows for the node-store.

I also profiled the version using the simple map.

We found one issue (IdGenerator) that could benefit from the assumption that 
BatchInsertion runs single threaded.

Lots of time was spent in acquiring persistence-windows for nodes when creating 
the relationships.

My results with no batch-buffer configuration were not so bad initially but I 
sped them up tremendously by increasing the memory_buffer size for the 
node-store (so that i more often finds the
persistence-window in the cache and doesn't have to look it up.)

so, my configuration looks like this:
neostore.nodestore.db.mapped_memory,250M,
neostore.relationshipstore.db.mapped_memory,200M, // that 
could be also lower 50M-100M
neostore.propertystore.db.mapped_memory,50M,
neostore.propertystore.db.strings.mapped_memory,50M,
neostore.propertystore.db.arrays.mapped_memory,0M

Then I get the following (with the faster map-cache):
Physical mem: 16384MB, Heap size: 3055MB
use_memory_mapped_buffers=false
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.strings.mapped_memory=50M
neostore.propertystore.db.arrays.mapped_memory=0M
neo_store=/Users/mh/java/neo/import/target/hepper/neostore
neostore.relationshipstore.db.mapped_memory=200M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.mapped_memory=50M
dump_configuration=true
cache_type=weak
neostore.nodestore.db.mapped_memory=250M

100 nodes created. Took 2358 
200 nodes created. Took 2090 
300 nodes created. Took 2082 
400 nodes created. Took 2054 
500 nodes created. Took 2245 
600 nodes created. Took 2100 
700 nodes created. Took 2117 
800 nodes created. Took 2076 
900 nodes created. Took 2391 
1000 nodes created. Took 2214 
Creating nodes took 21
100 relationships created. Took 4302 
200 relationships created. Took 4176 
300 relationships created. Took 4260 
400 relationships created. Took 6448 
500 relationships created. Took 7645 
600 relationships created. Took 7290 
700 relationships created. Took 8627 
800 relationships created. Took 7907 
900 relationships created. Took 8292 
1000 relationships created. Took 8563 
Creating relationships took 67

 use_memory_mapped_buffers=false
 neostore.propertystore.db.index.keys.mapped_memory=1M
 neostore.propertystore.db.strings.mapped_memory=52M
 neostore.propertystore.db.arrays.mapped_memory=60M
 neo_store=N:\TradeModel\target\hepper\neostore
 neostore.relationshipstore.db.mapped_memory=76M
 neostore.propertystore.db.index.mapped_memory=1M
 neostore.propertystore.db.mapped_memory=62M
 dump_configuration=true
 cache_type=weak


Cheers

Michael

Am 13.06.2011 um 14:58 schrieb Paul Bandler:

 can you share your test and the CompactIndex you wrote?
 
 That would be great. 
 
 See below...
 
 Also the memory settings (Xmx) you used for the different runs.
 
 The heap size is displayed by neo4j is it not with console entries such as:-
 
 Physical mem: 1535MB, Heap size: 1016MB
 
 So that one came fro -Xmx1024M and 
 
 Physical mem: 4096MB, Heap size: 2039MB
 
 
 came from -Xms2048M
 
 
 regards,
 
 Paul
 
 
 package com.xxx.neo4j.restore;
 
 import java.io.ByteArrayInputStream;
 import java.io.ByteArrayOutputStream;
 import java.io.DataInputStream;
 import java.io.DataOutputStream;
 
 public class NodeIdPair implements ComparableNodeIdPair {
private long_node;
private int _id;
static final NodeIdPair _prototype = new NodeIdPair(Long.MAX_VALUE,
   Integer.MAX_VALUE);
static Integer  MY_SIZE= null;
 
public static int size() {
if (MY_SIZE == null) {
MY_SIZE = (new NodeIdPair(Long.MAX_VALUE, Integer.MAX_VALUE))
.toByteArray().length;
System.out.println(MY_SIZE:  + MY_SIZE);
}
return MY_SIZE;
}
 
public NodeIdPair(long node, int id) {
_node = node;
_id = id;
}
 
public NodeIdPair(byte fromByteArray[]) {
ByteArrayInputStream bais = new ByteArrayInputStream(fromByteArray);
DataInputStream dis = new DataInputStream(bais);
try {
_node = dis.readLong();
_id = dis.readInt();
} catch (Exception e) {
throw new Error(Unexpected exception. byte[] len  + 
 fromByteArray.length, e);
}
}
 
byte[] toByteArray() {
ByteArrayOutputStream bos = new ByteArrayOutputStream(
MY_SIZE != null ? MY_SIZE : 12);
DataOutputStream dos = new DataOutputStream(bos);
try {
dos.writeLong(_node);

[Neo4j] BatchInserter improvement with 1.4M04 but still got relationship building bottleneck [was Re: Speeding up initial import of graph...]

2011-06-13 Thread Paul Bandler
Having noticed a mention in the 1.4M04 release notes that:

 Also, the BatchInserterIndex now keeps its memory usage in-check with batched 
 commits of indexed data using a configurable batch commit size.

I re-ran this test using M04 and sure enough, node creation no longer eats up 
the heap linearly so that is good - I should be able to remove the periodic 
resetting of the BatchInserter during import.

So I returned to the issue of removing the index creation and later access 
bottleneck using an application managed data structure as Michael illustrated, 
but needing a solution with a smaller memory footprint I wrote a 
CompactNodeIndex class for mapping integer 'id' key values to long nodes that 
uses a minimum memory footprint by overlaying a binary-choppable table onto a 
byte array.  Watching heap on jconsole while this ran I could see that had the 
desired effect of releasing huge amounts of heap once it the CompactNodeIndex 
is loaded and the source data structure gc'd.  However when I attempted to 
scale the test program back up to the 10M nodes Michael had been testing it 
appears to run into something of a brick wall becoming massively I/O bound when 
creating the relationships.  With 1M nodes it ran ok, with 2M nodes not too 
bad, but much beyond that it crawls along using just about 1% of CPU but has 
loads of heap spare.

I re-ran on a more generously configured iMac (giving the test 4G of heap) and 
it did much better in that it actually showed some progress building 
relationships over a 10M node-set, but still exhibited massive slow down once 
past 7M relationships.

Below are the test results - the question now is are there any Neo4j parameters 
that might enable this I/O bottleneck that appears when building relationships 
over such sized node sets with the BatchInserter...?  I note the section in the 
manual on performance parameters, but I'm afraid not being familiar enough with 
the Neo4j internals I don't feel that they give enough clear information on how 
to set them improve the performance of this use-case.

Thanks,

Paul

Run 1 - Windows m/c..REPORT_COUNT = MILLION/10
Physical mem: 1535MB, Heap size: 1016MB
use_memory_mapped_buffers=false
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.strings.mapped_memory=52M
neostore.propertystore.db.arrays.mapped_memory=60M
neo_store=N:\TradeModel\target\hepper\neostore
neostore.relationshipstore.db.mapped_memory=76M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.mapped_memory=62M
dump_configuration=true
cache_type=weak
neostore.nodestore.db.mapped_memory=17M
10 nodes created. Took 2906
20 nodes created. Took 2688
30 nodes created. Took 2828
40 nodes created. Took 2953
50 nodes created. Took 2672
60 nodes created. Took 2766
70 nodes created. Took 2687
80 nodes created. Took 2703
90 nodes created. Took 2719
100 nodes created. Took 2641
Creating nodes took 27
MY_SIZE: 12
CompactNodeIndex slot count: 100
10 relationships created. Took 4125
20 relationships created. Took 3953
30 relationships created. Took 3937
40 relationships created. Took 3610
50 relationships created. Took 3719
60 relationships created. Took 4328
70 relationships created. Took 3750
80 relationships created. Took 3609
90 relationships created. Took 4125
100 relationships created. Took 3781
110 relationships created. Took 4125
120 relationships created. Took 3750
130 relationships created. Took 3907
140 relationships created. Took 4297
150 relationships created. Took 3703
160 relationships created. Took 3687
170 relationships created. Took 4328
180 relationships created. Took 3907
190 relationships created. Took 3718
200 relationships created. Took 3891
Creating relationships took 78
 
2M Nodes on Windows m/c:-
 
Creating data took 68 seconds
Physical mem: 1535MB, Heap size: 1016MB
use_memory_mapped_buffers=false
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.strings.mapped_memory=52M
neostore.propertystore.db.arrays.mapped_memory=60M
neo_store=N:\TradeModel\target\hepper\neostore
neostore.relationshipstore.db.mapped_memory=76M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.mapped_memory=62M
dump_configuration=true
cache_type=weak
neostore.nodestore.db.mapped_memory=17M
10 nodes created. Took 3188
20 nodes created. Took 3094
30 nodes created. Took 3062
40 nodes created. Took 2813
50 nodes created. Took 2718
60 nodes created. Took 3000
70 nodes created. Took 2938
80 nodes created. Took 2828
90 nodes created. Took 4172
100 nodes created. Took 2859
110 nodes created. Took 3625
120 nodes created. Took 3235
130 nodes created. Took 2781
140 nodes created. Took 2891
150 nodes created. Took 2922
160 nodes created. Took 2968
170 nodes created. Took 3438
180 nodes created. 

Re: [Neo4j] BatchInserter improvement with 1.4M04 but still got relationship building bottleneck [was Re: Speeding up initial import of graph...]

2011-06-13 Thread Michael Hunger
Paul,

can you share your test and the CompactIndex you wrote?

That would be great. 

Also the memory settings (Xmx) you used for the different runs.

Thanks so much

Michael

Am 13.06.2011 um 14:15 schrieb Paul Bandler:

 Having noticed a mention in the 1.4M04 release notes that:
 
 Also, the BatchInserterIndex now keeps its memory usage in-check with 
 batched commits of indexed data using a configurable batch commit size.
 
 I re-ran this test using M04 and sure enough, node creation no longer eats up 
 the heap linearly so that is good - I should be able to remove the periodic 
 resetting of the BatchInserter during import.
 
 So I returned to the issue of removing the index creation and later access 
 bottleneck using an application managed data structure as Michael 
 illustrated, but needing a solution with a smaller memory footprint I wrote a 
 CompactNodeIndex class for mapping integer 'id' key values to long nodes that 
 uses a minimum memory footprint by overlaying a binary-choppable table onto a 
 byte array.  Watching heap on jconsole while this ran I could see that had 
 the desired effect of releasing huge amounts of heap once it the 
 CompactNodeIndex is loaded and the source data structure gc'd.  However when 
 I attempted to scale the test program back up to the 10M nodes Michael had 
 been testing it appears to run into something of a brick wall becoming 
 massively I/O bound when creating the relationships.  With 1M nodes it ran 
 ok, with 2M nodes not too bad, but much beyond that it crawls along using 
 just about 1% of CPU but has loads of heap spare.
 
 I re-ran on a more generously configured iMac (giving the test 4G of heap) 
 and it did much better in that it actually showed some progress building 
 relationships over a 10M node-set, but still exhibited massive slow down once 
 past 7M relationships.
 
 Below are the test results - the question now is are there any Neo4j 
 parameters that might enable this I/O bottleneck that appears when building 
 relationships over such sized node sets with the BatchInserter...?  I note 
 the section in the manual on performance parameters, but I'm afraid not being 
 familiar enough with the Neo4j internals I don't feel that they give enough 
 clear information on how to set them improve the performance of this use-case.
 
 Thanks,
 
 Paul
 
 Run 1 - Windows m/c..REPORT_COUNT = MILLION/10
 Physical mem: 1535MB, Heap size: 1016MB
 use_memory_mapped_buffers=false
 neostore.propertystore.db.index.keys.mapped_memory=1M
 neostore.propertystore.db.strings.mapped_memory=52M
 neostore.propertystore.db.arrays.mapped_memory=60M
 neo_store=N:\TradeModel\target\hepper\neostore
 neostore.relationshipstore.db.mapped_memory=76M
 neostore.propertystore.db.index.mapped_memory=1M
 neostore.propertystore.db.mapped_memory=62M
 dump_configuration=true
 cache_type=weak
 neostore.nodestore.db.mapped_memory=17M
 10 nodes created. Took 2906
 20 nodes created. Took 2688
 30 nodes created. Took 2828
 40 nodes created. Took 2953
 50 nodes created. Took 2672
 60 nodes created. Took 2766
 70 nodes created. Took 2687
 80 nodes created. Took 2703
 90 nodes created. Took 2719
 100 nodes created. Took 2641
 Creating nodes took 27
 MY_SIZE: 12
 CompactNodeIndex slot count: 100
 10 relationships created. Took 4125
 20 relationships created. Took 3953
 30 relationships created. Took 3937
 40 relationships created. Took 3610
 50 relationships created. Took 3719
 60 relationships created. Took 4328
 70 relationships created. Took 3750
 80 relationships created. Took 3609
 90 relationships created. Took 4125
 100 relationships created. Took 3781
 110 relationships created. Took 4125
 120 relationships created. Took 3750
 130 relationships created. Took 3907
 140 relationships created. Took 4297
 150 relationships created. Took 3703
 160 relationships created. Took 3687
 170 relationships created. Took 4328
 180 relationships created. Took 3907
 190 relationships created. Took 3718
 200 relationships created. Took 3891
 Creating relationships took 78
 
 2M Nodes on Windows m/c:-
 
 Creating data took 68 seconds
 Physical mem: 1535MB, Heap size: 1016MB
 use_memory_mapped_buffers=false
 neostore.propertystore.db.index.keys.mapped_memory=1M
 neostore.propertystore.db.strings.mapped_memory=52M
 neostore.propertystore.db.arrays.mapped_memory=60M
 neo_store=N:\TradeModel\target\hepper\neostore
 neostore.relationshipstore.db.mapped_memory=76M
 neostore.propertystore.db.index.mapped_memory=1M
 neostore.propertystore.db.mapped_memory=62M
 dump_configuration=true
 cache_type=weak
 neostore.nodestore.db.mapped_memory=17M
 10 nodes created. Took 3188
 20 nodes created. Took 3094
 30 nodes created. Took 3062
 40 nodes created. Took 2813
 50 nodes created. Took 2718
 60 nodes created. Took 3000
 70 nodes created. Took 2938
 80 nodes created. 

Re: [Neo4j] BatchInserter improvement with 1.4M04 but still got relationship building bottleneck [was Re: Speeding up initial import of graph...]

2011-06-13 Thread Paul Bandler
 can you share your test and the CompactIndex you wrote?
 
 That would be great. 

See below...

 Also the memory settings (Xmx) you used for the different runs.

The heap size is displayed by neo4j is it not with console entries such as:-

 Physical mem: 1535MB, Heap size: 1016MB

So that one came fro -Xmx1024M and 

 Physical mem: 4096MB, Heap size: 2039MB
 

came from -Xms2048M


regards,

Paul


package com.xxx.neo4j.restore;
 
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.DataOutputStream;
 
public class NodeIdPair implements ComparableNodeIdPair {
private long_node;
private int _id;
static final NodeIdPair _prototype = new NodeIdPair(Long.MAX_VALUE,
   Integer.MAX_VALUE);
static Integer  MY_SIZE= null;
 
public static int size() {
if (MY_SIZE == null) {
MY_SIZE = (new NodeIdPair(Long.MAX_VALUE, Integer.MAX_VALUE))
.toByteArray().length;
System.out.println(MY_SIZE:  + MY_SIZE);
}
return MY_SIZE;
}
 
public NodeIdPair(long node, int id) {
_node = node;
_id = id;
}
 
public NodeIdPair(byte fromByteArray[]) {
ByteArrayInputStream bais = new ByteArrayInputStream(fromByteArray);
DataInputStream dis = new DataInputStream(bais);
try {
_node = dis.readLong();
_id = dis.readInt();
} catch (Exception e) {
throw new Error(Unexpected exception. byte[] len  + 
fromByteArray.length, e);
}
}
 
byte[] toByteArray() {
ByteArrayOutputStream bos = new ByteArrayOutputStream(
MY_SIZE != null ? MY_SIZE : 12);
DataOutputStream dos = new DataOutputStream(bos);
try {
dos.writeLong(_node);
dos.writeInt(_id);
dos.flush();
} catch (Exception e) {
throw new Error(Unexpected exception: , e);
}
 
return bos.toByteArray();
}
 
@Override
public int compareTo(NodeIdPair arg0) {
return _id - arg0._id;
}
 
public long getNode() {
return _node;
}
 
public int getId() {
return _id;
}
 
}
 




package com.xxx.neo4j.restore;
 
import java.util.Arrays;
import java.util.TreeSet;
 
public class CompactNodeIndex {
private int  _offSet = 0;
private byte _extent[];
private int  _slotCount;
 
public CompactNodeIndex(TreeSetNodeIdPair sortedPairs) {
_extent = new byte[sortedPairs.size() * NodeIdPair.size()];
_slotCount = sortedPairs.size();
for (NodeIdPair pair : sortedPairs) {
byte pairBytes[] = pair.toByteArray();
copyToExtent(pairBytes);
}
System.out.println(CompactNodeIndex slot count:  + _slotCount);
}
 
public NodeIdPair findNodeForId(int id) {
return search(id, 0, _slotCount - 1);
}
 
@SuppressWarnings(serial)
static class FoundIt extends Exception {
NodeIdPair _result;
 
FoundIt(NodeIdPair result) {
_result = result;
}
 
}
 
private NodeIdPair search(int soughtId, int lowerBound, int upperBound) {
try {
while (true) {
if ((upperBound - lowerBound)  1) {
int compareSlot = lowerBound
+ ((upperBound - lowerBound) / 2);
int comparison = compareAt(soughtId, compareSlot);
 
if (comparison  0) {
lowerBound = compareSlot;
continue;
} else {
upperBound = compareSlot;
}
} else {
compareAt(soughtId, upperBound);
compareAt(soughtId, lowerBound);
// not found it
return null;
}
}
} catch (FoundIt result) {
return result._result;
}
}
 
private int compareAt(int soughtId, int compareSlot) throws FoundIt {
NodeIdPair candidate = get(compareSlot);
int diff = soughtId - candidate.getId();
if (diff == 0)
throw new FoundIt(candidate);
return diff;
}
 
private NodeIdPair get(int compareSlot) {
int startPos = compareSlot * NodeIdPair.size();
byte serialisedPair[] = Arrays.copyOfRange(_extent, startPos, startPos
+ NodeIdPair.size());
 
return new NodeIdPair(serialisedPair);
}
 
private void copyToExtent(byte[] pairBytes) {
for (byte b : pairBytes) {
if (_offSet  _extent.length)
throw new Error(Unexpected extent overflow:  + _offSet);
_extent[_offSet++] = b;
}
}
}
 

On 13 Jun 2011, at 13:23, Michael Hunger wrote:

 Paul,
 
 can you share your test