2010/9/30 Garrett Barton garrett.bar...@gmail.com
Thanks for the reply!
Root nodes are found via:
private Node getNewNode(NeoTypes entity) {
Node n = graphDb.createNode();
n.createRelationshipTp(getRootEntity(entity),entity);
}
private Node getRootNode(NeoTypes entity) {
Node root = rootMap.get(entity);
if(root == null) {
if(indexService.getSingleNode(eid,entity.toString()) != null )
root = indexService.getSingleNode(eid,entity.toString());
else {
Node entityNode = graphDb.createNode();
entityNode.createRelationshipTo(graphdb.getReferenceNode(),entity);
root = entitiyNode;
indexService.index(root, eid,entity.toString());
}
rootMap.put(entity,root);
}
}
Where rootMap = new HashMapNeoType,Node();
Thus I create the root entity once, attach it to the referenceNode and
the look it up via the rootMap. My loads are loading one entity at a
time, which from what you said will single thread me since every
relationship that attaches to one of my rootNodes (the same one per
run) locks that node until the transaction completes.
I was creating root nodes in order to provide entrypoints, but this
may be undesireable now that I think about it since each root entity
could easily have 500M nodes hanging off of it. Neo probably would not
be able to operate through a traversal of that very well correct? If I
remove this restriction and load nodes individually I should be able
to thread out again. Only when I do the relationships will I
occasionally run into locks, which I can try to mitigate with more
threads and smaller transaction sizes (10k maybe?) Is there any
documentation on what operations will take out a lock?
It depends on what kind of traversal you are doing... could you give some
examples?
I know its not the db (postgres) as the same code to hit this rs also
can drive my full lucene indexing layer and I can pull well over
100k/s per thread with it. (My lucene implementation indexes on
average 400-500k/s with 4 threads and peaks once in a while over
1mil/s) HUGE hack right now, but instead of just calling tx.finish() I
am also shutting down and starting the db back up again every 150k.
This has brought the rates (including start/stop time) up to about
15k/s and it stays at that level now. Need to figure out why I run out
of ram so I can avoid doing this.
(See my answer below on batch insertion as well). If neo specific
configuration is used for neo4j it will try to cache pretty much in your
heap so if your other database also caches stuff your heap will pretty soon
be full. You can try to set down the caching levels. You can fiddle around
with the Cache settings where an example
http://dist.neo4j.org/neo_default.props , look at f.ex.
adaptive_cache_heap_ratio=0.77, maybe lower it a bit.
In general creating nodes I assume is expensive? Can I create batch
of Nodes, close the transaction, update all of their properties and
then reopen a transaction to attach relationships? What is the
bottleneck when doing insertions?
If this is a one-time batch insertion then should really use the batch
inserter http://wiki.neo4j.org/content/Batch_Insert which is optimized for
these things. It's much faster for imports of this sort and you don't have
to (in fact, you cannot have) multiple threads inserting your data.
Message: 1
Date: Thu, 30 Sep 2010 09:54:31 +0200
From: Mattias Persson matt...@neotechnology.com
Subject: Re: [Neo4j] Using EmbeddedGraphDatabase, possible to stop
node caching eating ram?
To: Neo4j user discussions user@lists.neo4j.org
Message-ID:
aanlktinkcdufqrjszxmyosgotj_fak+jnp-638cf5...@mail.gmail.comaanlktinkcdufqrjszxmyosgotj_fak%2bjnp-638cf5...@mail.gmail.com
Content-Type: text/plain; charset=UTF-8
2010/9/29 Garrett Barton garrett.bar...@gmail.com
Hey all,
I have an issue similar to this post
http://www.mail-archive.com/user@lists.neo4j.org/msg04942.html
I am following the advice under the Big Transactions page on the wiki
and my code looks something like this:
public void doBigBatchJob(NeoType entity) {
Transaction tx = null;
try {
int counter = 0;
while(rs.next()) {
if(tx == null)
tx = graphDb.beginTx();
Node n = getNewNode(entity);
for(String col: columnList)
if(rs.getString(col) != null)
n.setProperty(col,rs.getString(col));
counter++;
if ( counter % 1 == 0 ) {
tx.success();
tx.finish();
tx = null;
}
}
}
finally {
if(tx != null) {
tx.success();
tx.finish();
}
}
}
It looks correct to me.
Where getNewNode creates a node and gives it a relationship to the
parent entity. Parent nodes are cached, that helped a whole bunch.
How are you looking