Re: [Neo4j] Using EmbeddedGraphDatabase, possible to stop node caching eating ram?

Mattias Persson Fri, 01 Oct 2010 03:29:23 -0700

2010/9/30 Garrett Barton <garrett.bar...@gmail.com>

> Thanks for the reply!
>
> Root nodes are found via:
>
> private Node getNewNode(NeoTypes entity) {
>  Node n = graphDb.createNode();
>  n.createRelationshipTp(getRootEntity(entity),entity);
> }
>
> private Node getRootNode(NeoTypes entity) {
>  Node root = rootMap.get(entity);
>  if(root == null) {
>    if(indexService.getSingleNode("eid",entity.toString()) != null )
>      root = indexService.getSingleNode("eid",entity.toString());
>    else {
>      Node entityNode = graphDb.createNode();
>      entityNode.createRelationshipTo(graphdb.getReferenceNode(),entity);
>      root = entitiyNode;
>      indexService.index(root, "eid",entity.toString());
>    }
>    rootMap.put(entity,root);
>  }
> }
>
> Where rootMap = new HashMap<NeoType,Node>();
> Thus I create the root entity once, attach it to the referenceNode and
> the look it up via the rootMap. My loads are loading one entity at a
> time, which from what you said will single thread me since every
> relationship that attaches to one of my rootNodes (the same one per
> run) locks that node until the transaction completes.
>


> I was creating root nodes in order to provide entrypoints, but this
> may be undesireable now that I think about it since each root entity
> could easily have 500M nodes hanging off of it. Neo probably would not
> be able to operate through a traversal of that very well correct? If I
> remove this restriction and load nodes individually I should be able
> to thread out again. Only when I do the relationships will I
> occasionally run into locks, which I can try to mitigate with more
> threads and smaller transaction sizes (10k maybe?) Is there any
> documentation on what operations will take out a lock?
>
It depends on what kind of traversal you are doing... could you give some
examples?

>
> I know its not the db (postgres) as the same code to hit this rs also
> can drive my full lucene indexing layer and I can pull well over
> 100k/s per thread with it. (My lucene implementation indexes on
> average 400-500k/s with 4 threads and peaks once in a while over
> 1mil/s) HUGE hack right now, but instead of just calling tx.finish() I
> am also shutting down and starting the db back up again every 150k.
> This has brought the rates (including start/stop time) up to about
> 15k/s and it stays at that level now. Need to figure out why I run out
> of ram so I can avoid doing this.
>
(See my answer below on batch insertion as well). If neo specific
configuration is used for neo4j it will try to cache pretty much in your
heap so if your other database also caches stuff your heap will pretty soon
be full. You can try to set down the caching levels. You can fiddle around
with the Cache settings where an example
http://dist.neo4j.org/neo_default.props , look at f.ex.
adaptive_cache_heap_ratio=0.77, maybe lower it a bit.

>
> In general creating nodes I assume is expensive?  Can I create batch
> of Nodes, close the transaction, update all of their properties and
> then reopen a transaction to attach relationships?  What is the
> bottleneck when doing insertions?
>

If this is a one-time batch insertion then should really use the batch
inserter <http://wiki.neo4j.org/content/Batch_Insert> which is optimized for
these things. It's much faster for imports of this sort and you don't have
to (in fact, you cannot have) multiple threads inserting your data.

>
> > Message: 1
> > Date: Thu, 30 Sep 2010 09:54:31 +0200
> > From: Mattias Persson <matt...@neotechnology.com>
> > Subject: Re: [Neo4j] Using EmbeddedGraphDatabase, possible to stop
> >        node caching eating ram?
> > To: Neo4j user discussions <user@lists.neo4j.org>
> > Message-ID:
> >        
> > <aanlktinkcdufqrjszxmyosgotj_fak+jnp-638cf5...@mail.gmail.com<aanlktinkcdufqrjszxmyosgotj_fak%2bjnp-638cf5...@mail.gmail.com>
> >
> > Content-Type: text/plain; charset=UTF-8
> >
> > 2010/9/29 Garrett Barton <garrett.bar...@gmail.com>
> >
> >> Hey all,
> >>
> >> I have an issue similar to this post
> >> http://www.mail-archive.com/user@lists.neo4j.org/msg04942.html
> >>
> >> I am following the advice under the Big Transactions page on the wiki
> >> and my code looks something like this:
> >>
> >> public void doBigBatchJob(NeoType entity) {
> >>    Transaction tx = null;
> >>    try  {
> >>        int counter = 0;
> >>        while(rs.next()) {
> >>            if(tx == null)
> >>                tx = graphDb.beginTx();
> >>
> >>            Node n = getNewNode(entity);
> >>            for(String col: columnList)
> >>                if(rs.getString(col) != null)
> >>                    n.setProperty(col,rs.getString(col));
> >>
> >>            counter++;
> >>
> >>            if ( counter % 10000 == 0 ) {
> >>                tx.success();
> >>                tx.finish();
> >>                tx = null;
> >>            }
> >>        }
> >>    }
> >>    finally {
> >>        if(tx != null) {
> >>            tx.success();
> >>            tx.finish();
> >>        }
> >>    }
> >> }
> >>
> >> It looks correct to me.
> >
> >>
> >> Where getNewNode creates a node and gives it a relationship to the
> >> parent entity. Parent nodes are cached, that helped a whole bunch.
> >>
> > How are you looking up parent nodes?
> >
> >>
> >> I have timers throughout the code as well, I know I eat some time
> >> pulling from the db, but if i take out the node creation and to a pull
> >> test of the db I can sustain 100k/s rates easily.  When I start this
> >> process up, I get an initial 12-14k/s rate that works well for the
> >> first 500k or so then the drop off is huge.  By the time its done the
> >> next 500k its down to under 3k/s.
> >>
> >> What I watch with JProfiler I see the ram I gave the vm maxes out and
> >> stays there, as soon as that peaks rates tank.
> >> Current setup is:
> >> -Xms2048m -Xmx2048m -XX:+UseConcMarkSweepGC
> >>
> >> Box has about 8GB of ram free for this, its own storage for the neo
> >> db, and I have already watched nr_dirty and nr_writeback and they
> >> never get over 2k/10 respectfully.
> >>
> >> neo config options:
> >> nodestore.db.mapped_memory= 500M
> >> relationshipstore.db.mapped_memory= 1G
> >> propertystore.db.mapped_memory= 500M
> >> propertystore.db.strings.mapped_memory= 2G
> >> propertystore.db.arrays.mapped_memory= 0M
> >>
> >> I have not run through a complete initial node load as the first set
> >> of nodes is ~16M, the second set is about 20M and theres a good 30M
> >> relationships between the two I haven't gotten to yet.
> >>
> >> Am I configuring something wrong?  I read that neo will cache all the
> >> nodes I create, is that whats hurting me? I do not really want to use
> >> batchinserter because I think its bugged (lucene part) and I will be
> >> injesting 100's of millions of nodes live daily when this thing works
> >> anyways.  (Yes I have the potential to see what the upper limits of
> >> Neo are).
> >>
> >
> > It might me the SQL database you're running from causes the slowdowns...
> > I've seen this before a couple of times, so try to do this in two steps:
> >
> > 1) Extract data from your SQL database and store in a CSV file or
> something.
> > 2) Import from that file into neo4j.
> >
> > If you do it this way, do you experience these slowdowns?
> >
> >
> >>
> >> Also, is neo single write transaction based? My injest code is
> >> actually threadable and I noticed in JProfiler that only 1 thread
> >> would be inserting at a time.
> >>
> >
> > It might be that you always create relationships to some parent node(s)
> so
> > that locks are taken on them. That would mean that those locks are held
> > until that thread has committed its transaction, this will make it look
> like
> > it's only one thread at a time committing stuff.
> >
> >
> >> _______________________________________________
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> >
> >
> >
> > --
> > Mattias Persson, [matt...@neotechnology.com]
> > Hacker, Neo Technology
> > www.neotechnology.com
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > User mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> >
> > End of User Digest, Vol 42, Issue 59
> > ************************************
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Using EmbeddedGraphDatabase, possible to stop node caching eating ram?

Reply via email to