2010/9/29 Garrett Barton <garrett.bar...@gmail.com>

> Hey all,
>
> I have an issue similar to this post
> http://www.mail-archive.com/user@lists.neo4j.org/msg04942.html
>
> I am following the advice under the Big Transactions page on the wiki
> and my code looks something like this:
>
> public void doBigBatchJob(NeoType entity) {
>    Transaction tx = null;
>    try  {
>        int counter = 0;
>        while(rs.next()) {
>            if(tx == null)
>                tx = graphDb.beginTx();
>
>            Node n = getNewNode(entity);
>            for(String col: columnList)
>                if(rs.getString(col) != null)
>                    n.setProperty(col,rs.getString(col));
>
>            counter++;
>
>            if ( counter % 10000 == 0 ) {
>                tx.success();
>                tx.finish();
>                tx = null;
>            }
>        }
>    }
>    finally {
>        if(tx != null) {
>            tx.success();
>            tx.finish();
>        }
>    }
> }
>
> It looks correct to me.

>
> Where getNewNode creates a node and gives it a relationship to the
> parent entity. Parent nodes are cached, that helped a whole bunch.
>
How are you looking up parent nodes?

>
> I have timers throughout the code as well, I know I eat some time
> pulling from the db, but if i take out the node creation and to a pull
> test of the db I can sustain 100k/s rates easily.  When I start this
> process up, I get an initial 12-14k/s rate that works well for the
> first 500k or so then the drop off is huge.  By the time its done the
> next 500k its down to under 3k/s.
>
> What I watch with JProfiler I see the ram I gave the vm maxes out and
> stays there, as soon as that peaks rates tank.
> Current setup is:
> -Xms2048m -Xmx2048m -XX:+UseConcMarkSweepGC
>
> Box has about 8GB of ram free for this, its own storage for the neo
> db, and I have already watched nr_dirty and nr_writeback and they
> never get over 2k/10 respectfully.
>
> neo config options:
> nodestore.db.mapped_memory= 500M
> relationshipstore.db.mapped_memory= 1G
> propertystore.db.mapped_memory= 500M
> propertystore.db.strings.mapped_memory= 2G
> propertystore.db.arrays.mapped_memory= 0M
>
> I have not run through a complete initial node load as the first set
> of nodes is ~16M, the second set is about 20M and theres a good 30M
> relationships between the two I haven't gotten to yet.
>
> Am I configuring something wrong?  I read that neo will cache all the
> nodes I create, is that whats hurting me? I do not really want to use
> batchinserter because I think its bugged (lucene part) and I will be
> injesting 100's of millions of nodes live daily when this thing works
> anyways.  (Yes I have the potential to see what the upper limits of
> Neo are).
>

It might me the SQL database you're running from causes the slowdowns...
I've seen this before a couple of times, so try to do this in two steps:

1) Extract data from your SQL database and store in a CSV file or something.
2) Import from that file into neo4j.

If you do it this way, do you experience these slowdowns?


>
> Also, is neo single write transaction based? My injest code is
> actually threadable and I noticed in JProfiler that only 1 thread
> would be inserting at a time.
>

It might be that you always create relationships to some parent node(s) so
that locks are taken on them. That would mean that those locks are held
until that thread has committed its transaction, this will make it look like
it's only one thread at a time committing stuff.


> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to