Hi Guys -

I have been working on loading WordNet (http://wordnet.princeton.edu/) into
Neo4j, and have been using it as an opportunity to tune write performance on
Linux for a Web application I am developing. 

My initial idea was to load WordNet RDF
(http://semanticweb.cs.vu.nl/lod/wn30/) through the Blueprints SailGraph
interface, but then I decided to use NLTK (http://www.nltk.org) and load it
directly from Bulbs into Rexster.

Stephen recently added batch transactions to Rexster
(https://github.com/tinkerpop/rexster-kibbles/tree/master/batch-kibble), but
right now I am not using them because I want to see what type of write
performance you can get in non-batch mode.

The Neo4j performance guides were helpful:

* http://wiki.neo4j.org/content/Performance_Guide
* http://wiki.neo4j.org/content/Linux_Performance_Guide
* http://wiki.neo4j.org/content/Configuration_Settings

As are Peter and Tobias' recommendations to put Neo4j transactions in manual
mode
(https://groups.google.com/d/msg/gremlin-users/vl4IZO7O8H4/20Yc4rUObNcJ) so
you don't have to flush to disk for each write.  

However, manual/batch modes are not practical for writes in a Web
application. It would be cool if there was a tunable parameter where you
could set Neo4j to flush to disk at some interval instead of after every
create/update statement. 

Obviously you would have an issue if the server crashed before it was
written to disk, but this could be mitigated through HA redundancy, and
because it's a tunable parameter, you could dial it up or down depending on
your requirements. 

MongoDB does something similar, and it is reported that a single server can
do 20-30,000 writes per second
(http://www.dbms2.com/2011/04/04/the-mongodb-story/).

Here some of the things Mongo does to make writes fast:

* A memory-mapped data model.
* Deferred writes — a write might take a couple of seconds to actually
persist.
* Optimism — you don’t have to wait for an acknowledgement if you write
something to the database.
* “Upsert in place” – update in place without checking whether you’re doing
a write or insert.

What would it take for Neo4j to approach these levels?

Neo4j does memory-mapped IO:

 
http://wiki.neo4j.org/content/Configuration_Settings#Memory_mapped_I.2FO_settings

There have been talks about adding optimistic locking:

  http://neo4j.org/forums/#nabble-td2891798

And Peter has said that deferred writes are on the drawing board
(http://lists.neo4j.org/pipermail/user/2011-May/008792.html):


Peter Neubauer wrote:
> 
> However, we are looking into Neo4j normal mode speedups by having a mode
> that drops the JTA dependencies and thus can relax on the logfile flushing
> requirements for each transaction, by that being able to use the
> underlying
> OS for ordered (deferred) writing, adjustable on a case-by-case level
> (e.g.
> batch inserting big data). This will give Neo4j insertions in this mode
> comparable performance with the batchinserter, while keeping all other
> semantics and layers in place. I hope this can make it into 1.4, and it
> will
> speed up the RDF insertion considerably!
> 

Is support for optimistic locking and deferred writes planned for an
upcoming release?

Thanks.

- James

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Write-Performance-tp3323638p3323638.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to