Hi Guys - I have been working on loading WordNet (http://wordnet.princeton.edu/) into Neo4j, and have been using it as an opportunity to tune write performance on Linux for a Web application I am developing.
My initial idea was to load WordNet RDF (http://semanticweb.cs.vu.nl/lod/wn30/) through the Blueprints SailGraph interface, but then I decided to use NLTK (http://www.nltk.org) and load it directly from Bulbs into Rexster. Stephen recently added batch transactions to Rexster (https://github.com/tinkerpop/rexster-kibbles/tree/master/batch-kibble), but right now I am not using them because I want to see what type of write performance you can get in non-batch mode. The Neo4j performance guides were helpful: * http://wiki.neo4j.org/content/Performance_Guide * http://wiki.neo4j.org/content/Linux_Performance_Guide * http://wiki.neo4j.org/content/Configuration_Settings As are Peter and Tobias' recommendations to put Neo4j transactions in manual mode (https://groups.google.com/d/msg/gremlin-users/vl4IZO7O8H4/20Yc4rUObNcJ) so you don't have to flush to disk for each write. However, manual/batch modes are not practical for writes in a Web application. It would be cool if there was a tunable parameter where you could set Neo4j to flush to disk at some interval instead of after every create/update statement. Obviously you would have an issue if the server crashed before it was written to disk, but this could be mitigated through HA redundancy, and because it's a tunable parameter, you could dial it up or down depending on your requirements. MongoDB does something similar, and it is reported that a single server can do 20-30,000 writes per second (http://www.dbms2.com/2011/04/04/the-mongodb-story/). Here some of the things Mongo does to make writes fast: * A memory-mapped data model. * Deferred writes — a write might take a couple of seconds to actually persist. * Optimism — you don’t have to wait for an acknowledgement if you write something to the database. * “Upsert in place” – update in place without checking whether you’re doing a write or insert. What would it take for Neo4j to approach these levels? Neo4j does memory-mapped IO: http://wiki.neo4j.org/content/Configuration_Settings#Memory_mapped_I.2FO_settings There have been talks about adding optimistic locking: http://neo4j.org/forums/#nabble-td2891798 And Peter has said that deferred writes are on the drawing board (http://lists.neo4j.org/pipermail/user/2011-May/008792.html): Peter Neubauer wrote: > > However, we are looking into Neo4j normal mode speedups by having a mode > that drops the JTA dependencies and thus can relax on the logfile flushing > requirements for each transaction, by that being able to use the > underlying > OS for ordered (deferred) writing, adjustable on a case-by-case level > (e.g. > batch inserting big data). This will give Neo4j insertions in this mode > comparable performance with the batchinserter, while keeping all other > semantics and layers in place. I hope this can make it into 1.4, and it > will > speed up the RDF insertion considerably! > Is support for optimistic locking and deferred writes planned for an upcoming release? Thanks. - James -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Write-Performance-tp3323638p3323638.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

