Re: [Neo4j] Neo4j Write Performance

2011-09-24 Thread Mattias Persson
For the record, that branch is outdated and not working correctly in HA
mode.

2011/9/12 Peter Neubauer pe...@neubauer.se

 James,
 we are experimenting with that feature, namely, not forcing a flush()
 at the end of a transaction and let the OS take care of the actual
 flushing. You potentially loose some last-transaction data, but the
 store is still going to recover and will not get corrupted.
 Mattias has been testing this in the ordered-writes branch at
 https://github.com/neo4j/community/tree/ordered-writes .This needs to
 be fleshed out to give access to these settings per transaction. I
 think it will not make it into 1.5 unless someone in the community
 steps up and puts in the effort to expose it. But feel free to try it
 out and give feedback on your findings!

 /peter

 On Fri, Sep 9, 2011 at 8:07 PM, espeed ja...@jamesthornton.com wrote:
  Hi Guys -
 
  I have been working on loading WordNet (http://wordnet.princeton.edu/)
 into
  Neo4j, and have been using it as an opportunity to tune write performance
 on
  Linux for a Web application I am developing.
 
  My initial idea was to load WordNet RDF
  (http://semanticweb.cs.vu.nl/lod/wn30/) through the Blueprints SailGraph
  interface, but then I decided to use NLTK (http://www.nltk.org) and load
 it
  directly from Bulbs into Rexster.
 
  Stephen recently added batch transactions to Rexster
  (https://github.com/tinkerpop/rexster-kibbles/tree/master/batch-kibble),
 but
  right now I am not using them because I want to see what type of write
  performance you can get in non-batch mode.
 
  The Neo4j performance guides were helpful:
 
  * http://wiki.neo4j.org/content/Performance_Guide
  * http://wiki.neo4j.org/content/Linux_Performance_Guide
  * http://wiki.neo4j.org/content/Configuration_Settings
 
  As are Peter and Tobias' recommendations to put Neo4j transactions in
 manual
  mode
  (https://groups.google.com/d/msg/gremlin-users/vl4IZO7O8H4/20Yc4rUObNcJ)
 so
  you don't have to flush to disk for each write.
 
  However, manual/batch modes are not practical for writes in a Web
  application. It would be cool if there was a tunable parameter where you
  could set Neo4j to flush to disk at some interval instead of after every
  create/update statement.
 
  Obviously you would have an issue if the server crashed before it was
  written to disk, but this could be mitigated through HA redundancy, and
  because it's a tunable parameter, you could dial it up or down depending
 on
  your requirements.
 
  MongoDB does something similar, and it is reported that a single server
 can
  do 20-30,000 writes per second
  (http://www.dbms2.com/2011/04/04/the-mongodb-story/).
 
  Here some of the things Mongo does to make writes fast:
 
  * A memory-mapped data model.
  * Deferred writes — a write might take a couple of seconds to actually
  persist.
  * Optimism — you don’t have to wait for an acknowledgement if you write
  something to the database.
  * “Upsert in place” – update in place without checking whether you’re
 doing
  a write or insert.
 
  What would it take for Neo4j to approach these levels?
 
  Neo4j does memory-mapped IO:
 
 
 
 http://wiki.neo4j.org/content/Configuration_Settings#Memory_mapped_I.2FO_settings
 
  There have been talks about adding optimistic locking:
 
   http://neo4j.org/forums/#nabble-td2891798
 
  And Peter has said that deferred writes are on the drawing board
  (http://lists.neo4j.org/pipermail/user/2011-May/008792.html):
 
 
  Peter Neubauer wrote:
 
  However, we are looking into Neo4j normal mode speedups by having a mode
  that drops the JTA dependencies and thus can relax on the logfile
 flushing
  requirements for each transaction, by that being able to use the
  underlying
  OS for ordered (deferred) writing, adjustable on a case-by-case level
  (e.g.
  batch inserting big data). This will give Neo4j insertions in this mode
  comparable performance with the batchinserter, while keeping all other
  semantics and layers in place. I hope this can make it into 1.4, and it
  will
  speed up the RDF insertion considerably!
 
 
  Is support for optimistic locking and deferred writes planned for an
  upcoming release?
 
  Thanks.
 
  - James
 
  --
  View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Write-Performance-tp3323638p3323638.html
  Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j Write Performance

2011-09-12 Thread Peter Neubauer
James,
we are experimenting with that feature, namely, not forcing a flush()
at the end of a transaction and let the OS take care of the actual
flushing. You potentially loose some last-transaction data, but the
store is still going to recover and will not get corrupted.
Mattias has been testing this in the ordered-writes branch at
https://github.com/neo4j/community/tree/ordered-writes .This needs to
be fleshed out to give access to these settings per transaction. I
think it will not make it into 1.5 unless someone in the community
steps up and puts in the effort to expose it. But feel free to try it
out and give feedback on your findings!

/peter

On Fri, Sep 9, 2011 at 8:07 PM, espeed ja...@jamesthornton.com wrote:
 Hi Guys -

 I have been working on loading WordNet (http://wordnet.princeton.edu/) into
 Neo4j, and have been using it as an opportunity to tune write performance on
 Linux for a Web application I am developing.

 My initial idea was to load WordNet RDF
 (http://semanticweb.cs.vu.nl/lod/wn30/) through the Blueprints SailGraph
 interface, but then I decided to use NLTK (http://www.nltk.org) and load it
 directly from Bulbs into Rexster.

 Stephen recently added batch transactions to Rexster
 (https://github.com/tinkerpop/rexster-kibbles/tree/master/batch-kibble), but
 right now I am not using them because I want to see what type of write
 performance you can get in non-batch mode.

 The Neo4j performance guides were helpful:

 * http://wiki.neo4j.org/content/Performance_Guide
 * http://wiki.neo4j.org/content/Linux_Performance_Guide
 * http://wiki.neo4j.org/content/Configuration_Settings

 As are Peter and Tobias' recommendations to put Neo4j transactions in manual
 mode
 (https://groups.google.com/d/msg/gremlin-users/vl4IZO7O8H4/20Yc4rUObNcJ) so
 you don't have to flush to disk for each write.

 However, manual/batch modes are not practical for writes in a Web
 application. It would be cool if there was a tunable parameter where you
 could set Neo4j to flush to disk at some interval instead of after every
 create/update statement.

 Obviously you would have an issue if the server crashed before it was
 written to disk, but this could be mitigated through HA redundancy, and
 because it's a tunable parameter, you could dial it up or down depending on
 your requirements.

 MongoDB does something similar, and it is reported that a single server can
 do 20-30,000 writes per second
 (http://www.dbms2.com/2011/04/04/the-mongodb-story/).

 Here some of the things Mongo does to make writes fast:

 * A memory-mapped data model.
 * Deferred writes — a write might take a couple of seconds to actually
 persist.
 * Optimism — you don’t have to wait for an acknowledgement if you write
 something to the database.
 * “Upsert in place” – update in place without checking whether you’re doing
 a write or insert.

 What would it take for Neo4j to approach these levels?

 Neo4j does memory-mapped IO:


 http://wiki.neo4j.org/content/Configuration_Settings#Memory_mapped_I.2FO_settings

 There have been talks about adding optimistic locking:

  http://neo4j.org/forums/#nabble-td2891798

 And Peter has said that deferred writes are on the drawing board
 (http://lists.neo4j.org/pipermail/user/2011-May/008792.html):


 Peter Neubauer wrote:

 However, we are looking into Neo4j normal mode speedups by having a mode
 that drops the JTA dependencies and thus can relax on the logfile flushing
 requirements for each transaction, by that being able to use the
 underlying
 OS for ordered (deferred) writing, adjustable on a case-by-case level
 (e.g.
 batch inserting big data). This will give Neo4j insertions in this mode
 comparable performance with the batchinserter, while keeping all other
 semantics and layers in place. I hope this can make it into 1.4, and it
 will
 speed up the RDF insertion considerably!


 Is support for optimistic locking and deferred writes planned for an
 upcoming release?

 Thanks.

 - James

 --
 View this message in context: 
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Write-Performance-tp3323638p3323638.html
 Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j Write Performance

2011-09-11 Thread espeed
I added a ticket for this here...

https://github.com/neo4j/community/issues/18

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Write-Performance-tp3323638p3327618.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Neo4j Write Performance

2011-09-09 Thread espeed
Hi Guys -

I have been working on loading WordNet (http://wordnet.princeton.edu/) into
Neo4j, and have been using it as an opportunity to tune write performance on
Linux for a Web application I am developing. 

My initial idea was to load WordNet RDF
(http://semanticweb.cs.vu.nl/lod/wn30/) through the Blueprints SailGraph
interface, but then I decided to use NLTK (http://www.nltk.org) and load it
directly from Bulbs into Rexster.

Stephen recently added batch transactions to Rexster
(https://github.com/tinkerpop/rexster-kibbles/tree/master/batch-kibble), but
right now I am not using them because I want to see what type of write
performance you can get in non-batch mode.

The Neo4j performance guides were helpful:

* http://wiki.neo4j.org/content/Performance_Guide
* http://wiki.neo4j.org/content/Linux_Performance_Guide
* http://wiki.neo4j.org/content/Configuration_Settings

As are Peter and Tobias' recommendations to put Neo4j transactions in manual
mode
(https://groups.google.com/d/msg/gremlin-users/vl4IZO7O8H4/20Yc4rUObNcJ) so
you don't have to flush to disk for each write.  

However, manual/batch modes are not practical for writes in a Web
application. It would be cool if there was a tunable parameter where you
could set Neo4j to flush to disk at some interval instead of after every
create/update statement. 

Obviously you would have an issue if the server crashed before it was
written to disk, but this could be mitigated through HA redundancy, and
because it's a tunable parameter, you could dial it up or down depending on
your requirements. 

MongoDB does something similar, and it is reported that a single server can
do 20-30,000 writes per second
(http://www.dbms2.com/2011/04/04/the-mongodb-story/).

Here some of the things Mongo does to make writes fast:

* A memory-mapped data model.
* Deferred writes — a write might take a couple of seconds to actually
persist.
* Optimism — you don’t have to wait for an acknowledgement if you write
something to the database.
* “Upsert in place” – update in place without checking whether you’re doing
a write or insert.

What would it take for Neo4j to approach these levels?

Neo4j does memory-mapped IO:

 
http://wiki.neo4j.org/content/Configuration_Settings#Memory_mapped_I.2FO_settings

There have been talks about adding optimistic locking:

  http://neo4j.org/forums/#nabble-td2891798

And Peter has said that deferred writes are on the drawing board
(http://lists.neo4j.org/pipermail/user/2011-May/008792.html):


Peter Neubauer wrote:
 
 However, we are looking into Neo4j normal mode speedups by having a mode
 that drops the JTA dependencies and thus can relax on the logfile flushing
 requirements for each transaction, by that being able to use the
 underlying
 OS for ordered (deferred) writing, adjustable on a case-by-case level
 (e.g.
 batch inserting big data). This will give Neo4j insertions in this mode
 comparable performance with the batchinserter, while keeping all other
 semantics and layers in place. I hope this can make it into 1.4, and it
 will
 speed up the RDF insertion considerably!
 

Is support for optimistic locking and deferred writes planned for an
upcoming release?

Thanks.

- James

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Write-Performance-tp3323638p3323638.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user