Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
Hi all, Accessing a remote database for read-centric purposes should be done through HA. Even if we could bind a read-only local instance of EGD to a data store on disk, the caching will become out of sync with respect to the on-disk store. HA avoids this because it's a proper protocol for synchronising databases. Jim On 2 Aug 2011, at 21:01, Utility Mail wrote: I agree! In my opinion a remote access to a live instance of a GD is really to be hoped. Let me explain my current test case with neo4j: I created an instance of an EmbeddedGraphDatabase that ingests continously csv files coming froma a polling service. At the same time I need to create an indipendent service indipentent from the first one that query (to retrive and not to modify) the GD. I'm tried with EROGD but the active index segment has become corrupted! Even if EGD is thread safe and I can create multiple thread sharing the same instance of GD what to do when, like in my case, I need to have indipendent service (app) accessing at the same time to the EGD? Paolo Forte p.s. I'm not sure is correlated and for sure is a lack of my knowledge of webadmin, but how can I control my EGD status (number of nodes,edges, etc.) via webadmin while it is ingesting new data? Il giorno 01/ago/2011, alle ore 20:52, Tobias Ivarsson tobias.ivars...@neotechnology.com ha scritto: I think a bit of elaboration might be in order. EmbeddedReadOnlyGraphDatabase was created for one specific purpose: Being able to interactively introspect a graph without having to shut down the application that uses it. Specifically the tools that we wanted to support with this were the Neo4j shell and Neoclipse. EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching is done internally in Neo4j (one issue with each cache): - When the EROGD reads data from the file system it will, like a normal EGD, cache the node and relationship objects. If a normal EGD modifies the graph under the feet of the EROGD, there is no way for the EROGD to know that the data in cache is now stale, which will lead to an inconsistent view of the graph. If for example the EROGD has cached Node[15] with the information that it is connected to some other node through Relationship[344], and Relationship[344] is deleted you will get InvalidRecordException (as you described). And of course if relationships are added to Node[15] these will not be seen at all by the EROGD (until Node[15] is evicted from the cache due to not being used for a while). - Neo4j also caches data on the filesystem level by memory mapping (mmap) hot regions of the store files. Writes to these regions will not be flushed to the actual file until the mmapped window is evicted due to being less hot than other windows, or when the transaction log for Neo4j is rotated. This means that from the p.o.v. of the EROGD the actual data written to disk will look inconsistent. Which would also lead to InvalidRecordExcaption. This situation is actually made even more complicated by the fact that unix operating systems will attempt to share memory mapped data from the same file between multiple processes, but the normal EGD and the EROGD will not make the same decisions on which regions to mmap, they might not even decide on the same size for mmap windows. We haven't tested how well different operating systems deal with reading data that was written to an mmap region through non-mmap syscalls from a different process, most likely this varies from OS to OS. The second of these problems is of course the worst, since it cannot be worked around. The first one can be mitigated by configuring Neo4j to not use the object cache, by passing the cache_type=none parameter to the constructor of the EROGD. This should really be made default for EROGD, unless we decide to completely remove EROGD. I hope that sheds some light on the reasons why you experience these problems with EmbeddedReadOnlyGraphDatabase, and what the intention of creating it was. As a side note I can mention that I had a different idea for how to solve the introspection-of-live-graph problem at the time EmbeddedReadOnlyGraphDatabase was created: Create network based implementation of the GraphDatabaseService API and connect directly to the running instance. This would completely avoid the cache staleness problem, but at the cost of network overhead for each graph operation, which is probably fine for tooling purposes. With the JVM agent attach protocol it would be possible to inject such a server into a running graph database that wasn't originally configured for it. I in fact implemented this as the RemoteGraphDatabase subproject. Since my colleagues did not share my vision about that idea, this project didn't receive much attention after its initial inception. It was also never really used for these purposes, but rather misused for
Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
Thank you all very much for your answers and ideas around this, especially for the detailed elaboration of Tobias. In the mailing list I sometimes see the recommendation of EROGD, f.x. a few days ago in another thread around multiple processes, so I think it would be good to simultaneously mention that it is planned to become deprecated. Regards, Mathias -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Synchronization-of-EmbeddedReadOnlyGraphDatabase-Bug-tp3174626p3219409.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
I agree! In my opinion a remote access to a live instance of a GD is really to be hoped. Let me explain my current test case with neo4j: I created an instance of an EmbeddedGraphDatabase that ingests continously csv files coming froma a polling service. At the same time I need to create an indipendent service indipentent from the first one that query (to retrive and not to modify) the GD. I'm tried with EROGD but the active index segment has become corrupted! Even if EGD is thread safe and I can create multiple thread sharing the same instance of GD what to do when, like in my case, I need to have indipendent service (app) accessing at the same time to the EGD? Paolo Forte p.s. I'm not sure is correlated and for sure is a lack of my knowledge of webadmin, but how can I control my EGD status (number of nodes,edges, etc.) via webadmin while it is ingesting new data? Il giorno 01/ago/2011, alle ore 20:52, Tobias Ivarsson tobias.ivars...@neotechnology.com ha scritto: I think a bit of elaboration might be in order. EmbeddedReadOnlyGraphDatabase was created for one specific purpose: Being able to interactively introspect a graph without having to shut down the application that uses it. Specifically the tools that we wanted to support with this were the Neo4j shell and Neoclipse. EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching is done internally in Neo4j (one issue with each cache): - When the EROGD reads data from the file system it will, like a normal EGD, cache the node and relationship objects. If a normal EGD modifies the graph under the feet of the EROGD, there is no way for the EROGD to know that the data in cache is now stale, which will lead to an inconsistent view of the graph. If for example the EROGD has cached Node[15] with the information that it is connected to some other node through Relationship[344], and Relationship[344] is deleted you will get InvalidRecordException (as you described). And of course if relationships are added to Node[15] these will not be seen at all by the EROGD (until Node[15] is evicted from the cache due to not being used for a while). - Neo4j also caches data on the filesystem level by memory mapping (mmap) hot regions of the store files. Writes to these regions will not be flushed to the actual file until the mmapped window is evicted due to being less hot than other windows, or when the transaction log for Neo4j is rotated. This means that from the p.o.v. of the EROGD the actual data written to disk will look inconsistent. Which would also lead to InvalidRecordExcaption. This situation is actually made even more complicated by the fact that unix operating systems will attempt to share memory mapped data from the same file between multiple processes, but the normal EGD and the EROGD will not make the same decisions on which regions to mmap, they might not even decide on the same size for mmap windows. We haven't tested how well different operating systems deal with reading data that was written to an mmap region through non-mmap syscalls from a different process, most likely this varies from OS to OS. The second of these problems is of course the worst, since it cannot be worked around. The first one can be mitigated by configuring Neo4j to not use the object cache, by passing the cache_type=none parameter to the constructor of the EROGD. This should really be made default for EROGD, unless we decide to completely remove EROGD. I hope that sheds some light on the reasons why you experience these problems with EmbeddedReadOnlyGraphDatabase, and what the intention of creating it was. As a side note I can mention that I had a different idea for how to solve the introspection-of-live-graph problem at the time EmbeddedReadOnlyGraphDatabase was created: Create network based implementation of the GraphDatabaseService API and connect directly to the running instance. This would completely avoid the cache staleness problem, but at the cost of network overhead for each graph operation, which is probably fine for tooling purposes. With the JVM agent attach protocol it would be possible to inject such a server into a running graph database that wasn't originally configured for it. I in fact implemented this as the RemoteGraphDatabase subproject. Since my colleagues did not share my vision about that idea, this project didn't receive much attention after its initial inception. It was also never really used for these purposes, but rather misused for building applications, leading us to deprecate the project. When we then later discovered a severe bug in the implementation of the remote transaction handling logic, we completely removed the project. I still believe this to be a superior model for tools, but would build it differently if I were to build it today. -tobias On Mon, Aug 1, 2011 at 4:48 PM, Jim Webber
Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
Hi Mathias, EmbeddedReadOnlyGraphDatabase is not quite what it seems, and I think should be deprecated/removed. The correct way for database instances to become consistent is through the HA protocol. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
I think a bit of elaboration might be in order. EmbeddedReadOnlyGraphDatabase was created for one specific purpose: Being able to interactively introspect a graph without having to shut down the application that uses it. Specifically the tools that we wanted to support with this were the Neo4j shell and Neoclipse. EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching is done internally in Neo4j (one issue with each cache): - When the EROGD reads data from the file system it will, like a normal EGD, cache the node and relationship objects. If a normal EGD modifies the graph under the feet of the EROGD, there is no way for the EROGD to know that the data in cache is now stale, which will lead to an inconsistent view of the graph. If for example the EROGD has cached Node[15] with the information that it is connected to some other node through Relationship[344], and Relationship[344] is deleted you will get InvalidRecordException (as you described). And of course if relationships are added to Node[15] these will not be seen at all by the EROGD (until Node[15] is evicted from the cache due to not being used for a while). - Neo4j also caches data on the filesystem level by memory mapping (mmap) hot regions of the store files. Writes to these regions will not be flushed to the actual file until the mmapped window is evicted due to being less hot than other windows, or when the transaction log for Neo4j is rotated. This means that from the p.o.v. of the EROGD the actual data written to disk will look inconsistent. Which would also lead to InvalidRecordExcaption. This situation is actually made even more complicated by the fact that unix operating systems will attempt to share memory mapped data from the same file between multiple processes, but the normal EGD and the EROGD will not make the same decisions on which regions to mmap, they might not even decide on the same size for mmap windows. We haven't tested how well different operating systems deal with reading data that was written to an mmap region through non-mmap syscalls from a different process, most likely this varies from OS to OS. The second of these problems is of course the worst, since it cannot be worked around. The first one can be mitigated by configuring Neo4j to not use the object cache, by passing the cache_type=none parameter to the constructor of the EROGD. This should really be made default for EROGD, unless we decide to completely remove EROGD. I hope that sheds some light on the reasons why you experience these problems with EmbeddedReadOnlyGraphDatabase, and what the intention of creating it was. As a side note I can mention that I had a different idea for how to solve the introspection-of-live-graph problem at the time EmbeddedReadOnlyGraphDatabase was created: Create network based implementation of the GraphDatabaseService API and connect directly to the running instance. This would completely avoid the cache staleness problem, but at the cost of network overhead for each graph operation, which is probably fine for tooling purposes. With the JVM agent attach protocol it would be possible to inject such a server into a running graph database that wasn't originally configured for it. I in fact implemented this as the RemoteGraphDatabase subproject. Since my colleagues did not share my vision about that idea, this project didn't receive much attention after its initial inception. It was also never really used for these purposes, but rather misused for building applications, leading us to deprecate the project. When we then later discovered a severe bug in the implementation of the remote transaction handling logic, we completely removed the project. I still believe this to be a superior model for tools, but would build it differently if I were to build it today. -tobias On Mon, Aug 1, 2011 at 4:48 PM, Jim Webber j...@neotechnology.com wrote: Hi Mathias, EmbeddedReadOnlyGraphDatabase is not quite what it seems, and I think should be deprecated/removed. The correct way for database instances to become consistent is through the HA protocol. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
FWIW, I really like the idea of a remote API. Not only for tooling, but as an alternative to bulkier and more abstracted REST APIs. One way to think of it could be a high performance, binary-formatted REST API - if the overhead of HTTP isn't too much (and I seriously doubt that it would be - I implemented a solution at my previous company, Lighthammer, using HTTP + a binary stream protocol and it was considerably faster than native binary protocols for databases and other similar scenarios in most if not all use cases). This API would allow you to view a running instance from the P.O.V. as if you were another thread accessing the same embedded instance. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Tobias Ivarsson Sent: Monday, August 01, 2011 2:53 PM To: Neo4j user discussions Subject: Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug? I think a bit of elaboration might be in order. EmbeddedReadOnlyGraphDatabase was created for one specific purpose: Being able to interactively introspect a graph without having to shut down the application that uses it. Specifically the tools that we wanted to support with this were the Neo4j shell and Neoclipse. EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching is done internally in Neo4j (one issue with each cache): - When the EROGD reads data from the file system it will, like a normal EGD, cache the node and relationship objects. If a normal EGD modifies the graph under the feet of the EROGD, there is no way for the EROGD to know that the data in cache is now stale, which will lead to an inconsistent view of the graph. If for example the EROGD has cached Node[15] with the information that it is connected to some other node through Relationship[344], and Relationship[344] is deleted you will get InvalidRecordException (as you described). And of course if relationships are added to Node[15] these will not be seen at all by the EROGD (until Node[15] is evicted from the cache due to not being used for a while). - Neo4j also caches data on the filesystem level by memory mapping (mmap) hot regions of the store files. Writes to these regions will not be flushed to the actual file until the mmapped window is evicted due to being less hot than other windows, or when the transaction log for Neo4j is rotated. This means that from the p.o.v. of the EROGD the actual data written to disk will look inconsistent. Which would also lead to InvalidRecordExcaption. This situation is actually made even more complicated by the fact that unix operating systems will attempt to share memory mapped data from the same file between multiple processes, but the normal EGD and the EROGD will not make the same decisions on which regions to mmap, they might not even decide on the same size for mmap windows. We haven't tested how well different operating systems deal with reading data that was written to an mmap region through non-mmap syscalls from a different process, most likely this varies from OS to OS. The second of these problems is of course the worst, since it cannot be worked around. The first one can be mitigated by configuring Neo4j to not use the object cache, by passing the cache_type=none parameter to the constructor of the EROGD. This should really be made default for EROGD, unless we decide to completely remove EROGD. I hope that sheds some light on the reasons why you experience these problems with EmbeddedReadOnlyGraphDatabase, and what the intention of creating it was. As a side note I can mention that I had a different idea for how to solve the introspection-of-live-graph problem at the time EmbeddedReadOnlyGraphDatabase was created: Create network based implementation of the GraphDatabaseService API and connect directly to the running instance. This would completely avoid the cache staleness problem, but at the cost of network overhead for each graph operation, which is probably fine for tooling purposes. With the JVM agent attach protocol it would be possible to inject such a server into a running graph database that wasn't originally configured for it. I in fact implemented this as the RemoteGraphDatabase subproject. Since my colleagues did not share my vision about that idea, this project didn't receive much attention after its initial inception. It was also never really used for these purposes, but rather misused for building applications, leading us to deprecate the project. When we then later discovered a severe bug in the implementation of the remote transaction handling logic, we completely removed the project. I still believe this to be a superior model for tools, but would build it differently if I were to build it today. -tobias On Mon, Aug 1, 2011 at 4:48 PM, Jim Webber j...@neotechnology.com wrote: Hi Mathias, EmbeddedReadOnlyGraphDatabase is not quite what
Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
Hello Neo4J-Team, does anyone looked into this issue? Any insights? Thanks again, Mathias -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Synchronization-of-EmbeddedReadOnlyGraphDatabase-Bug-tp3174626p3213450.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
Hello Jim, thanks for your reply. Yes, this is absolutely correct. Mathias Message: 5 Date: Sat, 16 Jul 2011 08:46:57 -0600 From: Jim Webber j...@neotechnology.com Subject: Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug? To: Neo4j user discussions user@lists.neo4j.org Message-ID: 902bc63b-d69b-4b97-85cb-b52e88dbd...@neotechnology.com Content-Type: text/plain; charset=us-ascii Hi Mattias, If I understand you correctly, you're pointing two database instances (one being read-only) at the same on-disk location. Is that correct? Jim On 16 Jul 2011, at 07:37, Mathias Hensel wrote: Hello, I try to use Neo4J in a Ruby on Rails application (MRI Ruby not JRuby). Due to Rails process-based model I run one instance of the EmbeddedGraphDatabase in a separate process. All write updates coming in from user actions are delegated to this process. The EmbeddedGraphDatabase here serves as a pure writeable database. All reads are handled directly in the web app through multiple instances of EmbeddedReadOnlyGraphDatabase (one instance for each web server process). Unfortunately I encountered the following problem: When adding a new relationship to a node (via the EmbeddedGraphDatabase), this relationship is not visible to the EmbeddedReadOnlyGraphDatabase. I can reopen the EmbeddedReadOnlyGraphDatabase from time to time or even at each request, but this ends up with an InvalidRecordException: Record[9180] not in use when trying to traverse the node or trying to get the relationships. 9180 is the newly created relationship. Only when I restart the EmbeddedGraphDatabase this relationship is visible to the EmbeddedReadOnlyGraphDatabase without any exceptions but this shouldn't be the use case. Is this a bug or is there an explicit way to synchronize both types of database instances? Thank you very much! Regards, Mathias ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
Hello, I try to use Neo4J in a Ruby on Rails application (MRI Ruby not JRuby). Due to Rails process-based model I run one instance of the EmbeddedGraphDatabase in a separate process. All write updates coming in from user actions are delegated to this process. The EmbeddedGraphDatabase here serves as a pure writeable database. All reads are handled directly in the web app through multiple instances of EmbeddedReadOnlyGraphDatabase (one instance for each web server process). Unfortunately I encountered the following problem: When adding a new relationship to a node (via the EmbeddedGraphDatabase), this relationship is not visible to the EmbeddedReadOnlyGraphDatabase. I can reopen the EmbeddedReadOnlyGraphDatabase from time to time or even at each request, but this ends up with an InvalidRecordException: Record[9180] not in use when trying to traverse the node or trying to get the relationships. 9180 is the newly created relationship. Only when I restart the EmbeddedGraphDatabase this relationship is visible to the EmbeddedReadOnlyGraphDatabase without any exceptions but this shouldn't be the use case. Is this a bug or is there an explicit way to synchronize both types of database instances? Thank you very much! Regards, Mathias ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?
Hi Mattias, If I understand you correctly, you're pointing two database instances (one being read-only) at the same on-disk location. Is that correct? Jim On 16 Jul 2011, at 07:37, Mathias Hensel wrote: Hello, I try to use Neo4J in a Ruby on Rails application (MRI Ruby not JRuby). Due to Rails process-based model I run one instance of the EmbeddedGraphDatabase in a separate process. All write updates coming in from user actions are delegated to this process. The EmbeddedGraphDatabase here serves as a pure writeable database. All reads are handled directly in the web app through multiple instances of EmbeddedReadOnlyGraphDatabase (one instance for each web server process). Unfortunately I encountered the following problem: When adding a new relationship to a node (via the EmbeddedGraphDatabase), this relationship is not visible to the EmbeddedReadOnlyGraphDatabase. I can reopen the EmbeddedReadOnlyGraphDatabase from time to time or even at each request, but this ends up with an InvalidRecordException: Record[9180] not in use when trying to traverse the node or trying to get the relationships. 9180 is the newly created relationship. Only when I restart the EmbeddedGraphDatabase this relationship is visible to the EmbeddedReadOnlyGraphDatabase without any exceptions but this shouldn't be the use case. Is this a bug or is there an explicit way to synchronize both types of database instances? Thank you very much! Regards, Mathias ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user