Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-08-03 Thread Jim Webber
Hi all,

Accessing a remote database for read-centric purposes should be done through 
HA. Even if we could bind a read-only local instance of EGD to a data store on 
disk, the caching will become out of sync with respect to the on-disk store. 

HA avoids this because it's a proper protocol for synchronising databases.

Jim


On 2 Aug 2011, at 21:01, Utility Mail wrote:

 I agree!
 In my opinion a remote access to a live instance of a GD is really to be 
 hoped. Let me explain my current test case with neo4j: I created an instance 
 of an EmbeddedGraphDatabase that ingests continously csv files coming froma a 
 polling service. At the same time I need to create an indipendent service 
 indipentent from the first one that query (to retrive and not to modify) the 
 GD. I'm tried with EROGD but the active index segment has become corrupted! 
 Even if EGD is thread safe and I can create multiple thread sharing the same 
 instance of GD what to do when, like in my case, I need to have indipendent 
 service (app) accessing at the same time to the EGD?
 
 
 Paolo Forte
 
 p.s.
 I'm not sure is correlated and for sure is a lack of my knowledge of 
 webadmin, but how can I control my EGD status (number of nodes,edges, etc.) 
 via webadmin while it is ingesting new data?
 
 
 
 Il giorno 01/ago/2011, alle ore 20:52, Tobias Ivarsson 
 tobias.ivars...@neotechnology.com ha scritto:
 
 I think a bit of elaboration might be in order.
 
 EmbeddedReadOnlyGraphDatabase was created for one specific purpose:
 
 Being able to interactively introspect a graph without having to shut down
 the application that uses it.
 
 Specifically the tools that we wanted to support with this were the Neo4j
 shell and Neoclipse.
 
 EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching
 is done internally in Neo4j (one issue with each cache):
 
  - When the EROGD reads data from the file system it will, like a normal
  EGD, cache the node and relationship objects. If a normal EGD modifies the
  graph under the feet of the EROGD, there is no way for the EROGD to know
  that the data in cache is now stale, which will lead to an inconsistent view
  of the graph. If for example the EROGD has cached Node[15] with the
  information that it is connected to some other node through
  Relationship[344], and Relationship[344] is deleted you will get
  InvalidRecordException (as you described). And of course if relationships
  are added to Node[15] these will not be seen at all by the EROGD (until
  Node[15] is evicted from the cache due to not being used for a while).
  - Neo4j also caches data on the filesystem level by memory mapping (mmap)
  hot regions of the store files. Writes to these regions will not be flushed
  to the actual file until the mmapped window is evicted due to being less hot
  than other windows, or when the transaction log for Neo4j is rotated. This
  means that from the p.o.v. of the EROGD the actual data written to disk will
  look inconsistent. Which would also lead to InvalidRecordExcaption. This
  situation is actually made even more complicated by the fact that unix
  operating systems will attempt to share memory mapped data from the same
  file between multiple processes, but the normal EGD and the EROGD will not
  make the same decisions on which regions to mmap, they might not even decide
  on the same size for mmap windows. We haven't tested how well different
  operating systems deal with reading data that was written to an mmap region
  through non-mmap syscalls from a different process, most likely this varies
  from OS to OS.
 
 The second of these problems is of course the worst, since it cannot be
 worked around. The first one can be mitigated by configuring Neo4j to not
 use the object cache, by passing the cache_type=none parameter to the
 constructor of the EROGD. This should really be made default for EROGD,
 unless we decide to completely remove EROGD.
 
 I hope that sheds some light on the reasons why you experience these
 problems with EmbeddedReadOnlyGraphDatabase, and what the intention of
 creating it was.
 
 As a side note I can mention that I had a different idea for how to solve
 the introspection-of-live-graph problem at the time
 EmbeddedReadOnlyGraphDatabase was created: Create network based
 implementation of the GraphDatabaseService API and connect directly to the
 running instance. This would completely avoid the cache staleness problem,
 but at the cost of network overhead for each graph operation, which is
 probably fine for tooling purposes. With the JVM agent attach protocol it
 would be possible to inject such a server into a running graph database that
 wasn't originally configured for it. I in fact implemented this as the
 RemoteGraphDatabase subproject.
 Since my colleagues did not share my vision about that idea, this project
 didn't receive much attention after its initial inception. It was also never
 really used for these purposes, but rather misused for 

Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-08-02 Thread Mathias Hensel
Thank you all very much for your answers and ideas around this, especially
for the detailed elaboration of Tobias. In the mailing list I sometimes see
the recommendation of EROGD, f.x. a few days ago in another thread around
multiple processes, so I think it would be good to simultaneously mention
that it is planned to become deprecated.

Regards,
Mathias

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Synchronization-of-EmbeddedReadOnlyGraphDatabase-Bug-tp3174626p3219409.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-08-02 Thread Utility Mail
I agree!
In my opinion a remote access to a live instance of a GD is really to be hoped. 
Let me explain my current test case with neo4j: I created an instance of an 
EmbeddedGraphDatabase that ingests continously csv files coming froma a polling 
service. At the same time I need to create an indipendent service indipentent 
from the first one that query (to retrive and not to modify) the GD. I'm tried 
with EROGD but the active index segment has become corrupted! Even if EGD is 
thread safe and I can create multiple thread sharing the same instance of GD 
what to do when, like in my case, I need to have indipendent service (app) 
accessing at the same time to the EGD?


Paolo Forte

p.s.
I'm not sure is correlated and for sure is a lack of my knowledge of webadmin, 
but how can I control my EGD status (number of nodes,edges, etc.) via webadmin 
while it is ingesting new data?



Il giorno 01/ago/2011, alle ore 20:52, Tobias Ivarsson 
tobias.ivars...@neotechnology.com ha scritto:

 I think a bit of elaboration might be in order.
 
 EmbeddedReadOnlyGraphDatabase was created for one specific purpose:
 
 Being able to interactively introspect a graph without having to shut down
 the application that uses it.
 
 Specifically the tools that we wanted to support with this were the Neo4j
 shell and Neoclipse.
 
 EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching
 is done internally in Neo4j (one issue with each cache):
 
   - When the EROGD reads data from the file system it will, like a normal
   EGD, cache the node and relationship objects. If a normal EGD modifies the
   graph under the feet of the EROGD, there is no way for the EROGD to know
   that the data in cache is now stale, which will lead to an inconsistent view
   of the graph. If for example the EROGD has cached Node[15] with the
   information that it is connected to some other node through
   Relationship[344], and Relationship[344] is deleted you will get
   InvalidRecordException (as you described). And of course if relationships
   are added to Node[15] these will not be seen at all by the EROGD (until
   Node[15] is evicted from the cache due to not being used for a while).
   - Neo4j also caches data on the filesystem level by memory mapping (mmap)
   hot regions of the store files. Writes to these regions will not be flushed
   to the actual file until the mmapped window is evicted due to being less hot
   than other windows, or when the transaction log for Neo4j is rotated. This
   means that from the p.o.v. of the EROGD the actual data written to disk will
   look inconsistent. Which would also lead to InvalidRecordExcaption. This
   situation is actually made even more complicated by the fact that unix
   operating systems will attempt to share memory mapped data from the same
   file between multiple processes, but the normal EGD and the EROGD will not
   make the same decisions on which regions to mmap, they might not even decide
   on the same size for mmap windows. We haven't tested how well different
   operating systems deal with reading data that was written to an mmap region
   through non-mmap syscalls from a different process, most likely this varies
   from OS to OS.
 
 The second of these problems is of course the worst, since it cannot be
 worked around. The first one can be mitigated by configuring Neo4j to not
 use the object cache, by passing the cache_type=none parameter to the
 constructor of the EROGD. This should really be made default for EROGD,
 unless we decide to completely remove EROGD.
 
 I hope that sheds some light on the reasons why you experience these
 problems with EmbeddedReadOnlyGraphDatabase, and what the intention of
 creating it was.
 
 As a side note I can mention that I had a different idea for how to solve
 the introspection-of-live-graph problem at the time
 EmbeddedReadOnlyGraphDatabase was created: Create network based
 implementation of the GraphDatabaseService API and connect directly to the
 running instance. This would completely avoid the cache staleness problem,
 but at the cost of network overhead for each graph operation, which is
 probably fine for tooling purposes. With the JVM agent attach protocol it
 would be possible to inject such a server into a running graph database that
 wasn't originally configured for it. I in fact implemented this as the
 RemoteGraphDatabase subproject.
 Since my colleagues did not share my vision about that idea, this project
 didn't receive much attention after its initial inception. It was also never
 really used for these purposes, but rather misused for building
 applications, leading us to deprecate the project. When we then later
 discovered a severe bug in the implementation of the remote transaction
 handling logic, we completely removed the project.
 I still believe this to be a superior model for tools, but would build it
 differently if I were to build it today.
 
 -tobias
 
 On Mon, Aug 1, 2011 at 4:48 PM, Jim Webber 

Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-08-01 Thread Jim Webber
Hi Mathias,

EmbeddedReadOnlyGraphDatabase is not quite what it seems, and I think should be 
deprecated/removed. The correct way for database instances to become consistent 
is through the HA protocol.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-08-01 Thread Tobias Ivarsson
I think a bit of elaboration might be in order.

EmbeddedReadOnlyGraphDatabase was created for one specific purpose:

Being able to interactively introspect a graph without having to shut down
the application that uses it.

Specifically the tools that we wanted to support with this were the Neo4j
shell and Neoclipse.

EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching
is done internally in Neo4j (one issue with each cache):

   - When the EROGD reads data from the file system it will, like a normal
   EGD, cache the node and relationship objects. If a normal EGD modifies the
   graph under the feet of the EROGD, there is no way for the EROGD to know
   that the data in cache is now stale, which will lead to an inconsistent view
   of the graph. If for example the EROGD has cached Node[15] with the
   information that it is connected to some other node through
   Relationship[344], and Relationship[344] is deleted you will get
   InvalidRecordException (as you described). And of course if relationships
   are added to Node[15] these will not be seen at all by the EROGD (until
   Node[15] is evicted from the cache due to not being used for a while).
   - Neo4j also caches data on the filesystem level by memory mapping (mmap)
   hot regions of the store files. Writes to these regions will not be flushed
   to the actual file until the mmapped window is evicted due to being less hot
   than other windows, or when the transaction log for Neo4j is rotated. This
   means that from the p.o.v. of the EROGD the actual data written to disk will
   look inconsistent. Which would also lead to InvalidRecordExcaption. This
   situation is actually made even more complicated by the fact that unix
   operating systems will attempt to share memory mapped data from the same
   file between multiple processes, but the normal EGD and the EROGD will not
   make the same decisions on which regions to mmap, they might not even decide
   on the same size for mmap windows. We haven't tested how well different
   operating systems deal with reading data that was written to an mmap region
   through non-mmap syscalls from a different process, most likely this varies
   from OS to OS.

The second of these problems is of course the worst, since it cannot be
worked around. The first one can be mitigated by configuring Neo4j to not
use the object cache, by passing the cache_type=none parameter to the
constructor of the EROGD. This should really be made default for EROGD,
unless we decide to completely remove EROGD.

I hope that sheds some light on the reasons why you experience these
problems with EmbeddedReadOnlyGraphDatabase, and what the intention of
creating it was.

As a side note I can mention that I had a different idea for how to solve
the introspection-of-live-graph problem at the time
EmbeddedReadOnlyGraphDatabase was created: Create network based
implementation of the GraphDatabaseService API and connect directly to the
running instance. This would completely avoid the cache staleness problem,
but at the cost of network overhead for each graph operation, which is
probably fine for tooling purposes. With the JVM agent attach protocol it
would be possible to inject such a server into a running graph database that
wasn't originally configured for it. I in fact implemented this as the
RemoteGraphDatabase subproject.
Since my colleagues did not share my vision about that idea, this project
didn't receive much attention after its initial inception. It was also never
really used for these purposes, but rather misused for building
applications, leading us to deprecate the project. When we then later
discovered a severe bug in the implementation of the remote transaction
handling logic, we completely removed the project.
I still believe this to be a superior model for tools, but would build it
differently if I were to build it today.

-tobias

On Mon, Aug 1, 2011 at 4:48 PM, Jim Webber j...@neotechnology.com wrote:

 Hi Mathias,

 EmbeddedReadOnlyGraphDatabase is not quite what it seems, and I think
 should be deprecated/removed. The correct way for database instances to
 become consistent is through the HA protocol.

 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Tobias Ivarsson tobias.ivars...@neotechnology.com
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-08-01 Thread Rick Bullotta
FWIW, I really like the idea of a remote API.  Not only for tooling, but as an 
alternative to bulkier and more abstracted REST APIs.  One way to think of it 
could be a high performance, binary-formatted REST API - if the overhead of 
HTTP isn't too much (and I seriously doubt that it would be - I implemented a 
solution at my previous company, Lighthammer, using HTTP + a binary stream 
protocol and it was considerably faster than native binary protocols for 
databases and other similar scenarios in most if not all use cases).  This API 
would allow you to view a running instance from the P.O.V. as if you were 
another thread accessing the same embedded instance.


-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Tobias Ivarsson
Sent: Monday, August 01, 2011 2:53 PM
To: Neo4j user discussions
Subject: Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

I think a bit of elaboration might be in order.

EmbeddedReadOnlyGraphDatabase was created for one specific purpose:

Being able to interactively introspect a graph without having to shut down
the application that uses it.

Specifically the tools that we wanted to support with this were the Neo4j
shell and Neoclipse.

EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching
is done internally in Neo4j (one issue with each cache):

   - When the EROGD reads data from the file system it will, like a normal
   EGD, cache the node and relationship objects. If a normal EGD modifies the
   graph under the feet of the EROGD, there is no way for the EROGD to know
   that the data in cache is now stale, which will lead to an inconsistent view
   of the graph. If for example the EROGD has cached Node[15] with the
   information that it is connected to some other node through
   Relationship[344], and Relationship[344] is deleted you will get
   InvalidRecordException (as you described). And of course if relationships
   are added to Node[15] these will not be seen at all by the EROGD (until
   Node[15] is evicted from the cache due to not being used for a while).
   - Neo4j also caches data on the filesystem level by memory mapping (mmap)
   hot regions of the store files. Writes to these regions will not be flushed
   to the actual file until the mmapped window is evicted due to being less hot
   than other windows, or when the transaction log for Neo4j is rotated. This
   means that from the p.o.v. of the EROGD the actual data written to disk will
   look inconsistent. Which would also lead to InvalidRecordExcaption. This
   situation is actually made even more complicated by the fact that unix
   operating systems will attempt to share memory mapped data from the same
   file between multiple processes, but the normal EGD and the EROGD will not
   make the same decisions on which regions to mmap, they might not even decide
   on the same size for mmap windows. We haven't tested how well different
   operating systems deal with reading data that was written to an mmap region
   through non-mmap syscalls from a different process, most likely this varies
   from OS to OS.

The second of these problems is of course the worst, since it cannot be
worked around. The first one can be mitigated by configuring Neo4j to not
use the object cache, by passing the cache_type=none parameter to the
constructor of the EROGD. This should really be made default for EROGD,
unless we decide to completely remove EROGD.

I hope that sheds some light on the reasons why you experience these
problems with EmbeddedReadOnlyGraphDatabase, and what the intention of
creating it was.

As a side note I can mention that I had a different idea for how to solve
the introspection-of-live-graph problem at the time
EmbeddedReadOnlyGraphDatabase was created: Create network based
implementation of the GraphDatabaseService API and connect directly to the
running instance. This would completely avoid the cache staleness problem,
but at the cost of network overhead for each graph operation, which is
probably fine for tooling purposes. With the JVM agent attach protocol it
would be possible to inject such a server into a running graph database that
wasn't originally configured for it. I in fact implemented this as the
RemoteGraphDatabase subproject.
Since my colleagues did not share my vision about that idea, this project
didn't receive much attention after its initial inception. It was also never
really used for these purposes, but rather misused for building
applications, leading us to deprecate the project. When we then later
discovered a severe bug in the implementation of the remote transaction
handling logic, we completely removed the project.
I still believe this to be a superior model for tools, but would build it
differently if I were to build it today.

-tobias

On Mon, Aug 1, 2011 at 4:48 PM, Jim Webber j...@neotechnology.com wrote:

 Hi Mathias,

 EmbeddedReadOnlyGraphDatabase is not quite what

Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-07-31 Thread Mathias Hensel
Hello Neo4J-Team, 

does anyone looked into this issue? Any insights?

Thanks again, 
Mathias

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Synchronization-of-EmbeddedReadOnlyGraphDatabase-Bug-tp3174626p3213450.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-07-19 Thread Mathias Hensel

Hello Jim, 

thanks for your reply. Yes, this is absolutely correct.

Mathias

 Message: 5
 Date: Sat, 16 Jul 2011 08:46:57 -0600
 From: Jim Webber j...@neotechnology.com
 Subject: Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase
   - Bug?
 To: Neo4j user discussions user@lists.neo4j.org
 Message-ID: 902bc63b-d69b-4b97-85cb-b52e88dbd...@neotechnology.com
 Content-Type: text/plain; charset=us-ascii
 
 Hi Mattias,
 
 If  I understand you correctly, you're pointing two database instances (one 
 being read-only) at the same on-disk location. Is that correct?
 
 Jim
 
 On 16 Jul 2011, at 07:37, Mathias Hensel wrote:
 
  
  Hello, 
  
  I try to use Neo4J in a Ruby on Rails application (MRI Ruby not JRuby). Due 
  to Rails process-based model I run one instance of the 
  EmbeddedGraphDatabase in a separate process. All write updates coming in 
  from user actions are delegated to this process. The EmbeddedGraphDatabase 
  here serves as a pure writeable database. All reads are handled directly in 
  the web app through multiple instances of EmbeddedReadOnlyGraphDatabase 
  (one instance for each web server process).
  
  Unfortunately I encountered the following problem: When adding a new 
  relationship to a node (via the EmbeddedGraphDatabase), this relationship 
  is not visible to the EmbeddedReadOnlyGraphDatabase. I can reopen the 
  EmbeddedReadOnlyGraphDatabase from time to time or even at each request, 
  but this ends up with an InvalidRecordException: Record[9180] not in use 
  when trying to traverse the node or trying to get the relationships. 9180 
  is the newly created relationship. 
  
  Only when I restart the EmbeddedGraphDatabase this relationship is visible 
  to the EmbeddedReadOnlyGraphDatabase without any exceptions but this 
  shouldn't be the use case. Is this a bug or is there an explicit way to 
  synchronize both types of database instances?
  
  Thank you very much!
  
  Regards,
  Mathias
  
  

  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 
 
  
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-07-16 Thread Mathias Hensel

Hello, 

I try to use Neo4J in a Ruby on Rails application (MRI Ruby not JRuby). Due to 
Rails process-based model I run one instance of the EmbeddedGraphDatabase in a 
separate process. All write updates coming in from user actions are delegated 
to this process. The EmbeddedGraphDatabase here serves as a pure writeable 
database. All reads are handled directly in the web app through multiple 
instances of EmbeddedReadOnlyGraphDatabase (one instance for each web server 
process).

Unfortunately I encountered the following problem: When adding a new 
relationship to a node (via the EmbeddedGraphDatabase), this relationship is 
not visible to the EmbeddedReadOnlyGraphDatabase. I can reopen the 
EmbeddedReadOnlyGraphDatabase from time to time or even at each request, but 
this ends up with an InvalidRecordException: Record[9180] not in use when 
trying to traverse the node or trying to get the relationships. 9180 is the 
newly created relationship. 

Only when I restart the EmbeddedGraphDatabase this relationship is visible to 
the EmbeddedReadOnlyGraphDatabase without any exceptions but this shouldn't be 
the use case. Is this a bug or is there an explicit way to synchronize both 
types of database instances?

Thank you very much!

Regards,
Mathias


  
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

2011-07-16 Thread Jim Webber
Hi Mattias,

If  I understand you correctly, you're pointing two database instances (one 
being read-only) at the same on-disk location. Is that correct?

Jim

On 16 Jul 2011, at 07:37, Mathias Hensel wrote:

 
 Hello, 
 
 I try to use Neo4J in a Ruby on Rails application (MRI Ruby not JRuby). Due 
 to Rails process-based model I run one instance of the EmbeddedGraphDatabase 
 in a separate process. All write updates coming in from user actions are 
 delegated to this process. The EmbeddedGraphDatabase here serves as a pure 
 writeable database. All reads are handled directly in the web app through 
 multiple instances of EmbeddedReadOnlyGraphDatabase (one instance for each 
 web server process).
 
 Unfortunately I encountered the following problem: When adding a new 
 relationship to a node (via the EmbeddedGraphDatabase), this relationship is 
 not visible to the EmbeddedReadOnlyGraphDatabase. I can reopen the 
 EmbeddedReadOnlyGraphDatabase from time to time or even at each request, but 
 this ends up with an InvalidRecordException: Record[9180] not in use when 
 trying to traverse the node or trying to get the relationships. 9180 is the 
 newly created relationship. 
 
 Only when I restart the EmbeddedGraphDatabase this relationship is visible to 
 the EmbeddedReadOnlyGraphDatabase without any exceptions but this shouldn't 
 be the use case. Is this a bug or is there an explicit way to synchronize 
 both types of database instances?
 
 Thank you very much!
 
 Regards,
 Mathias
 
 
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user