Our approach is very application-specific, but it can be summarized by: - We keep our model database on one server and our "run time" data (somewhat like activity streams) on another server - A long value (node id) "source" property on data nodes that identify a model node in the other graph - Long value (node id) "server" property on data nodes that identify a node in the same graph, which contains "logical server" information stored as properties (logical name + domain name/IP address + port + protocol) - Lucene indices on the data nodes that index the data by tag(s), source, and time - Relationships in the "model" graph that describe inter-entity model relationships (inheritance, reference), dependencies and usage references, etc. - Lucene indices on the model nodes that index the model entities by type, tag(s) - Lucene indices on the "tagging" vocabularies on both the model and data graph(s)
We avoided using relationships in the data graph due to the fact that we are constantly adding and deleting potentially thousands of items per second, and this could create concurrency and performance issues when there are potentially millions of relationships on a node We didn't originally design it this way. The original approach was a single (embedded) database, using relationships for all node<->node connections. We're in the process of moving to our new design in phases, the first of which was a logical separation of model + data, though in the same graph, and switching from relationships to the "node id property" approach for some specific scenarios. I have to think there are substantial performance implications *if* you are trying to do complex cross-shard or cross-graph traversals, which we generally do not need to do. Rather, we can deal with this at the application layer. -----Original Message----- From: Aliabbas Petiwala [mailto:aliabba...@gmail.com] Sent: Sunday, July 03, 2011 2:54 AM To: Neo4j user discussions Subject: Re: [Neo4j] reify links with other neo4j databases located on different distributed servers Thanks a lot Rick can you please provide more details on issues which you faced while using this approach and share some code with us . Had you decided about this at design time itself and designed your graph db schema accordingly? Is there much perceived performance penalties if there are a large number of such references spanning physical boundaries? On 7/2/11, Rick Bullotta <rick.bullo...@thingworx.com> wrote: > We are using node-id property references (the node id as a property), > qualified with a "logical server" reference, to provide this type of binding > across graphs. If you combine these with an index, you can actually get a > lot of the functionality of relationships "cross graph", spanning physical > boundaries. Of course, as Craig points out, this all has to be done at the > application level, including dealing with cascading deletes when a node is > removed from one graph, ensuring that references to it in another graph are > removed/redirected. > > -----Original Message----- > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On > Behalf Of Craig Taverner > Sent: Saturday, July 02, 2011 6:03 AM > To: Neo4j user discussions > Subject: Re: [Neo4j] reify links with other neo4j databases located on > different distributed servers > > As far as I know there is no internal support for transparent traversals > across shards. Generally people are doing that in the application layer. > However, I think there might be a middle ground of sorts. I we modify the > relationship expander, I could imagine that relationships that are between > shards could be modified to return node on the other shard. This would make > the traversal return nodes across shards, but since I've not tried this > myself, I am uncertain if there are other consequences. > > On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala > <aliabba...@gmail.com>wrote: > >> Hi, >> >> I cannot figure out how my application logic can reify links with >> other neo4j databases located on different distributed servers? >> hence , how can i make the traversals and graph algorithms transparent >> to the location of the different databases ? >> -- >> Aliabbas Petiwala >> M.Tech CSE >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Aliabbas Petiwala M.Tech CSE _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user