Our approach is very application-specific, but it can be summarized by:

- We keep our model database on one server and our "run time" data (somewhat 
like activity streams) on another server
- A long value (node id) "source" property on data nodes that identify a model 
node in the other graph
- Long value (node id) "server" property on data nodes that identify a node in 
the same graph, which contains "logical server" information stored as 
properties (logical name + domain name/IP address + port + protocol)
- Lucene indices on the data nodes that index the data by tag(s), source, and 
time
- Relationships in the "model" graph that describe inter-entity model 
relationships (inheritance, reference), dependencies and usage references, etc.
- Lucene indices on the model nodes that index the model entities by type, 
tag(s)
- Lucene indices on the "tagging" vocabularies on both the model and data 
graph(s)

We avoided using relationships in the data graph due to the fact that we are 
constantly adding and deleting potentially thousands of items per second, and 
this could create concurrency and performance issues when there are potentially 
millions of relationships on a node

We didn't originally design it this way.  The original approach was a single 
(embedded) database, using relationships for all node<->node connections. We're 
in the process of moving to our new design in phases, the first of which was a 
logical separation of model + data, though in the same graph, and switching 
from relationships to the "node id property" approach for some specific 
scenarios.

I have to think there are substantial performance implications *if* you are 
trying to do complex cross-shard or cross-graph traversals, which we generally 
do not need to do.  Rather, we can deal with this at the application layer.




-----Original Message-----
From: Aliabbas Petiwala [mailto:aliabba...@gmail.com] 
Sent: Sunday, July 03, 2011 2:54 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] reify links with other neo4j databases located on 
different distributed servers

Thanks a lot Rick

can you please provide more details on  issues which you faced while
using this approach and  share some code with us .
Had you decided about this at design time itself and designed your
graph db schema accordingly?
Is there much perceived performance penalties if there are a large
number of such references spanning physical boundaries?

On 7/2/11, Rick Bullotta <rick.bullo...@thingworx.com> wrote:
> We are using node-id property references (the node id as a property),
> qualified with a "logical server" reference, to provide this type of binding
> across graphs. If you combine these with an index, you can actually get a
> lot of the functionality of relationships "cross graph", spanning physical
> boundaries.  Of course, as Craig points out, this all has to be done at the
> application level, including dealing with cascading deletes when a node is
> removed from one graph, ensuring that references to it in another graph are
> removed/redirected.
>
> -----Original Message-----
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
> Behalf Of Craig Taverner
> Sent: Saturday, July 02, 2011 6:03 AM
> To: Neo4j user discussions
> Subject: Re: [Neo4j] reify links with other neo4j databases located on
> different distributed servers
>
> As far as I know there is no internal support for transparent traversals
> across shards. Generally people are doing that in the application layer.
> However, I think there might be a middle ground of sorts. I we modify the
> relationship expander, I could imagine that relationships that are between
> shards could be modified to return node on the other shard. This would make
> the traversal return nodes across shards, but since I've not tried this
> myself, I am uncertain if there are other consequences.
>
> On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala
> <aliabba...@gmail.com>wrote:
>
>> Hi,
>>
>> I cannot figure out how my application logic can reify links with
>> other neo4j databases located on different distributed servers?
>> hence , how can i make the traversals and graph algorithms transparent
>> to the location of the different databases ?
>> --
>> Aliabbas Petiwala
>> M.Tech CSE
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>


-- 
Aliabbas Petiwala
M.Tech CSE
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to