Re: [Neo4j] distributed neo4j

Atle Osmoen-Prange Tue, 07 Sep 2010 00:59:14 -0700

Hi, i have been thinking about this for a while, but i am not very smart, so
the results aren't that interesting.

One way to solve this is just to model the graph ontop of a key-value store,
and store pointers to nodes and relationships in the values. The pro is that
its easy to implement, and it scales easily in size, most distributed
key-value stores scale to the hundreds of jvms without problems. The
downside to this approach is that all traversing is done at the client side,
and all data must be moved from the store to the client. It also does not
scale well when a node gets many relationships, since more and more data
must be moved when the node is loaded. Indexed search is also a problem,
most stores solve this by doing map-reduce searches. This approach is almost
like storing stuff to disk, robust, but not very innovative. You gain high
uptime and decent performance. I guess this is what you call
"up-front-sharding".

The other approach would be to store graph neighborhoods on the same jvm,
and traverse the graph by sending the traverser to the machine the nodes are
stored, passing the traverser instead of the data. This would give a huge
performance gain since most operations are done locally, and network
transfers only occur when the traverser "hops" from one machine to the next
by following edges that leave a neighborhood. The drawaback with this
approach as i see it is complexity, for example when rebalancing the data
when one jvm is "full". Backups and robustnessis also harder to achieve.
Thusmust be the "real-time-sharding".

An interesting project that does something similar (although with objects,
not with a data structures) is swarm [1]. It is an experimental distributed
object model, and they have some interesting findings regarding their
distribution model.

[1] http://code.google.com/p/swarm-dpl/

-atle

On Sat, Sep 4, 2010 at 11:21 PM, Peter Neubauer <
[email protected]> wrote:

> Jonathan,
> right now the focus is on getting Master-Slave replication up. That
> is, run a cluster of distributed machines with the same graph and
> master failover. After that, sharding is one of the areas where it
> would be VERY interesting to start experimenting with different
> approaches both from the Insert-time-sharding (upfront-sharding of
> your domain much like in document- key/value and other approaches) and
> the runtime sharding (the redistribution of data to better reflect
> runtime characteristics of e.g. traversals etc).
>
> To start with, a simple application that uses different neo4j
> instances to hold different parts of a domain would be a great
> contribution in order to show and explore "domain space" sharding.
>
> Do you have any special thoughts on good approaches?
>
> Cheers,
>
> /peter neubauer
>
> COO and Sales, Neo Technology
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org               - Your high performance graph database.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Sat, Sep 4, 2010 at 10:41 PM, Jonathan Leibiusky <[email protected]>
> wrote:
> > Hi! Just wondering if you're still working on being able to distribute
> neo4j
> > over several JVMs.
> > Based on this answer
> > http://lists.neo4j.org/pipermail/user/2008-September/000758.html it
> seems
> > like maybe something like this should be done by now, but I saw that
> neo4j
> > hasn't reach 2.0 so maybe it is not there yet.
> >
> > I am willing to help developing this.
> >
> > Thanks,
> >
> > Jonathan
> > _______________________________________________
> > Neo4j mailing list
> > [email protected]
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] distributed neo4j

Reply via email to