Hi, i have been thinking about this for a while, but i am not very smart, so the results aren't that interesting.
One way to solve this is just to model the graph ontop of a key-value store, and store pointers to nodes and relationships in the values. The pro is that its easy to implement, and it scales easily in size, most distributed key-value stores scale to the hundreds of jvms without problems. The downside to this approach is that all traversing is done at the client side, and all data must be moved from the store to the client. It also does not scale well when a node gets many relationships, since more and more data must be moved when the node is loaded. Indexed search is also a problem, most stores solve this by doing map-reduce searches. This approach is almost like storing stuff to disk, robust, but not very innovative. You gain high uptime and decent performance. I guess this is what you call "up-front-sharding". The other approach would be to store graph neighborhoods on the same jvm, and traverse the graph by sending the traverser to the machine the nodes are stored, passing the traverser instead of the data. This would give a huge performance gain since most operations are done locally, and network transfers only occur when the traverser "hops" from one machine to the next by following edges that leave a neighborhood. The drawaback with this approach as i see it is complexity, for example when rebalancing the data when one jvm is "full". Backups and robustnessis also harder to achieve. Thusmust be the "real-time-sharding". An interesting project that does something similar (although with objects, not with a data structures) is swarm [1]. It is an experimental distributed object model, and they have some interesting findings regarding their distribution model. [1] http://code.google.com/p/swarm-dpl/ -atle On Sat, Sep 4, 2010 at 11:21 PM, Peter Neubauer < [email protected]> wrote: > Jonathan, > right now the focus is on getting Master-Slave replication up. That > is, run a cluster of distributed machines with the same graph and > master failover. After that, sharding is one of the areas where it > would be VERY interesting to start experimenting with different > approaches both from the Insert-time-sharding (upfront-sharding of > your domain much like in document- key/value and other approaches) and > the runtime sharding (the redistribution of data to better reflect > runtime characteristics of e.g. traversals etc). > > To start with, a simple application that uses different neo4j > instances to hold different parts of a domain would be a great > contribution in order to show and explore "domain space" sharding. > > Do you have any special thoughts on good approaches? > > Cheers, > > /peter neubauer > > COO and Sales, Neo Technology > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org - Your high performance graph database. > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > On Sat, Sep 4, 2010 at 10:41 PM, Jonathan Leibiusky <[email protected]> > wrote: > > Hi! Just wondering if you're still working on being able to distribute > neo4j > > over several JVMs. > > Based on this answer > > http://lists.neo4j.org/pipermail/user/2008-September/000758.html it > seems > > like maybe something like this should be done by now, but I saw that > neo4j > > hasn't reach 2.0 so maybe it is not there yet. > > > > I am willing to help developing this. > > > > Thanks, > > > > Jonathan > > _______________________________________________ > > Neo4j mailing list > > [email protected] > > https://lists.neo4j.org/mailman/listinfo/user > > > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

