"Cache sharding" = super nice iterative/interim improvement. It makes use of aggregate resources (RAM & CPU) across multiple servers (as would be the case with a "truly sharded" Neo4j) without bothering about partitioning (even "basic" consistent hashing) algorithms.
You get a viable "partitioning" solution by moving the logic to the client-side. Neo4j doesn't need to worry about some of the engineering headaches that true partitioning would bring - like performing transactions between multiple different partitions, or load balancing partition replicas across available servers - but clients still get some of the benefits of true partitioning. Like! Some other thoughts... As mentioned, you don't get around the difficult partitioning problem: achieving locality in a big graph of unknown topology/access. Also, you don't make the best use of aggregate resources. For example, if you wanted to run one very large (read: forbidden/not-graphy) traversal like page-rank on the entire graph you (1) would hit disk (2) would not make use of aggregate resources (CPU,RAM). I think that's an orthogonal problem though... regardless of how you cut your graph, the way you then traverse it has to be "resource-aware": it has to know about, and be capable of using, the compute resources on other machines. E.g. Neo4j "graph walking" vs Pregel "vertex gossiping"... but maybe that can be left to future discussions. On Thu, Feb 24, 2011 at 3:26 PM, Mark Harwood <[email protected]> wrote: > > That's a really fantastic and useful design metric. Can paraphrase it a > bit and write it up on the Neo4j blog/my blog? > > I'd be honoured. > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

