"Cache sharding" = super nice iterative/interim improvement.

It makes use of aggregate resources (RAM & CPU) across multiple servers (as
would be the case with a "truly sharded" Neo4j) without bothering about
partitioning (even "basic" consistent hashing) algorithms.

You get a viable "partitioning" solution by moving the logic to the
client-side. Neo4j doesn't need to worry about some of the engineering
headaches that true partitioning would bring - like performing transactions
between multiple different partitions, or load balancing partition replicas
across available servers - but clients still get some of the benefits of
true partitioning.

Like!

Some other thoughts...

As mentioned, you don't get around the difficult partitioning problem:
achieving locality in a big graph of unknown topology/access.

Also, you don't make the best use of aggregate resources. For example, if
you wanted to run one very large (read: forbidden/not-graphy) traversal like
page-rank on the entire graph you (1) would hit disk (2) would not make use
of aggregate resources (CPU,RAM). I think that's an orthogonal problem
though... regardless of how you cut your graph, the way you then traverse it
has to be "resource-aware": it has to know about, and be capable of using,
the compute resources on other machines. E.g. Neo4j "graph walking" vs
Pregel "vertex gossiping"... but maybe that can be left to future
discussions.

On Thu, Feb 24, 2011 at 3:26 PM, Mark Harwood <[email protected]> wrote:

> > That's a really fantastic and useful design metric. Can paraphrase it a
> bit and write it up on the Neo4j blog/my blog?
>
> I'd be honoured.
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to