Hmm, I hadn't considered the apache approach but it still kind of goes against the grain - perhaps i just want too much or its my innate laziness ... hehe ;)
Its not just about data size, its more about not wanting to have to re-engineer/re-factor as things grow - whether that growth is concurrent access or in data quantity. On Mon, Feb 21, 2011 at 11:47 PM, Michael Hunger < [email protected]> wrote: > Hi J.T., > > of course you can have the cache sharding taken care of by the server side, > e.g. use an apache proxy for > client sticky routing, redirecting according to URL patterns etc. But that > doesn't cover your "domain". > > The problem is that other than simple kv stores, where the sharding the key > is pretty easy, sharding graphs is much more > demanding. You would like to have traversal locality (so that you don't > have to cross servers for a single traversal). > That means something that keeps (and also updates) your subgraphs to be in > just one server. > And deciding which subgraphs should be put together is either a pure domain > driven thing or something that could be achieved by having lots of (long > running?) > clients (and their request URLs) and looking at their traversal / query > statistics and optimizing the data held permanently (or even "mastered") on > the specific node for a certain set of requests. > > It would also mean that the occasional cross-server traversal should result > in local caches being updated for the remote data. > > Is the problem we're talking about just data size? You can already store > pretty big graphs in a single neo4j node (esp. when you go for big > machines). > > Michael > > Am 22.02.2011 um 00:15 schrieb J T: > > > I realise that there are different qualities that can come in to play > with > > the labels 'scalability' & 'performance' and I can see how your strategy > > would help with some of those qualities but it relies on custom logic in > the > > client application to do the sharding and load spreading and doesn't > address > > scaling the underlying persistant storage engine. > > > > One of the things that attracted me to Riak and Cassandra (for the use > cases > > I can apply them to) is that sharding, load balancing and persistance > > scaling was available out-of-the-box and and pretty much invisible to the > > client application. The client app didn't have to do anything special. I > > appreciate that perhaps because they have different semantics that its an > > easier for them to solve. > > > > I had a read of this page you wrote the other day : > > > http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx > > > > It was your comment "it's hard to achieve in practice" that prompted me > to > > post my initial message yesterday to enquire further. > > > > I'm no specialist in the field, I just know what I want hehe :) > > > > The only player in the field I've been able to find that might have more > of > > the qualities I am interested is InfiniteGraph, its a shame that it > doesn't > > have a 'server' version like neo does for me to do a proper comparison. > > > > I'll stick with neo for now, and see how the marketplace matures in the > > coming months - i'm amazed at how much movement there has been in the > last > > year. > > > > > > > > > > On Mon, Feb 21, 2011 at 3:09 PM, Jim Webber <[email protected]> > wrote: > > > >> Yup, you nailed it better than I did Rick. > >> > >> Though your partition strategy might not be just "per user." For example > in > >> the geo domain, it makes sense to route requests for particular cities > to > >> specific nodes. It'll depend on your application how you generate your > >> routing rules. > >> > >> Jim > >> > >> On 21 Feb 2011, at 14:51, Michael Hunger wrote: > >> > >>> You shouldn't be confused because you got it right :) > >>> > >>> Cheers > >>> > >>> Michael > >>> > >>> Am 21.02.2011 um 15:40 schrieb Rick Otten: > >>> > >>>> Ok, I'm following this discussion, and now I'm confused. > >>>> > >>>> My understanding was that the (potentially very large) database is > >>>> replicated across all instances. > >>>> > >>>> If someone needed to traverse to something that wasn't cached, they'd > >> take > >>>> a performance hit, but still be able to get to it. > >>>> > >>>> I had understood the idea behind the load balancing is to minimize > >>>> traversals out of cache by grouping similar sets of users on a > >> particular > >>>> server. (That way you don't need a ton of RAM to stash everything in > >> the > >>>> database, just the most frequently accessed nodes and relationships > >>>> associated with a subset of the users.) > >>>> > >>>> > >>>> > >>>> > >>>>> Hello JT, > >>>>> > >>>>>> One thing, when you say route requests to specific instances .. does > >>>>>> that > >>>>>> imply that node relationships can't span instances ? > >>>>> > >>>>> Yes that's right. What I'm suggesting here is that each instance is a > >> full > >>>>> replica that works on a subset of requests which are likely to keep > the > >>>>> caches warm. > >>>>> > >>>>> So if you can split your requests (e.g all customers beginning with > "A" > >> go > >>>>> to instance "1" ... all customers beginning with "Z" go to instance > >> "26"), > >>>>> they will benefit from having warm caches for reading, while the HA > >>>>> infrastructure deals with updates across instances transactionally. > >>>>> > >>>>> Jim > >>>>> _______________________________________________ > >>>>> Neo4j mailing list > >>>>> [email protected] > >>>>> https://lists.neo4j.org/mailman/listinfo/user > >>>>> > >>>> > >>>> > >>>> -- > >>>> Rick Otten > >>>> [email protected] > >>>> O=='=+ > >>>> > >>>> > >>>> _______________________________________________ > >>>> Neo4j mailing list > >>>> [email protected] > >>>> https://lists.neo4j.org/mailman/listinfo/user > >>> > >>> _______________________________________________ > >>> Neo4j mailing list > >>> [email protected] > >>> https://lists.neo4j.org/mailman/listinfo/user > >> > >> _______________________________________________ > >> Neo4j mailing list > >> [email protected] > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > _______________________________________________ > > Neo4j mailing list > > [email protected] > > https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

