2011/2/22 J T <jt4websi...@googlemail.com> > Hmm, I hadn't considered the apache approach but it still kind of goes > against the grain - perhaps i just want too much or its my innate laziness > ... hehe ;) > > Its not just about data size, its more about not wanting to have to > re-engineer/re-factor as things grow - whether that growth is concurrent > access or in data quantity. > > There are not that many cases (fewer than you'd might imagine) where you'd need to scale/shard out Neo4j to multiple machines just to handle the load put on it. It's great to think ahead and be aware of limitations, but there's a pretty high chance you just wont run into those. And if/when you do Neo4j will probably have evolved to handle that load for you anyway, maybe even sharding :)
> > > > On Mon, Feb 21, 2011 at 11:47 PM, Michael Hunger < > michael.hun...@neotechnology.com> wrote: > > > Hi J.T., > > > > of course you can have the cache sharding taken care of by the server > side, > > e.g. use an apache proxy for > > client sticky routing, redirecting according to URL patterns etc. But > that > > doesn't cover your "domain". > > > > The problem is that other than simple kv stores, where the sharding the > key > > is pretty easy, sharding graphs is much more > > demanding. You would like to have traversal locality (so that you don't > > have to cross servers for a single traversal). > > That means something that keeps (and also updates) your subgraphs to be > in > > just one server. > > And deciding which subgraphs should be put together is either a pure > domain > > driven thing or something that could be achieved by having lots of (long > > running?) > > clients (and their request URLs) and looking at their traversal / query > > statistics and optimizing the data held permanently (or even "mastered") > on > > the specific node for a certain set of requests. > > > > It would also mean that the occasional cross-server traversal should > result > > in local caches being updated for the remote data. > > > > Is the problem we're talking about just data size? You can already store > > pretty big graphs in a single neo4j node (esp. when you go for big > > machines). > > > > Michael > > > > Am 22.02.2011 um 00:15 schrieb J T: > > > > > I realise that there are different qualities that can come in to play > > with > > > the labels 'scalability' & 'performance' and I can see how your > strategy > > > would help with some of those qualities but it relies on custom logic > in > > the > > > client application to do the sharding and load spreading and doesn't > > address > > > scaling the underlying persistant storage engine. > > > > > > One of the things that attracted me to Riak and Cassandra (for the use > > cases > > > I can apply them to) is that sharding, load balancing and persistance > > > scaling was available out-of-the-box and and pretty much invisible to > the > > > client application. The client app didn't have to do anything special. > I > > > appreciate that perhaps because they have different semantics that its > an > > > easier for them to solve. > > > > > > I had a read of this page you wrote the other day : > > > > > > http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx > > > > > > It was your comment "it's hard to achieve in practice" that prompted me > > to > > > post my initial message yesterday to enquire further. > > > > > > I'm no specialist in the field, I just know what I want hehe :) > > > > > > The only player in the field I've been able to find that might have > more > > of > > > the qualities I am interested is InfiniteGraph, its a shame that it > > doesn't > > > have a 'server' version like neo does for me to do a proper comparison. > > > > > > I'll stick with neo for now, and see how the marketplace matures in the > > > coming months - i'm amazed at how much movement there has been in the > > last > > > year. > > > > > > > > > > > > > > > On Mon, Feb 21, 2011 at 3:09 PM, Jim Webber <j...@neotechnology.com> > > wrote: > > > > > >> Yup, you nailed it better than I did Rick. > > >> > > >> Though your partition strategy might not be just "per user." For > example > > in > > >> the geo domain, it makes sense to route requests for particular cities > > to > > >> specific nodes. It'll depend on your application how you generate your > > >> routing rules. > > >> > > >> Jim > > >> > > >> On 21 Feb 2011, at 14:51, Michael Hunger wrote: > > >> > > >>> You shouldn't be confused because you got it right :) > > >>> > > >>> Cheers > > >>> > > >>> Michael > > >>> > > >>> Am 21.02.2011 um 15:40 schrieb Rick Otten: > > >>> > > >>>> Ok, I'm following this discussion, and now I'm confused. > > >>>> > > >>>> My understanding was that the (potentially very large) database is > > >>>> replicated across all instances. > > >>>> > > >>>> If someone needed to traverse to something that wasn't cached, > they'd > > >> take > > >>>> a performance hit, but still be able to get to it. > > >>>> > > >>>> I had understood the idea behind the load balancing is to minimize > > >>>> traversals out of cache by grouping similar sets of users on a > > >> particular > > >>>> server. (That way you don't need a ton of RAM to stash everything > in > > >> the > > >>>> database, just the most frequently accessed nodes and relationships > > >>>> associated with a subset of the users.) > > >>>> > > >>>> > > >>>> > > >>>> > > >>>>> Hello JT, > > >>>>> > > >>>>>> One thing, when you say route requests to specific instances .. > does > > >>>>>> that > > >>>>>> imply that node relationships can't span instances ? > > >>>>> > > >>>>> Yes that's right. What I'm suggesting here is that each instance is > a > > >> full > > >>>>> replica that works on a subset of requests which are likely to keep > > the > > >>>>> caches warm. > > >>>>> > > >>>>> So if you can split your requests (e.g all customers beginning with > > "A" > > >> go > > >>>>> to instance "1" ... all customers beginning with "Z" go to instance > > >> "26"), > > >>>>> they will benefit from having warm caches for reading, while the HA > > >>>>> infrastructure deals with updates across instances transactionally. > > >>>>> > > >>>>> Jim > > >>>>> _______________________________________________ > > >>>>> Neo4j mailing list > > >>>>> User@lists.neo4j.org > > >>>>> https://lists.neo4j.org/mailman/listinfo/user > > >>>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Rick Otten > > >>>> rot...@windfish.net > > >>>> O=='=+ > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> Neo4j mailing list > > >>>> User@lists.neo4j.org > > >>>> https://lists.neo4j.org/mailman/listinfo/user > > >>> > > >>> _______________________________________________ > > >>> Neo4j mailing list > > >>> User@lists.neo4j.org > > >>> https://lists.neo4j.org/mailman/listinfo/user > > >> > > >> _______________________________________________ > > >> Neo4j mailing list > > >> User@lists.neo4j.org > > >> https://lists.neo4j.org/mailman/listinfo/user > > >> > > > _______________________________________________ > > > Neo4j mailing list > > > User@lists.neo4j.org > > > https://lists.neo4j.org/mailman/listinfo/user > > > > _______________________________________________ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user