Re: [Neo4j] Social Networks And Graph Databases

Mattias Persson Mon, 21 Feb 2011 23:52:52 -0800

2011/2/22 J T <jt4websi...@googlemail.com>

> Hmm, I hadn't considered the apache approach but it still kind of goes
> against the grain - perhaps i just want too much or its my innate laziness
> ... hehe ;)
>
> Its not just about data size, its more about not wanting to have to
> re-engineer/re-factor as things grow - whether that growth is concurrent
> access or in data quantity.
>
>
There are not that many cases (fewer than you'd might imagine) where you'd
need to scale/shard out Neo4j to multiple machines just to handle the load
put on it. It's great to think ahead and be aware of limitations, but
there's a pretty high chance you just wont run into those. And if/when you
do Neo4j will probably have evolved to handle that load for you anyway,
maybe even sharding :)


>
>
>
> On Mon, Feb 21, 2011 at 11:47 PM, Michael Hunger <
> michael.hun...@neotechnology.com> wrote:
>
> > Hi J.T.,
> >
> > of course you can have the cache sharding taken care of by the server
> side,
> > e.g. use an apache proxy for
> > client sticky routing, redirecting according to URL patterns etc. But
> that
> > doesn't cover your "domain".
> >
> > The problem is that other than simple kv stores, where the sharding the
> key
> > is pretty easy, sharding graphs is much more
> > demanding. You would like to have traversal locality (so that you don't
> > have to cross servers for a single traversal).
> > That means something that keeps (and also updates) your subgraphs to be
> in
> > just one server.
> > And deciding which subgraphs should be put together is either a pure
> domain
> > driven thing or something that could be achieved by having lots of (long
> > running?)
> > clients (and their request URLs) and looking at their traversal / query
> > statistics and optimizing the data held permanently (or even "mastered")
> on
> > the specific node for a certain set of requests.
> >
> > It would also mean that the occasional cross-server traversal should
> result
> > in local caches being updated for the remote data.
> >
> > Is the problem we're talking about just data size? You can already store
> > pretty big graphs in a single neo4j node (esp. when you go for big
> > machines).
> >
> > Michael
> >
> > Am 22.02.2011 um 00:15 schrieb J T:
> >
> > > I realise that there are different qualities that can come in to play
> > with
> > > the labels 'scalability' & 'performance' and I can see how your
> strategy
> > > would help with some of those qualities but it relies on custom logic
> in
> > the
> > > client application to do the sharding and load spreading and doesn't
> > address
> > > scaling the underlying persistant storage engine.
> > >
> > > One of the things that attracted me to Riak and Cassandra (for the use
> > cases
> > > I can apply them to) is that sharding, load balancing and persistance
> > > scaling was available out-of-the-box and and pretty much invisible to
> the
> > > client application. The client app didn't have to do anything special.
> I
> > > appreciate that perhaps because they have different semantics that its
> an
> > > easier for them to solve.
> > >
> > > I had a read of this page you wrote the other day :
> > >
> >
> http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx
> > >
> > > It was your comment "it's hard to achieve in practice" that prompted me
> > to
> > > post my initial message yesterday to enquire further.
> > >
> > > I'm no specialist in the field, I just know what I want hehe :)
> > >
> > > The only player in the field I've been able to find that might have
> more
> > of
> > > the qualities I am interested is InfiniteGraph, its a shame that it
> > doesn't
> > > have a 'server' version like neo does for me to do a proper comparison.
> > >
> > > I'll stick with neo for now, and see how the marketplace matures in the
> > > coming months - i'm amazed at how much movement there has been in the
> > last
> > > year.
> > >
> > >
> > >
> > >
> > > On Mon, Feb 21, 2011 at 3:09 PM, Jim Webber <j...@neotechnology.com>
> > wrote:
> > >
> > >> Yup, you nailed it better than I did Rick.
> > >>
> > >> Though your partition strategy might not be just "per user." For
> example
> > in
> > >> the geo domain, it makes sense to route requests for particular cities
> > to
> > >> specific nodes. It'll depend on your application how you generate your
> > >> routing rules.
> > >>
> > >> Jim
> > >>
> > >> On 21 Feb 2011, at 14:51, Michael Hunger wrote:
> > >>
> > >>> You shouldn't be confused because you got it right :)
> > >>>
> > >>> Cheers
> > >>>
> > >>> Michael
> > >>>
> > >>> Am 21.02.2011 um 15:40 schrieb Rick Otten:
> > >>>
> > >>>> Ok, I'm following this discussion, and now I'm confused.
> > >>>>
> > >>>> My understanding was that the (potentially very large) database is
> > >>>> replicated across all instances.
> > >>>>
> > >>>> If someone needed to traverse to something that wasn't cached,
> they'd
> > >> take
> > >>>> a performance hit, but still be able to get to it.
> > >>>>
> > >>>> I had understood the idea behind the load balancing is to minimize
> > >>>> traversals out of cache by grouping similar sets of users on a
> > >> particular
> > >>>> server.  (That way you don't need a ton of RAM to stash everything
> in
> > >> the
> > >>>> database, just the most frequently accessed nodes and relationships
> > >>>> associated with a subset of the users.)
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>> Hello JT,
> > >>>>>
> > >>>>>> One thing, when you say route requests to specific instances ..
> does
> > >>>>>> that
> > >>>>>> imply that node relationships can't span instances ?
> > >>>>>
> > >>>>> Yes that's right. What I'm suggesting here is that each instance is
> a
> > >> full
> > >>>>> replica that works on a subset of requests which are likely to keep
> > the
> > >>>>> caches warm.
> > >>>>>
> > >>>>> So if you can split your requests (e.g all customers beginning with
> > "A"
> > >> go
> > >>>>> to instance "1" ... all customers beginning with "Z" go to instance
> > >> "26"),
> > >>>>> they will benefit from having warm caches for reading, while the HA
> > >>>>> infrastructure deals with updates across instances transactionally.
> > >>>>>
> > >>>>> Jim
> > >>>>> _______________________________________________
> > >>>>> Neo4j mailing list
> > >>>>> User@lists.neo4j.org
> > >>>>> https://lists.neo4j.org/mailman/listinfo/user
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Rick Otten
> > >>>> rot...@windfish.net
> > >>>> O=='=+
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> Neo4j mailing list
> > >>>> User@lists.neo4j.org
> > >>>> https://lists.neo4j.org/mailman/listinfo/user
> > >>>
> > >>> _______________________________________________
> > >>> Neo4j mailing list
> > >>> User@lists.neo4j.org
> > >>> https://lists.neo4j.org/mailman/listinfo/user
> > >>
> > >> _______________________________________________
> > >> Neo4j mailing list
> > >> User@lists.neo4j.org
> > >> https://lists.neo4j.org/mailman/listinfo/user
> > >>
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> >
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Social Networks And Graph Databases

Reply via email to