Hmm, I hadn't considered the apache approach but it still kind of goes
against the grain - perhaps i just want too much or its my innate laziness
... hehe ;)

Its not just about data size, its more about not wanting to have to
re-engineer/re-factor as things grow - whether that growth is concurrent
access or in data quantity.




On Mon, Feb 21, 2011 at 11:47 PM, Michael Hunger <
[email protected]> wrote:

> Hi J.T.,
>
> of course you can have the cache sharding taken care of by the server side,
> e.g. use an apache proxy for
> client sticky routing, redirecting according to URL patterns etc. But that
> doesn't cover your "domain".
>
> The problem is that other than simple kv stores, where the sharding the key
> is pretty easy, sharding graphs is much more
> demanding. You would like to have traversal locality (so that you don't
> have to cross servers for a single traversal).
> That means something that keeps (and also updates) your subgraphs to be in
> just one server.
> And deciding which subgraphs should be put together is either a pure domain
> driven thing or something that could be achieved by having lots of (long
> running?)
> clients (and their request URLs) and looking at their traversal / query
> statistics and optimizing the data held permanently (or even "mastered") on
> the specific node for a certain set of requests.
>
> It would also mean that the occasional cross-server traversal should result
> in local caches being updated for the remote data.
>
> Is the problem we're talking about just data size? You can already store
> pretty big graphs in a single neo4j node (esp. when you go for big
> machines).
>
> Michael
>
> Am 22.02.2011 um 00:15 schrieb J T:
>
> > I realise that there are different qualities that can come in to play
> with
> > the labels 'scalability' & 'performance' and I can see how your strategy
> > would help with some of those qualities but it relies on custom logic in
> the
> > client application to do the sharding and load spreading and doesn't
> address
> > scaling the underlying persistant storage engine.
> >
> > One of the things that attracted me to Riak and Cassandra (for the use
> cases
> > I can apply them to) is that sharding, load balancing and persistance
> > scaling was available out-of-the-box and and pretty much invisible to the
> > client application. The client app didn't have to do anything special. I
> > appreciate that perhaps because they have different semantics that its an
> > easier for them to solve.
> >
> > I had a read of this page you wrote the other day :
> >
> http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx
> >
> > It was your comment "it's hard to achieve in practice" that prompted me
> to
> > post my initial message yesterday to enquire further.
> >
> > I'm no specialist in the field, I just know what I want hehe :)
> >
> > The only player in the field I've been able to find that might have more
> of
> > the qualities I am interested is InfiniteGraph, its a shame that it
> doesn't
> > have a 'server' version like neo does for me to do a proper comparison.
> >
> > I'll stick with neo for now, and see how the marketplace matures in the
> > coming months - i'm amazed at how much movement there has been in the
> last
> > year.
> >
> >
> >
> >
> > On Mon, Feb 21, 2011 at 3:09 PM, Jim Webber <[email protected]>
> wrote:
> >
> >> Yup, you nailed it better than I did Rick.
> >>
> >> Though your partition strategy might not be just "per user." For example
> in
> >> the geo domain, it makes sense to route requests for particular cities
> to
> >> specific nodes. It'll depend on your application how you generate your
> >> routing rules.
> >>
> >> Jim
> >>
> >> On 21 Feb 2011, at 14:51, Michael Hunger wrote:
> >>
> >>> You shouldn't be confused because you got it right :)
> >>>
> >>> Cheers
> >>>
> >>> Michael
> >>>
> >>> Am 21.02.2011 um 15:40 schrieb Rick Otten:
> >>>
> >>>> Ok, I'm following this discussion, and now I'm confused.
> >>>>
> >>>> My understanding was that the (potentially very large) database is
> >>>> replicated across all instances.
> >>>>
> >>>> If someone needed to traverse to something that wasn't cached, they'd
> >> take
> >>>> a performance hit, but still be able to get to it.
> >>>>
> >>>> I had understood the idea behind the load balancing is to minimize
> >>>> traversals out of cache by grouping similar sets of users on a
> >> particular
> >>>> server.  (That way you don't need a ton of RAM to stash everything in
> >> the
> >>>> database, just the most frequently accessed nodes and relationships
> >>>> associated with a subset of the users.)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> Hello JT,
> >>>>>
> >>>>>> One thing, when you say route requests to specific instances .. does
> >>>>>> that
> >>>>>> imply that node relationships can't span instances ?
> >>>>>
> >>>>> Yes that's right. What I'm suggesting here is that each instance is a
> >> full
> >>>>> replica that works on a subset of requests which are likely to keep
> the
> >>>>> caches warm.
> >>>>>
> >>>>> So if you can split your requests (e.g all customers beginning with
> "A"
> >> go
> >>>>> to instance "1" ... all customers beginning with "Z" go to instance
> >> "26"),
> >>>>> they will benefit from having warm caches for reading, while the HA
> >>>>> infrastructure deals with updates across instances transactionally.
> >>>>>
> >>>>> Jim
> >>>>> _______________________________________________
> >>>>> Neo4j mailing list
> >>>>> [email protected]
> >>>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Rick Otten
> >>>> [email protected]
> >>>> O=='=+
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Neo4j mailing list
> >>>> [email protected]
> >>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>
> >>> _______________________________________________
> >>> Neo4j mailing list
> >>> [email protected]
> >>> https://lists.neo4j.org/mailman/listinfo/user
> >>
> >> _______________________________________________
> >> Neo4j mailing list
> >> [email protected]
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > _______________________________________________
> > Neo4j mailing list
> > [email protected]
> > https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to