Re: [Neo4j] Social Networks And Graph Databases

Michael Hunger Mon, 21 Feb 2011 15:48:06 -0800

Hi J.T.,

of course you can have the cache sharding taken care of by the server side, 
e.g. use an apache proxy for 
client sticky routing, redirecting according to URL patterns etc. But that 
doesn't cover your "domain".


The problem is that other than simple kv stores, where the sharding the key is 
pretty easy, sharding graphs is much more
demanding. You would like to have traversal locality (so that you don't have to 
cross servers for a single traversal).
That means something that keeps (and also updates) your subgraphs to be in just 
one server. 
And deciding which subgraphs should be put together is either a pure domain 
driven thing or something that could be achieved by having lots of (long 
running?)
clients (and their request URLs) and looking at their traversal / query 
statistics and optimizing the data held permanently (or even "mastered") on the 
specific node for a certain set of requests.

It would also mean that the occasional cross-server traversal should result in 
local caches being updated for the remote data.

Is the problem we're talking about just data size? You can already store pretty 
big graphs in a single neo4j node (esp. when you go for big machines).

Michael

Am 22.02.2011 um 00:15 schrieb J T:

> I realise that there are different qualities that can come in to play with
> the labels 'scalability' & 'performance' and I can see how your strategy
> would help with some of those qualities but it relies on custom logic in the
> client application to do the sharding and load spreading and doesn't address
> scaling the underlying persistant storage engine.
> 
> One of the things that attracted me to Riak and Cassandra (for the use cases
> I can apply them to) is that sharding, load balancing and persistance
> scaling was available out-of-the-box and and pretty much invisible to the
> client application. The client app didn't have to do anything special. I
> appreciate that perhaps because they have different semantics that its an
> easier for them to solve.
> 
> I had a read of this page you wrote the other day :
> http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx
> 
> It was your comment "it's hard to achieve in practice" that prompted me to
> post my initial message yesterday to enquire further.
> 
> I'm no specialist in the field, I just know what I want hehe :)
> 
> The only player in the field I've been able to find that might have more of
> the qualities I am interested is InfiniteGraph, its a shame that it doesn't
> have a 'server' version like neo does for me to do a proper comparison.
> 
> I'll stick with neo for now, and see how the marketplace matures in the
> coming months - i'm amazed at how much movement there has been in the last
> year.
> 
> 
> 
> 
> On Mon, Feb 21, 2011 at 3:09 PM, Jim Webber <[email protected]> wrote:
> 
>> Yup, you nailed it better than I did Rick.
>> 
>> Though your partition strategy might not be just "per user." For example in
>> the geo domain, it makes sense to route requests for particular cities to
>> specific nodes. It'll depend on your application how you generate your
>> routing rules.
>> 
>> Jim
>> 
>> On 21 Feb 2011, at 14:51, Michael Hunger wrote:
>> 
>>> You shouldn't be confused because you got it right :)
>>> 
>>> Cheers
>>> 
>>> Michael
>>> 
>>> Am 21.02.2011 um 15:40 schrieb Rick Otten:
>>> 
>>>> Ok, I'm following this discussion, and now I'm confused.
>>>> 
>>>> My understanding was that the (potentially very large) database is
>>>> replicated across all instances.
>>>> 
>>>> If someone needed to traverse to something that wasn't cached, they'd
>> take
>>>> a performance hit, but still be able to get to it.
>>>> 
>>>> I had understood the idea behind the load balancing is to minimize
>>>> traversals out of cache by grouping similar sets of users on a
>> particular
>>>> server.  (That way you don't need a ton of RAM to stash everything in
>> the
>>>> database, just the most frequently accessed nodes and relationships
>>>> associated with a subset of the users.)
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> Hello JT,
>>>>> 
>>>>>> One thing, when you say route requests to specific instances .. does
>>>>>> that
>>>>>> imply that node relationships can't span instances ?
>>>>> 
>>>>> Yes that's right. What I'm suggesting here is that each instance is a
>> full
>>>>> replica that works on a subset of requests which are likely to keep the
>>>>> caches warm.
>>>>> 
>>>>> So if you can split your requests (e.g all customers beginning with "A"
>> go
>>>>> to instance "1" ... all customers beginning with "Z" go to instance
>> "26"),
>>>>> they will benefit from having warm caches for reading, while the HA
>>>>> infrastructure deals with updates across instances transactionally.
>>>>> 
>>>>> Jim
>>>>> _______________________________________________
>>>>> Neo4j mailing list
>>>>> [email protected]
>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Rick Otten
>>>> [email protected]
>>>> O=='=+
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> [email protected]
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>> 
>>> _______________________________________________
>>> Neo4j mailing list
>>> [email protected]
>>> https://lists.neo4j.org/mailman/listinfo/user
>> 
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Social Networks And Graph Databases

Reply via email to