Re: [Neo4j] server with big (huge?) graph

Mattias Persson Mon, 09 May 2011 02:22:22 -0700

2011/2/23 Kiss Miklós <[email protected]>

> Thanks for the response.
>
> Then my idea of a server plugin wasn't a bad idea, great.
> My next question is then: how do I traverse only a part of the possible
> sub-graph?
> I mean: let's suppose I start traversing from node 'A' and want to get
> all 2 length paths on relationships 'TYPE_X' and 'TYPE_Y'. Let's say
> node 'A' has 5000 'TYPE_X' relationships and all connected nodes have 10
> 'TYPE_Y' relationships. If I start my traversing from node 'A' on server
> side, how can I stop after fetching 1000 paths and later continue from
> where I stopped? Or am I missing something very important?
>
> I have an idea to put extra nodes into my graph that collect (let's say)
> 100 same typed connections and stand as an intermediate node between 'A'
> and 'B'. Using this would allow me to collect all paths between 'A' and
> 'B' in multiple steps, each step returning at most 100 paths only.
> However, this scheme is harder to maintain and makes the connections
> harder to read and also has a performance penalty (direct connections
> becoma indirect 2 step connections).
>


Seems awfully complex to maintain as you mentioned. There should be another
way of doing this, however I think you'll have to roll your own. Have you
made any progress here?

I just though about it and could it be done by writing your own server
plugin which does a path calculation and returns URI or ID where you can
start getting results from it? So that plugin merely does the calculation,
assigns the result a new ID and returns, then another GET request could read
that result and iterate only N number of items from the result, and then the
next GET request to that ID could continue that iteration of the results.

I don't know, just thinking out loud.

>
> Or is this something very similar that can be achieved with proper
> indexing (like You mentioned)?
>

Doing path algorithms with mixed indexing can be rather slow and limiting
the index results wouldn't solve your problem, would it?

>
> Am I on the right way?
>
> Miklós Kiss
>
> 2011.02.23. 14:20 keltezéssel, Michael Hunger írta:
> > First - you should perhaps write a Server-Plugin that does your heavy
> > lifting on the server and provides a REST endpoint to get the results.
> > Not sure if non-GET verbs are supported yet (otherwise you can always
> > go for an unmanaged extension defining your own resources).
> >
> > You can do indexing for certain start nodes and then use the traversal
> > facilities to update your graph (if this is fitting). E.g. you can use
> > the javascript evaluators not only to evaluate/query but also to
> > update the graph.
> >
> > Hope that helps
> >
> > Michael
> >
> > We're also thinking about a more terse or binary API that would server
> > interaction more efficient but I think that is the wrong direction for
> > your usecase. Rather move into the server what belongs there and
> > expose appropriate resources for your clients to interact with.
> >
> > 2011/2/23 Kiss Miklós<[email protected]>:
> >> Hi all,
> >>
> >> I'd like to get ideas on how to handle a (relatively) big graph. My
> >> graph is stored in a neo4j server. The structure is simple but highly
> >> interconnected:
> >> - I have nodes containing longer texts
> >> - and I have many nodes containing tokens of those texts.
> >> Relationships connect tokens to texts so I have many relationships. The
> >> actual graph does have many other nodes too but this is irrelevant now.
> >> The graph contains 300k nodes, 2.5 million properties and 1 million
> >> relationships (and is still growing).
> >>
> >> My question is how to execute querys from the graph. I have to execute
> >> operations that usually require querying huge parts of the graph. I
> >> mean: get all the tokens for some of the texts; or even get all the
> >> tokens. (I'm creating a text processing system that is learning and the
> >> teaching process involves manipulation of all tokens - I think it's much
> >> faster executed in memory rather then querying each token separately).
> >>
> >> The naive solution (traverse the graph from root node with 1 depth to
> >> get all the nodes of a certain type) is now unsusabe since my graph is
> >> too big. The server simply runs out of memory (I gave it 1024 MB - this
> >> is around the maximum until the server gets a separate hardvare).
> >>
> >> So my question is how to implement correctly and efficiently the
> >> querying of the graph? Should I create custom extensions that traverse
> >> and return only a part of the graph in such scenario? Or should I insert
> >> additional "control" nodes to the graph which can be used as reference
> >> points for querying? The main problem is that I have many same typed
> >> relationships. I don't know how to manage traversing the graph partially
> >> if it is only accessible through the REST protocol.
> >>
> >> Any help would be appreciated!
> >>
> >> Thanks in advance,
> >> Miklós Kiss
> >> _______________________________________________
> >> Neo4j mailing list
> >> [email protected]
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > _______________________________________________
> > Neo4j mailing list
> > [email protected]
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> >
>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [[email protected]]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] server with big (huge?) graph

Reply via email to