Hi all,

I'd like to get ideas on how to handle a (relatively) big graph. My 
graph is stored in a neo4j server. The structure is simple but highly 
interconnected:
- I have nodes containing longer texts
- and I have many nodes containing tokens of those texts.
Relationships connect tokens to texts so I have many relationships. The 
actual graph does have many other nodes too but this is irrelevant now. 
The graph contains 300k nodes, 2.5 million properties and 1 million 
relationships (and is still growing).

My question is how to execute querys from the graph. I have to execute 
operations that usually require querying huge parts of the graph. I 
mean: get all the tokens for some of the texts; or even get all the 
tokens. (I'm creating a text processing system that is learning and the 
teaching process involves manipulation of all tokens - I think it's much 
faster executed in memory rather then querying each token separately).

The naive solution (traverse the graph from root node with 1 depth to 
get all the nodes of a certain type) is now unsusabe since my graph is 
too big. The server simply runs out of memory (I gave it 1024 MB - this 
is around the maximum until the server gets a separate hardvare).

So my question is how to implement correctly and efficiently the 
querying of the graph? Should I create custom extensions that traverse 
and return only a part of the graph in such scenario? Or should I insert 
additional "control" nodes to the graph which can be used as reference 
points for querying? The main problem is that I have many same typed 
relationships. I don't know how to manage traversing the graph partially 
if it is only accessible through the REST protocol.

Any help would be appreciated!

Thanks in advance,
Miklós Kiss
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to