Hi all, I'd like to get ideas on how to handle a (relatively) big graph. My graph is stored in a neo4j server. The structure is simple but highly interconnected: - I have nodes containing longer texts - and I have many nodes containing tokens of those texts. Relationships connect tokens to texts so I have many relationships. The actual graph does have many other nodes too but this is irrelevant now. The graph contains 300k nodes, 2.5 million properties and 1 million relationships (and is still growing).
My question is how to execute querys from the graph. I have to execute operations that usually require querying huge parts of the graph. I mean: get all the tokens for some of the texts; or even get all the tokens. (I'm creating a text processing system that is learning and the teaching process involves manipulation of all tokens - I think it's much faster executed in memory rather then querying each token separately). The naive solution (traverse the graph from root node with 1 depth to get all the nodes of a certain type) is now unsusabe since my graph is too big. The server simply runs out of memory (I gave it 1024 MB - this is around the maximum until the server gets a separate hardvare). So my question is how to implement correctly and efficiently the querying of the graph? Should I create custom extensions that traverse and return only a part of the graph in such scenario? Or should I insert additional "control" nodes to the graph which can be used as reference points for querying? The main problem is that I have many same typed relationships. I don't know how to manage traversing the graph partially if it is only accessible through the REST protocol. Any help would be appreciated! Thanks in advance, Miklós Kiss _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

