Recently I have worked on loading the content of DbPedia into my database and 
run into a performance issue.
My application has a meta-layer; inspired by the meta model component, but 
rewritten in Scala.
All DbPedia resources are said to be an instance of "topic", 
creating a relationship from that resource node to the node that describes the 
topic class.
This makes the "topic class" node of course densely populated.
The "topic class" node has relationships other than "HAS_INSTANCE", 
for example "SUB_CLASS_OF", which states that the "topic class" node is a 
subclass of "typable". 
When trying to retrieve the "SUB_CLASS_OF" relationships of the "topic class" 
node performance degrades enormously. 

It looks (please correct me if I am wrong in my assumption) as if all 
relationships are being scanned 
to filter out the "SUB_CLASS_OF" relationships (of which there are very few, 
especially compared to the "HAS_INSTANCE" relationship)
I ended up placing all "HAS_INSTANCE" into the Timeline index from 
Neo4j-graph-collections for two reasons,it's nice to know when a resource 
became an instance of a class (bonus), and to make sure that not a single 
nodebecomes heavily populated.
So far so good, but delving deeper into the Timeline index, I notice that the 
relationship between an entry nodeand the root of the tree is partially 
established by the use of a property on "entry node" which names the timeline 
index.
The simplest way to establish the relationship between an "entry node" and the 
tree root is by means of a Lucene index lookup.
This is of course not a very fastest solution and actually would mean the same 
as adding a property to the "resource node", listing the classes a resource is 
an instance of.
Adding a relationship from "entry node" to "tree root" in the Timeline 
component would create yet another densely populated nodein the database (in 
this case the tree root). 
Is there a way out of this situation? 
Would it be possible to partition the relationships in the database per 
relationship type per direction, so densely populated nodescan get traversed 
fast for those relationships types that are sparsely populated?
Niels
                                          
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to