Hello,

I'm writing to ask whether I am using correctly Neo4J for loading and 
storing RDF datasets.
For now my performances results have been quite bad. However, it seems 
to me that I haven't understood well how to use the BatchInserter for 
what I want to.

So, I have RDF datasets that can go from 1K to 20M triples, and I want 
to store them into an empty Neo4J graph.

The method I use for the insertion is the following:

- For each triple of my RDF data:
-- Check if there is a subject node in the graph. If yes, find it, if 
not, create it.
-- Check if there is a object node in the graph. If yes, find it, if 
not, create it.
-- Create an edge with a label "predicate" between subject and object.

This method is quite simple and generic, but has also carries a quite 
big problem:
It spends more time reading and searching than inserting.

Having profiled its execution, it spends almost 90% of the time 
searching if a given node exists.

For now, I have tried to use Neo4J with simple transactions, then I have 
switched to BatchInserter + LuceneIndex, but I still think there is 
space to improve my program.

That said, my questions are:
- Can anyone tell me, knowing how Neo4J works, how to improve my 
insertion process or tell me if there is a better solution?
- If there are any big errors in my code. It's not yet very well 
documented, but it is available here: 
https://bitbucket.org/bplsilva/alaska-project/src/e7fdf2e9341b/src/fr/lirmm/graphik/alaska/impl/graph/neo4j/Neo4jFact.java

Thank you very much,

-- 
*PAIVA LIMA DA SILVA Bruno*
PhD Student in Informatics @ Univ. Montpellier 2
[ GraphIK Research Team: LIRMM, Montpellier (France) ]
Website: http://bplsilva.com <bplsilva.com>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to