Hello, I'm writing to ask whether I am using correctly Neo4J for loading and storing RDF datasets. For now my performances results have been quite bad. However, it seems to me that I haven't understood well how to use the BatchInserter for what I want to.
So, I have RDF datasets that can go from 1K to 20M triples, and I want to store them into an empty Neo4J graph. The method I use for the insertion is the following: - For each triple of my RDF data: -- Check if there is a subject node in the graph. If yes, find it, if not, create it. -- Check if there is a object node in the graph. If yes, find it, if not, create it. -- Create an edge with a label "predicate" between subject and object. This method is quite simple and generic, but has also carries a quite big problem: It spends more time reading and searching than inserting. Having profiled its execution, it spends almost 90% of the time searching if a given node exists. For now, I have tried to use Neo4J with simple transactions, then I have switched to BatchInserter + LuceneIndex, but I still think there is space to improve my program. That said, my questions are: - Can anyone tell me, knowing how Neo4J works, how to improve my insertion process or tell me if there is a better solution? - If there are any big errors in my code. It's not yet very well documented, but it is available here: https://bitbucket.org/bplsilva/alaska-project/src/e7fdf2e9341b/src/fr/lirmm/graphik/alaska/impl/graph/neo4j/Neo4jFact.java Thank you very much, -- *PAIVA LIMA DA SILVA Bruno* PhD Student in Informatics @ Univ. Montpellier 2 [ GraphIK Research Team: LIRMM, Montpellier (France) ] Website: http://bplsilva.com <bplsilva.com> _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

