Bruno, I think using the LuceneBatchInserter extensively for lookups during insertion is not a good idea perfomance-wise, see http://wiki.neo4j.org/content/Indexing_with_BatchInserter
I would suggest first adding the nodes, and then doing a second pass for the relationships. Maybe that involves massaging the ingoing data into a better insert format? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Oct 6, 2011 at 7:50 AM, Bruno Paiva Lima da Silva <[email protected]> wrote: > Hello, > > I'm writing to ask whether I am using correctly Neo4J for loading and > storing RDF datasets. > For now my performances results have been quite bad. However, it seems > to me that I haven't understood well how to use the BatchInserter for > what I want to. > > So, I have RDF datasets that can go from 1K to 20M triples, and I want > to store them into an empty Neo4J graph. > > The method I use for the insertion is the following: > > - For each triple of my RDF data: > -- Check if there is a subject node in the graph. If yes, find it, if > not, create it. > -- Check if there is a object node in the graph. If yes, find it, if > not, create it. > -- Create an edge with a label "predicate" between subject and object. > > This method is quite simple and generic, but has also carries a quite > big problem: > It spends more time reading and searching than inserting. > > Having profiled its execution, it spends almost 90% of the time > searching if a given node exists. > > For now, I have tried to use Neo4J with simple transactions, then I have > switched to BatchInserter + LuceneIndex, but I still think there is > space to improve my program. > > That said, my questions are: > - Can anyone tell me, knowing how Neo4J works, how to improve my > insertion process or tell me if there is a better solution? > - If there are any big errors in my code. It's not yet very well > documented, but it is available here: > https://bitbucket.org/bplsilva/alaska-project/src/e7fdf2e9341b/src/fr/lirmm/graphik/alaska/impl/graph/neo4j/Neo4jFact.java > > Thank you very much, > > -- > *PAIVA LIMA DA SILVA Bruno* > PhD Student in Informatics @ Univ. Montpellier 2 > [ GraphIK Research Team: LIRMM, Montpellier (France) ] > Website: http://bplsilva.com <bplsilva.com> > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

