Bruno,
I think using the LuceneBatchInserter extensively for lookups during
insertion is not a good idea perfomance-wise, see
http://wiki.neo4j.org/content/Indexing_with_BatchInserter

I would suggest first adding the nodes, and then doing a second pass
for the relationships. Maybe that involves massaging the ingoing data
into a better insert format?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Thu, Oct 6, 2011 at 7:50 AM, Bruno Paiva Lima da Silva
<[email protected]> wrote:
> Hello,
>
> I'm writing to ask whether I am using correctly Neo4J for loading and
> storing RDF datasets.
> For now my performances results have been quite bad. However, it seems
> to me that I haven't understood well how to use the BatchInserter for
> what I want to.
>
> So, I have RDF datasets that can go from 1K to 20M triples, and I want
> to store them into an empty Neo4J graph.
>
> The method I use for the insertion is the following:
>
> - For each triple of my RDF data:
> -- Check if there is a subject node in the graph. If yes, find it, if
> not, create it.
> -- Check if there is a object node in the graph. If yes, find it, if
> not, create it.
> -- Create an edge with a label "predicate" between subject and object.
>
> This method is quite simple and generic, but has also carries a quite
> big problem:
> It spends more time reading and searching than inserting.
>
> Having profiled its execution, it spends almost 90% of the time
> searching if a given node exists.
>
> For now, I have tried to use Neo4J with simple transactions, then I have
> switched to BatchInserter + LuceneIndex, but I still think there is
> space to improve my program.
>
> That said, my questions are:
> - Can anyone tell me, knowing how Neo4J works, how to improve my
> insertion process or tell me if there is a better solution?
> - If there are any big errors in my code. It's not yet very well
> documented, but it is available here:
> https://bitbucket.org/bplsilva/alaska-project/src/e7fdf2e9341b/src/fr/lirmm/graphik/alaska/impl/graph/neo4j/Neo4jFact.java
>
> Thank you very much,
>
> --
> *PAIVA LIMA DA SILVA Bruno*
> PhD Student in Informatics @ Univ. Montpellier 2
> [ GraphIK Research Team: LIRMM, Montpellier (France) ]
> Website: http://bplsilva.com <bplsilva.com>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to