Re: [Neo4j] Loading RDF data questions

Bruno Paiva Lima da Silva Thu, 06 Oct 2011 08:13:46 -0700

Thanks for the quick answer Peter.

I don't know if you remember my talk @ Hannover, but for my PhD thesis 
project, my research team & I we translate all the RDF data we have as 
input, and we transform it into First-Order Logics (that's basically to 
maintain semantic equivalences with Datalog and Conceptual Graphs families).


That said, we don't try inserting an RDF file directly into Neo4J, but 
JAVA Objects representing the RDF files. (Btw, we also use Sail in order 
to compare the efficacity and effectiveness of GDB's against RDBs and 
TSs for our problem).

But, these objects aren't very complicated. For now, we just encapsulate 
Strings containing subject, predicate and object names.

That's why I asked the question this morning:

After parsing the RDF with Jena, I obtain a big list of atoms (in FOL, 
an atom represents an edge in a graph) which I try to store, using the 
method I have written before.

I see people in the mailing list working with very big datasets, and I 
ask myself what is going wrong for now since we haven't got further than 
200k triples (which is not big at all) using our methods.

Bruno

Le 06/10/2011 12:17, Peter Neubauer a écrit :
> Bruno,
>
> RDF support is provided via Josh Shinavier's SAIL implementation on
> top of Neo4j already.
>
> Look at the SPARQL-plugin-in-the-making,
> https://github.com/peterneubauer/sparql-plugin/blob/master/src/test/java/org/neo4j/server/plugin/sparql/BerlinDatasetTest.java
> for how to load a fiel into Neo4j as an RDF store, and how to query
> it. This is using a subset of the Berlin RDF dataset and queries,
> http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/ExploreUseCase/index.html,
> for instance.
>
> Does that help? I hope to get this into shape very soon, so you can
> use the Neo4j Server with the SPARQL plugin in order to load and query
> RDF and essentially turn the Neo4j Server into a Triple Store.
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org               - Your high performance graph database.
> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Thu, Oct 6, 2011 at 7:50 AM, Bruno Paiva Lima da Silva
> <[email protected]>  wrote:
>> Hello,
>>
>> I'm writing to ask whether I am using correctly Neo4J for loading and
>> storing RDF datasets.
>> For now my performances results have been quite bad. However, it seems
>> to me that I haven't understood well how to use the BatchInserter for
>> what I want to.
>>
>> So, I have RDF datasets that can go from 1K to 20M triples, and I want
>> to store them into an empty Neo4J graph.
>>
>> The method I use for the insertion is the following:
>>
>> - For each triple of my RDF data:
>> -- Check if there is a subject node in the graph. If yes, find it, if
>> not, create it.
>> -- Check if there is a object node in the graph. If yes, find it, if
>> not, create it.
>> -- Create an edge with a label "predicate" between subject and object.
>>
>> This method is quite simple and generic, but has also carries a quite
>> big problem:
>> It spends more time reading and searching than inserting.
>>
>> Having profiled its execution, it spends almost 90% of the time
>> searching if a given node exists.
>>
>> For now, I have tried to use Neo4J with simple transactions, then I have
>> switched to BatchInserter + LuceneIndex, but I still think there is
>> space to improve my program.
>>
>> That said, my questions are:
>> - Can anyone tell me, knowing how Neo4J works, how to improve my
>> insertion process or tell me if there is a better solution?
>> - If there are any big errors in my code. It's not yet very well
>> documented, but it is available here:
>> https://bitbucket.org/bplsilva/alaska-project/src/e7fdf2e9341b/src/fr/lirmm/graphik/alaska/impl/graph/neo4j/Neo4jFact.java
>>
>> Thank you very much,
>>
>> --
>> *PAIVA LIMA DA SILVA Bruno*
>> PhD Student in Informatics @ Univ. Montpellier 2
>> [ GraphIK Research Team: LIRMM, Montpellier (France) ]
>> Website: http://bplsilva.com<bplsilva.com>
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Loading RDF data questions

Reply via email to