Sorry for the late answer ; I'm aware of the bad side of autocommit which I never use. I did wrap In Transaction the call to removeGraph I'll make measurements you asked to assert the respective CPU and elapsed times for loading RDF and indexing the text.
But for the time being, I had to solve my issue of loading data without stopping my SPARQL + HTML server . So I wrote a client RDF uploader, talking to the SPARQL graph store protocol : https://www.w3.org/TR/sparql11-http-rdf-update/ splitting given RDF file in chunks of 10000 triples for sending : https://github.com/jmvanel/semantic_forms/blob/master/scala/clients/src/main/scala/deductions/runtime/clients/RDFuploader.scala#L66 I used for the first time the Riot parser with callback (org.apache.jena.riot.system.StreamRDFBase) , which I'll also test for performance. It is understandable that it can be slow, since the input was a Turtle file , not N-Triple . On server side, I modularized my code, so that now several instances TDB(1) are created on the same directory, which is not a problem for TDB. But apparently this is a problem for Lucene: there is a LockObtainFailedException: "Lock held by this virtual machine: ../LUCENE/write.lock" when creating the second TDB instance connected to Lucene. So 'll ensure that only one TDB database is instantiated. Or maybe I use badly the API (it's configured by API not RDF configuration). NOTES - I'm not sure if LUCENE/write.lock is deleted in all cases when closing the TDB, although it has been specified at text index creation: TextDatasetFactory.create(... closeIndexOnDSGClose = true) - using the GUI Luke in lucene-8.5.2 is useful to inspect Lucene index Jean-Marc Vanel <http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me> +33 (0)6 89 16 29 52 Le sam. 6 juin 2020 à 11:45, Andy Seaborne <[email protected]> a écrit : > > > On 04/06/2020 10:25, Jean-Marc Vanel wrote: > > Hi > > > > It took hours loading a TTL document with text indexing (in TDB 3.15.0). > > The TTL document is Taxrefld_taxonomy_classes.ttl (size: 2_676_428 > triples) > > in zip taxref12-core.zip > > < > https://github.com/frmichel/taxref-ld/blob/master/dataset/12.0/taxref12-core.zip > > > > . > > Have you tried with and without the text index to get a information > about where the time is going? > > This is a combination setup so it is harder to say where time is going > without an experiment. > > > > > This method in DatasetGraph is called : > > public void add(Node g, Node s, Node p, Node o) ; > > > > With logging at debug level, it appeared that most of the elapsed time is > > taken by removing the graph, one entity at a time. > > > > In fact I explicitly call *removeGraph()* before, because the data is > > stored in provenance specific graphs in this database. > > The text index has to be updated as well, and I think there is nothing > special about removeGraph for a test index so it undoes all the indexing. > > Also - lucene indexing may be slower that the TDB part. > > > > > Is there a way to accelerate things ? > > I wondered if wrapping removeGraph()operation in a transaction is > mandatory > > or useful. > > useful - If you don't have a transaction, TDB1 is going to be less safe > for your data. > > > At runtime Jena does not protest about that ... > > TDB1 does not ... but it is better to use a transaction and its > mandatory for TDB2. > > Adding an autocommit mode is not as good as it may seem. Like in SQL, > autocommit is nothing more than an automatic transaction around each > step and very easily becomes extremely slow. > > Andy > > > > > A typical block in the data: > > <http://taxref.mnhn.fr/lod/taxon/629656/12.0> > > a owl:Class ; > > rdfs:isDefinedBy < > > http://taxref.mnhn.fr/lod/taxref-ld/12.0> ; > > > > * rdfs:label "Eranthemum pulchellum" ;* > > rdfs:subClassOf < > http://taxref.mnhn.fr/lod/taxon/452421/12.0> ; > > schema:mainEntityOfPage < > > https://inpn.mnhn.fr/espece/cd_nom/629656?lg=en> ; > > taxrefprop:habitat taxrefhab:FreshWater , > > taxrefhab:Terrestrial ; > > taxrefprop:hasRank taxrefrk:Species ; > > taxrefprop:hasReferenceName < > http://taxref.mnhn.fr/lod/name/629656> > > ; > > taxrefprop:hasSynonym < > http://taxref.mnhn.fr/lod/name/633029> > > , <http://taxref.mnhn.fr/lod/name/637984> , < > > http://taxref.mnhn.fr/lod/name/634312> ; > > foaf:homepage < > > https://inpn.mnhn.fr/espece/cd_nom/629656?lg=en> . > > > > Jean-Marc Vanel > > < > http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me > > > > +33 (0)6 89 16 29 52 > > Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui > > Chroniques jardin > > < > http://semantic-forms.cc:1952/history?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle > > > > >
