On 04/06/2020 10:25, Jean-Marc Vanel wrote:
Hi

It took hours loading a TTL document with text indexing (in TDB 3.15.0).
The TTL document is Taxrefld_taxonomy_classes.ttl (size: 2_676_428 triples)
in zip taxref12-core.zip
<https://github.com/frmichel/taxref-ld/blob/master/dataset/12.0/taxref12-core.zip>
  .

Have you tried with and without the text index to get a information about where the time is going?

This is a combination setup so it is harder to say where time is going without an experiment.


This method in DatasetGraph is called :
     public void add(Node g, Node s, Node p, Node o) ;

With logging at debug level, it appeared that most of the elapsed time is
taken by removing the graph, one entity at a time.
>
In fact I explicitly call *removeGraph()* before, because the data is
stored in provenance specific graphs in this database.

The text index has to be updated as well, and I think there is nothing special about removeGraph for a test index so it undoes all the indexing.

Also - lucene indexing may be slower that the TDB part.


Is there a way to accelerate things ?
I wondered if wrapping removeGraph()operation in a transaction is mandatory
or useful.

useful - If you don't have a transaction, TDB1 is going to be less safe for your data.

At runtime Jena does not protest about that ...

TDB1 does not ... but it is better to use a transaction and its mandatory for TDB2.

Adding an autocommit mode is not as good as it may seem. Like in SQL, autocommit is nothing more than an automatic transaction around each step and very easily becomes extremely slow.

    Andy


A typical block in the data:
<http://taxref.mnhn.fr/lod/taxon/629656/12.0>
         a                            owl:Class ;
         rdfs:isDefinedBy             <
http://taxref.mnhn.fr/lod/taxref-ld/12.0> ;

*        rdfs:label                   "Eranthemum pulchellum" ;*
rdfs:subClassOf              <http://taxref.mnhn.fr/lod/taxon/452421/12.0> ;
         schema:mainEntityOfPage      <
https://inpn.mnhn.fr/espece/cd_nom/629656?lg=en> ;
         taxrefprop:habitat           taxrefhab:FreshWater ,
taxrefhab:Terrestrial ;
         taxrefprop:hasRank           taxrefrk:Species ;
         taxrefprop:hasReferenceName  <http://taxref.mnhn.fr/lod/name/629656>
;
         taxrefprop:hasSynonym        <http://taxref.mnhn.fr/lod/name/633029>
, <http://taxref.mnhn.fr/lod/name/637984> , <
http://taxref.mnhn.fr/lod/name/634312> ;
         foaf:homepage                <
https://inpn.mnhn.fr/espece/cd_nom/629656?lg=en> .

Jean-Marc Vanel
<http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me>
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
  Chroniques jardin
<http://semantic-forms.cc:1952/history?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle>

Reply via email to