Code Ferret created JENA-1489:
---------------------------------

             Summary: models written twice on RDFConnection
                 Key: JENA-1489
                 URL: https://issues.apache.org/jira/browse/JENA-1489
             Project: Apache Jena
          Issue Type: Bug
          Components: Fuseki, Jena, TDB
    Affects Versions: Jena 3.7.0
         Environment: Jena 3.7.0-Snapshot, Java 1.8.0_131 on Mac OS 10.13.3, 
Java 1.8.0_151-8u151-b12-1~deb9u1-b12 on Debian Stretch
            Reporter: Code Ferret


*Problem*: I am transferring models via {{RDFConnection}} to {{TDB}} and seeing 
doubling of blank nodes in _some_ graphs as though the same model is written a 
second time *after* a commit during the transfer. I apologize in advance for 
the length of this report.

*Details*: We have a collection of entity types: Persons, Items, Works and so 
on. Each entity is a graph in a ttl file in a per type git repo. For each type, 
the ttl files are read from the corresponding repo into models and the models 
are added to a {{Dataset}} until the number of triples in the dataset exceeds a 
threshold, e.g., 50,000 triples. When the threshold is exceeded then the 
dataset is loaded to Fuseki via an RDFConnection:
{code:java}
fuConn = RDFConnectionFactory.connect(baseUrl, baseUrl+"/query", 
baseUrl+"/update", baseUrl+"/data");
{code}
which is opened once at the beginning of loading all entity types. The kernel 
of loading is performed via:
{code:java}
    private static void loadDatasetSimple(final Dataset ds) {
        if (!fuConn.isInTransaction()) {
            fuConn.begin(ReadWrite.WRITE);
        }
        fuConn.loadDataset(ds);
        fuConn.commit();
    }
{code}
The {{loadDatasetSimple}} is called until all of the entities of a given type 
have been loaded from the corresponding repo. Since there may be some models 
not yet transferred after reading in all of the entities of a given type then a 
finish method is called:
{code:java}
    static void finishDatasetTransfers() {
        // if map is not empty, transfer the last one
        if (currentDataset != null) {
            loadDatasetSimple(currentDataset);
        }
    }
{code}
After loading a given type of entity the next type in a list of types to 
transfer is processed as described above and this is when the problem is 
noticed.

Once enough models of the next type have been added to the transfer dataset and 
that dataset is transferred via {{loadDatasetSimple}} then _some_ of the 
previously transferred graphs exhibit doubled blank nodes. Here is {{describe 
bdr:P58}} to illustrate the doubling:
{code:java}
@prefix :      <http://purl.bdrc.io/ontology/core/> .
@prefix bdr:   <http://purl.bdrc.io/resource/> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix adm:   <http://purl.bdrc.io/ontology/admin/> .

bdr:P58  a                :Person ;
        adm:gitRevision   "e5e094dd8803f851448aac6ff3a800205ff8ef00" ;
        adm:status        bdr:StatusReleased ;
        :hasFather        bdr:P4342 ;
        :hasMother        bdr:P4343 ;
        :personEvent      [ a                  :PersonOccupiesSeat ;
                            :personEventPlace  bdr:G227
                          ] ;
        :personEvent      [ a                  :PersonOccupiesSeat ;
                            :personEventPlace  bdr:G227
                          ] ;
        :personEvent      [ a                  :PersonBirth ;
                            :onOrAbout         "1402" ;
                            :personEventPlace  bdr:G547
                          ] ;
        :personEvent      [ a                  :PersonOccupiesSeat ;
                            :personEventPlace  bdr:G235
                          ] ;
        :personEvent      [ a                  :PersonOccupiesSeat ;
                            :personEventPlace  bdr:G235
                          ] ;
        :personEvent      [ a           :PersonDeath ;
                            :onOrAbout  "1472"
                          ] ;
        :personEvent      [ a           :PersonDeath ;
                            :onOrAbout  "1472"
                          ] ;
        :personEvent      [ a                  :PersonBirth ;
                            :onOrAbout         "1402" ;
                            :personEventPlace  bdr:G547
                          ] ;
        :personGender     bdr:GenderMale ;
        :personName       [ a           :PersonPrimaryTitle ;
                            rdfs:label  "spyan snga blo gros rgyal 
mtshan/"@bo-x-ewts
                          ] ;
        :personName       [ a           :PersonPrimaryTitle ;
                            rdfs:label  "spyan snga blo gros rgyal 
mtshan/"@bo-x-ewts
                          ] ;
        :personName       [ a           :PersonChineseName ;
                            rdfs:label  "金厄·洛卓坚赞"@zh
                          ] ;
        :personName       [ a           :PersonTitle ;
                            rdfs:label  "rgya ma spyan snga ba blo gros rgyal 
mtshan/"@bo-x-ewts
                          ] ;
        :personName       [ a           :PersonPrimaryName ;
                            rdfs:label  "blo gros rgyal mtshan/"@bo-x-ewts
                          ] ;
        :personName       [ a           :PersonTitle ;
                            rdfs:label  "rgya ma spyan snga ba blo gros rgyal 
mtshan/"@bo-x-ewts
                          ] ;
        :personName       [ a           :PersonPrimaryName ;
                            rdfs:label  "blo gros rgyal mtshan/"@bo-x-ewts
                          ] ;
        :personName       [ a           :PersonFirstOrdinationName ;
                            rdfs:label  "blo gros rgyal mtshan/"@bo-x-ewts
                          ] ;
        :personName       [ a           :PersonChineseName ;
                            rdfs:label  "金厄·洛卓坚赞"@zh
                          ] ;
        :personName       [ a           :PersonFirstOrdinationName ;
                            rdfs:label  "blo gros rgyal mtshan/"@bo-x-ewts
                          ] ;
        skos:prefLabel    "blo gros rgyal mtshan/"@bo-x-ewts .
{code}
This doubling is completely reproducible and the same graphs exhibit doubling 
on each trial.

Varying the threshold changes which graphs and how many graphs exhibit 
doubling. If the threshold is set higher, e.g., to 100,000 triples per call to 
{{loadDatasetSimple}} then many more graphs exhibit doubling. If the threshold 
is set lower, say to 20,000 triples, then fewer graphs exhibit doubling. If 
only a single model at-a-time is transferred then there is no doubling,

Also if each type of entity is transferred separately - opening the connection, 
transferring all models of the type, then closing down via:
{code:java}
    public static void closeConnections() {
        TransferHelpers.logger.info("closeConnections fuConn.commit, end, 
close");
        FusekiHelpers.fuConn.commit();
        FusekiHelpers.fuConn.end();
        FusekiHelpers.fuConn.close();
    }
{code}
There is no doubling.

It appears that models that have already been transferred and committed are 
being written a second time when switching to a new type and upon the first 
transfer via {{loadDatasetSimple}} of the new type.

I'm hoping there's enough information in this report to identify what sort of 
error in usage of {{RDFConnection}} and/or {{TDB}} would account for this 
behavior. If this appears to be a bug in Jena then I will have to expend more 
effort to create a relatively self-contained test case.

Here is the relevant portion of the Fuseki configuration:
{code:java}
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix :        <http://base/#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix skos:    <http://www.w3.org/2004/02/skos/core#> .

[] rdf:type fuseki:Server ;
   fuseki:services (
     :bdrcrw
   ) .

:bdrcrw rdf:type fuseki:Service ;
    fuseki:name                       "bdrcrw" ;   # name of the dataset in the 
url
    fuseki:serviceQuery               "query" ;    # SPARQL query service
    fuseki:serviceUpdate              "update" ;   # SPARQL update service
    fuseki:serviceUpload              "upload" ;   # Non-SPARQL upload service
    fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store 
protocol (read and write)
    fuseki:dataset                    :bdrc_text_dataset ;
    .

:bdrc_text_dataset rdf:type     text:TextDataset ;
    text:dataset   :dataset_bdrc ;
    text:index     :bdrc_lucene_index ;
    .

:dataset_bdrc rdf:type      tdb:DatasetTDB ;
     tdb:location "/etc/fuseki/databases/bdrc" ;
     tdb:unionDefaultGraph true ;
     .
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to