RE: Tried to compact a live TDB2 dataset using Lucene engine but failed

李惠玲 Tue, 08 Dec 2020 21:48:12 -0800

Thank you Rob, we've tried what you suggested and it worked! 

Also glad to know there will be an enhancement about similar issue, it would 
definitely be helpful.


Thank you all for spending time viewing and answering this question, it's 
grateful to have this mailing list for support.

Thanks again!
Huiling Lee

-----Original Message-----
From: Rob Vesse <[email protected]> 
Sent: Tuesday, December 8, 2020 5:42 PM
To: [email protected]
Subject: Re: Tried to compact a live TDB2 dataset using Lucene engine but failed

Hi

In your configuration file you currently define a single service that exposes 
your #dataset_fulltext dataset, what myself and Andy were suggesting was that 
you should also define another service that exposes #tdb_dataset_readwrite 
itself.  Then you could submit compact requests to that dataset and they would 
work i.e. you would have a second service so you could call 
/$/compact/<other-dataset> where <other-dataset> is the name of the additional 
service you define in your configuration.

:service_tdb_direct  a                       fuseki:Service ;
                rdfs:label                        "TDB2 direct access" ;
                fuseki:name                       "tdb" ;      
                fuseki:dataset                    <#tdb_dataset_readwrite> .

So the above added to your configuration would allow you to POST to 
/$/compact/tdb to compact the TDB2 dataset.

I would also note that Andy has already started on an improvement (JENA-2010 
[1]) that would allow compaction to work on text datasets, and other similar 
dataset wrappers, when the underlying dataset is TDB2.  Once we add that 
enhancement you will not need to use the suggested workaround in future.

Rob

[1] https://github.com/apache/jena/pull/883

On 08/12/2020, 07:31, "李惠玲" <[email protected]> wrote:

    Thank you Rob, for your answer and suggestion.

    But what exactly the "define another service that exposes the TDB2 dataset 
directly" means? Could you please share more details or examples?

    I apologize if this sounds rude, we basically set this service by personal 
understanding of online docs from jena.apache.org, maybe there's some 
misinterpretation.

    Hope you could give us some hint, thank you.


    Regards,
    Huiling Lee

    -----Original Message-----
    From: Rob Vesse <[email protected]> 
    Sent: Monday, December 7, 2020 7:08 PM
    To: [email protected]
    Subject: Re: Tried to compact a live TDB2 dataset using Lucene engine but 
failed

    Online compaction will only work if the dataset is directly a TDB2 dataset, 
if it's wrapped in another dataset e.g. TextDataset as in this example then 
that won't work

    You would need to define another service that exposes the TDB2 dataset 
directly and then instead call /$/compact/<other-dataset>

    Rob

    On 07/12/2020, 01:04, "李惠玲" <[email protected]> wrote:

        Hi,

        To Lorenz:
        Yes, we are using SNAPSHOT version.

        To Andy:
        We tried using "http://location:8080/fuseki/$/compact/lod " to run 
online compact

        And here is the config file:
        
================================================================================
        @prefix :      <http://base/#> .
        @prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
        @prefix tdb2:  <http://jena.apache.org/2016/tdb#> .
        @prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
        @prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
        @prefix fuseki: <http://jena.apache.org/fuseki#> .
        @prefix text:  <http://jena.apache.org/text#> .
        @prefix madsrdf: <http://www.loc.gov/mads/rdf/v1#> .

        tdb2:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .

        ja:DatasetTxnMem  rdfs:subClassOf  ja:RDFDataset .

        <http://jena.hpl.hp.com/2008/tdb#DatasetTDB>
                rdfs:subClassOf  ja:RDFDataset .

        <http://jena.hpl.hp.com/2008/tdb#GraphTDB>
                rdfs:subClassOf  ja:Model .

        tdb2:GraphTDB2  rdfs:subClassOf  ja:Model .

        ja:MemoryDataset  rdfs:subClassOf  ja:RDFDataset .

        ja:RDFDatasetZero  rdfs:subClassOf  ja:RDFDataset .

        <http://jena.apache.org/text#TextDataset>
                rdfs:subClassOf  ja:RDFDataset .

        :service_tdb_all  a                       fuseki:Service ;
                rdfs:label                        "TDB2 lod" ;
                fuseki:name                       "lod" ;      

                fuseki:endpoint [
                    fuseki:operation fuseki:query ;
                    fuseki:name "query" ;
                ] ;
                fuseki:endpoint [
                    fuseki:operation fuseki:gsp-r ;
                    fuseki:name "get" ;
                ] ;
                fuseki:endpoint [ 
                    fuseki:operation fuseki:gsp-rw ; 
                    fuseki:name "data" ;
                    #fuseki:allowedUsers "*" ;
                ] ; 
                fuseki:endpoint [
                    fuseki:operation fuseki:update ;
                    fuseki:name "update" ;
                    #fuseki:allowedUsers "*" ;
                ] ; 
                fuseki:endpoint [ 
                    fuseki:operation fuseki:upload ;
                    fuseki:name "upload" ;
                    #fuseki:allowedUsers "*" ;
                ] ;

                fuseki:dataset                    <#dataset_fulltext> .

        <#dataset_fulltext> rdf:type     text:TextDataset ;
            text:dataset     <#tdb_dataset_readwrite> ;
            text:index       <#indexLucene> .

        <#indexLucene> a text:TextIndexLucene ;
            text:directory  <file:luceneIndex/lod> ;    
            text:entityMap <#entMap> ;
            text:storeValues true ; 
            text:analyzer [ 
                a text:StandardAnalyzer 
            ] ;    
            text:queryAnalyzer [
                a text:StandardAnalyzer 
            ] ;
            text:queryParser text:AnalyzingQueryParser ;    
            text:multilingualSupport true . # optional

        <#entMap> a text:EntityMap ;
            text:defaultField     "authoritativeLabel" ; 
            text:entityField      "uri" ;
            text:uidField         "uid" ;
            text:langField        "lang" ;
            text:graphField       "graph" ;
            text:map (
                [ text:field "authoritativeLabel" ; text:predicate 
madsrdf:authoritativeLabel]
                [ text:field "variantLabel" ; text:predicate 
madsrdf:variantLabel]
                [ text:field "citation-note" ; text:predicate 
madsrdf:citation-note]
                [ text:field "citation-source" ; text:predicate 
madsrdf:citation-source]
            ) .

        <#tdb_dataset_readwrite>
                a              tdb2:DatasetTDB2 ;
                tdb2:unionDefaultGraph true ;
                tdb2:location  "apache-jena-fuseki/databases/lod" .

        tdb2:GraphTDB  rdfs:subClassOf  ja:Model .

        ja:RDFDatasetOne  rdfs:subClassOf  ja:RDFDataset .

        ja:RDFDatasetSink  rdfs:subClassOf  ja:RDFDataset .

        tdb2:DatasetTDB2  rdfs:subClassOf  ja:RDFDataset .

        
==============================================================================

        Thank you for your kindly help.

        Regards,
        Huiling Lee

        -----Original Message-----
        From: Andy Seaborne <[email protected]> 
        Sent: Friday, December 4, 2020 6:20 PM
        To: [email protected]
        Subject: Re: Tried to compact a live TDB2 dataset using Lucene engine 
but failed

        Hi - without a config file it's not tpossible to be definitive.

        A guess - are you applying compact to the text dataset (which may use
        TDbB2 for storage)? A text dataset may have various different storages.

        I think you will need to directly expose the TDB2 dataset and send the 
"compact" request to that.

             Andy

        On 04/12/2020 10:08, Lorenz Buehmann wrote:
        > without seeing the config file it's just a guess, but did you use TDB 
        > or TDB2?
        >
        > Please share the config file.
        >
        > Minor: there is no 3.18.0 release unless you're using the SNAPSHOT 
        > version for whatever reason.
        >
        > On 04.12.20 10:45, 李惠玲 wrote:
        >> Our project implemented Jena Fuseki server (3.18.0) and using Lucene 
(7.7.x) as fulltext search engine.
        >> As for the 
doc<https://jena.apache.org/documentation/tdb2/tdb2_admin.html> from 
jena.apache.org, it's possible to run a live compaction on TDB2 dataset, which 
we tried to do, but we got error message says "Not a TDB2 dataset: Compact only 
applies to TDB2".
        >> We are confused and don't know where the problem may be, could 
someone help to figure out?
        >>
        >> Thanks!

RE: Tried to compact a live TDB2 dataset using Lucene engine but failed

Reply via email to