On 22/12/15 18:22, Andy Seaborne wrote:
JENA-1104 suggests there is a ordering/timing issue and that it is not Fuseki1/Fuseki2 expect that things happen in a different order.
I have investigated this further and I think I understand what is happening.

If we have a configuration with the same dataset+text-index shared between two services, then when the first service is built, TextIndexLuceneAssembler is called to create TextIndexLucene object. When the second service is built, TextIndexLuceneAssembler is called again and creates another TextIndexLucene object.

Both of these TextIndexLucene objects create a Lucene IndexWriter object on the same directory. That doesn't work because they both try to grab the same lock and one fails.

I am happy to offer pull request to change this behaviour. There are broadly two strategies that I can see, and I'm wondering if there is a preferred approach from the Jena team.

The first approach is to make a change the way the assemblers work to only create one TextIndexLucene object per node in the configuration graph.

A second approach is to modify the TextIndexLucene so that two or more objects can operate on the same directory.

My default approach would be to make the change in the assembler code.

Brian

I'm not sure that a shared index across two different datasets will work if updates are involved. Maybe someone else can help with that.
The configuration I'm looking at is not an index shared across two data sets - there is one index+tdb-dataset pair in the configuration.

What's fuseki:allowTimeoutOverride? Is this a local build with the code for that uncommented out?

    Andy

On 21/12/15 14:53, Brian McBride wrote:
The fuseki configuration below sets up two services with a shared
dataset.  The dataset has a lucene text index.

This configuration works on Fuseki 1.3.1.  Fuseki 2.3.1 fails to start.
The log output is shown below.  Looks like the lucene index may be
trying to grab a lock for the dataset twice.

If I change the second fuseki:dataset line to:

[[
     fuseki:dataset                        <#ds> ;
]]

then it works on Fuseki 2.3.1 and  Unexpectedly both services have
access to the text index, which doesn't seem right, thought suits me for
the moment as I need both services to have access to the index.

Is there some configuration change I need to make between Fuseki 1 and
Fuseki 2?

Brian



Fuseki 2.3.1 log output

[[
2015-12-21 14:42:20.940 WARN  Config               :: Fuseki v2:
Management functions are always on the same port as the server.
--mgtPort ignored.
2015-12-21 14:42:21.062 INFO  Server               :: Fuseki 2.3.1
2015-12-08T09:24:07+0000
2015-12-21 14:42:21.229 INFO  Config               ::
FUSEKI_HOME=/usr/share/fuseki
2015-12-21 14:42:21.230 INFO  Config               ::
FUSEKI_BASE=/etc/fuseki
2015-12-21 14:42:21.233 INFO  Servlet              :: Initializing Shiro
environment
2015-12-21 14:42:21.233 INFO  EnvironmentLoader    :: Starting Shiro
environment initialization.
2015-12-21 14:42:21.242 INFO  Config               :: Shiro file:
file:///etc/fuseki/shiro.ini
2015-12-21 14:42:21.415 INFO  EnvironmentLoader    :: Shiro environment
initialized in 181 ms.
2015-12-21 14:42:21.415 INFO  Config               :: Configuration
file: /etc/fuseki/config.ttl
2015-12-21 14:42:22.193 WARN  AssemblerHelp        :: ja:loadClass:
Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
org.apache.jena.tdb.TDB
2015-12-21 14:42:23.557 ERROR Server               :: Exception in
initialization: caught:
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
2015-12-21 14:42:23.577 INFO  Server               :: Started 2015/12/21
14:42:23 UTC on port 3030

]]



Fuseki configuration.

[[

# Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type fuseki:Server ;

    fuseki:services (
      <#service_ds>
      <#service_ds_timeout_override>
    ) .

# TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .



<#service_ds> rdf:type fuseki:Service ;
     rdfs:label                             "TDB Service (RW)" ;
     fuseki:name                            "ds" ;
     fuseki:serviceQuery                    "query" ;
     fuseki:dataset <#ds-with-lucene> ;
     .

<#service_ds_timeout_override>
     rdfs:label                            "TDB Service Query with
timeout override" ;
     fuseki:name                           "ds_to" ;
     fuseki:allowTimeoutOverride           true;
     fuseki:serviceQuery                   "query" ;
     fuseki:dataset <#ds-with-lucene> ;
     .

<#ds> rdf:type      tdb:DatasetTDB ;
                       tdb:location "/var/lib/fuseki/databases/ds" ;
      .


@prefix text:    <http://jena.apache.org/text#> .

[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .


<#ds-with-lucene>
     rdf:type     text:TextDataset;
     text:dataset   <#ds> ;
     text:index     <#indexLucene> ;
     .

<#indexLucene> a text:TextIndexLucene ;
     text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
     text:entityMap <#entMap> ;
     .

<#entMap> a text:EntityMap ;
     text:entityField      "uri" ;
     text:defaultField     "text" ;
     text:map (
          [
            text:field "text" ;
            text:predicate rdfs:label ;
          ]
          ) .
]]



--
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)

Reply via email to