rdfs:seeAlso JENA-1104

Can I suggest a 3rd option?

A static cache in TextDatasetFactory remembers text datasets created and returns the same one on each call for the same location/text index. c.f. string interns.

TDB does this in StoreConnection where there must be one DatasetGraphTDB per storage location or else chaos results.

A single intern table the same effect as assembler caching but also applies to java code as well and also across multiple assembler files (Fuseki can have multiple separate configurations). The entry key can include the Lucene Directory - not sure what else is needed.

There is an issue which can't be solved which is two differently configured text indexes over one Directory (I have never found a way to get the full configuration back out of a lucene index).

The second option might work for some cases - one case (not here) is two different datasets trying to share one text index. Update will break - the same text index is inside two transaction regimes.

        Andy


On 18/01/16 14:45, Rob Vesse wrote:
I would prefer the assembler option as that is only fixing the cause of
the specific bug and it in my mind fixes the assembler API semantics to
what I mentally expect

Rob

On 18/01/2016 14:09, "Brian McBride" <[email protected]> wrote:



On 22/12/15 18:22, Andy Seaborne wrote:
JENA-1104 suggests there is a ordering/timing issue and that it is not
Fuseki1/Fuseki2 expect that things happen in a different order.
I have investigated this further and I think I understand what is
happening.

If we have a configuration with the same dataset+text-index shared
between two services, then when the first service is built,
TextIndexLuceneAssembler is called to create  TextIndexLucene object.
When the second service is built, TextIndexLuceneAssembler is called
again and creates another TextIndexLucene object.

Both of these TextIndexLucene objects create a Lucene IndexWriter object
on the same directory.  That doesn't work because they both try to grab
the same lock and one fails.

I am happy to offer pull request to change this behaviour.  There are
broadly two strategies that I can see, and I'm wondering if there is a
preferred approach from the Jena team.

The first approach is to make a change the way the assemblers work to
only create one TextIndexLucene object per node in the configuration
graph.

A second approach is to modify the TextIndexLucene so that two or more
objects can operate on the same directory.

My default approach would be to make the change in the assembler code.

Brian

I'm not sure that a shared index across two different datasets will
work if updates are involved.  Maybe someone else can help with that.
The configuration I'm looking at is not an index shared across two data
sets - there is one index+tdb-dataset pair in the configuration.

What's fuseki:allowTimeoutOverride?  Is this a local build with the
code for that uncommented out?

     Andy

On 21/12/15 14:53, Brian McBride wrote:
The fuseki configuration below sets up two services with a shared
dataset.  The dataset has a lucene text index.

This configuration works on Fuseki 1.3.1.  Fuseki 2.3.1 fails to start.
The log output is shown below.  Looks like the lucene index may be
trying to grab a lock for the dataset twice.

If I change the second fuseki:dataset line to:

[[
      fuseki:dataset                        <#ds> ;
]]

then it works on Fuseki 2.3.1 and  Unexpectedly both services have
access to the text index, which doesn't seem right, thought suits me
for
the moment as I need both services to have access to the index.

Is there some configuration change I need to make between Fuseki 1 and
Fuseki 2?

Brian



Fuseki 2.3.1 log output

[[
2015-12-21 14:42:20.940 WARN  Config               :: Fuseki v2:
Management functions are always on the same port as the server.
--mgtPort ignored.
2015-12-21 14:42:21.062 INFO  Server               :: Fuseki 2.3.1
2015-12-08T09:24:07+0000
2015-12-21 14:42:21.229 INFO  Config               ::
FUSEKI_HOME=/usr/share/fuseki
2015-12-21 14:42:21.230 INFO  Config               ::
FUSEKI_BASE=/etc/fuseki
2015-12-21 14:42:21.233 INFO  Servlet              :: Initializing
Shiro
environment
2015-12-21 14:42:21.233 INFO  EnvironmentLoader    :: Starting Shiro
environment initialization.
2015-12-21 14:42:21.242 INFO  Config               :: Shiro file:
file:///etc/fuseki/shiro.ini
2015-12-21 14:42:21.415 INFO  EnvironmentLoader    :: Shiro environment
initialized in 181 ms.
2015-12-21 14:42:21.415 INFO  Config               :: Configuration
file: /etc/fuseki/config.ttl
2015-12-21 14:42:22.193 WARN  AssemblerHelp        :: ja:loadClass:
Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
org.apache.jena.tdb.TDB
2015-12-21 14:42:23.557 ERROR Server               :: Exception in
initialization: caught:
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
2015-12-21 14:42:23.577 INFO  Server               :: Started
2015/12/21
14:42:23 UTC on port 3030

]]



Fuseki configuration.

[[

# Licensed under the terms of
http://www.apache.org/licenses/LICENSE-2.0

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type fuseki:Server ;

     fuseki:services (
       <#service_ds>
       <#service_ds_timeout_override>
     ) .

# TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .



<#service_ds> rdf:type fuseki:Service ;
      rdfs:label                             "TDB Service (RW)" ;
      fuseki:name                            "ds" ;
      fuseki:serviceQuery                    "query" ;
      fuseki:dataset <#ds-with-lucene> ;
      .

<#service_ds_timeout_override>
      rdfs:label                            "TDB Service Query with
timeout override" ;
      fuseki:name                           "ds_to" ;
      fuseki:allowTimeoutOverride           true;
      fuseki:serviceQuery                   "query" ;
      fuseki:dataset <#ds-with-lucene> ;
      .

<#ds> rdf:type      tdb:DatasetTDB ;
                        tdb:location "/var/lib/fuseki/databases/ds" ;
       .


@prefix text:    <http://jena.apache.org/text#> .

[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .


<#ds-with-lucene>
      rdf:type     text:TextDataset;
      text:dataset   <#ds> ;
      text:index     <#indexLucene> ;
      .

<#indexLucene> a text:TextIndexLucene ;
      text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
      text:entityMap <#entMap> ;
      .

<#entMap> a text:EntityMap ;
      text:entityField      "uri" ;
      text:defaultField     "text" ;
      text:map (
           [
             text:field "text" ;
             text:predicate rdfs:label ;
           ]
           ) .
]]



--
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol
BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number
7016688)






Reply via email to