Hi Lorenz, I am using Index configuration file only for starting my Fuseki Server below is the content of my index file.
@prefix : <http://localhost/jena_example/#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text: <http://jena.apache.org/text#> . @prefix fuseki: <http://jena.apache.org/fuseki#> . @prefix tdb2: <http://jena.apache.org/2016/tdb#> . @prefix star: <http://stardog.com/tutorial/> . @prefix movie: <http://data.linkedmdb.org/resource/movie/> . ## Initialize text query [] ja:loadClass "org.apache.jena.query.text.TextQuery" . # A TextDataset is a regular dataset with a text index. text:TextDataset rdfs:subClassOf ja:RDFDataset . ## --------------------------------------------------------------- ## This URI must be fixed - it's used to assemble the text dataset. :text_dataset rdf:type text:TextDataset ; text:dataset <#dataset> ; text:index <#indexES>; . # A TDB datset used for RDF storage <#dataset> rdf:type tdb2:DatasetTDB2 ; tdb2:location "run/databases/SampleIndexDataset" ; tdb2:unionDefaultGraph false ; # Optional . <#indexES> a text:TextIndexES ; # A comma-separated list of Host:Port values of the ElasticSearch Cluster nodes. text:serverList "127.0.0.1:9300" ; # Name of the ElasticSearch Cluster. If not specified defaults to 'elasticsearch' text:clusterName "elasticsearch" ; # The number of shards for the index. Defaults to 1 text:shards "1" ; # The number of replicas for the index. Defaults to 1 text:replicas "1" ; # Name of the Index. defaults to jena-text text:indexName "jena-text" ; text:entityMap <#entMap> ; . # Mapping in the index # URI stored in field "uri" # rdfs:label is mapped to field "text" <#entMap> a text:EntityMap ; text:entityField "uri" ; text:defaultField "label" ; text:map ( [ text:field "label" ; text:predicate rdfs:label] ) . [] rdf:type fuseki:Server ; # Server-wide context parameters can be given here. # For example, to set query timeouts: on a server-wide basis: # Format 1: "1000" -- 1 second timeout # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to for rest of query. # See java doc for ARQ.queryTimeout # ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "10000" ] ; # Load custom code (rarely needed) # ja:loadClass "your.code.Class" ; # Services available. Only explicitly listed services are configured. # If there is a service description not linked from this list, it is ignored. fuseki:services ( <#service> #<#service_text_tdb> ) . <#service> rdf:type fuseki:Service ; fuseki:name "FusekiIndex" ; # http://host:port/tdb fuseki:serviceQuery "query" ; # SPARQL query service fuseki:serviceQuery "sparql" ; # SPARQL query service fuseki:serviceUpdate "update" ; # SPARQL query service fuseki:serviceUpload "upload" ; # Non-SPARQL upload service fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read and write) #fuseki:dataset <#dataset> ; fuseki:dataset :text_dataset ; . Regards, Deepali On Thu, Jan 7, 2021 at 12:31 PM Lorenz Buehmann < [email protected]> wrote: > no, I meant the whole content of the file not just the Fuseki part which > by the way as you can see just contains comments > > On 06.01.21 15:28, Deepali Singhavi wrote: > > Hi Lorenz, > > > > Please find the content of my configuration file and hope this is what > you > > are looking for. > > > > But I am using the same index.ttl file to start my fuseki server using > > below command. > > > > java -Xmx1200M -jar fuseki-server.jar --config=*LunceneIndex.ttl* > > > > # Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0 > > > > ## Fuseki Server configuration file. > > > > @prefix : <#> . > > @prefix fuseki: <http://jena.apache.org/fuseki#> . > > @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . > > @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . > > @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . > > > > [] rdf:type fuseki:Server ; > > # Example:: > > # Server-wide query timeout. > > # > > # Timeout - server-wide default: milliseconds. > > # Format 1: "1000" -- 1 second timeout > > # Format 2: "10000,60000" -- 10s timeout to first result, > > # then 60s timeout for the rest of query. > > # > > # See javadoc for ARQ.queryTimeout for details. > > # This can also be set on a per dataset basis in the dataset > assembler. > > # > > # ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "30000" ] > ; > > > > # Add any custom classes you want to load. > > # Must have a "public static void init()" method. > > # ja:loadClass "your.code.Class" ; > > > > # End triples. > > > > > > > > Regards, > > Deepali > > > > . > > > > > > On Wed, Jan 6, 2021 at 6:49 PM Lorenz Buehmann < > > [email protected]> wrote: > > > >> On 06.01.21 13:33, Deepali Singhavi wrote: > >>> Hi, > >>> > >>> Please find the requested details as below: > >>> > >>> Dataset - TDB2 Dataset > >>> Fuseki configuration- I am using the same index config file to start > >> fuseki > >>> server. What do you mean by fuseki configuration sorry I am not getting > >> it. > >> The config file for Fuseki which contains your text index config. In a > >> first glance this is the Fuseki config, not a Lucene config. The > >> App-Assembler file. Please post it here as content if the attachment > >> doesn't work. > >>> number of results of the query - There are 11 triples getting returned > >> from > >>> above query > >>> > >>> Thanks and Regards, > >>> Deepali > >>> > >>> On Tue, Jan 5, 2021 at 5:02 PM Lorenz Buehmann < > >>> [email protected]> wrote: > >>> > >>>> Ok, thanks for sharing the spreadsheet. > >>>> > >>>> We need more configuration infos: dataset, Fuseki configuration, > number > >>>> of results of the query. > >>>> > >>>> We didn't get the attachment of the assembler config. > >>>> > >>>> With no optimizer used, the text:query triple pattern should be > >>>> evaluated first - and depending on the number of matching literals, > >>>> faster than a scan with filter. But it depends. Also not sure if > >>>> text:query is preferred in query optimization, but I think so. Andy > >>>> knows better indeed > >>>> > >>>> On 04.01.21 12:11, Deepali Singhavi wrote: > >>>>> Hi, > >>>>> > >>>>> Sample size means number of triples? > >>>>> > >>>>> I have tried with 6000,40000,50000 and even with 1,00,000 triples. > >>>>> Please find the performance report attached with this email. > >>>>> > >>>>> Regards, > >>>>> Deepali > >>>>> > >>>>> On Mon, Jan 4, 2021 at 1:03 PM Lorenz Buehmann > >>>>> <[email protected] > >>>>> <mailto:[email protected]>> wrote: > >>>>> > >>>>> What is the sample size here? I mean, for a low number of > literals > >>>>> it's > >>>>> obvious that String containment check in Java isn't that slow. > The > >>>>> difference will most likely come from a large scan over literals > >> with > >>>>> containment check whereas with a Lucene index - which is > basically > >> an > >>>>> inverted index - it's obviously more efficient to lookup terms > for > >>>> the > >>>>> documents. > >>>>> > >>>>> On 04.01.21 05:56, Deepali Singhavi wrote: > >>>>> > Hi, > >>>>> > > >>>>> > I am trying to implement indexing for Fuseki using > >>>>> > Lucene/ElasticSearch using an assembler configuration file > >>>>> (attaching > >>>>> > file for reference) but there is no improvement in performance > >>>>> > (performance without index is better than with index). > >>>>> > > >>>>> > I am using sample data from *films.ttl* file. > >>>>> > > >>>>> > *Sample Query * > >>>>> > PREFIX text: <http://jena.apache.org/text#> > >>>>> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > >>>>> > select ?subject ?object > >>>>> > WHERE { > >>>>> > # Without Index > >>>>> > #?subject rdfs:label ?object . > >>>>> > #FILTER contains(?object,"City") > >>>>> > #With Index > >>>>> > ?subject text:query (rdfs:label "city"). > >>>>> > ?subject rdfs:label ?object . > >>>>> > } > >>>>> > > >>>>> > *Performance:* > >>>>> > > >>>>> > No of Triples > >>>>> > > >>>>> > > >>>>> > > >>>>> > No of Runs > >>>>> > > >>>>> > > >>>>> > > >>>>> > Without Index > >>>>> > > >>>>> > > >>>>> > > >>>>> > Lucene Index > >>>>> > > >>>>> > > >>>>> > > >>>>> > ElasticSearch Index > >>>>> > > >>>>> > 6918 > >>>>> > > >>>>> > > >>>>> > > >>>>> > 1 > >>>>> > > >>>>> > > >>>>> > > >>>>> > 16ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 18ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 19ms > >>>>> > > >>>>> > 2 > >>>>> > > >>>>> > > >>>>> > > >>>>> > 29ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 32ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 32ms > >>>>> > > >>>>> > 3 > >>>>> > > >>>>> > > >>>>> > > >>>>> > 22ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 23ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 21ms > >>>>> > > >>>>> > 4 > >>>>> > > >>>>> > > >>>>> > > >>>>> > 22ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 14ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 53ms > >>>>> > > >>>>> > 5 > >>>>> > > >>>>> > > >>>>> > > >>>>> > 15ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 19ms > >>>>> > > >>>>> > > >>>>> > > >>>>> > 18ms > >>>>> > > >>>>> > > >>>>> > Please let me know if any other information is required from my > >>>> side > >>>>> > and please suggest how I can improve performance. > >>>>> > > >>>>> > Regards, > >>>>> > Deepali > >>>>> > > >>>>> >
