Re: No Improvement In Performance with indexing in Jena Fuseki

Deepali Singhavi Thu, 07 Jan 2021 02:30:09 -0800

Hi Lorenz,

I am using Index configuration file only for starting my Fuseki Server
below is the content of my index file.


@prefix :        <http://localhost/jena_example/#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix tdb2:  <http://jena.apache.org/2016/tdb#> .
@prefix star:   <http://stardog.com/tutorial/> .
@prefix movie: <http://data.linkedmdb.org/resource/movie/> .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .

# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;
    text:dataset   <#dataset> ;
text:index <#indexES>;
    .

# A TDB datset used for RDF storage
<#dataset> rdf:type      tdb2:DatasetTDB2 ;
    tdb2:location "run/databases/SampleIndexDataset" ;
    tdb2:unionDefaultGraph false ; # Optional
    .
<#indexES> a text:TextIndexES ;
      # A comma-separated list of Host:Port values of the ElasticSearch
Cluster nodes.
    text:serverList "127.0.0.1:9300" ;
      # Name of the ElasticSearch Cluster. If not specified defaults to
'elasticsearch'
    text:clusterName "elasticsearch" ;
      # The number of shards for the index. Defaults to 1
    text:shards "1" ;
      # The number of replicas for the index. Defaults to 1
    text:replicas "1" ;
      # Name of the Index. defaults to jena-text
    text:indexName "jena-text" ;
    text:entityMap <#entMap> ;
    .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "label" ;
    text:map (
[ text:field "label" ; text:predicate rdfs:label]
         ) .

[] rdf:type fuseki:Server ;
   # Server-wide context parameters can be given here.
   # For example, to set query timeouts: on a server-wide basis:
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, then 60s
timeout to for rest of query.
   # See java doc for ARQ.queryTimeout
   # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "10000" ] ;

   # Load custom code (rarely needed)
   # ja:loadClass "your.code.Class" ;

   # Services available.  Only explicitly listed services are configured.
   #  If there is a service description not linked from this list, it is
ignored.
   fuseki:services (
     <#service>
     #<#service_text_tdb>
   ) .

<#service>  rdf:type fuseki:Service ;
    fuseki:name              "FusekiIndex" ;       # http://host:port/tdb
    fuseki:serviceQuery               "query" ;    # SPARQL query service
    fuseki:serviceQuery               "sparql" ;   # SPARQL query service
    fuseki:serviceUpdate              "update" ;   # SPARQL query service
    fuseki:serviceUpload              "upload" ;   # Non-SPARQL upload
service
    fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store
protocol (read and write)
    #fuseki:dataset           <#dataset> ;
    fuseki:dataset                  :text_dataset ;
.

Regards,
Deepali

On Thu, Jan 7, 2021 at 12:31 PM Lorenz Buehmann <
[email protected]> wrote:

> no, I meant the whole content of the file not just the Fuseki part which
> by the way as you can see just contains comments
>
> On 06.01.21 15:28, Deepali Singhavi wrote:
> > Hi Lorenz,
> >
> > Please find the content of my configuration file and hope this is what
> you
> > are looking for.
> >
> > But I am using the same index.ttl file to start my fuseki server using
> > below command.
> >
> > java -Xmx1200M -jar fuseki-server.jar --config=*LunceneIndex.ttl*
> >
> > # Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0
> >
> > ## Fuseki Server configuration file.
> >
> > @prefix :        <#> .
> > @prefix fuseki:  <http://jena.apache.org/fuseki#> .
> > @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> > @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> > @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> >
> > [] rdf:type fuseki:Server ;
> >    # Example::
> >    # Server-wide query timeout.
> >    #
> >    # Timeout - server-wide default: milliseconds.
> >    # Format 1: "1000" -- 1 second timeout
> >    # Format 2: "10000,60000" -- 10s timeout to first result,
> >    #                            then 60s timeout for the rest of query.
> >    #
> >    # See javadoc for ARQ.queryTimeout for details.
> >    # This can also be set on a per dataset basis in the dataset
> assembler.
> >    #
> >    # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "30000" ]
> ;
> >
> >    # Add any custom classes you want to load.
> >    # Must have a "public static void init()" method.
> >    # ja:loadClass "your.code.Class" ;
> >
> >    # End triples.
> >
> >
> >
> > Regards,
> > Deepali
> >
> >    .
> >
> >
> > On Wed, Jan 6, 2021 at 6:49 PM Lorenz Buehmann <
> > [email protected]> wrote:
> >
> >> On 06.01.21 13:33, Deepali Singhavi wrote:
> >>> Hi,
> >>>
> >>> Please find the requested details as below:
> >>>
> >>> Dataset - TDB2 Dataset
> >>> Fuseki configuration- I am using the same index config file to start
> >> fuseki
> >>> server. What do you mean by fuseki configuration sorry I am not getting
> >> it.
> >> The config file for Fuseki which contains your text index config. In a
> >> first glance this is the Fuseki config, not a Lucene config. The
> >> App-Assembler file. Please post it here as content if the attachment
> >> doesn't work.
> >>> number of results of the query - There are 11 triples getting returned
> >> from
> >>> above query
> >>>
> >>> Thanks and Regards,
> >>> Deepali
> >>>
> >>> On Tue, Jan 5, 2021 at 5:02 PM Lorenz Buehmann <
> >>> [email protected]> wrote:
> >>>
> >>>> Ok, thanks for sharing the spreadsheet.
> >>>>
> >>>> We need more configuration infos: dataset, Fuseki configuration,
> number
> >>>> of results of the query.
> >>>>
> >>>> We didn't get  the attachment of the assembler config.
> >>>>
> >>>> With no optimizer used, the text:query triple pattern should be
> >>>> evaluated first - and depending on the number of matching literals,
> >>>> faster than a scan with filter. But it depends. Also not sure if
> >>>> text:query is preferred in query optimization, but I think so. Andy
> >>>> knows better indeed
> >>>>
> >>>> On 04.01.21 12:11, Deepali Singhavi wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Sample size means number of triples?
> >>>>>
> >>>>> I have tried with 6000,40000,50000 and even with 1,00,000 triples.
> >>>>> Please find the performance report attached with this email.
> >>>>>
> >>>>> Regards,
> >>>>> Deepali
> >>>>>
> >>>>> On Mon, Jan 4, 2021 at 1:03 PM Lorenz Buehmann
> >>>>> <[email protected]
> >>>>> <mailto:[email protected]>> wrote:
> >>>>>
> >>>>>     What is the sample size here? I mean, for a low number of
> literals
> >>>>>     it's
> >>>>>     obvious that String containment check in Java isn't that slow.
> The
> >>>>>     difference will most likely come from a large scan over literals
> >> with
> >>>>>     containment check whereas with a Lucene index - which is
> basically
> >> an
> >>>>>     inverted index - it's obviously more efficient to lookup terms
> for
> >>>> the
> >>>>>     documents.
> >>>>>
> >>>>>     On 04.01.21 05:56, Deepali Singhavi wrote:
> >>>>>     > Hi,
> >>>>>     >
> >>>>>     > I am trying to implement indexing for Fuseki using
> >>>>>     > Lucene/ElasticSearch using an assembler configuration file
> >>>>>     (attaching
> >>>>>     > file for reference) but there is no improvement in performance
> >>>>>     > (performance without index is better than with index).
> >>>>>     >
> >>>>>     > I am using sample data from *films.ttl* file.
> >>>>>     >
> >>>>>     > *Sample Query *
> >>>>>     > PREFIX text: <http://jena.apache.org/text#>
> >>>>>     > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> >>>>>     > select ?subject ?object
> >>>>>     > WHERE {
> >>>>>     > # Without Index
> >>>>>     > #?subject rdfs:label ?object .
> >>>>>     > #FILTER contains(?object,"City")
> >>>>>     > #With Index
> >>>>>     > ?subject text:query (rdfs:label "city").
> >>>>>     > ?subject rdfs:label ?object .
> >>>>>     > }
> >>>>>     >
> >>>>>     > *Performance:*
> >>>>>     >
> >>>>>     > No of Triples
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > No of Runs
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > Without Index
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > Lucene Index
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > ElasticSearch Index
> >>>>>     >
> >>>>>     > 6918
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 1
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 16ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 18ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 19ms
> >>>>>     >
> >>>>>     > 2
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 29ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 32ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 32ms
> >>>>>     >
> >>>>>     > 3
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 22ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 23ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 21ms
> >>>>>     >
> >>>>>     > 4
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 22ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 14ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 53ms
> >>>>>     >
> >>>>>     > 5
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 15ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 19ms
> >>>>>     >
> >>>>>     >
> >>>>>     >
> >>>>>     > 18ms
> >>>>>     >
> >>>>>     >
> >>>>>     > Please let me know if any other information is required from my
> >>>> side
> >>>>>     > and please suggest how I can improve performance.
> >>>>>     >
> >>>>>     > Regards,
> >>>>>     > Deepali
> >>>>>     >
> >>>>>
>

Re: No Improvement In Performance with indexing in Jena Fuseki

Reply via email to