Re: No Improvement In Performance with indexing in Jena Fuseki

Andy Seaborne Wed, 06 Jan 2021 06:11:37 -0800

The figures for the no index case (the only ones I can make sense ofbecause the rest are environment-dependent) look strange.


  ?subject rdfs:label ?object .
  FILTER contains(?object,"City")


is a simple query.

A Fuseki request, client and server on the same machine is might beexpensive first use because of java classloading and starting thedatabase (if it is a spnning disk, slower than SSD)

But a second call will be all in-memory with TDB caches already used bythe first call and is not going to be >10ms. 1ms is more likely justfor java+http before the JIT has triggered and it will drop to when theJIT optimizes the bytecode. (For any timing, you have to warm the systemto get meaningful results; more so for Java).

ES is a separate server so for small, short queries, the network costswill be significant.


But

  ?subject text:query (rdfs:label "city").
  ?subject rdfs:label ?object .

is a single text index request.

    Andy

On 06/01/2021 13:19, Lorenz Buehmann wrote:


On 06.01.21 13:33, Deepali Singhavi wrote:

Hi,

Please find the requested details as below:

Dataset - TDB2 Dataset
Fuseki configuration- I am using the same index config file to start fuseki
server. What do you mean by fuseki configuration sorry I am not getting it.

The config file for Fuseki which contains your text index config. In a
first glance this is the Fuseki config, not a Lucene config. The
App-Assembler file. Please post it here as content if the attachment
doesn't work.

number of results of the query - There are 11 triples getting returned from
above query

Thanks and Regards,
Deepali

On Tue, Jan 5, 2021 at 5:02 PM Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:

Ok, thanks for sharing the spreadsheet.

We need more configuration infos: dataset, Fuseki configuration, number
of results of the query.

We didn't get  the attachment of the assembler config.

With no optimizer used, the text:query triple pattern should be
evaluated first - and depending on the number of matching literals,
faster than a scan with filter. But it depends. Also not sure if
text:query is preferred in query optimization, but I think so. Andy
knows better indeed

On 04.01.21 12:11, Deepali Singhavi wrote:

Hi,

Sample size means number of triples?

I have tried with 6000,40000,50000 and even with 1,00,000 triples.
Please find the performance report attached with this email.

Regards,
Deepali

On Mon, Jan 4, 2021 at 1:03 PM Lorenz Buehmann
<buehm...@informatik.uni-leipzig.de
<mailto:buehm...@informatik.uni-leipzig.de>> wrote:

     What is the sample size here? I mean, for a low number of literals
     it's
     obvious that String containment check in Java isn't that slow. The
     difference will most likely come from a large scan over literals with
     containment check whereas with a Lucene index - which is basically an
     inverted index - it's obviously more efficient to lookup terms for

the

     documents.

     On 04.01.21 05:56, Deepali Singhavi wrote:
     > Hi,
     >
     > I am trying to implement indexing for Fuseki using
     > Lucene/ElasticSearch using an assembler configuration file
     (attaching
     > file for reference) but there is no improvement in performance
     > (performance without index is better than with index).
     >
     > I am using sample data from *films.ttl* file.
     >
     > *Sample Query *
     > PREFIX text: <http://jena.apache.org/text#>
     > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
     > select ?subject ?object
     > WHERE {
     > # Without Index
     > #?subject rdfs:label ?object .
     > #FILTER contains(?object,"City")
     > #With Index
     > ?subject text:query (rdfs:label "city").
     > ?subject rdfs:label ?object .
     > }
     >
     > *Performance:*
     >
     > No of Triples
     >
     >
     >
     > No of Runs
     >
     >
     >
     > Without Index
     >
     >
     >
     > Lucene Index
     >
     >
     >
     > ElasticSearch Index
     >
     > 6918
     >
     >
     >
     > 1
     >
     >
     >
     > 16ms
     >
     >
     >
     > 18ms
     >
     >
     >
     > 19ms
     >
     > 2
     >
     >
     >
     > 29ms
     >
     >
     >
     > 32ms
     >
     >
     >
     > 32ms
     >
     > 3
     >
     >
     >
     > 22ms
     >
     >
     >
     > 23ms
     >
     >
     >
     > 21ms
     >
     > 4
     >
     >
     >
     > 22ms
     >
     >
     >
     > 14ms
     >
     >
     >
     > 53ms
     >
     > 5
     >
     >
     >
     > 15ms
     >
     >
     >
     > 19ms
     >
     >
     >
     > 18ms
     >
     >
     > Please let me know if any other information is required from my

side

     > and please suggest how I can improve performance.
     >
     > Regards,
     > Deepali
     >

Re: No Improvement In Performance with indexing in Jena Fuseki

Reply via email to