The figures for the no index case (the only ones I can make sense of
because the rest are environment-dependent) look strange.
?subject rdfs:label ?object .
FILTER contains(?object,"City")
is a simple query.
A Fuseki request, client and server on the same machine is might be
expensive first use because of java classloading and starting the
database (if it is a spnning disk, slower than SSD)
But a second call will be all in-memory with TDB caches already used by
the first call and is not going to be >10ms. 1ms is more likely just
for java+http before the JIT has triggered and it will drop to when the
JIT optimizes the bytecode. (For any timing, you have to warm the system
to get meaningful results; more so for Java).
ES is a separate server so for small, short queries, the network costs
will be significant.
But
?subject text:query (rdfs:label "city").
?subject rdfs:label ?object .
is a single text index request.
Andy
On 06/01/2021 13:19, Lorenz Buehmann wrote:
On 06.01.21 13:33, Deepali Singhavi wrote:
Hi,
Please find the requested details as below:
Dataset - TDB2 Dataset
Fuseki configuration- I am using the same index config file to start fuseki
server. What do you mean by fuseki configuration sorry I am not getting it.
The config file for Fuseki which contains your text index config. In a
first glance this is the Fuseki config, not a Lucene config. The
App-Assembler file. Please post it here as content if the attachment
doesn't work.
number of results of the query - There are 11 triples getting returned from
above query
Thanks and Regards,
Deepali
On Tue, Jan 5, 2021 at 5:02 PM Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:
Ok, thanks for sharing the spreadsheet.
We need more configuration infos: dataset, Fuseki configuration, number
of results of the query.
We didn't get the attachment of the assembler config.
With no optimizer used, the text:query triple pattern should be
evaluated first - and depending on the number of matching literals,
faster than a scan with filter. But it depends. Also not sure if
text:query is preferred in query optimization, but I think so. Andy
knows better indeed
On 04.01.21 12:11, Deepali Singhavi wrote:
Hi,
Sample size means number of triples?
I have tried with 6000,40000,50000 and even with 1,00,000 triples.
Please find the performance report attached with this email.
Regards,
Deepali
On Mon, Jan 4, 2021 at 1:03 PM Lorenz Buehmann
<buehm...@informatik.uni-leipzig.de
<mailto:buehm...@informatik.uni-leipzig.de>> wrote:
What is the sample size here? I mean, for a low number of literals
it's
obvious that String containment check in Java isn't that slow. The
difference will most likely come from a large scan over literals with
containment check whereas with a Lucene index - which is basically an
inverted index - it's obviously more efficient to lookup terms for
the
documents.
On 04.01.21 05:56, Deepali Singhavi wrote:
> Hi,
>
> I am trying to implement indexing for Fuseki using
> Lucene/ElasticSearch using an assembler configuration file
(attaching
> file for reference) but there is no improvement in performance
> (performance without index is better than with index).
>
> I am using sample data from *films.ttl* file.
>
> *Sample Query *
> PREFIX text: <http://jena.apache.org/text#>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> select ?subject ?object
> WHERE {
> # Without Index
> #?subject rdfs:label ?object .
> #FILTER contains(?object,"City")
> #With Index
> ?subject text:query (rdfs:label "city").
> ?subject rdfs:label ?object .
> }
>
> *Performance:*
>
> No of Triples
>
>
>
> No of Runs
>
>
>
> Without Index
>
>
>
> Lucene Index
>
>
>
> ElasticSearch Index
>
> 6918
>
>
>
> 1
>
>
>
> 16ms
>
>
>
> 18ms
>
>
>
> 19ms
>
> 2
>
>
>
> 29ms
>
>
>
> 32ms
>
>
>
> 32ms
>
> 3
>
>
>
> 22ms
>
>
>
> 23ms
>
>
>
> 21ms
>
> 4
>
>
>
> 22ms
>
>
>
> 14ms
>
>
>
> 53ms
>
> 5
>
>
>
> 15ms
>
>
>
> 19ms
>
>
>
> 18ms
>
>
> Please let me know if any other information is required from my
side
> and please suggest how I can improve performance.
>
> Regards,
> Deepali
>