SPARQL performance (new to the tech)

Steve Vestal Thu, 06 May 2021 11:55:20 -0700

I asked about this topic awhile ago and received some very helpful pointers from this forum (thanks again!). Here is the list I collected during some explorations:


http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov


https://www.dropbox.com/s/knudzewbiuqkqvy/SPARQL%20Optimisation%20101%20Tutorial.pptx?dl=0

https://events.static.linuxfound.org/sites/events/files/slides/SPARQL%20Optimisation%20101%20Tutorial.pdf

https://openproceedings.org/2014/conf/edbt/Gubichev014.pdf

http://sites.fas.harvard.edu/~cs265/papers/neumann-2008.pdf



On 5/6/2021 1:10 PM, Andy Seaborne wrote:

Hi there,

Showing the query would be helpful but some general remarks:
1/ If the query or the setup for Fuseki is needing more than the default heap size, then it might be that the Java JVM is getting into a state of heap exhaustion. This manifests as the CPU loading getting very high. It will seem like nothing is happening (waiting for response).
2/ The query may be expensive.

Things to look for
* cross products - two parts of the query pattern that are not
connected.

{ ?s ?p ?o . ?a ?b ?c } is N-squared the size of the database.

* sort, spilling to disk or combined with a cross product the query.
3/ If no results are coming back, then the query is form that does not stream back - sort, or CONSTRUCT maybe.
There was a useful presentation recently that talks about the principles of query efficiency.
SPARQL Query Optimization with Pavel Klinov
https://www.youtube.com/watch?v=16eMswT2x2Y

More inline:

On 06/05/2021 09:54, Martin Van Aken wrote:
Hi!
I'm Martin, I'm a software developer new to the Triples/SPARQL world. I'm currently building queries against a Fuseki/TDB backend (that I can work on too) and I'm getting into significant performance problems (including never
ending queries).
Are updates also happening at the same time?
Despite what I thought was a good search on the apache
jena website I could not find a lot of insight about performance
investigation so I'm trying it here.
Most of my data experience comes from the relational world (ex: PG) so I'm
sometimes drawing comparisons there.

To give some context my data set is around 15 linked concepts, with the
number of triples for each ranging from some hundreds to 500K - total less
than 2 millions (documents/authors/publication kind of data).

Unto questions:
- When I'm facing a slow query, what are my investigation options. Is there an equivalent of an "explain plan" in SQL pointing to the query specific slow points? What's the advised way for performance checks in
    SPARQL?
qparse --print=opt --file query.rq
- Are there any performance setups to be aware of on the server side? Like ways to check indexes are correctly built (outside of text search that
    I'm not working with for the moment)
- We're currently using TDB1. I've seen the transactional benefits of
    TDB2 - are there performance improvements too that would warrant a
    migration there ?
Not on the query side.

   Andy
Thanks a lot already!

Martin

OpenPGP_signature
Description: OpenPGP digital signature

Re: Jena / Fuseki / SPARQL performance (new to the tech)

Reply via email to