Thanks a lot Andy & Steve for the advices & material provided. This is going to be invaluable.
Martin On Thu, 6 May 2021 at 20:55, Steve Vestal <steve.ves...@adventiumlabs.com> wrote: > I asked about this topic awhile ago and received some very helpful > pointers from this forum (thanks again!). Here is the list I collected > during some explorations: > > http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov > > > https://www.dropbox.com/s/knudzewbiuqkqvy/SPARQL%20Optimisation%20101%20Tutorial.pptx?dl=0 > > > https://events.static.linuxfound.org/sites/events/files/slides/SPARQL%20Optimisation%20101%20Tutorial.pdf > > https://openproceedings.org/2014/conf/edbt/Gubichev014.pdf > > http://sites.fas.harvard.edu/~cs265/papers/neumann-2008.pdf > > > > On 5/6/2021 1:10 PM, Andy Seaborne wrote: > > Hi there, > > > > Showing the query would be helpful but some general remarks: > > > > 1/ If the query or the setup for Fuseki is needing more than the > > default heap size, then it might be that the Java JVM is getting into > > a state of heap exhaustion. This manifests as the CPU loading getting > > very high. It will seem like nothing is happening (waiting for response). > > > > 2/ The query may be expensive. > > > > Things to look for > > * cross products - two parts of the query pattern that are not > > connected. > > > > { ?s ?p ?o . ?a ?b ?c } is N-squared the size of the database. > > > > * sort, spilling to disk or combined with a cross product the query. > > > > 3/ If no results are coming back, then the query is form that does not > > stream back - sort, or CONSTRUCT maybe. > > > > There was a useful presentation recently that talks about the > > principles of query efficiency. > > > > SPARQL Query Optimization with Pavel Klinov > > https://www.youtube.com/watch?v=16eMswT2x2Y > > > > More inline: > > > > On 06/05/2021 09:54, Martin Van Aken wrote: > >> Hi! > >> I'm Martin, I'm a software developer new to the Triples/SPARQL world. > >> I'm > >> currently building queries against a Fuseki/TDB backend (that I can > >> work on > >> too) and I'm getting into significant performance problems (including > >> never > >> ending queries). > > > > Are updates also happening at the same time? > > > >> Despite what I thought was a good search on the apache > >> jena website I could not find a lot of insight about performance > >> investigation so I'm trying it here. > >> > >> Most of my data experience comes from the relational world (ex: PG) > >> so I'm > >> sometimes drawing comparisons there. > >> > >> To give some context my data set is around 15 linked concepts, with the > >> number of triples for each ranging from some hundreds to 500K - total > >> less > >> than 2 millions (documents/authors/publication kind of data). > >> > >> Unto questions: > >> > >> - When I'm facing a slow query, what are my investigation > >> options. Is > >> there an equivalent of an "explain plan" in SQL pointing to the > >> query > >> specific slow points? What's the advised way for performance > >> checks in > >> SPARQL? > > > > qparse --print=opt --file query.rq > > > >> - Are there any performance setups to be aware of on the server > >> side? > >> Like ways to check indexes are correctly built (outside of text > >> search that > >> I'm not working with for the moment) > >> - We're currently using TDB1. I've seen the transactional > >> benefits of > >> TDB2 - are there performance improvements too that would warrant a > >> migration there ? > > > > Not on the query side. > > > > Andy > > > >> > >> Thanks a lot already! > >> > >> Martin > >> > > -- *Martin Van Aken - **Freelance Enthusiast Developer* Mobile : +32 486 899 652 Follow me on Twitter : @martinvanaken <http://twitter.com/martinvanaken> Call me on Skype : vanakenm Hang out with me : mar...@joyouscoding.com Contact me on LinkedIn : http://www.linkedin.com/in/martinvanaken Company website : www.joyouscoding.com