SPARQL performance (new to the tech)

Martin Van Aken Tue, 11 May 2021 08:04:32 -0700

Thanks a lot Andy & Steve for the advices & material provided. This is
going to be invaluable.


Martin

On Thu, 6 May 2021 at 20:55, Steve Vestal <steve.ves...@adventiumlabs.com>
wrote:

> I asked about this topic awhile ago and received some very helpful
> pointers from this forum (thanks again!).  Here is the list I collected
> during some explorations:
>
> http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov
>
>
> https://www.dropbox.com/s/knudzewbiuqkqvy/SPARQL%20Optimisation%20101%20Tutorial.pptx?dl=0
>
>
> https://events.static.linuxfound.org/sites/events/files/slides/SPARQL%20Optimisation%20101%20Tutorial.pdf
>
> https://openproceedings.org/2014/conf/edbt/Gubichev014.pdf
>
> http://sites.fas.harvard.edu/~cs265/papers/neumann-2008.pdf
>
>
>
> On 5/6/2021 1:10 PM, Andy Seaborne wrote:
> > Hi there,
> >
> > Showing the query would be helpful but some general remarks:
> >
> > 1/ If the query or the setup for Fuseki is needing more than the
> > default heap size, then it might be that the Java JVM is getting into
> > a state of heap exhaustion. This manifests as the CPU loading getting
> > very high. It will seem like nothing is happening (waiting for response).
> >
> > 2/ The query may be expensive.
> >
> > Things to look for
> > * cross products - two parts of the query pattern that are not
> > connected.
> >
> > { ?s ?p ?o . ?a ?b ?c } is N-squared the size of the database.
> >
> > * sort, spilling to disk or combined with a cross product the query.
> >
> > 3/ If no results are coming back, then the query is form that does not
> > stream back - sort, or CONSTRUCT maybe.
> >
> > There was a useful presentation recently that talks about the
> > principles of query efficiency.
> >
> > SPARQL Query Optimization with Pavel Klinov
> > https://www.youtube.com/watch?v=16eMswT2x2Y
> >
> > More inline:
> >
> > On 06/05/2021 09:54, Martin Van Aken wrote:
> >> Hi!
> >> I'm Martin, I'm a software developer new to the Triples/SPARQL world.
> >> I'm
> >> currently building queries against a Fuseki/TDB backend (that I can
> >> work on
> >> too) and I'm getting into significant performance problems (including
> >> never
> >> ending queries).
> >
> > Are updates also happening at the same time?
> >
> >> Despite what I thought was a good search on the apache
> >> jena website I could not find a lot of insight about performance
> >> investigation so I'm trying it here.
> >>
> >> Most of my data experience comes from the relational world (ex: PG)
> >> so I'm
> >> sometimes drawing comparisons there.
> >>
> >> To give some context my data set is around 15 linked concepts, with the
> >> number of triples for each ranging from some hundreds to 500K - total
> >> less
> >> than 2 millions (documents/authors/publication kind of data).
> >>
> >> Unto questions:
> >>
> >>     - When I'm facing a slow query, what are my investigation
> >> options. Is
> >>     there an equivalent of an "explain plan" in SQL pointing to the
> >> query
> >>     specific slow points? What's the advised way for performance
> >> checks in
> >>     SPARQL?
> >
> > qparse --print=opt --file query.rq
> >
> >>     - Are there any performance setups to be aware of on the server
> >> side?
> >>     Like ways to check indexes are correctly built (outside of text
> >> search that
> >>     I'm not working with for the moment)
> >>     - We're currently using TDB1. I've seen the transactional
> >> benefits of
> >>     TDB2 - are there performance improvements too that would warrant a
> >>     migration there ?
> >
> > Not on the query side.
> >
> >    Andy
> >
> >>
> >> Thanks a lot already!
> >>
> >> Martin
> >>
>
>

-- 
*Martin Van Aken - **Freelance Enthusiast Developer*

Mobile : +32 486 899 652

Follow me on Twitter : @martinvanaken <http://twitter.com/martinvanaken>
Call me on Skype : vanakenm
Hang out with me : mar...@joyouscoding.com
Contact me on LinkedIn : http://www.linkedin.com/in/martinvanaken
Company website : www.joyouscoding.com

Re: Jena / Fuseki / SPARQL performance (new to the tech)

Reply via email to