On 18/06/15 10:34, Neubert, Joachim wrote:

The culprit for the long execution time on the remote server seems to
be the SERVICE clause. Even when it points to an in-memory endpoint
on the same server, execution took more than 400 sec. After loading
the very same data into a named GRAPH on the same endpoint, and
referencing that instead of SERVICE, execution was quick (less than 2
sec). So that seems to be the way to go.

I've noticed the same - SERVICE queries are not terribly efficient with joins.

A query such as this:

SELECT * {
  ?s skos:closeMatch ?o .
  SERVICE <http://another-endpoint-nearby/sparql> {
    ?o skos:prefLabel ?label .
  }
}

will usually be slow.

What happens is that first, the initial pattern (?s skos:closeMatch ?o) will be executed on the local endpoint. But for each of the solutions (there could be, e.g., 1000 of these), the SERVICE query gets executed separately on the other endpoint. So overall there will be 1001 queries, which will be slow even if done on the same server.

It would be much smarter for the first query processor to add a VALUES block of the 1000 possible values of ?o to the SERVICE query and then send that as a single query to the other endpoint. Jena doesn't seem to do that. Neither does TopBraid Composer. I haven't tested others.

-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to