Dave,

By changing the order of parts of the query, the number of SERVICE calls can change. Sometimes it is better to grab more data, once, than many small calls. And not just for performance if the remote endpoint is across the unreliable internet.

As Rob says, batching for SERVICE calls would be good to have.

    Andy

On 01/05/2019 09:40, Rob Vesse wrote:
Dave

Yes this is what is happening.  This stems from the fact that ARQ is designed 
as a lazy streaming evaluation engine i.e. It tries to do the least work 
possible to answer the query. This is why the underlying implementation is all 
iterator driven.  In some cases the engine does have to batch up everything in 
order to proceed e.g. DISTINCT/aggregation

Introducing some degree of batching for SERVICE blocks might be a nice 
optimisation. I think this will definitely be valuable to the community, 
contributions are always appreciated

Thanks,

Rob

On 30/04/2019, 18:31, "Dave Griffith" <[email protected]> wrote:

     I'm tracking down an issue with a very slow federated query.  Looking
     through logs, Jena appears to be doing one call to the remote endpoint for
     every set of values that match locally.  This struck me as odd, since the
     SPARQL federation specs suggest that implementations may create "batched"
     queries to remote endpoints using VALUES blocks to pass multiple bindings.
     Looking through the source, it appears that Jena isn't doing that, but
     instead actually is issuing one remote call per binding.
Am I correct in assuming that this optimization isn't being done, or am I
     missing something?  Looking through the source, it looks like it wouldn't
     be _too_ difficult to change the QueryIterService class to batch up some
     number of results into an OpTable.  OpAsQuery.asQuery would then render
     that as a VALUES block before calling to the remote endpoint.  There are a
     variety of issues to be resolved, most especially around batch size, but
     those don't appear insurmountable.  I haven't found any discussion of this
     possible optimization, but it's entirely possible I just didn't know where
     to look.  I'd be happy to do the work and submit a batch, but if there's a
     reason that people think this optimization shouldn't be done, I'd love to
     hear it before I start.
Thanks for reading, and I'd love to hear any thoughts on the matter. Dave Griffith
     Principal Engineer
     data.world



Reply via email to