I'm tracking down an issue with a very slow federated query. Looking through logs, Jena appears to be doing one call to the remote endpoint for every set of values that match locally. This struck me as odd, since the SPARQL federation specs suggest that implementations may create "batched" queries to remote endpoints using VALUES blocks to pass multiple bindings. Looking through the source, it appears that Jena isn't doing that, but instead actually is issuing one remote call per binding.
Am I correct in assuming that this optimization isn't being done, or am I missing something? Looking through the source, it looks like it wouldn't be _too_ difficult to change the QueryIterService class to batch up some number of results into an OpTable. OpAsQuery.asQuery would then render that as a VALUES block before calling to the remote endpoint. There are a variety of issues to be resolved, most especially around batch size, but those don't appear insurmountable. I haven't found any discussion of this possible optimization, but it's entirely possible I just didn't know where to look. I'd be happy to do the work and submit a batch, but if there's a reason that people think this optimization shouldn't be done, I'd love to hear it before I start. Thanks for reading, and I'd love to hear any thoughts on the matter. Dave Griffith Principal Engineer data.world
