Dave Thanks for continuing to look at this. We don't have strict code standards as with such an old code base there are a wide variety of code styles so the general rule is to follow the surrounding code style. https://jena.apache.org/getting_involved/reviewing_contributions.html details our reviewing guidelines but these are fairly flexibly enforced.
Yes good test coverage for something like this would be a must. If the tests are particularly slow they can always be put in a separate module and excluded from the faster "dev" profile. You may want to create a separate tests module anyway because then you'd able to depend on Fuseki embedded and bring up Fuseki servers as part of your tests. Since the SERVICE logic lives in ARQ trying to depend on Fuseki from there would create a circular dependency. See https://jena.apache.org/documentation/fuseki2/fuseki-run.html#fuseki-main, particularly the bit on Fuseki as a Configurable and Embeddable SPARQL Server Rob On 07/06/2019, 19:19, "Dave Griffith" <[email protected]> wrote: After a bit of work, I have what appears to be a working version of Jena with batching SERVICE calls. It's sort of complex, so I'll be adding more tests and documentation before submitting a pull request to y'all. Is there any contributor docs I should read, particularly around coding standards, configurability, or level of testing expected. I'd hate to get the etiquette wrong here. Around level of testing in particular, this is as I say a pretty complex feature and deserves to be fully tested, but I'd hate to slow down your (pretty darn fast) build. Thanks. It's been a delight extending your work. Dave Griffith Principal Engineer data.world On Wed, May 1, 2019 at 4:34 AM Andy Seaborne <[email protected]> wrote: > Dave, > > By changing the order of parts of the query, the number of SERVICE calls > can change. Sometimes it is better to grab more data, once, than many > small calls. And not just for performance if the remote endpoint is > across the unreliable internet. > > As Rob says, batching for SERVICE calls would be good to have. > > Andy > > On 01/05/2019 09:40, Rob Vesse wrote: > > Dave > > > > Yes this is what is happening. This stems from the fact that ARQ is > designed as a lazy streaming evaluation engine i.e. It tries to do the > least work possible to answer the query. This is why the underlying > implementation is all iterator driven. In some cases the engine does have > to batch up everything in order to proceed e.g. DISTINCT/aggregation > > > > Introducing some degree of batching for SERVICE blocks might be a nice > optimisation. I think this will definitely be valuable to the community, > contributions are always appreciated > > > > Thanks, > > > > Rob > > > > On 30/04/2019, 18:31, "Dave Griffith" <[email protected]> wrote: > > > > I'm tracking down an issue with a very slow federated query. > Looking > > through logs, Jena appears to be doing one call to the remote > endpoint for > > every set of values that match locally. This struck me as odd, > since the > > SPARQL federation specs suggest that implementations may create > "batched" > > queries to remote endpoints using VALUES blocks to pass multiple > bindings. > > Looking through the source, it appears that Jena isn't doing that, > but > > instead actually is issuing one remote call per binding. > > > > Am I correct in assuming that this optimization isn't being done, > or am I > > missing something? Looking through the source, it looks like it > wouldn't > > be _too_ difficult to change the QueryIterService class to batch up > some > > number of results into an OpTable. OpAsQuery.asQuery would then > render > > that as a VALUES block before calling to the remote endpoint. > There are a > > variety of issues to be resolved, most especially around batch > size, but > > those don't appear insurmountable. I haven't found any discussion > of this > > possible optimization, but it's entirely possible I just didn't > know where > > to look. I'd be happy to do the work and submit a batch, but if > there's a > > reason that people think this optimization shouldn't be done, I'd > love to > > hear it before I start. > > > > Thanks for reading, and I'd love to hear any thoughts on the matter. > > > > Dave Griffith > > Principal Engineer > > data.world > > > > > > > > > > >
