On Fri, Jul 6, 2012 at 2:19 PM, Neubert Joachim <[email protected]> wrote:
> Hi Paul,
>
> Thank you very much for your enlightening explanation. I now understand
> why it could not work.
>
> But how could I achieve the aim of getting the right bits out of the
> remote dataset? In the use case which is analogous to the example given
> here there are just a dozen labels in the VALUES clause (which could be
> transformed to a UNION statement). They select about 5000 uris out of a
> unrestricted set of about 1 million In another use case I would ask for
> about 100,000 uris out of 10,000,000 in the remote dataset. So fetching
> all the data and applying the restriction locally is not an option.
That was my reasoning for the VALUES statement. However, I was
surprised that the following:
> The in my naive eyes most logical statement
>
> CONSTRUCT { $book dc:title $title }
> WHERE {
> SERVICE <http://sparql.org/books/sparql>
> { SELECT ?book ?title
> WHERE { $book dc:title $title }
> VALUES ?book {
> <http://example.org/book/book1>
> <http://example.org/book/book2>
> }
> }
> }
>
> which I had tried first gave me an Error 500: Server Error. In the
> server log, I found
>
> 19:41:35 WARN Fuseki :: [81] RC = 500 : null
> Not implemented
> at
> com.hp.hpl.jena.sparql.graph.NodeTransformOp.transform(NodeTransformOp.j
> ava:154)
> at
> com.hp.hpl.jena.sparql.algebra.op.OpTable.apply(OpTable.java:63)
> at
> com.hp.hpl.jena.sparql.algebra.Transformer$ApplyTransformVisitor.visit0(
> Transformer.java:270)
> ...
>
> - so perhaps - hopefully - the syntax above will be implemented
> eventually?
It should be. I'm not sure if it's the "local" engine complaining, or
the remote one (which happens to be the local one in this case), but I
think it's the local one. If it *is* the local engine, then I'm
surprised the values aren't just passed through to the remote service.
> VALUES/ex-BINDINGS is one of my favorite SPARQL 1.1 statements, because
> it allows restrictions with possibly long lists of values collected
> out-of-band. I was happy that in the current Fuseki implementation it is
> evaluated quite efficiently. My feeling is that it should be possible to
> pass a VALUES clause explicitly as a part of a SERVICE subclause.
Agreed. When it was proposed I advocated it for this purpose.
> Not sure, what you intended with appending a VALUES clause silently.
> Does this refer to prior bindings within the main clause?
Yes.
> This would
> make a lot of sense, but perhaps it could be better achieved with some
> explicit syntax under the users responsibility. In any case, an implicit
> VALUES clause should not interfere with an explicit one given by the
> user.
No, the engine should have the ability to figure this out for itself.
For instance, consider the following WHERE clause:
?localBook dc:title ?title .
SERVICE <http://sparql.org/books/sparql>
{ SELECT ?book ?title WHERE { ?book dc:title ?title } }
If ?localBook comes to just 10 book, whereas the remote service
contains 100,000, then you really want to send those 10 books as a
VALUES clause. However, if there are 100,000 local books, and only 10
remotely, then you really DON'T want to send those 100,000 books along
with the SERVICE request.
In general, you probably want to send along VALUES to bind variables
that are found in the remote service request so long as the size of
the bindings is less than the size of the returned data. How much
less? Well, that's a heuristic, and is based on the overhead of
sending the extra data vs the reduced return size. Count queries are
great to help work this out, but they add a lot of overhead for small
datasets. Another one of the many problems is that even with sizes of
individual BGP resolutions, you can't know the size of the returned
data when the bindings are included unless you do the join, so you
have to guess the expected sizes after a join.
My point is that using COUNT/VALUES to make federated queries more
efficient is a complex task that must eventually get done, but hasn't
been addressed yet. Everything has to be made correct with respect to
the still-evolving spec before it can be optimized.
Paul