Re: ARQ Service join strategy

Andy Seaborne Tue, 03 Sep 2013 07:27:47 -0700

On 02/09/13 19:03, Diogo FC Patrao wrote:

Hi


I'm running a query with join results from two endpoints:

SELECT * {
   SERVICE <s1> { ?a a :Class1 } # q1
   SERVICE <s2> { ?a a :Class2 } # q2
}

I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query q1
on s1 and then one query q2 on s2 *for each result* in the previous query.

Is there any other way of doing it? I mean, I'm getting like 1M results
from q1, maybe it would be better to get all results from q2 and then join
them in memory, or pass more than one value at a time for q2 (the VALUE tag
allows that).

Without programming, there is not a way to control the execution. Youcan implement a OpExecutor and provide your own implementation of join.

There are many issues such as how to know the sizes from a remoteservice and how to send data to a remote service from s1 to s2 -- VALUEsassumes looping back through the query engine, but does s2 allow hugeSPARQL. See bind joins from the IBM Garlic papers.

ARQ provides the basics - remote execution - but isn't a federated queryoptimizer.

There is a not-yet-ready query engine, called 'quack', that providesmerge and hash joins - only on BGPs currently but the code, inprinciple, is general.


        Andy


Cheers!


--
diogo patrão

Re: ARQ Service join strategy

Reply via email to