Re: Achieving reasonably performing federated queries

Olivier Rossel Fri, 26 Jul 2013 01:09:47 -0700

Usually the author knows the data repartition and can hint the query planer
with some info.
May we such hints could be keywords ahead of the SERVICE keyword.


Example of chaining queries:
SELECT...
WHERE {
SERVICE <foo> {....}
THEN SERVICE <bar> { ... }
}

Example of parallel queries:
SELECT...
WHERE {
SERVICE <foo> {....}
ALSO SERVICE <bar> { ... }
}

... and more.

Sounds silly?


On Thu, Jul 25, 2013 at 5:12 PM, Andy Seaborne <[email protected]> wrote:

> It may help if ARQ did a hash join in this case - getting the data from
> the two SERVICEs could even be done in parallel (except that in turn may be
> unacceptable).
>
> The advantage of the current approach is that it does not run out of
> memory - it does not consume temporary RAM in proportion to the data size.
>  But it's not a free choice and may be slower (regardless of being
> inappropriate for this SERVICE situation).
>
> There isn't code in TDB to do that, currently.
>
>         Andy
>
>
> On 25/07/13 14:33, Rob Vesse wrote:
>
>> Yes you should be able to add the following:
>>
>> --set arq:optIndexJoinStrategy=false
>>
>> I'm not 100% sure that the short form will work, you may need to use the
>> fully expanded form:
>>
>> --set 
>> http://jena.hpl.hp.com/ARQ#**optIndexJoinStrategy=false<http://jena.hpl.hp.com/ARQ#optIndexJoinStrategy=false>
>>
>
> It should work - it's a bug if it doesn't.
>
>
>
>> However as noted in my email this is new in 2.10.2-SNAPSHOT builds so
>> unless you are using the latest SNAPSHOTs this would have no effect.  In
>> all previous releases this particular optimization was always on.
>>
>> Rob
>>
>>
>> On 7/25/13 1:56 PM, "Diogo FC Patrao" <[email protected]> wrote:
>>
>>  Hello
>>>
>>> The better plan for the query you posted would be (1), simply because of
>>>
>>>> the cost of accessing a remote service. But, if the first SERVICEd
>>>>> query
>>>>> would return just a few lines, maybe it would be better to run  a
>>>>> couple
>>>>> of
>>>>> times the same query  as in (2) than to get all results.
>>>>>
>>>>>
>>>> I agree. I started out with (2) because ARQ by default did that.
>>>> However,
>>>> soon after, that wasn't going to work out and so explored a way to do
>>>> (1).
>>>> Now doing (1) but I'm trying to get more out of it. I have to take a
>>>> look
>>>> closer at Rob Vasse's suggestion:
>>>> ARQ.getContext().set(ARQ.****optIndexJoinStrategy,
>>>> false);
>>>>
>>>
>>>
>>> Yes; it is a great feature that we can turn on and off certain
>>> optimizations!
>>>
>>> Rob, can we turn that on and off by the ARQ command line?
>>>
>>>
>>>    As for optimizing the query, I would try separating the each query
>>>> into a
>>>>
>>>>> UNION, one part with the OPTIONAL, the other without it. Getting the
>>>>> subproperties, depending on which triplestore you're querying, can be
>>>>> expensive too. If it's Fuseki+TDB and you have access to the server
>>>>> configuration, you could turn on RDFs inference. Also, the order of the
>>>>> triples can influence a lot on the overall query performance - put the
>>>>> triples that return lesser results before the others.
>>>>>
>>>>> Good luck!
>>>>>
>>>>>
>>>> I'm not sure I see how UNION can be used as per your suggestion such
>>>> that
>>>> the results contain values for each field. Only one of the variables in
>>>> OPTIONAL is used towards the final output. Duplicating the earlier
>>>> pattern
>>>> plus what was in OPTIONAL is probably not ideal. Did I misunderstand
>>>> you?
>>>>
>>>>
>>> Yes, but that was an idea based solely on my experience with RDB. Writing
>>>
>>> SELECT * FROM A WHERE type_id in (1,2)
>>>
>>> can be slower than
>>>
>>> SELECT * FROM A WHERE type_id = 1
>>> UNION ALL
>>> SELECT * FROM A WHERE type_id = 2
>>>
>>> , believe me or not. I never really worked with OPTIONALs so I'm guessing
>>> it out of thin air. But I think's worth the shot.
>>>
>>>
>>>  I'll test it with only RDFS inference.
>>>>
>>>>
>>> The SPARQL will look better too.
>>>
>>> cheers!
>>>
>>> dfcp
>>>
>>>
>>>
>>>  Based on my tests, the order of the statements are as good as they get.
>>>>
>>>> Thanks for the suggestions.
>>>>
>>>> -Sarven
>>>>
>>>>
>>>>
>>>>
>>
>

Re: Achieving reasonably performing federated queries

Reply via email to