Re: Achieving reasonably performing federated queries

Rob Vesse Thu, 25 Jul 2013 06:35:24 -0700

Yes you should be able to add the following:

--set arq:optIndexJoinStrategy=false


I'm not 100% sure that the short form will work, you may need to use the
fully expanded form:

--set http://jena.hpl.hp.com/ARQ#optIndexJoinStrategy=false

However as noted in my email this is new in 2.10.2-SNAPSHOT builds so
unless you are using the latest SNAPSHOTs this would have no effect.  In
all previous releases this particular optimization was always on.

Rob


On 7/25/13 1:56 PM, "Diogo FC Patrao" <[email protected]> wrote:

>Hello
>
>The better plan for the query you posted would be (1), simply because of
>>> the cost of accessing a remote service. But, if the first SERVICEd
>>>query
>>> would return just a few lines, maybe it would be better to run  a
>>>couple
>>> of
>>> times the same query  as in (2) than to get all results.
>>>
>>
>> I agree. I started out with (2) because ARQ by default did that.
>>However,
>> soon after, that wasn't going to work out and so explored a way to do
>>(1).
>> Now doing (1) but I'm trying to get more out of it. I have to take a
>>look
>> closer at Rob Vasse's suggestion:
>>ARQ.getContext().set(ARQ.**optIndexJoinStrategy,
>> false);
>
>
>Yes; it is a great feature that we can turn on and off certain
>optimizations!
>
>Rob, can we turn that on and off by the ARQ command line?
>
>
>>  As for optimizing the query, I would try separating the each query
>>into a
>>> UNION, one part with the OPTIONAL, the other without it. Getting the
>>> subproperties, depending on which triplestore you're querying, can be
>>> expensive too. If it's Fuseki+TDB and you have access to the server
>>> configuration, you could turn on RDFs inference. Also, the order of the
>>> triples can influence a lot on the overall query performance - put the
>>> triples that return lesser results before the others.
>>>
>>> Good luck!
>>>
>>
>> I'm not sure I see how UNION can be used as per your suggestion such
>>that
>> the results contain values for each field. Only one of the variables in
>> OPTIONAL is used towards the final output. Duplicating the earlier
>>pattern
>> plus what was in OPTIONAL is probably not ideal. Did I misunderstand
>>you?
>>
>
>Yes, but that was an idea based solely on my experience with RDB. Writing
>
>SELECT * FROM A WHERE type_id in (1,2)
>
>can be slower than
>
>SELECT * FROM A WHERE type_id = 1
>UNION ALL
>SELECT * FROM A WHERE type_id = 2
>
>, believe me or not. I never really worked with OPTIONALs so I'm guessing
>it out of thin air. But I think's worth the shot.
>
>
>> I'll test it with only RDFS inference.
>>
>
>The SPARQL will look better too.
>
>cheers!
>
>dfcp
>
>
>
>> Based on my tests, the order of the statements are as good as they get.
>>
>> Thanks for the suggestions.
>>
>> -Sarven
>>
>>
>>

Re: Achieving reasonably performing federated queries

Reply via email to