Re: Best practice to combine simple queries

Lorenz Buehmann Tue, 14 Apr 2020 22:53:27 -0700

looking into the code [1] I think it's not possible to have multiple
VALUE clauses, so it might be a limitation of the SelectBuilder.


Claude Warren and others should know better than me though.


[1]
https://github.com/apache/jena/blob/master/jena-extras/jena-querybuilder/src/main/java/org/apache/jena/arq/querybuilder/handlers/ValuesHandler.java#L62

On 14.04.20 17:10, Jim Balhoff wrote:
> I use ParameterizedSparqlString to build my SPARQL queries.
>
>> On Apr 12, 2020, at 5:54 PM, Mike Welch <[email protected]> 
>> wrote:
>>
>> Thanks Jim, I did a bit of perf comparison (detail below).  Now my question
>> is: how do I use the Jena SelectBuilder interfaces to formulate such a
>> query?  Calling addWhereValueVar(..) twice, once for ?s and once for ?p,
>> fails since the two lists are not the same length.  It's trying to build a
>> single table like VALUES { ?s ?p } vs. two separate VALUES statements.
>>
>> Thanks,
>> - Mike
>>
>> Simple perf test on a 48 core, 512gb ram machine with SSDs.
>>
>> Empirically it seems like if batching requests by subject, the VALUES
>> approach and UNION have negligible performance differences for a reasonable
>> number of props.  If you start to increase the parallelism / thruput target
>> by batching multiple subjectIds, however, it does seem like VALUES
>> outperforms UNION.  For those interested:
>>
>> VALUES (1 subj * 8 props) * 5 parallel requests ~ 19ms at 95%
>> VALUES (1 subj * 8 props) * 20 parallel requests ~ 144ms at 95%
>> VALUES (4 subj * 8 props) * 5 parallel requests ~ 35ms at 95%
>> VALUES (4 subj * 8 props) * 10 parallel requests ~ 75ms at 95%
>>
>> UNION (1 subj * 8 props) * 5 parallel requests ~ 17ms at 95%
>> UNION (1 subj * 8 props) * 20 parallel requests ~ 136ms at 95%
>> UNION (4 subj * 8 props) * 5 parallel requests ~ 52ms at 95%
>> UNION (4 subj * 8 props) * 10 parallel requests ~ 124ms at 95%
>>
>>
>> On Sun, Mar 29, 2020 at 3:10 AM Balhoff, Jim <[email protected]> wrote:
>>
>>> I usually do this sort of thing in one query using VALUES. For example
>>>
>>> SELECT ?s ?p ?o
>>> WHERE {
>>> VALUES ?s { ns:subj1 ns:subj2 ns:subj3 }
>>> VALUES ?p { ns:prop1 ns:prop2 }
>>> ?s ?p ?o .
>>> }
>>>
>>> Best regards,
>>> Jim
>>>
>>>
>>>> On Mar 28, 2020, at 7:30 PM, Mike Welch <[email protected]>
>>> wrote:
>>>> Hi everyone,
>>>>
>>>> We have a use case like the following:  given a set of M known URIs
>>>> (typically 10s to a few hundred), fetch the same N (typically  2-10)
>>>> properties for each of them via Fuseki (TDB2).
>>>>
>>>> Does anyone have any benchmarks, rules of thumb, hunches, etc of what
>>> would
>>>> be the most optimal approach for packaging and/or parallelizing requests?
>>>> We currently send 1 query per subject URI, so M parallel requests, each
>>>> request containing a UNION of N individual simple patterns:
>>>>
>>>> { ns:subj1 ns:prop1 ?a }
>>>> UNION
>>>> { ns:subj1 ns:prop2 ?b }
>>>> ...
>>>>
>>>> This seems to work pretty well - better than M parallel queries with all
>>>> OPTIONAL patterns, M*N parallel single pattern requests, or 1 giant UNION
>>>> for all M*N things at once, but it was still a somewhat arbitrary choice
>>>> amongst many, many other possibilities.  Does anyone have
>>>> any suggestions of what might work better or be more friendly to the
>>>> internal optimizers, with the main goal of minimizing overall latency to
>>>> fetch all M*N things?
>>>>
>>>> Thanks!
>>>> - Mike
>>>
>

Re: Best practice to combine simple queries

Reply via email to