Re: Best practice to combine simple queries

Jim Balhoff Tue, 14 Apr 2020 08:11:04 -0700
I use ParameterizedSparqlString to build my SPARQL queries.

> On Apr 12, 2020, at 5:54 PM, Mike Welch <[email protected]> 
> wrote:
> 
> Thanks Jim, I did a bit of perf comparison (detail below).  Now my question
> is: how do I use the Jena SelectBuilder interfaces to formulate such a
> query?  Calling addWhereValueVar(..) twice, once for ?s and once for ?p,
> fails since the two lists are not the same length.  It's trying to build a
> single table like VALUES { ?s ?p } vs. two separate VALUES statements.
> 
> Thanks,
> - Mike
> 
> Simple perf test on a 48 core, 512gb ram machine with SSDs.
> 
> Empirically it seems like if batching requests by subject, the VALUES
> approach and UNION have negligible performance differences for a reasonable
> number of props.  If you start to increase the parallelism / thruput target
> by batching multiple subjectIds, however, it does seem like VALUES
> outperforms UNION.  For those interested:
> 
> VALUES (1 subj * 8 props) * 5 parallel requests ~ 19ms at 95%
> VALUES (1 subj * 8 props) * 20 parallel requests ~ 144ms at 95%
> VALUES (4 subj * 8 props) * 5 parallel requests ~ 35ms at 95%
> VALUES (4 subj * 8 props) * 10 parallel requests ~ 75ms at 95%
> 
> UNION (1 subj * 8 props) * 5 parallel requests ~ 17ms at 95%
> UNION (1 subj * 8 props) * 20 parallel requests ~ 136ms at 95%
> UNION (4 subj * 8 props) * 5 parallel requests ~ 52ms at 95%
> UNION (4 subj * 8 props) * 10 parallel requests ~ 124ms at 95%
> 
> 
> On Sun, Mar 29, 2020 at 3:10 AM Balhoff, Jim <[email protected]> wrote:
> 
>> I usually do this sort of thing in one query using VALUES. For example
>> 
>> SELECT ?s ?p ?o
>> WHERE {
>> VALUES ?s { ns:subj1 ns:subj2 ns:subj3 }
>> VALUES ?p { ns:prop1 ns:prop2 }
>> ?s ?p ?o .
>> }
>> 
>> Best regards,
>> Jim
>> 
>> 
>>> On Mar 28, 2020, at 7:30 PM, Mike Welch <[email protected]>
>> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> We have a use case like the following:  given a set of M known URIs
>>> (typically 10s to a few hundred), fetch the same N (typically  2-10)
>>> properties for each of them via Fuseki (TDB2).
>>> 
>>> Does anyone have any benchmarks, rules of thumb, hunches, etc of what
>> would
>>> be the most optimal approach for packaging and/or parallelizing requests?
>>> We currently send 1 query per subject URI, so M parallel requests, each
>>> request containing a UNION of N individual simple patterns:
>>> 
>>> { ns:subj1 ns:prop1 ?a }
>>> UNION
>>> { ns:subj1 ns:prop2 ?b }
>>> ...
>>> 
>>> This seems to work pretty well - better than M parallel queries with all
>>> OPTIONAL patterns, M*N parallel single pattern requests, or 1 giant UNION
>>> for all M*N things at once, but it was still a somewhat arbitrary choice
>>> amongst many, many other possibilities.  Does anyone have
>>> any suggestions of what might work better or be more friendly to the
>>> internal optimizers, with the main goal of minimizing overall latency to
>>> fetch all M*N things?
>>> 
>>> Thanks!
>>> - Mike
>> 
>>
Re: Best practice to combine simple queries

Reply via email to