I use ParameterizedSparqlString to build my SPARQL queries.
> On Apr 12, 2020, at 5:54 PM, Mike Welch <[email protected]>
> wrote:
>
> Thanks Jim, I did a bit of perf comparison (detail below). Now my question
> is: how do I use the Jena SelectBuilder interfaces to formulate such a
> query? Calling addWhereValueVar(..) twice, once for ?s and once for ?p,
> fails since the two lists are not the same length. It's trying to build a
> single table like VALUES { ?s ?p } vs. two separate VALUES statements.
>
> Thanks,
> - Mike
>
> Simple perf test on a 48 core, 512gb ram machine with SSDs.
>
> Empirically it seems like if batching requests by subject, the VALUES
> approach and UNION have negligible performance differences for a reasonable
> number of props. If you start to increase the parallelism / thruput target
> by batching multiple subjectIds, however, it does seem like VALUES
> outperforms UNION. For those interested:
>
> VALUES (1 subj * 8 props) * 5 parallel requests ~ 19ms at 95%
> VALUES (1 subj * 8 props) * 20 parallel requests ~ 144ms at 95%
> VALUES (4 subj * 8 props) * 5 parallel requests ~ 35ms at 95%
> VALUES (4 subj * 8 props) * 10 parallel requests ~ 75ms at 95%
>
> UNION (1 subj * 8 props) * 5 parallel requests ~ 17ms at 95%
> UNION (1 subj * 8 props) * 20 parallel requests ~ 136ms at 95%
> UNION (4 subj * 8 props) * 5 parallel requests ~ 52ms at 95%
> UNION (4 subj * 8 props) * 10 parallel requests ~ 124ms at 95%
>
>
> On Sun, Mar 29, 2020 at 3:10 AM Balhoff, Jim <[email protected]> wrote:
>
>> I usually do this sort of thing in one query using VALUES. For example
>>
>> SELECT ?s ?p ?o
>> WHERE {
>> VALUES ?s { ns:subj1 ns:subj2 ns:subj3 }
>> VALUES ?p { ns:prop1 ns:prop2 }
>> ?s ?p ?o .
>> }
>>
>> Best regards,
>> Jim
>>
>>
>>> On Mar 28, 2020, at 7:30 PM, Mike Welch <[email protected]>
>> wrote:
>>>
>>> Hi everyone,
>>>
>>> We have a use case like the following: given a set of M known URIs
>>> (typically 10s to a few hundred), fetch the same N (typically 2-10)
>>> properties for each of them via Fuseki (TDB2).
>>>
>>> Does anyone have any benchmarks, rules of thumb, hunches, etc of what
>> would
>>> be the most optimal approach for packaging and/or parallelizing requests?
>>> We currently send 1 query per subject URI, so M parallel requests, each
>>> request containing a UNION of N individual simple patterns:
>>>
>>> { ns:subj1 ns:prop1 ?a }
>>> UNION
>>> { ns:subj1 ns:prop2 ?b }
>>> ...
>>>
>>> This seems to work pretty well - better than M parallel queries with all
>>> OPTIONAL patterns, M*N parallel single pattern requests, or 1 giant UNION
>>> for all M*N things at once, but it was still a somewhat arbitrary choice
>>> amongst many, many other possibilities. Does anyone have
>>> any suggestions of what might work better or be more friendly to the
>>> internal optimizers, with the main goal of minimizing overall latency to
>>> fetch all M*N things?
>>>
>>> Thanks!
>>> - Mike
>>
>>