looking into the code [1] I think it's not possible to have multiple VALUE clauses, so it might be a limitation of the SelectBuilder.
Claude Warren and others should know better than me though. [1] https://github.com/apache/jena/blob/master/jena-extras/jena-querybuilder/src/main/java/org/apache/jena/arq/querybuilder/handlers/ValuesHandler.java#L62 On 14.04.20 17:10, Jim Balhoff wrote: > I use ParameterizedSparqlString to build my SPARQL queries. > >> On Apr 12, 2020, at 5:54 PM, Mike Welch <[email protected]> >> wrote: >> >> Thanks Jim, I did a bit of perf comparison (detail below). Now my question >> is: how do I use the Jena SelectBuilder interfaces to formulate such a >> query? Calling addWhereValueVar(..) twice, once for ?s and once for ?p, >> fails since the two lists are not the same length. It's trying to build a >> single table like VALUES { ?s ?p } vs. two separate VALUES statements. >> >> Thanks, >> - Mike >> >> Simple perf test on a 48 core, 512gb ram machine with SSDs. >> >> Empirically it seems like if batching requests by subject, the VALUES >> approach and UNION have negligible performance differences for a reasonable >> number of props. If you start to increase the parallelism / thruput target >> by batching multiple subjectIds, however, it does seem like VALUES >> outperforms UNION. For those interested: >> >> VALUES (1 subj * 8 props) * 5 parallel requests ~ 19ms at 95% >> VALUES (1 subj * 8 props) * 20 parallel requests ~ 144ms at 95% >> VALUES (4 subj * 8 props) * 5 parallel requests ~ 35ms at 95% >> VALUES (4 subj * 8 props) * 10 parallel requests ~ 75ms at 95% >> >> UNION (1 subj * 8 props) * 5 parallel requests ~ 17ms at 95% >> UNION (1 subj * 8 props) * 20 parallel requests ~ 136ms at 95% >> UNION (4 subj * 8 props) * 5 parallel requests ~ 52ms at 95% >> UNION (4 subj * 8 props) * 10 parallel requests ~ 124ms at 95% >> >> >> On Sun, Mar 29, 2020 at 3:10 AM Balhoff, Jim <[email protected]> wrote: >> >>> I usually do this sort of thing in one query using VALUES. For example >>> >>> SELECT ?s ?p ?o >>> WHERE { >>> VALUES ?s { ns:subj1 ns:subj2 ns:subj3 } >>> VALUES ?p { ns:prop1 ns:prop2 } >>> ?s ?p ?o . >>> } >>> >>> Best regards, >>> Jim >>> >>> >>>> On Mar 28, 2020, at 7:30 PM, Mike Welch <[email protected]> >>> wrote: >>>> Hi everyone, >>>> >>>> We have a use case like the following: given a set of M known URIs >>>> (typically 10s to a few hundred), fetch the same N (typically 2-10) >>>> properties for each of them via Fuseki (TDB2). >>>> >>>> Does anyone have any benchmarks, rules of thumb, hunches, etc of what >>> would >>>> be the most optimal approach for packaging and/or parallelizing requests? >>>> We currently send 1 query per subject URI, so M parallel requests, each >>>> request containing a UNION of N individual simple patterns: >>>> >>>> { ns:subj1 ns:prop1 ?a } >>>> UNION >>>> { ns:subj1 ns:prop2 ?b } >>>> ... >>>> >>>> This seems to work pretty well - better than M parallel queries with all >>>> OPTIONAL patterns, M*N parallel single pattern requests, or 1 giant UNION >>>> for all M*N things at once, but it was still a somewhat arbitrary choice >>>> amongst many, many other possibilities. Does anyone have >>>> any suggestions of what might work better or be more friendly to the >>>> internal optimizers, with the main goal of minimizing overall latency to >>>> fetch all M*N things? >>>> >>>> Thanks! >>>> - Mike >>> >
