Thanks Jim, I did a bit of perf comparison (detail below). Now my question
is: how do I use the Jena SelectBuilder interfaces to formulate such a
query? Calling addWhereValueVar(..) twice, once for ?s and once for ?p,
fails since the two lists are not the same length. It's trying to build a
single table like VALUES { ?s ?p } vs. two separate VALUES statements.
Thanks,
- Mike
Simple perf test on a 48 core, 512gb ram machine with SSDs.
Empirically it seems like if batching requests by subject, the VALUES
approach and UNION have negligible performance differences for a reasonable
number of props. If you start to increase the parallelism / thruput target
by batching multiple subjectIds, however, it does seem like VALUES
outperforms UNION. For those interested:
VALUES (1 subj * 8 props) * 5 parallel requests ~ 19ms at 95%
VALUES (1 subj * 8 props) * 20 parallel requests ~ 144ms at 95%
VALUES (4 subj * 8 props) * 5 parallel requests ~ 35ms at 95%
VALUES (4 subj * 8 props) * 10 parallel requests ~ 75ms at 95%
UNION (1 subj * 8 props) * 5 parallel requests ~ 17ms at 95%
UNION (1 subj * 8 props) * 20 parallel requests ~ 136ms at 95%
UNION (4 subj * 8 props) * 5 parallel requests ~ 52ms at 95%
UNION (4 subj * 8 props) * 10 parallel requests ~ 124ms at 95%
On Sun, Mar 29, 2020 at 3:10 AM Balhoff, Jim <[email protected]> wrote:
> I usually do this sort of thing in one query using VALUES. For example
>
> SELECT ?s ?p ?o
> WHERE {
> VALUES ?s { ns:subj1 ns:subj2 ns:subj3 }
> VALUES ?p { ns:prop1 ns:prop2 }
> ?s ?p ?o .
> }
>
> Best regards,
> Jim
>
>
> > On Mar 28, 2020, at 7:30 PM, Mike Welch <[email protected]>
> wrote:
> >
> > Hi everyone,
> >
> > We have a use case like the following: given a set of M known URIs
> > (typically 10s to a few hundred), fetch the same N (typically 2-10)
> > properties for each of them via Fuseki (TDB2).
> >
> > Does anyone have any benchmarks, rules of thumb, hunches, etc of what
> would
> > be the most optimal approach for packaging and/or parallelizing requests?
> > We currently send 1 query per subject URI, so M parallel requests, each
> > request containing a UNION of N individual simple patterns:
> >
> > { ns:subj1 ns:prop1 ?a }
> > UNION
> > { ns:subj1 ns:prop2 ?b }
> > ...
> >
> > This seems to work pretty well - better than M parallel queries with all
> > OPTIONAL patterns, M*N parallel single pattern requests, or 1 giant UNION
> > for all M*N things at once, but it was still a somewhat arbitrary choice
> > amongst many, many other possibilities. Does anyone have
> > any suggestions of what might work better or be more friendly to the
> > internal optimizers, with the main goal of minimizing overall latency to
> > fetch all M*N things?
> >
> > Thanks!
> > - Mike
>
>