Hi Adam,

It would be useful to also know:

    which version of Jena this is
    What the storage is - in-memory, or TDB
        TDB1 or TDB2?
        If TDB: What the hardware is disk or SSD?
    What the times actually are and what the count result is?

Count is handled specially in TDB and maybe that interacts with the "|" usage.

    Andy

On 22/02/2021 13:18, Adam K wrote:
Hi all, I executed two simple equivalent queries having a big performance
difference on a large dataset:


    1. First matching by two alternative predicates using pipe operator
* SELECT (count(*) as ?total) WHERE { *
* { ?s <http://someURI1 <http://someURI1>>  | <http://someURI1
    <http://someURI1>> ?o .}*
* }*
    this one is very slow and query plan shows the following matching
    pattern:
    (path ?subject (alt  <http://someURI1>  <http://someURI2> ) ?object)))))
    2. If I use UNION operator instead of pipe the query becomes fast
* SELECT (count(*) as ?total) WHERE {*
*   { ?s <http://someURI1 <http://someURI1>> ?o . }**  UNION**  { ?s
    <http://someURI2 <http://someURI2>> ?o . }*
* }*
    query plan here is different and shows UNION of two BGP matches:
    (union (bgp (triple ?s <http://someURI1> ?o )) (bgp (triple ?s <
    http://someURI2> ?o ))))))


Documentation here
https://jena.apache.org/documentation/query/property_paths.html tells that:

    1. "Paths are “simple” if they involve only operators / (sequence), ^
    (reverse, unary or binary) and the form {n}, for some single integer n."
    2. "A path is “complex”  if it involves one or more of the operators
    *,?, + and {}."

These statements do do define implications of | - it should act like union,
but query plan is different - is it a bug or a feature? Is there general
recommendation to use UNION instead of pipe?

Thanks for help!

Reply via email to