Hi Adam,
It would be useful to also know:
which version of Jena this is
What the storage is - in-memory, or TDB
TDB1 or TDB2?
If TDB: What the hardware is disk or SSD?
What the times actually are and what the count result is?
Count is handled specially in TDB and maybe that interacts with the "|"
usage.
Andy
On 22/02/2021 13:18, Adam K wrote:
Hi all, I executed two simple equivalent queries having a big performance
difference on a large dataset:
1. First matching by two alternative predicates using pipe operator
* SELECT (count(*) as ?total) WHERE { *
* { ?s <http://someURI1 <http://someURI1>> | <http://someURI1
<http://someURI1>> ?o .}*
* }*
this one is very slow and query plan shows the following matching
pattern:
(path ?subject (alt <http://someURI1> <http://someURI2> ) ?object)))))
2. If I use UNION operator instead of pipe the query becomes fast
* SELECT (count(*) as ?total) WHERE {*
* { ?s <http://someURI1 <http://someURI1>> ?o . }** UNION** { ?s
<http://someURI2 <http://someURI2>> ?o . }*
* }*
query plan here is different and shows UNION of two BGP matches:
(union (bgp (triple ?s <http://someURI1> ?o )) (bgp (triple ?s <
http://someURI2> ?o ))))))
Documentation here
https://jena.apache.org/documentation/query/property_paths.html tells that:
1. "Paths are “simple” if they involve only operators / (sequence), ^
(reverse, unary or binary) and the form {n}, for some single integer n."
2. "A path is “complex” if it involves one or more of the operators
*,?, + and {}."
These statements do do define implications of | - it should act like union,
but query plan is different - is it a bug or a feature? Is there general
recommendation to use UNION instead of pipe?
Thanks for help!