On 28/02/2023 03:11, Paul Tyson wrote:
I maintain an old jena/fuseki application that has happily been using jena v2.13 and tdb v1.1.2 for several years. It loads 1b+ triples into a tdb database, and runs a couple dozen queries, some not so trivial, on the tdb.

Now it is time to update things. I first went to 3.17, to stay on java8. Many of the queries work fine, but a few have abysmal performance. A query that took maybe 10 minutes with v2.13 now runs for hours without finishing.

I am now trying v4.7 with java11. Testing is still in progress, but it doesn't look promising.

The troublesome queries have several FILTER EXISTS and FILTER NOT EXISTS clauses, some of which have UNION patterns. It is rather complicated, but also a fairly literal translation of the applicable business rules. I took a closer look at them, and adjusted the order of patterns to put the more-specific ones earlier, but that didn't help. I discovered that eliminating the UNION alternatives would let the query return some results, but obviously not what is wanted.

Did anything in particular change in the query processing since v2 that would cause this performance degradation?

v2.13 was March 2015. A lot has changed since then including fixes where optimization would get the wrong answers. Some are directly EXISTS, some aren't but if you have complex EXISTS patterns, they can be impacted. They aren't pattern orders.

(Mostly they will be in JIRA)

Should I expect any difference in tdb vs tdb2? I've tried both, and neither give satisfaction.

Unlikely. TDB2 is preferred.


Thanks in advance,
--Paul


Reply via email to