Replying to myself, as I did some follow-up tests.

Osma Suominen kirjoitti 4.12.2020 klo 18.42:
Now this turned into a rather interesting exercise in using git bisect. I was able to track down the change that caused the slowdown. It's this merge commit:

[f93fdbad7aa8d6ddb46693395e3bfb5ea487bf16] JENA-1648: Merge commit 'refs/pull/507/head' of https://github.com/apache/jena

which refers to this pull request:

https://github.com/apache/jena/pull/507

I don't have time for very deep analysis right now but it doesn't surprise me that a substantial change to the query result serialization slows down the queries.

Things to check: (mostly as a TODO list for myself)

1. Does this depend on the query result format? For example, is only the text format (default) slower than before? 2. Is there something suspicious in the PR 507 code that would explain why it's so much slower?

This affects at least the CSV format too, so it's not just the text output format.

But I figured out that the real change here is simply that the warmup performed when using the --repeat parameter with two arguments has become less effective starting with Jena 3.10.0. When no warmup is used, the performance is the same for the different Jena versions.

And now that Andy implemented JENA-2007 which improves the warmup, I think the problem has already been solved.

Case closed.

-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to