Re: Java 11 vs Java 17

2023-08-30 Thread Andy Seaborne




On 29/08/2023 12:26, Andy Seaborne wrote:


Which result format is this? JSON? XML?


Thanks - the fact the impact on JSON and XML results writers is 
suggestive that the difference is in that area.


Andy






No suggestion that our case is representative of any broader pattern.

Dave


     Andy


Re: Java 11 vs Java 17

2023-08-30 Thread Dave Reynolds




On 29/08/2023 12:26, Andy Seaborne wrote:



On 29/08/2023 08:46, Dave Reynolds wrote:

Hi Andy,

On 27/08/2023 10:36, Andy Seaborne wrote:


On 25/08/2023 15:18, Dave Reynolds wrote: [1]
 > We've being testing some of our troublesome queries on 4.9.0 on java
 > 11 vs java 17 and see a 10-15% performance hit on java 17 (even after
 > we take control of the GC by forcing both to use the old parallel GC
 > instead of G1). No idea why, seems wrong! Makes us inclined to stick
 > with java 11 and thus jena 4.x series as long as we can.

Dave,

Is this 4.9.0 specific or across multiple Jena versions?


Seems to be multiple versions (at least 4.8.0 and 4.9.0), but not 
tested exhaustively.



Is G1 worse than the old parallel GC on Java17?


It is definitely worse on Java11 for a particular narrow type of query 
that is an issue for us. Believe the same is true on Java17 but 
haven't collected definitive data on this.


It may be possible to tune G1 to better match our particular test case 
but the testing and tuning is time consuming and the parallel GC does 
the trick.


Our aim was to replace a system running on 3.x era fuseki with a 4.x 
era one without significant loss of performance. Out of box there was 
a 20% hit. Switching GC reduced much of that, switching to java11 
instead of 17 brought us basically to parity - for this special case. 
This is a case where legitimate queries get close to the timeout 
threshold we run at, so a 20% performance drop is particularly visible 
in having currently working queries timeout on a newer version.


The query itself is trivial - return large numbers of resources 
(10k-1m) found by a simple lucene query along with a few (~15) 
properties of each. Performance in this case seems to be dominated by 
the time to render the large results stream rather than lucene or TDB 
query performance. So it makes some sense that in this specific case a 
GC tuned for throughput rather than pause time would help.


Which result format is this? JSON? XML?


XML. Also tested JSON which is around 10% slower.

Dave





No suggestion that our case is representative of any broader pattern.

Dave


     Andy


Re: Java 11 vs Java 17

2023-08-29 Thread Andy Seaborne




On 29/08/2023 08:46, Dave Reynolds wrote:

Hi Andy,

On 27/08/2023 10:36, Andy Seaborne wrote:


On 25/08/2023 15:18, Dave Reynolds wrote: [1]
 > We've being testing some of our troublesome queries on 4.9.0 on java
 > 11 vs java 17 and see a 10-15% performance hit on java 17 (even after
 > we take control of the GC by forcing both to use the old parallel GC
 > instead of G1). No idea why, seems wrong! Makes us inclined to stick
 > with java 11 and thus jena 4.x series as long as we can.

Dave,

Is this 4.9.0 specific or across multiple Jena versions?


Seems to be multiple versions (at least 4.8.0 and 4.9.0), but not tested 
exhaustively.



Is G1 worse than the old parallel GC on Java17?


It is definitely worse on Java11 for a particular narrow type of query 
that is an issue for us. Believe the same is true on Java17 but haven't 
collected definitive data on this.


It may be possible to tune G1 to better match our particular test case 
but the testing and tuning is time consuming and the parallel GC does 
the trick.


Our aim was to replace a system running on 3.x era fuseki with a 4.x era 
one without significant loss of performance. Out of box there was a 20% 
hit. Switching GC reduced much of that, switching to java11 instead of 
17 brought us basically to parity - for this special case. This is a 
case where legitimate queries get close to the timeout threshold we run 
at, so a 20% performance drop is particularly visible in having 
currently working queries timeout on a newer version.


The query itself is trivial - return large numbers of resources (10k-1m) 
found by a simple lucene query along with a few (~15) properties of 
each. Performance in this case seems to be dominated by the time to 
render the large results stream rather than lucene or TDB query 
performance. So it makes some sense that in this specific case a GC 
tuned for throughput rather than pause time would help.


Which result format is this? JSON? XML?



No suggestion that our case is representative of any broader pattern.

Dave


Andy


Re: Java 11 vs Java 17

2023-08-29 Thread Dave Reynolds

Hi Andy,

On 27/08/2023 10:36, Andy Seaborne wrote:


On 25/08/2023 15:18, Dave Reynolds wrote: [1]
 > We've being testing some of our troublesome queries on 4.9.0 on java
 > 11 vs java 17 and see a 10-15% performance hit on java 17 (even after
 > we take control of the GC by forcing both to use the old parallel GC
 > instead of G1). No idea why, seems wrong! Makes us inclined to stick
 > with java 11 and thus jena 4.x series as long as we can.

Dave,

Is this 4.9.0 specific or across multiple Jena versions?


Seems to be multiple versions (at least 4.8.0 and 4.9.0), but not tested 
exhaustively.



Is G1 worse than the old parallel GC on Java17?


It is definitely worse on Java11 for a particular narrow type of query 
that is an issue for us. Believe the same is true on Java17 but haven't 
collected definitive data on this.


It may be possible to tune G1 to better match our particular test case 
but the testing and tuning is time consuming and the parallel GC does 
the trick.


Our aim was to replace a system running on 3.x era fuseki with a 4.x era 
one without significant loss of performance. Out of box there was a 20% 
hit. Switching GC reduced much of that, switching to java11 instead of 
17 brought us basically to parity - for this special case. This is a 
case where legitimate queries get close to the timeout threshold we run 
at, so a 20% performance drop is particularly visible in having 
currently working queries timeout on a newer version.


The query itself is trivial - return large numbers of resources (10k-1m) 
found by a simple lucene query along with a few (~15) properties of 
each. Performance in this case seems to be dominated by the time to 
render the large results stream rather than lucene or TDB query 
performance. So it makes some sense that in this specific case a GC 
tuned for throughput rather than pause time would help.


No suggestion that our case is representative of any broader pattern.

Dave