Re: Requesting advice on Fuseki memory settings

Andy Seaborne Thu, 07 Mar 2024 13:55:34 -0800


On 07/03/2024 13:24, Gaspar Bartalus wrote:

Dear Jena support team,
We would like to ask you to help us in configuring the memory for ourjena-fuseki instance running in kubernetes.
*We have the following setup:*
* Jena-fuseki deployed as StatefulSet to a k8s cluster with theresource config:
Limits:
  cpu:     2
  memory:  16Gi
Requests:
  cpu:     100m
  memory:  11Gi

* The JVM_ARGS has the following value: -Xmx10G

* Our main dataset of type TDB2 contains ~1 million triples.

A million triples doesn't take up much RAM even in a memory dataset.

In Java, the JVM will grow until it is close to the -Xmx figure. A majorGC will then free up a lot of memory. But the JVM does not give thememory back to the kernel.

TDB2 does not only use heap space. A heap of 2-4G is usually enough perdataset, sometimes less (data shape depenendent - e.g. many largeliterals used more space.

Use a profiler to examine the heap in-use, you'll probably see asaw-tooth shape.

Force a GC and see the level of in-use memory afterwards.

Add some safety margin and work space for requests and try that as theheap size.

*  We execute the following type of UPDATE operations:
- There are triggers in the system (e.g. users of the applicationchanging the data) which start ~50 other update operations containingup to ~30K triples. Most of them run in parallel, some are delayedwith seconds or minutes. - There are scheduled UPDATE operations (executed on hourly basis)containing 30K-500K triples. - These UPDATE operations usually delete and insert the same amountof triples in the dataset. We use the compact API as a nightly job.
*We are noticing the following behaviour:*
* Fuseki consumes 5-10G of heap memory continuously, as configured inthe JVM_ARGS.
* There are points in time when the volume usage of the k8s containerstarts to increase suddenly. This does not drop even though compactionis successfully executed and the dataset size (triple count) does notincrease. See attachment below.
*Our suspicions:*
* garbage collection in Java is often delayed; memory is not freed asquickly as we would expect it, and the heap limit is reached quicklyif multiple parallel queries are run* long running database queries can send regular memory to Gen2, thatis not actively cleaned by the garbage collector* memory-mapped files are also garbage-collected (and perhaps theycould go to Gen2 as well, using more and more storage space).
Could you please explain the possible reasons behind such a behaviour?
And finally could you please suggest a more appropriate configurationfor our use case?
Thanks in advance and best wishes,
Gaspar Bartalus

Re: Requesting advice on Fuseki memory settings

Reply via email to