If it helps, I have a setup I have used to profile Fuseki in VisualVM: https://github.com/AtomGraph/fuseki-docker <https://github.com/AtomGraph/fuseki-docker/tree/master>
On Thu, 7 Mar 2024 at 22.55, Andy Seaborne <a...@apache.org> wrote: > > > On 07/03/2024 13:24, Gaspar Bartalus wrote: > > Dear Jena support team, > > > > We would like to ask you to help us in configuring the memory for our > > jena-fuseki instance running in kubernetes. > > > > *We have the following setup:* > > > > * Jena-fuseki deployed as StatefulSet to a k8s cluster with the > > resource config: > > > > Limits: > > cpu: 2 > > memory: 16Gi > > Requests: > > cpu: 100m > > memory: 11Gi > > > > * The JVM_ARGS has the following value: -Xmx10G > > > > * Our main dataset of type TDB2 contains ~1 million triples. > A million triples doesn't take up much RAM even in a memory dataset. > > In Java, the JVM will grow until it is close to the -Xmx figure. A major > GC will then free up a lot of memory. But the JVM does not give the > memory back to the kernel. > > TDB2 does not only use heap space. A heap of 2-4G is usually enough per > dataset, sometimes less (data shape depenendent - e.g. many large > literals used more space. > > Use a profiler to examine the heap in-use, you'll probably see a > saw-tooth shape. > Force a GC and see the level of in-use memory afterwards. > Add some safety margin and work space for requests and try that as the > heap size. > > > * We execute the following type of UPDATE operations: > > - There are triggers in the system (e.g. users of the application > > changing the data) which start ~50 other update operations containing > > up to ~30K triples. Most of them run in parallel, some are delayed > > with seconds or minutes. > > - There are scheduled UPDATE operations (executed on hourly basis) > > containing 30K-500K triples. > > - These UPDATE operations usually delete and insert the same amount > > of triples in the dataset. We use the compact API as a nightly job. > > > > *We are noticing the following behaviour:* > > > > * Fuseki consumes 5-10G of heap memory continuously, as configured in > > the JVM_ARGS. > > > > * There are points in time when the volume usage of the k8s container > > starts to increase suddenly. This does not drop even though compaction > > is successfully executed and the dataset size (triple count) does not > > increase. See attachment below. > > > > *Our suspicions:* > > > > * garbage collection in Java is often delayed; memory is not freed as > > quickly as we would expect it, and the heap limit is reached quickly > > if multiple parallel queries are run > > * long running database queries can send regular memory to Gen2, that > > is not actively cleaned by the garbage collector > > * memory-mapped files are also garbage-collected (and perhaps they > > could go to Gen2 as well, using more and more storage space). > > > > Could you please explain the possible reasons behind such a behaviour? > > And finally could you please suggest a more appropriate configuration > > for our use case? > > > > Thanks in advance and best wishes, > > Gaspar Bartalus > > >