Re: Requesting advice on Fuseki memory settings

Martynas Jusevičius Thu, 07 Mar 2024 23:48:17 -0800

If it helps, I have a setup I have used to profile Fuseki in VisualVM:
https://github.com/AtomGraph/fuseki-docker
<https://github.com/AtomGraph/fuseki-docker/tree/master>



On Thu, 7 Mar 2024 at 22.55, Andy Seaborne <a...@apache.org> wrote:

>
>
> On 07/03/2024 13:24, Gaspar Bartalus wrote:
> > Dear Jena support team,
> >
> > We would like to ask you to help us in configuring the memory for our
> > jena-fuseki instance running in kubernetes.
> >
> > *We have the following setup:*
> >
> > * Jena-fuseki deployed as StatefulSet to a k8s cluster with the
> > resource config:
> >
> > Limits:
> >   cpu:     2
> >   memory:  16Gi
> > Requests:
> >   cpu:     100m
> >   memory:  11Gi
> >
> > * The JVM_ARGS has the following value: -Xmx10G
> >
> > * Our main dataset of type TDB2 contains ~1 million triples.
> A million triples doesn't take up much RAM even in a memory dataset.
>
> In Java, the JVM will grow until it is close to the -Xmx figure. A major
> GC will then free up a lot of memory. But the JVM does not give the
> memory back to the kernel.
>
> TDB2 does not only use heap space. A heap of 2-4G is usually enough per
> dataset, sometimes less (data shape depenendent - e.g. many large
> literals used more space.
>
> Use a profiler to examine the heap in-use, you'll probably see a
> saw-tooth shape.
> Force a GC and see the level of in-use memory afterwards.
> Add some safety margin and work space for requests and try that as the
> heap size.
>
> > *  We execute the following type of UPDATE operations:
> >    - There are triggers in the system (e.g. users of the application
> > changing the data) which start ~50 other update operations containing
> > up to ~30K triples. Most of them run in parallel, some are delayed
> > with seconds or minutes.
> >    - There are scheduled UPDATE operations (executed on hourly basis)
> > containing 30K-500K triples.
> >    - These UPDATE operations usually delete and insert the same amount
> > of triples in the dataset. We use the compact API as a nightly job.
> >
> > *We are noticing the following behaviour:*
> >
> > * Fuseki consumes 5-10G of heap memory continuously, as configured in
> > the JVM_ARGS.
> >
> > * There are points in time when the volume usage of the k8s container
> > starts to increase suddenly. This does not drop even though compaction
> > is successfully executed and the dataset size (triple count) does not
> > increase. See attachment below.
> >
> > *Our suspicions:*
> >
> > * garbage collection in Java is often delayed; memory is not freed as
> > quickly as we would expect it, and the heap limit is reached quickly
> > if multiple parallel queries are run
> > * long running database queries can send regular memory to Gen2, that
> > is not actively cleaned by the garbage collector
> > * memory-mapped files are also garbage-collected (and perhaps they
> > could go to Gen2 as well, using more and more storage space).
> >
> > Could you please explain the possible reasons behind such a behaviour?
> > And finally could you please suggest a more appropriate configuration
> > for our use case?
> >
> > Thanks in advance and best wishes,
> > Gaspar Bartalus
> >
>

Re: Requesting advice on Fuseki memory settings

Reply via email to