Re: Requesting advice on Fuseki memory settings

2024-03-07 Thread Martynas Jusevičius
If it helps, I have a setup I have used to profile Fuseki in VisualVM:
https://github.com/AtomGraph/fuseki-docker



On Thu, 7 Mar 2024 at 22.55, Andy Seaborne  wrote:

>
>
> On 07/03/2024 13:24, Gaspar Bartalus wrote:
> > Dear Jena support team,
> >
> > We would like to ask you to help us in configuring the memory for our
> > jena-fuseki instance running in kubernetes.
> >
> > *We have the following setup:*
> >
> > * Jena-fuseki deployed as StatefulSet to a k8s cluster with the
> > resource config:
> >
> > Limits:
> >   cpu: 2
> >   memory:  16Gi
> > Requests:
> >   cpu: 100m
> >   memory:  11Gi
> >
> > * The JVM_ARGS has the following value: -Xmx10G
> >
> > * Our main dataset of type TDB2 contains ~1 million triples.
> A million triples doesn't take up much RAM even in a memory dataset.
>
> In Java, the JVM will grow until it is close to the -Xmx figure. A major
> GC will then free up a lot of memory. But the JVM does not give the
> memory back to the kernel.
>
> TDB2 does not only use heap space. A heap of 2-4G is usually enough per
> dataset, sometimes less (data shape depenendent - e.g. many large
> literals used more space.
>
> Use a profiler to examine the heap in-use, you'll probably see a
> saw-tooth shape.
> Force a GC and see the level of in-use memory afterwards.
> Add some safety margin and work space for requests and try that as the
> heap size.
>
> > *  We execute the following type of UPDATE operations:
> >- There are triggers in the system (e.g. users of the application
> > changing the data) which start ~50 other update operations containing
> > up to ~30K triples. Most of them run in parallel, some are delayed
> > with seconds or minutes.
> >- There are scheduled UPDATE operations (executed on hourly basis)
> > containing 30K-500K triples.
> >- These UPDATE operations usually delete and insert the same amount
> > of triples in the dataset. We use the compact API as a nightly job.
> >
> > *We are noticing the following behaviour:*
> >
> > * Fuseki consumes 5-10G of heap memory continuously, as configured in
> > the JVM_ARGS.
> >
> > * There are points in time when the volume usage of the k8s container
> > starts to increase suddenly. This does not drop even though compaction
> > is successfully executed and the dataset size (triple count) does not
> > increase. See attachment below.
> >
> > *Our suspicions:*
> >
> > * garbage collection in Java is often delayed; memory is not freed as
> > quickly as we would expect it, and the heap limit is reached quickly
> > if multiple parallel queries are run
> > * long running database queries can send regular memory to Gen2, that
> > is not actively cleaned by the garbage collector
> > * memory-mapped files are also garbage-collected (and perhaps they
> > could go to Gen2 as well, using more and more storage space).
> >
> > Could you please explain the possible reasons behind such a behaviour?
> > And finally could you please suggest a more appropriate configuration
> > for our use case?
> >
> > Thanks in advance and best wishes,
> > Gaspar Bartalus
> >
>


Re: Requesting advice on Fuseki memory settings

2024-03-07 Thread Andy Seaborne



On 07/03/2024 13:24, Gaspar Bartalus wrote:

Dear Jena support team,

We would like to ask you to help us in configuring the memory for our 
jena-fuseki instance running in kubernetes.


*We have the following setup:*

* Jena-fuseki deployed as StatefulSet to a k8s cluster with the 
resource config:


Limits:
  cpu:     2
  memory:  16Gi
Requests:
  cpu:     100m
  memory:  11Gi

* The JVM_ARGS has the following value: -Xmx10G

* Our main dataset of type TDB2 contains ~1 million triples.

A million triples doesn't take up much RAM even in a memory dataset.

In Java, the JVM will grow until it is close to the -Xmx figure. A major 
GC will then free up a lot of memory. But the JVM does not give the 
memory back to the kernel.


TDB2 does not only use heap space. A heap of 2-4G is usually enough per 
dataset, sometimes less (data shape depenendent - e.g. many large 
literals used more space.


Use a profiler to examine the heap in-use, you'll probably see a 
saw-tooth shape.

Force a GC and see the level of in-use memory afterwards.
Add some safety margin and work space for requests and try that as the 
heap size.



*  We execute the following type of UPDATE operations:
   - There are triggers in the system (e.g. users of the application 
changing the data) which start ~50 other update operations containing 
up to ~30K triples. Most of them run in parallel, some are delayed 
with seconds or minutes.
   - There are scheduled UPDATE operations (executed on hourly basis) 
containing 30K-500K triples.
   - These UPDATE operations usually delete and insert the same amount 
of triples in the dataset. We use the compact API as a nightly job.


*We are noticing the following behaviour:*

* Fuseki consumes 5-10G of heap memory continuously, as configured in 
the JVM_ARGS.


* There are points in time when the volume usage of the k8s container 
starts to increase suddenly. This does not drop even though compaction 
is successfully executed and the dataset size (triple count) does not 
increase. See attachment below.


*Our suspicions:*

* garbage collection in Java is often delayed; memory is not freed as 
quickly as we would expect it, and the heap limit is reached quickly 
if multiple parallel queries are run
* long running database queries can send regular memory to Gen2, that 
is not actively cleaned by the garbage collector
* memory-mapped files are also garbage-collected (and perhaps they 
could go to Gen2 as well, using more and more storage space).


Could you please explain the possible reasons behind such a behaviour?
And finally could you please suggest a more appropriate configuration 
for our use case?


Thanks in advance and best wishes,
Gaspar Bartalus



Requesting advice on Fuseki memory settings

2024-03-07 Thread Gaspar Bartalus
Dear Jena support team,

We would like to ask you to help us in configuring the memory for our
jena-fuseki instance running in kubernetes.

*We have the following setup:*

* Jena-fuseki deployed as StatefulSet to a k8s cluster with the resource
config:

Limits:


  cpu: 2


  memory:  16Gi


Requests:


  cpu: 100m


  memory:  11Gi



* The JVM_ARGS has the following value: -Xmx10G

* Our main dataset of type TDB2 contains ~1 million triples.

*  We execute the following type of UPDATE operations:
   - There are triggers in the system (e.g. users of the application
changing the data) which start ~50 other update operations containing up to
~30K triples. Most of them run in parallel, some are delayed with seconds
or minutes.
   - There are scheduled UPDATE operations (executed on hourly basis)
containing 30K-500K triples.
   - These UPDATE operations usually delete and insert the same amount of
triples in the dataset. We use the compact API as a nightly job.

*We are noticing the following behaviour:*

* Fuseki consumes 5-10G of heap memory continuously, as configured in the
JVM_ARGS.

* There are points in time when the volume usage of the k8s container
starts to increase suddenly. This does not drop even though compaction is
successfully executed and the dataset size (triple count) does not
increase. See attachment below.

*Our suspicions:*

* garbage collection in Java is often delayed; memory is not freed as
quickly as we would expect it, and the heap limit is reached quickly if
multiple parallel queries are run
* long running database queries can send regular memory to Gen2, that is
not actively cleaned by the garbage collector
* memory-mapped files are also garbage-collected (and perhaps they could go
to Gen2 as well, using more and more storage space).

Could you please explain the possible reasons behind such a behaviour?
And finally could you please suggest a more appropriate configuration for
our use case?

Thanks in advance and best wishes,
Gaspar Bartalus