Hi Andy,

On Fri, Mar 8, 2024 at 4:41 PM Andy Seaborne <[email protected]> wrote:

>
>
> On 08/03/2024 10:40, Gaspar Bartalus wrote:
> > Hi,
> >
> > Thanks for the responses.
> >
> > We were actually curious if you'd have some explanation for the
> > linear increase in the storage, and why we are seeing differences between
> > the actual size of our dataset and the size it uses on disk. (Changes
> > between `df -h` and `du -lh`)?
>
> Linear increase between compactions or across compactions? The latter
> sounds like the previous version hasn't been deleted.
>

Across compactions, increasing linearly over several days, with compactions
running every day. The compaction is used with the "deleteOld" parameter,
and there is only one Data- folder in the volume, so I assume compaction
itself works as expected.

>
> TDB uses sparse files. It allocates 8M chunks per index but that isn't
> used immediately. Sparse files are reported differently by different
> tools and also differently by different operating systems. I don't know
> how k3s is managing the storage.
>
> Sometimes it's the size of the file, sometimes it's the amount of space
> in use. For small databases, there is quite a difference.
>
> An empty database is around 220kbytes but you'll see many 8Mbyte files
> with "ls -l".
>
> If you zip the database up, and unpack it then it's 193Mbytes.
>
> After a compaction, the previous version of storage can be deleted. The
> directory "Data-..." - only the highest numbered directory is used. A
> previous one can be zipped up for backup.
>
> > The heap memory has some very minimal peaks, saw-tooth, but otherwise
> it's
> > flat.
>
> At what amount of memory?
>

At ~7GB.

>
> >
> > Regards,
> > Gaspar
> >
> > On Thu, Mar 7, 2024 at 11:55 PM Andy Seaborne <[email protected]> wrote:
> >
> >>
> >>
> >> On 07/03/2024 13:24, Gaspar Bartalus wrote:
> >>> Dear Jena support team,
> >>>
> >>> We would like to ask you to help us in configuring the memory for our
> >>> jena-fuseki instance running in kubernetes.
> >>>
> >>> *We have the following setup:*
> >>>
> >>> * Jena-fuseki deployed as StatefulSet to a k8s cluster with the
> >>> resource config:
> >>>
> >>> Limits:
> >>>    cpu:     2
> >>>    memory:  16Gi
> >>> Requests:
> >>>    cpu:     100m
> >>>    memory:  11Gi
> >>>
> >>> * The JVM_ARGS has the following value: -Xmx10G
> >>>
> >>> * Our main dataset of type TDB2 contains ~1 million triples.
> >> A million triples doesn't take up much RAM even in a memory dataset.
> >>
> >> In Java, the JVM will grow until it is close to the -Xmx figure. A major
> >> GC will then free up a lot of memory. But the JVM does not give the
> >> memory back to the kernel.
> >>
> >> TDB2 does not only use heap space. A heap of 2-4G is usually enough per
> >> dataset, sometimes less (data shape depenendent - e.g. many large
> >> literals used more space.
> >>
> >> Use a profiler to examine the heap in-use, you'll probably see a
> >> saw-tooth shape.
> >> Force a GC and see the level of in-use memory afterwards.
> >> Add some safety margin and work space for requests and try that as the
> >> heap size.
> >>
> >>> *  We execute the following type of UPDATE operations:
> >>>     - There are triggers in the system (e.g. users of the application
> >>> changing the data) which start ~50 other update operations containing
> >>> up to ~30K triples. Most of them run in parallel, some are delayed
> >>> with seconds or minutes.
> >>>     - There are scheduled UPDATE operations (executed on hourly basis)
> >>> containing 30K-500K triples.
> >>>     - These UPDATE operations usually delete and insert the same amount
> >>> of triples in the dataset. We use the compact API as a nightly job.
> >>>
> >>> *We are noticing the following behaviour:*
> >>>
> >>> * Fuseki consumes 5-10G of heap memory continuously, as configured in
> >>> the JVM_ARGS.
> >>>
> >>> * There are points in time when the volume usage of the k8s container
> >>> starts to increase suddenly. This does not drop even though compaction
> >>> is successfully executed and the dataset size (triple count) does not
> >>> increase. See attachment below.
> >>>
> >>> *Our suspicions:*
> >>>
> >>> * garbage collection in Java is often delayed; memory is not freed as
> >>> quickly as we would expect it, and the heap limit is reached quickly
> >>> if multiple parallel queries are run
> >>> * long running database queries can send regular memory to Gen2, that
> >>> is not actively cleaned by the garbage collector
> >>> * memory-mapped files are also garbage-collected (and perhaps they
> >>> could go to Gen2 as well, using more and more storage space).
> >>>
> >>> Could you please explain the possible reasons behind such a behaviour?
> >>> And finally could you please suggest a more appropriate configuration
> >>> for our use case?
> >>>
> >>> Thanks in advance and best wishes,
> >>> Gaspar Bartalus
> >>>
> >>
> >
>

Reply via email to