On 21/03/2024 09:52, Rob @ DNR wrote:
Gaspar
This probably relates to https://access.redhat.com/solutions/2316
Deleting a file removes it from the file table but doesn’t immediately free the
space if a process is still accessing those files. That could be something
else inside the container, or in a containerised environment where the disk
space is mounted that could potentially be host processes on the K8S node that
are monitoring the storage.
>
There’s some suggested debugging steps in the RedHat article about ways to
figure out what processes might still be holding onto the old database files
Rob
Fuseki does close the database connections after compact but only after
all read transactions on the old database have completed. that can hold
the database open for a while.
Another delay is the ext4 filing system. Deletes will be in the journal
and only when the journal operations are performed will the file system
be released. Usually this happens quickly, but I've seen it take an
appreciable length of time occasionally.
Gaspar wrote:
> then we start fresh where du -sh and df -h return the same numbers.
This indicates the file space has been release. Restarting clears any
outstanding read-transactions and likely gives the ext4 journal to run
through.
Just about any layer (K8s, VMs) adds delays to real release of the space
but it should happen eventually.
Andy
From: Gaspar Bartalus <[email protected]>
Date: Wednesday, 20 March 2024 at 11:41
To: [email protected] <[email protected]>
Subject: Re: Requesting advice on Fuseki memory settings
Hi Andy
On Sat, Mar 16, 2024 at 8:58 PM Andy Seaborne <[email protected]> wrote:
On 12/03/2024 13:17, Gaspar Bartalus wrote:
On Mon, Mar 11, 2024 at 6:28 PM Andy Seaborne<[email protected]> wrote:
On 11/03/2024 14:35, Gaspar Bartalus wrote:
Hi Andy,
On Fri, Mar 8, 2024 at 4:41 PM Andy Seaborne<[email protected]> wrote:
On 08/03/2024 10:40, Gaspar Bartalus wrote:
Hi,
Thanks for the responses.
We were actually curious if you'd have some explanation for the
linear increase in the storage, and why we are seeing differences
between
the actual size of our dataset and the size it uses on disk. (Changes
between `df -h` and `du -lh`)?
Linear increase between compactions or across compactions? The latter
sounds like the previous version hasn't been deleted.
Across compactions, increasing linearly over several days, with
compactions
running every day. The compaction is used with the "deleteOld"
parameter,
and there is only one Data- folder in the volume, so I assume
compaction
itself works as expected.
Strange - I can't explain that. Could you check that there is only one
Data-NNNN directory inside the database directory?
Yes, there is surely just one Data-NNNN folder in the database directory.
What's the disk storage setup? e.g filesystem type.
We have an Azure disk of type Standard SSD LRS with a filesystem of type
Ext4.
Hi Gaspar,
I still can't explain what your seeing I'm afraid.
Can we get some more details?
When the server has Data-N -- how big (as reported by 'du -sh') is that
directory and how big is the whole directory for the database. They
should be nearly equal.
When a compaction is done, and the server is at Data-(N+1), what are the
sizes of Data-(N+1) and the database directory?
What we see with respect to compaction is usually the following:
- We start with the Data-N folder of ~210MB
- After compaction we have a Data-(N+1) folder of size ~185MB, the old
Data-N being deleted.
- The sizes of the database directory and the Data-* directory are equal.
However when we check with df -h we sometimes see that volume usage is not
dropping, but on the contrary, it goes up ~140MB after each compaction.
Does stop/starting the server change those numbers?
Yes, then we start fresh where du -sh and df -h return the same numbers.
Andy