Re: Requesting advice on Fuseki memory settings

Andy Seaborne Sun, 24 Mar 2024 08:00:23 -0700



On 21/03/2024 09:52, Rob @ DNR wrote:

Gaspar

This probably relates to https://access.redhat.com/solutions/2316

Deleting a file removes it from the file table but doesn’t immediately free the 
space if a process is still accessing those files.  That could be something 
else inside the container, or in a containerised environment where the disk 
space is mounted that could potentially be host processes on the K8S node that 
are monitoring the storage.

There’s some suggested debugging steps in the RedHat article about ways to 
figure out what processes might still be holding onto the old database files

Rob

Fuseki does close the database connections after compact but only afterall read transactions on the old database have completed. that can holdthe database open for a while.

Another delay is the ext4 filing system. Deletes will be in the journaland only when the journal operations are performed will the file systembe released. Usually this happens quickly, but I've seen it take anappreciable length of time occasionally.


Gaspar wrote:
> then we start fresh where du -sh and df -h return the same numbers.

This indicates the file space has been release. Restarting clears anyoutstanding read-transactions and likely gives the ext4 journal to runthrough.

Just about any layer (K8s, VMs) adds delays to real release of the spacebut it should happen eventually.


    Andy

From: Gaspar Bartalus <[email protected]>
Date: Wednesday, 20 March 2024 at 11:41
To: [email protected] <[email protected]>
Subject: Re: Requesting advice on Fuseki memory settings
Hi Andy

On Sat, Mar 16, 2024 at 8:58 PM Andy Seaborne <[email protected]> wrote:



On 12/03/2024 13:17, Gaspar Bartalus wrote:

On Mon, Mar 11, 2024 at 6:28 PM Andy Seaborne<[email protected]>  wrote:


On 11/03/2024 14:35, Gaspar Bartalus wrote:

Hi Andy,

On Fri, Mar 8, 2024 at 4:41 PM Andy Seaborne<[email protected]>  wrote:


On 08/03/2024 10:40, Gaspar Bartalus wrote:

Hi,

Thanks for the responses.

We were actually curious if you'd have some explanation for the
linear increase in the storage, and why we are seeing differences

between

the actual size of our dataset and the size it uses on disk. (Changes
between `df -h` and `du -lh`)?

Linear increase between compactions or across compactions? The latter
sounds like the previous version hasn't been deleted.

Across compactions, increasing linearly over several days, with

compactions

running every day. The compaction is used with the "deleteOld"

parameter,

and there is only one Data- folder in the volume, so I assume

compaction

itself works as expected.

Strange - I can't explain that. Could you check that there is only one
Data-NNNN directory inside the database directory?

Yes, there is surely just one Data-NNNN folder in the database directory.

What's the disk storage setup? e.g filesystem type.

We have an Azure disk of type Standard SSD LRS with a filesystem of type
Ext4.


Hi Gaspar,

I still can't explain what your seeing I'm afraid.

Can we get some more details?

When the server has Data-N -- how big (as reported by 'du -sh') is that
directory and how big is the whole directory for the database. They
should be nearly equal.

When a compaction is done, and the server is at Data-(N+1), what are the
sizes of Data-(N+1) and the database directory?


What we see with respect to compaction is usually the following:
- We start with the Data-N folder of ~210MB
- After compaction we have a Data-(N+1) folder of size ~185MB, the old
Data-N being deleted.
- The sizes of the database directory and the Data-* directory are equal.

However when we check with df -h we sometimes see that volume usage is not
dropping, but on the contrary, it goes up ~140MB after each compaction.


Does stop/starting the server change those numbers?


Yes, then we start fresh where du -sh and df -h return the same numbers.


      Andy

Re: Requesting advice on Fuseki memory settings

Reply via email to