Hi Rob,

Thanks for your advice! We performed the compaction on the database. The data 
was loaded in separate parts: we first loaded the first batch of 18 out of 32, 
with each part being about 3-4 GB, using a Bash loop with s-post to load all 
files into Fuseki. After the optimization, we managed to reduce the database 
size to about 90 GB, which seems like a good result.

The first part contains 43,318,090 triples, and the other parts are similar in 
size and content.

We’ll keep monitoring the database, but for now, it looks much better. Do you 
think there are any other areas we should focus on, or any additional steps we 
could take for further optimization?

Best,

Maria Pereira & Francesco Bruno

>>> "Rob @ DNR" <rve...@dotnetrdf.org> 02/10/25 11:53 AM >>>
Without knowing anything about the contents of those files it is hard to say if 
those numbers are expected, as there aren’t any general rules of thumb about 
how big the database should be relative to the input data.  It depends heavily 
on the input data contents, how the input data was loaded etc.

Were each of these files uploaded separately, or as separate transactions?

You could try compacting 
(https://jena.apache.org/documentation/tdb2/tdb2_admin.html) the database to 
see if that helps.

TDB2 is implemented using copy on write data structures, so each new write 
transaction will expand the size of the database because it takes copies of 
existing data blocks before modifying them as there may be ongoing read 
transactions that need the original blocks still.  A compaction rewrites the 
database to keep only current blocks, discarding all the old blocks that are no 
longer referenced by the current state of the database.  This requires an 
exclusive write lock on the database so can only be done either during server 
downtime, or quiet periods.

Given 47GB of total data there was probably a lot of copy on write churn that 
happened during the data load, and I’d expect that the compaction would bring 
that size down substantially.

Hope this helps,

Rob

From: Francesco Bruno <francesco.br...@bsb-muenchen.de>
Date: Monday, 10 February 2025 at 10:22
To: users@jena.apache.org <users@jena.apache.org>
Subject: Question Regarding Large Index Size in Fuseki
Dear Apache Jena Team,

We recently uploaded 18 TTL files totaling 47GB to our Fuseki instance.
However, we noticed that the resulting index size is significantly
larger - around 296GB. We have deactivated the GSPO, GPOS, and GOSP
indexes, yet the size remains quite large.

Could you confirm if this is expected behavior? Are there any
optimizations or configurations we could apply to reduce the index size?

Thank you for your time and support.

Best regards,
Maria Pereira & Francesco Bruno

Reply via email to