Hi Rob, Thanks for your advice! We performed the compaction on the database. The data was loaded in separate parts: we first loaded the first batch of 18 out of 32, with each part being about 3-4 GB, using a Bash loop with s-post to load all files into Fuseki. After the optimization, we managed to reduce the database size to about 90 GB, which seems like a good result.
The first part contains 43,318,090 triples, and the other parts are similar in size and content. We’ll keep monitoring the database, but for now, it looks much better. Do you think there are any other areas we should focus on, or any additional steps we could take for further optimization? Best, Maria Pereira & Francesco Bruno >>> "Rob @ DNR" <rve...@dotnetrdf.org> 02/10/25 11:53 AM >>> Without knowing anything about the contents of those files it is hard to say if those numbers are expected, as there aren’t any general rules of thumb about how big the database should be relative to the input data. It depends heavily on the input data contents, how the input data was loaded etc. Were each of these files uploaded separately, or as separate transactions? You could try compacting (https://jena.apache.org/documentation/tdb2/tdb2_admin.html) the database to see if that helps. TDB2 is implemented using copy on write data structures, so each new write transaction will expand the size of the database because it takes copies of existing data blocks before modifying them as there may be ongoing read transactions that need the original blocks still. A compaction rewrites the database to keep only current blocks, discarding all the old blocks that are no longer referenced by the current state of the database. This requires an exclusive write lock on the database so can only be done either during server downtime, or quiet periods. Given 47GB of total data there was probably a lot of copy on write churn that happened during the data load, and I’d expect that the compaction would bring that size down substantially. Hope this helps, Rob From: Francesco Bruno <francesco.br...@bsb-muenchen.de> Date: Monday, 10 February 2025 at 10:22 To: users@jena.apache.org <users@jena.apache.org> Subject: Question Regarding Large Index Size in Fuseki Dear Apache Jena Team, We recently uploaded 18 TTL files totaling 47GB to our Fuseki instance. However, we noticed that the resulting index size is significantly larger - around 296GB. We have deactivated the GSPO, GPOS, and GOSP indexes, yet the size remains quite large. Could you confirm if this is expected behavior? Are there any optimizations or configurations we could apply to reduce the index size? Thank you for your time and support. Best regards, Maria Pereira & Francesco Bruno