How to optimize TDB disk storage?

Vinay Mahamuni Wed, 26 Jan 2022 22:15:11 -0800

Hello,

I am using Apache Jena v4.3.2 + Fuseki + TDB2 persistent disk storage. I am
using jena RDFConnection to connect to the Fuseki server. I am sending 50k
triples in one update. This is mostly new data(only a few triples will
match with existing data). These data are instances based on an ontology.
Please have a look at the attached file containing how much disk memory
increases with each update. For 1.5million triples, it took around 1.2GB.
We want to store around a few billions of triples. Thus the bytes/triple
ratio won't be good for our use case.


When I used the tdb2.tdbcompact tool, the data volume shrinked to 400MB.
But this extra step needs to be performed manually to optimise the storage.

My questions are as follows:

   1. Why 30 update queries each of 50k triples take 3 times more memory
   than a single update query of 1500k triples? Data getting stored is the
   same but memory consumed is more in the first case.
   2. Is there any other way to solve this memory problem?
   3. What are the existing strategies that can be used to optimise the
   storage memory while writing data?
   4. Is there any new development going on to use less memory for the
   write/update query?


Thanks,
Vinay Mahamuni

triples in thousands,memory increase,total Disk memory consumed
0,0,201.3
57.04,1.24,202.54
106.95,0.58,203.12
156.86,0.59,203.71
206.77,17.36,221.07
256.68,25.75,246.82
306.59,8.97,255.79
356.50,17.37,273.16
406.41,25.75,298.91
456.32,8.97,307.88
506.23,34.14,342.02
556.14,25.75,367.77
606.05,25.75,393.52
655.96,25.75,419.27
705.87,42.53,461.8
755.78,34.14,495.94
805.69,34.14,530.08
855.60,34.14,564.22
905.51,34.14,598.36
955.42,50.91,649.27
1005.33,42.53,691.8
1055.24,50.92,742.72
1105.15,42.53,785.25
1155.06,50.91,836.16
1204.97,50.92,887.08
1254.88,50.92,938
1304.79,59.3,997.3
1354.70,50.92,1048.22
1404.61,50.91,1099.13
1454.52,67.7,1166.83
1504.43,59.3,1226.13

How to optimize TDB disk storage?

Reply via email to