Hello, I am using Apache Jena v4.3.2 + Fuseki + TDB2 persistent disk storage. I am using jena RDFConnection to connect to the Fuseki server. I am sending 50k triples in one update. This is mostly new data(only a few triples will match with existing data). These data are instances based on an ontology. Please have a look at the attached file containing how much disk memory increases with each update. For 1.5million triples, it took around 1.2GB. We want to store around a few billions of triples. Thus the bytes/triple ratio won't be good for our use case.
When I used the tdb2.tdbcompact tool, the data volume shrinked to 400MB. But this extra step needs to be performed manually to optimise the storage. My questions are as follows: 1. Why 30 update queries each of 50k triples take 3 times more memory than a single update query of 1500k triples? Data getting stored is the same but memory consumed is more in the first case. 2. Is there any other way to solve this memory problem? 3. What are the existing strategies that can be used to optimise the storage memory while writing data? 4. Is there any new development going on to use less memory for the write/update query? Thanks, Vinay Mahamuni
triples in thousands,memory increase,total Disk memory consumed 0,0,201.3 57.04,1.24,202.54 106.95,0.58,203.12 156.86,0.59,203.71 206.77,17.36,221.07 256.68,25.75,246.82 306.59,8.97,255.79 356.50,17.37,273.16 406.41,25.75,298.91 456.32,8.97,307.88 506.23,34.14,342.02 556.14,25.75,367.77 606.05,25.75,393.52 655.96,25.75,419.27 705.87,42.53,461.8 755.78,34.14,495.94 805.69,34.14,530.08 855.60,34.14,564.22 905.51,34.14,598.36 955.42,50.91,649.27 1005.33,42.53,691.8 1055.24,50.92,742.72 1105.15,42.53,785.25 1155.06,50.91,836.16 1204.97,50.92,887.08 1254.88,50.92,938 1304.79,59.3,997.3 1354.70,50.92,1048.22 1404.61,50.91,1099.13 1454.52,67.7,1166.83 1504.43,59.3,1226.13
