The 3 dat files that are growing significantly are SPO.dat, OSP.dat and POS.dat ordered by size.
> On 6 Jul 2022, at 11:36, Bartalus Gáspár > <[email protected]> wrote: > > Hi Lorenz, > > Thanks for quick feedback and clarification on lucene indexes. > > Here are my answers to your questions: > - We are uploading 7 ttl files to our dataset, where 1 is larger 6Mb, the > others are below 200Kb. > - The overall number of triples after data upload is ~150000. > - We have around 10 SPARQL UPDATE queries that are executed on a regular and > frequent basis, i.e. every 5 seconds. We also have 5 such queries that are > executed each minute. But most of the time they do not produce any outcome, > i.e. the dataset is not altered, and when they do, there are just a couple of > triples that are added to the dataset. > - These *.dat files start from ~10Mb in size, and after a day or so some of > them grow to ~10Gb. > > We have ~300 blank nodes, and ~half of the triples have a literal in the > object position, so ~75000. > > Best regards, > Gaspar > > > >> On 6 Jul 2022, at 10:55, Lorenz Buehmann >> <[email protected]> wrote: >> >> Hi and welcome Gaspar. >> >> >> Those files do contain the node tables. >> >> A Lucene index is never computed by default and would be contained in Lucene >> specific index files. >> >> >> Can you give some details about the >> >> - size of the files >> - the number of triples >> - the number triples added/removed/changed >> - the frequency of updates >> - how much the files grow >> - what kind of data you insert? Lots of blank nodes? Or literals? >> >> Also, did you try a compact operation during time? >> >> Lorenz >> >> On 06.07.22 09:40, Bartalus Gáspár wrote: >>> Hi Jena support team, >>> >>> We are experiencing an issue with Jena Fuseki databases. In the databases >>> folder we see some files called SPO.dat, OSP.dat, etc., and the size of >>> these files are growing quickly. From our understanding these files are >>> containing the Lucene indexes. We would have two questions: >>> >>> 1. Why are these files growing rapidly, although the underlying data >>> (triples) are not being changed, or only slightly changed? >>> 2. Can we disable indexing easily, since we are not using full text >>> searches in our SPARQL queries? >>> >>> Our usage of Jena Fuseki: >>> >>> * Start the server with `fuseki-server —port 3030` >>> * Create databases with HTTP POST to >>> `/$/datasets?state=active&dbType=tdb2&dbName=db_name` >>> * Upload ttl files with HTTP POST to /db_name/data >>> >>> Thanks in advance for your feedback, and if you’d require more input from >>> our side, please let me know. >>> >>> Best regards, >>> Gaspar Bartalus >>> >
smime.p7s
Description: S/MIME cryptographic signature
