Re: [MASSMAIL]Re: Large *.dat files in Fuseki

Bartalus Gáspár Wed, 06 Jul 2022 01:55:00 -0700

The 3 dat files that are growing significantly are SPO.dat, OSP.dat and POS.dat 
ordered by size.


> On 6 Jul 2022, at 11:36, Bartalus Gáspár 
> <[email protected]> wrote:
> 
> Hi Lorenz,
> 
> Thanks for quick feedback and clarification on lucene indexes.
> 
> Here are my answers to your questions:
> - We are uploading 7 ttl files to our dataset, where 1 is larger 6Mb, the 
> others are below 200Kb.
> - The overall number of triples after data upload is  ~150000.
> - We have around 10 SPARQL UPDATE queries that are executed on a regular and 
> frequent basis, i.e. every 5 seconds. We also have 5 such queries that are 
> executed each minute. But most of the time they do not produce any outcome, 
> i.e. the dataset is not altered, and when they do, there are just a couple of 
> triples that are added to the dataset.
> - These *.dat files start from ~10Mb in size, and after a day or so some of 
> them grow to ~10Gb.
> 
> We have ~300 blank nodes, and ~half of the triples have a literal in the 
> object position, so ~75000.
> 
> Best regards,
> Gaspar
> 
> 
> 
>> On 6 Jul 2022, at 10:55, Lorenz Buehmann 
>> <[email protected]> wrote:
>> 
>> Hi and welcome Gaspar.
>> 
>> 
>> Those files do contain the node tables.
>> 
>> A Lucene index is never computed by default and would be contained in Lucene 
>> specific index files.
>> 
>> 
>> Can you give some details about the
>> 
>> - size of the files
>> - the number of triples
>> - the number triples added/removed/changed
>> - the frequency of updates
>> - how much the files grow
>> - what kind of data you insert? Lots of blank nodes? Or literals?
>> 
>> Also, did you try a compact operation during time?
>> 
>> Lorenz
>> 
>> On 06.07.22 09:40, Bartalus Gáspár wrote:
>>> Hi Jena support team,
>>> 
>>> We are experiencing an issue with Jena Fuseki databases. In the databases 
>>> folder we see some files called SPO.dat, OSP.dat, etc., and the size of 
>>> these files are growing quickly. From our understanding these files are 
>>> containing the Lucene indexes. We would have two questions:
>>> 
>>> 1. Why are these files growing rapidly, although the underlying data 
>>> (triples) are not being changed, or only slightly changed?
>>> 2. Can we disable indexing easily, since we are not using full text 
>>> searches in our SPARQL queries?
>>> 
>>> Our usage of Jena Fuseki:
>>> 
>>> * Start the server with `fuseki-server —port 3030`
>>> * Create databases with HTTP POST to 
>>> `/$/datasets?state=active&dbType=tdb2&dbName=db_name`
>>> * Upload ttl files with HTTP POST to /db_name/data
>>> 
>>> Thanks in advance for your feedback, and if you’d require more input from 
>>> our side, please let me know.
>>> 
>>> Best regards,
>>> Gaspar Bartalus
>>> 
>

smime.p7s
Description: S/MIME cryptographic signature

Re: [MASSMAIL]Re: Large *.dat files in Fuseki

Reply via email to