Thank you Rob for the confirmation. Some monthly graph export could be an option, to get an second opinion.

Br

On 4.6.2018 16:01, Rob Vesse wrote:
That's usually what I see done in the literature

Accounting for the exact amount of disk usage it's difficult for a number of 
reasons:

- Terms are dictionary encoded, so each URI, literal and blank node identifier is stored 
only once and mapped to an internal constant size identifier (64 bits for TBD1). So 
however many times a term is used its storage is its encoded size plus N times the 
identifier size. So how "shared" disk usage contributes to an individual graph 
is subject to interpretation
- Similarly there is no reference counting for terms. So if data is deleted 
from a graph some of the disk usage is never reclaimed, and there is no way to 
track this. On the other hand if you want to know how many times a given term 
is used you need to query the database to find that out.
- Index size will vary depending upon the data, including how it was loaded and 
how many updates have happened. For example tdbloader2 will produce maximally 
packed indices but as soon as you start running updates the indexes will expand 
as the B+Trees get rebalanced. And again how do you account for the overhead of 
the on disk idnex data structures?

One "hack" might be to export the graph in question, import it into a separate 
TDB instance and get the disk size of that. However as explained above you would end up 
over estimating to some extent.

Rob

On 04/06/2018, 13:18, "Mikael Pesonen" <[email protected]> wrote:

Hi, what would be best way to estimate how much disk space (bytes) a single
     graph is using in Fuseki?
Only option that came to mind is to get entire db disk usage with Linux
     system call and take the same proportion as there are triplets in the
     graph vs in all graphs. That would be a rough estimate.
Thank you --
     Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books Mikael Pesonen
     System Engineer
e-mail: [email protected]
     Tel. +358 2 279 3300
Time zone: GMT+2 Helsinki Office
     Eteläranta 10
     FI-00130 Helsinki
     FINLAND
Turku Office
     Kauppiaskatu 5 A
     FI-20100 Turku
     FINLAND




--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: [email protected]
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Reply via email to