Thank you Rob for the confirmation. Some monthly graph export could be
an option, to get an second opinion.
Br
On 4.6.2018 16:01, Rob Vesse wrote:
That's usually what I see done in the literature
Accounting for the exact amount of disk usage it's difficult for a number of
reasons:
- Terms are dictionary encoded, so each URI, literal and blank node identifier is stored
only once and mapped to an internal constant size identifier (64 bits for TBD1). So
however many times a term is used its storage is its encoded size plus N times the
identifier size. So how "shared" disk usage contributes to an individual graph
is subject to interpretation
- Similarly there is no reference counting for terms. So if data is deleted
from a graph some of the disk usage is never reclaimed, and there is no way to
track this. On the other hand if you want to know how many times a given term
is used you need to query the database to find that out.
- Index size will vary depending upon the data, including how it was loaded and
how many updates have happened. For example tdbloader2 will produce maximally
packed indices but as soon as you start running updates the indexes will expand
as the B+Trees get rebalanced. And again how do you account for the overhead of
the on disk idnex data structures?
One "hack" might be to export the graph in question, import it into a separate
TDB instance and get the disk size of that. However as explained above you would end up
over estimating to some extent.
Rob
On 04/06/2018, 13:18, "Mikael Pesonen" <[email protected]> wrote:
Hi,
what would be best way to estimate how much disk space (bytes) a single
graph is using in Fuseki?
Only option that came to mind is to get entire db disk usage with Linux
system call and take the same proportion as there are triplets in the
graph vs in all graphs. That would be a rough estimate.
Thank you
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's
Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND