Yes, there is some truth in that.

TDB1 uses a dictionary that maps node IDs to node labels (so that, e.g. a 
literal that is used as an object doesn't need to be in-line recorded in the 
indexes, which could quickly bloat the indexes). That dictionary isn't "garbage 
collected", so part of what you are seeing may be the absence of mappings that 
aren't in use. Andy can say more about what might be happening with the indexes 
themselves or how this does or doesn't apply to TDB2.

ajs6f

> On Jun 14, 2018, at 10:00 AM, Mikael Pesonen <[email protected]> 
> wrote:
> 
> 
> Just managed to load using tdbloader2, it even reads the gz file directly. 
> Noticed that new database size on disk is quite a bit smaller:
> 
> payload size: 2.8Gt
> old size on disk: 21Gt
> new size on disk: 3Gt
> 
> So it seems that its good to do cleanup of the db every now and then using 
> the backup?
> 
> 
> 
> On 14.6.2018 16:55, ajs6f wrote:
>> That dataset is just an NQuads file. You can stick it into Fuseki as you 
>> would do with any other NQuads file. You can certainly use tdbloader2, or 
>> you can script individual graph loads using GSP. tdbloader2 will produce an 
>> optimal set of indexes.
>> 
>> ajs6f
>> 
>>> On Jun 14, 2018, at 7:55 AM, Mikael Pesonen <[email protected]> 
>>> wrote:
>>> 
>>> Hi,
>>> 
>>> made backup using Fuseki HTTP Administration Protocol: 
>>> ds_2018-06-14_14-43-32.nq.gz
>>> 
>>> How do I restore it in Linux? Empty existing data and use tdbloader2? How 
>>> exactly?
>>> 
>>> Thank you
> 
> -- 
> Lingsoft - 30 years of Leading Language Management
> 
> www.lingsoft.fi
> 
> Speech Applications - Language Management - Translation - Reader's and 
> Writer's Tools - Text Tools - E-books and M-books
> 
> Mikael Pesonen
> System Engineer
> 
> e-mail: [email protected]
> Tel. +358 2 279 3300
> 
> Time zone: GMT+2
> 
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
> 
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
> 

Reply via email to