Ok, well compacting by using dump is not a problem.
Another question related to backup: when backing up disk image and all
jena data files as is, possible data corruption is backed up too without
warning. But if exporting data with Fuseki's built-in backup and saving
that, does Fuseki give error when database is corrupted? So in that case
the previous backed up data dump could be restored.
I guess an error message from trying to dump corrupted database is the
requirement for making it more useful than standard image backup.
On 14.6.2018 19:48, Andy Seaborne wrote:
Inside a TDB2 directory, you'll see "Data-0001". That's the first
database. TDB2 has a "compact" operation which would create
"Data-0002" etc and after that Data-0001 is not used and never
touched. Delete or archive as you choose.
It's simple at the moment - a copy of the database so much like
backup-restore except it can happen to a running database (writers are
locked out, readers can continue until the switchover point). Plenty
of scope to make more efficient.
Compaction is not available from Fuseki yet.
Andy
On 14/06/18 15:11, ajs6f wrote:
Yes, there is some truth in that.
TDB1 uses a dictionary that maps node IDs to node labels (so that,
e.g. a literal that is used as an object doesn't need to be in-line
recorded in the indexes, which could quickly bloat the indexes). That
dictionary isn't "garbage collected", so part of what you are seeing
may be the absence of mappings that aren't in use. Andy can say more
about what might be happening with the indexes themselves or how this
does or doesn't apply to TDB2.
ajs6f
On Jun 14, 2018, at 10:00 AM, Mikael Pesonen
<[email protected]> wrote:
Just managed to load using tdbloader2, it even reads the gz file
directly. Noticed that new database size on disk is quite a bit
smaller:
payload size: 2.8Gt
old size on disk: 21Gt
new size on disk: 3Gt
So it seems that its good to do cleanup of the db every now and then
using the backup?
On 14.6.2018 16:55, ajs6f wrote:
That dataset is just an NQuads file. You can stick it into Fuseki
as you would do with any other NQuads file. You can certainly use
tdbloader2, or you can script individual graph loads using GSP.
tdbloader2 will produce an optimal set of indexes.
ajs6f
On Jun 14, 2018, at 7:55 AM, Mikael Pesonen
<[email protected]> wrote:
Hi,
made backup using Fuseki HTTP Administration Protocol:
ds_2018-06-14_14-43-32.nq.gz
How do I restore it in Linux? Empty existing data and use
tdbloader2? How exactly?
Thank you
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's
and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's
Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND