On 10/10/13 10:37, Daniel Gerber wrote:
Hi,
> I'm importing 20Mb of data every day into a Jena TDB store.
Before insertion, I'm deleting everything (model.removeAll()). But I
noticed that the size of the index does not shrink, it even increases
every day (it's now at 11GB and soon will hit physical limits). I
found this question [1] on stack overflow but could not find any
mailing list entry (so sorry for re-asking this question). Is there
any way, except deletion, to reduce the size of a Jena TDB
directory/index.
Cheers, Daniel
[1]
http://stackoverflow.com/questions/11088082/how-to-reduce-the-size-of-the-tdb-backed-jena-dataset
Daniel,
Your question is a good one - the ful answer depends on the details of
your setup though.
The indexes won't shrink - TDB never gives disk space back to the OS -
but disk space is reused when reallocated within the same JVM. If you
are deleting, stopping, restarting (hence different JVMs), then there
can be memory leaks but it sounds like this is not the case here as the
"leak" in that case can be most of the database and you'd notice!
The other issue is blank nodes - does your data have a significant
amount of blank nodes? If so, each load is creating new blank nodes.
Nodes are not garbaged collected so old blank nodes (and unused URIs and
literals) remain in the node table.
If you are clearing out an entire database, then closing the database
(and removing from the StoreConnection manager), deleting the files,
then loading, which can be by bulk loader, may work for you.
[[
Except on MSWindows64, where it is not possible to delete memory files
while the JVM is running (they don't get deleted).
]]
Andy