Hello Jena-Users,

we are using Jena+TDB in production and are looking for an efficient
method to check the validity of the TDB files on disk.

Our situation is as follows.

With Jena 2.6.4 and TDB 0.8.10 each of our servers stores triples in up
to 4000 different TDB stores stored on its local hard drive. On average
each store owns 1 million triples (with high variance). To get our
system working fluently, we need massive parallel write access to the
different stores, so one huge named graph is no alternative. Also we
need to have all stores open and accessible.

In order to get that large number of TDB stores opened in parallel, we
customised the TDB code for our needs. For instance we introduced read
caches shared between all stores (to avoid memory problems). Also we
introduced basic capabilities to roll back transactions. (We took
control over all data read from or written to ObjectFile and BlockMgr).

So, in our situation we can't switch to the new TDB version over night.

Now, the problem is that we had some disk issues a few days ago and want
to check which stores have got broken (We know some of them are broken).

Our initial idea is to iterate over all statements in the store and
collect any S, P and O used in the store. Second step would be to check
if any such URI is correctly mapped to an nodeID. And the other way round.

Unfortunately we are not sure, if this will cover any possible file
problem. Also, we think there could be a more efficient way to check the
internal data structures.


So, any idea (both high and low level) is highly appreciated.


Thanks in advance
André

-- 
Dr. André Lanka  *  0178 / 134 44 47  *  http://dr-lanka.de

Reply via email to