Re: Ideas for an efficient TDB check?

Paolo Castagna Fri, 25 May 2012 10:42:51 -0700

Hi André,
I know exactly how you feel and I had exactly the same need at times.

How you know if your TDB indexes are all fine?

Add the work 'production' to it and everything becomes more 'fun'. :-)
Fortunately, we use replication and have the ability to replay updates going
back as much as we want/need. This makes things more 'relaxing'. But, this is
not the answer you are searching for right now.

I do not have *the* answer for you nor a tool, but in the past I've done
something similar to what you suggested, a sort of TDB index verifier/health
checker. Here [1], it's just a quick and dirty solution (not scalable... it
keeps stuff in memory, etc.). But, perhaps, it provides you with ideas.

If a TDB health checking utility is useful and feasible, we should probably open
a JIRA issue for it and gather ideas on how to best implement this. It should
not be too much work.

You are still using TDB 0.8.10, but on-disk format hasn't changed... so it's
reasonable to expect such functionality would work with your indexes as well.

My 2 cents,
Paolo

 [1]
https://github.com/castagna/tdbloader4/blob/f5363fa49d16a04a362898c1a5084ade620ee81b/src/test/java/dev/TDBVerifier.java

Dr. André Lanka wrote:
> Hello Jena-Users,
> 
> we are using Jena+TDB in production and are looking for an efficient
> method to check the validity of the TDB files on disk.
> 
> Our situation is as follows.
> 
> With Jena 2.6.4 and TDB 0.8.10 each of our servers stores triples in up
> to 4000 different TDB stores stored on its local hard drive. On average
> each store owns 1 million triples (with high variance). To get our
> system working fluently, we need massive parallel write access to the
> different stores, so one huge named graph is no alternative. Also we
> need to have all stores open and accessible.
> 
> In order to get that large number of TDB stores opened in parallel, we
> customised the TDB code for our needs. For instance we introduced read
> caches shared between all stores (to avoid memory problems). Also we
> introduced basic capabilities to roll back transactions. (We took
> control over all data read from or written to ObjectFile and BlockMgr).
> 
> So, in our situation we can't switch to the new TDB version over night.
> 
> Now, the problem is that we had some disk issues a few days ago and want
> to check which stores have got broken (We know some of them are broken).
> 
> Our initial idea is to iterate over all statements in the store and
> collect any S, P and O used in the store. Second step would be to check
> if any such URI is correctly mapped to an nodeID. And the other way round.
> 
> Unfortunately we are not sure, if this will cover any possible file
> problem. Also, we think there could be a more efficient way to check the
> internal data structures.
> 
> 
> So, any idea (both high and low level) is highly appreciated.
> 
> 
> Thanks in advance
> André
>

Re: Ideas for an efficient TDB check?

Reply via email to