Hi André, I know exactly how you feel and I had exactly the same need at times.
How you know if your TDB indexes are all fine? Add the work 'production' to it and everything becomes more 'fun'. :-) Fortunately, we use replication and have the ability to replay updates going back as much as we want/need. This makes things more 'relaxing'. But, this is not the answer you are searching for right now. I do not have *the* answer for you nor a tool, but in the past I've done something similar to what you suggested, a sort of TDB index verifier/health checker. Here [1], it's just a quick and dirty solution (not scalable... it keeps stuff in memory, etc.). But, perhaps, it provides you with ideas. If a TDB health checking utility is useful and feasible, we should probably open a JIRA issue for it and gather ideas on how to best implement this. It should not be too much work. You are still using TDB 0.8.10, but on-disk format hasn't changed... so it's reasonable to expect such functionality would work with your indexes as well. My 2 cents, Paolo [1] https://github.com/castagna/tdbloader4/blob/f5363fa49d16a04a362898c1a5084ade620ee81b/src/test/java/dev/TDBVerifier.java Dr. André Lanka wrote: > Hello Jena-Users, > > we are using Jena+TDB in production and are looking for an efficient > method to check the validity of the TDB files on disk. > > Our situation is as follows. > > With Jena 2.6.4 and TDB 0.8.10 each of our servers stores triples in up > to 4000 different TDB stores stored on its local hard drive. On average > each store owns 1 million triples (with high variance). To get our > system working fluently, we need massive parallel write access to the > different stores, so one huge named graph is no alternative. Also we > need to have all stores open and accessible. > > In order to get that large number of TDB stores opened in parallel, we > customised the TDB code for our needs. For instance we introduced read > caches shared between all stores (to avoid memory problems). Also we > introduced basic capabilities to roll back transactions. (We took > control over all data read from or written to ObjectFile and BlockMgr). > > So, in our situation we can't switch to the new TDB version over night. > > Now, the problem is that we had some disk issues a few days ago and want > to check which stores have got broken (We know some of them are broken). > > Our initial idea is to iterate over all statements in the store and > collect any S, P and O used in the store. Second step would be to check > if any such URI is correctly mapped to an nodeID. And the other way round. > > Unfortunately we are not sure, if this will cover any possible file > problem. Also, we think there could be a more efficient way to check the > internal data structures. > > > So, any idea (both high and low level) is highly appreciated. > > > Thanks in advance > André >