Hello Jena-Users, we are using Jena+TDB in production and are looking for an efficient method to check the validity of the TDB files on disk.
Our situation is as follows. With Jena 2.6.4 and TDB 0.8.10 each of our servers stores triples in up to 4000 different TDB stores stored on its local hard drive. On average each store owns 1 million triples (with high variance). To get our system working fluently, we need massive parallel write access to the different stores, so one huge named graph is no alternative. Also we need to have all stores open and accessible. In order to get that large number of TDB stores opened in parallel, we customised the TDB code for our needs. For instance we introduced read caches shared between all stores (to avoid memory problems). Also we introduced basic capabilities to roll back transactions. (We took control over all data read from or written to ObjectFile and BlockMgr). So, in our situation we can't switch to the new TDB version over night. Now, the problem is that we had some disk issues a few days ago and want to check which stores have got broken (We know some of them are broken). Our initial idea is to iterate over all statements in the store and collect any S, P and O used in the store. Second step would be to check if any such URI is correctly mapped to an nodeID. And the other way round. Unfortunately we are not sure, if this will cover any possible file problem. Also, we think there could be a more efficient way to check the internal data structures. So, any idea (both high and low level) is highly appreciated. Thanks in advance André -- Dr. André Lanka * 0178 / 134 44 47 * http://dr-lanka.de
