On 28/01/13 10:21, Emilio Miguelanez wrote:

On 27 Jan 2013, at 22:04, Andy Seaborne wrote:

If select * { ?agent
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>


}

works, it may be your lucky day.  The SPO index is intact so
tdbdump will work.  Maybe.

If you have the original data, then rebuilding is much safer.
There may be other problems not yet encountered.


This query works .... what should I do now?

If I run

tdbdump --loc=tdb > tdb.dump       (question: tdbdump are tdbbackup
are same commands?)

Almost.

I get same error.

Not your lucky day I'm afraid. The SPO index is damaged. It does however look as if another index is intact.

Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException:
RecordRangeIterator: records not strictly increasing:
0000000006d00261000000000000021c0000000006cfff69 //
0000000006b861a3000000000005233d00000000015a78b5

I would like to try if the current tdb can be fixed, as rebuilding
could take long time. The database was created  with minimal data,
and it is being populated (dynamically) with data over a long period
of time (> 1 year)

SPO is the index used for iteration of the whole database. This can be changed.

Is this a database of just triples? No named graphs? So far, the corruption looks to be in SPO (an index on the default graph).

It will take some programming to fix this. No guarantees that it will work but I've experimented here.

Take a backup of the database.

A/ (the second way is better)
If you know all the possible properties, then write code that loops on each of the properties and does

   defaultGraph.find(null, property, null)

This will use the POS index.

Print everything in N-Triples.

B/ A different, better approach is to build a special version of TDB. The changes needed are small but you need to build Jena.

These instructions apply to code in SVN as it is now, today. Not the last release, not last week. It's just easier to setup and explain from the current code base as a small recent change centralised the point you need to change and also introduced an easy to use testing feature.

1/ svn co the Jena code from trunk.

2/ Build Jena
   mvn clean install

It is easier to build and install than just package.

You must use the development releases of the other modules.
I don't think you need to set up maven to use the snapshot builds on Apache but if you do:

Set <repository>
http://jena.apache.org/download/maven.html

3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the code.

4/ Setup to use this build for tdbdump.  e.g. the apache-jena or fuseki.

For added ease - use the Fuseki server jar which as everything in it

java -cp fuseki-server.jar tdb.tdbdump --version

Check timestamps/version numbers.

5/ Test create a small text file of a few triples.

--- D.ttl
@prefix : <http://example/> .

:s1 :p 1 .
:s2 :p 2 .
:s3 :q 3 .
:s2 :q 4 .
:s1 :q 5 .

---

tdbdump --data D.ttl should dump the file with triples clustered by subject.

(no - you do not need to load a database - --data is a recent feature for testing)

6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method "chooseScanAllIndex"

Change:
-----
        if ( tupleLen != 4 )
            return indexes[0] ;
==>
        if ( tupleLen != 4 )
        {
            if ( indexes.length == 3 )
                return indexes[1] ;
            else
                return indexes[0] ;
        }
-----

7/ Rebuild.

Yes - the tests for TDB should pass!

8/ check the new version

tdbdump --version

check the change

tdbdump --data D.ttl

and it should be n-triples clustered by property, different to earlier on.

9/ Dump your database.

Hope there is a good index.

You can also try indexes[2] not indexes[1] to use the OSP index.
Each dumps the entire database, but in different triple orders.

10/ Clean up maven to get rid of the temporary build.

rm -r REPO/org/apache/jena/

11/ Rebuild the database with tdbloader/tdbloader2.

(the load is slower than if dumped in SPO order)

I tested the change here on that test file - I don't have a large corrupt database to try it on.

Any ideas of how to get it fixed are more than welcome.

Personally, I would adopt a 2 stream approach.

Do approach above and also collect all the data together and start a fresh load of the database on another machine.

        Good luck
        Andy


Regards, Emilio


-- Emilio Migueláñez Martín [email protected]




Reply via email to