Re: TDB: records not strictly increasing

Andy Seaborne Mon, 28 Jan 2013 03:23:54 -0800

On 28/01/13 10:21, Emilio Miguelanez wrote:


On 27 Jan 2013, at 22:04, Andy Seaborne wrote:

If select * { ?agent
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>


works, it may be your lucky day.  The SPO index is intact so
tdbdump will work.  Maybe.

If you have the original data, then rebuilding is much safer.
There may be other problems not yet encountered.



This query works .... what should I do now?

If I run

tdbdump --loc=tdb > tdb.dump       (question: tdbdump are tdbbackup
are same commands?)


Almost.

I get same error.

Not your lucky day I'm afraid. The SPO index is damaged. It doeshowever look as if another index is intact.

Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException:
RecordRangeIterator: records not strictly increasing:
0000000006d00261000000000000021c0000000006cfff69 //
0000000006b861a3000000000005233d00000000015a78b5

I would like to try if the current tdb can be fixed, as rebuilding
could take long time. The database was created  with minimal data,
and it is being populated (dynamically) with data over a long period
of time (> 1 year)

SPO is the index used for iteration of the whole database. This can bechanged.

Is this a database of just triples? No named graphs? So far, thecorruption looks to be in SPO (an index on the default graph).

It will take some programming to fix this. No guarantees that it willwork but I've experimented here.


Take a backup of the database.

A/ (the second way is better)

If you know all the possible properties, then write code that loops oneach of the properties and does


   defaultGraph.find(null, property, null)

This will use the POS index.

Print everything in N-Triples.

B/ A different, better approach is to build a special version of TDB.The changes needed are small but you need to build Jena.

These instructions apply to code in SVN as it is now, today. Not thelast release, not last week. It's just easier to setup and explain fromthe current code base as a small recent change centralised the point youneed to change and also introduced an easy to use testing feature.


1/ svn co the Jena code from trunk.

2/ Build Jena
   mvn clean install

It is easier to build and install than just package.

You must use the development releases of the other modules.

I don't think you need to set up maven to use the snapshot builds onApache but if you do:


Set <repository>
http://jena.apache.org/download/maven.html

3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to editthe code.


4/ Setup to use this build for tdbdump.  e.g. the apache-jena or fuseki.

For added ease - use the Fuseki server jar which as everything in it

java -cp fuseki-server.jar tdb.tdbdump --version

Check timestamps/version numbers.

5/ Test create a small text file of a few triples.

--- D.ttl
@prefix : <http://example/> .

:s1 :p 1 .
:s2 :p 2 .
:s3 :q 3 .
:s2 :q 4 .
:s1 :q 5 .

---

tdbdump --data D.ttl should dump the file with triples clustered by subject.

(no - you do not need to load a database - --data is a recent featurefor testing)

6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method"chooseScanAllIndex"


Change:
-----
        if ( tupleLen != 4 )
            return indexes[0] ;
==>
        if ( tupleLen != 4 )
        {
            if ( indexes.length == 3 )
                return indexes[1] ;
            else
                return indexes[0] ;
        }
-----

7/ Rebuild.

Yes - the tests for TDB should pass!

8/ check the new version

tdbdump --version

check the change

tdbdump --data D.ttl

and it should be n-triples clustered by property, different to earlier on.

9/ Dump your database.

Hope there is a good index.

You can also try indexes[2] not indexes[1] to use the OSP index.
Each dumps the entire database, but in different triple orders.

10/ Clean up maven to get rid of the temporary build.

rm -r REPO/org/apache/jena/

11/ Rebuild the database with tdbloader/tdbloader2.

(the load is slower than if dumped in SPO order)

I tested the change here on that test file - I don't have a largecorrupt database to try it on.

Any ideas of how to get it fixed are more than welcome.


Personally, I would adopt a 2 stream approach.

Do approach above and also collect all the data together and start afresh load of the database on another machine.


        Good luck
        Andy


Regards, Emilio


-- Emilio Migueláñez Martín [email protected]

Re: TDB: records not strictly increasing

Reply via email to