On 28/01/13 10:21, Emilio Miguelanez wrote:
On 27 Jan 2013, at 22:04, Andy Seaborne wrote:
If select * { ?agent
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>
}
works, it may be your lucky day. The SPO index is intact so
tdbdump will work. Maybe.
If you have the original data, then rebuilding is much safer.
There may be other problems not yet encountered.
This query works .... what should I do now?
If I run
tdbdump --loc=tdb > tdb.dump (question: tdbdump are tdbbackup
are same commands?)
Almost.
I get same error.
Not your lucky day I'm afraid. The SPO index is damaged. It does
however look as if another index is intact.
Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException:
RecordRangeIterator: records not strictly increasing:
0000000006d00261000000000000021c0000000006cfff69 //
0000000006b861a3000000000005233d00000000015a78b5
I would like to try if the current tdb can be fixed, as rebuilding
could take long time. The database was created with minimal data,
and it is being populated (dynamically) with data over a long period
of time (> 1 year)
SPO is the index used for iteration of the whole database. This can be
changed.
Is this a database of just triples? No named graphs? So far, the
corruption looks to be in SPO (an index on the default graph).
It will take some programming to fix this. No guarantees that it will
work but I've experimented here.
Take a backup of the database.
A/ (the second way is better)
If you know all the possible properties, then write code that loops on
each of the properties and does
defaultGraph.find(null, property, null)
This will use the POS index.
Print everything in N-Triples.
B/ A different, better approach is to build a special version of TDB.
The changes needed are small but you need to build Jena.
These instructions apply to code in SVN as it is now, today. Not the
last release, not last week. It's just easier to setup and explain from
the current code base as a small recent change centralised the point you
need to change and also introduced an easy to use testing feature.
1/ svn co the Jena code from trunk.
2/ Build Jena
mvn clean install
It is easier to build and install than just package.
You must use the development releases of the other modules.
I don't think you need to set up maven to use the snapshot builds on
Apache but if you do:
Set <repository>
http://jena.apache.org/download/maven.html
3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit
the code.
4/ Setup to use this build for tdbdump. e.g. the apache-jena or fuseki.
For added ease - use the Fuseki server jar which as everything in it
java -cp fuseki-server.jar tdb.tdbdump --version
Check timestamps/version numbers.
5/ Test create a small text file of a few triples.
--- D.ttl
@prefix : <http://example/> .
:s1 :p 1 .
:s2 :p 2 .
:s3 :q 3 .
:s2 :q 4 .
:s1 :q 5 .
---
tdbdump --data D.ttl should dump the file with triples clustered by subject.
(no - you do not need to load a database - --data is a recent feature
for testing)
6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method
"chooseScanAllIndex"
Change:
-----
if ( tupleLen != 4 )
return indexes[0] ;
==>
if ( tupleLen != 4 )
{
if ( indexes.length == 3 )
return indexes[1] ;
else
return indexes[0] ;
}
-----
7/ Rebuild.
Yes - the tests for TDB should pass!
8/ check the new version
tdbdump --version
check the change
tdbdump --data D.ttl
and it should be n-triples clustered by property, different to earlier on.
9/ Dump your database.
Hope there is a good index.
You can also try indexes[2] not indexes[1] to use the OSP index.
Each dumps the entire database, but in different triple orders.
10/ Clean up maven to get rid of the temporary build.
rm -r REPO/org/apache/jena/
11/ Rebuild the database with tdbloader/tdbloader2.
(the load is slower than if dumped in SPO order)
I tested the change here on that test file - I don't have a large
corrupt database to try it on.
Any ideas of how to get it fixed are more than welcome.
Personally, I would adopt a 2 stream approach.
Do approach above and also collect all the data together and start a
fresh load of the database on another machine.
Good luck
Andy
Regards, Emilio
-- Emilio Migueláñez Martín [email protected]