Hi Andy,
I have done some testing.
> On 28/01/13 10:21, Emilio Miguelanez wrote:
>>
>> On 27 Jan 2013, at 22:04, Andy Seaborne wrote:
>>
>>> If select * { ?agent
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>
>>>
>>>
> }
>>>
>>> works, it may be your lucky day. The SPO index is intact so
>>> tdbdump will work. Maybe.
>>>
>>> If you have the original data, then rebuilding is much safer.
>>> There may be other problems not yet encountered.
>>
>>
>> This query works .... what should I do now?
>>
>> If I run
>>
>> tdbdump --loc=tdb > tdb.dump (question: tdbdump are tdbbackup
>> are same commands?)
>
> Almost.
>
>> I get same error.
>
> Not your lucky day I'm afraid. The SPO index is damaged. It does however
> look as if another index is intact.
>
>> Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException:
>> RecordRangeIterator: records not strictly increasing:
>> 0000000006d00261000000000000021c0000000006cfff69 //
>> 0000000006b861a3000000000005233d00000000015a78b5
>>
>> I would like to try if the current tdb can be fixed, as rebuilding
>> could take long time. The database was created with minimal data,
>> and it is being populated (dynamically) with data over a long period
>> of time (> 1 year)
>
> SPO is the index used for iteration of the whole database. This can be
> changed.
>
> Is this a database of just triples? No named graphs? So far, the corruption
> looks to be in SPO (an index on the default graph).
The database started with a named graph, and is being populated with triples
over time.
> It will take some programming to fix this. No guarantees that it will work
> but I've experimented here.
>
> Take a backup of the database.
Done
>
> A/ (the second way is better)
I haven't tested this approach.
> If you know all the possible properties, then write code that loops on each
> of the properties and does
>
> defaultGraph.find(null, property, null)
>
> This will use the POS index.
>
> Print everything in N-Triples.
>
> B/ A different, better approach is to build a special version of TDB. The
> changes needed are small but you need to build Jena.
>
> These instructions apply to code in SVN as it is now, today. Not the last
> release, not last week. It's just easier to setup and explain from the
> current code base as a small recent change centralised the point you need to
> change and also introduced an easy to use testing feature.
>
> 1/ svn co the Jena code from trunk.
>
Done
> 2/ Build Jena
> mvn clean install
>
Done
> It is easier to build and install than just package.
>
> You must use the development releases of the other modules.
> I don't think you need to set up maven to use the snapshot builds on Apache
> but if you do:
>
> Set <repository>
> http://jena.apache.org/download/maven.html
>
> 3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the
> code.
Didn't set up maven or use Eclipse.
> 4/ Setup to use this build for tdbdump. e.g. the apache-jena or fuseki.
>
> For added ease - use the Fuseki server jar which as everything in it
>
> java -cp fuseki-server.jar tdb.tdbdump —version
java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump
—version
Jena: VERSION: 2.10.0-SNAPSHOT
Jena: BUILD_DATE: 2013-01-28T21:00:30+0000
ARQ: VERSION: 2.10.0-SNAPSHOT
ARQ: BUILD_DATE: 2013-01-28T21:00:30+0000
TDB: VERSION: 0.10.0-SNAPSHOT
TDB: BUILD_DATE: 2013-01-28T21:00:30+0000
> Check timestamps/version numbers.
>
> 5/ Test create a small text file of a few triples.
>
> --- D.ttl
> @prefix : <http://example/> .
>
> :s1 :p 1 .
> :s2 :p 2 .
> :s3 :q 3 .
> :s2 :q 4 .
> :s1 :q 5 .
>
> ---
>
> tdbdump --data D.ttl should dump the file with triples clustered by subject.
>
> (no - you do not need to load a database - --data is a recent feature for
> testing)
java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump
--data D.ttl
<http://example/s1> <http://example/p>
"1"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s1> <http://example/q>
"5"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s2> <http://example/p>
"2"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s2> <http://example/q>
"4"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s3> <http://example/q>
"3"^^<http://www.w3.org/2001/XMLSchema#integer> .
> 6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method
> "chooseScanAllIndex"
>
> Change:
> -----
> if ( tupleLen != 4 )
> return indexes[0] ;
> ==>
> if ( tupleLen != 4 )
> {
> if ( indexes.length == 3 )
> return indexes[1] ;
> else
> return indexes[0] ;
> }
> -----
>
> 7/ Rebuild.
>
> Yes - the tests for TDB should pass!
>
> 8/ check the new version
>
> tdbdump --version
>
> check the change
>
> tdbdump --data D.ttl
>
> and it should be n-triples clustered by property, different to earlier on.
java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump
--data D.ttl
<http://example/s1> <http://example/p>
"1"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s2> <http://example/p>
"2"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s3> <http://example/q>
"3"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s2> <http://example/q>
"4"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s1> <http://example/q>
"5"^^<http://www.w3.org/2001/XMLSchema#integer> .
Is it what you expect?
>
> 9/ Dump your database.
>
> Hope there is a good index.
It works and no errors were reported, however the size of the dump file is just
84MB, which is considerable smaller than the actual tdb (~1GB)
> You can also try indexes[2] not indexes[1] to use the OSP index.
> Each dumps the entire database, but in different triple orders.
I did also try this changes of indexes, and it gave me the same error
Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException:
RecordRangeIterator: records not strictly increasing:
00000000021aa0a20000000006cffe6b000000000005233d //
00000000021a2c0a0000000006b85f9f000000000005233d
> 10/ Clean up maven to get rid of the temporary build.
>
> rm -r REPO/org/apache/jena/
>
> 11/ Rebuild the database with tdbloader/tdbloader2.
java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbloader
--loc=tdb tdb.dump
but the size of the tdb is smaller than the original tdb
> (the load is slower than if dumped in SPO order)
>
> I tested the change here on that test file - I don't have a large corrupt
> database to try it on.
>
>> Any ideas of how to get it fixed are more than welcome.
>
> Personally, I would adopt a 2 stream approach.
>
> Do approach above and also collect all the data together and start a fresh
> load of the database on another machine.
Doing it already.
Thanks,
Emilio
>
> Good luck
> Andy
>
>>
>> Regards, Emilio
>>
>>
>> -- Emilio Migueláñez Martín [email protected]
>>
>>
>>
>
--
Emilio Migueláñez Martín
[email protected]