Andy,
I have run a couple of query tests, to see if I get same number of triples on
both instances of tdb
select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>}
select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Job>}
select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Symptom>}
So far, same results and so loss of data
I have also run the query that generates StorageException:RecordRangeIterator
error, and it works fine in the new tdb.
So, should I assume that the new tdb is fixed now? Should I run any other tests?
Using transactions would prevent for this problem to happen again? (Remember, I
was using version 0.8.10)
Thanks for your help.
Regards,
Emilio
On 29 Jan 2013, at 08:26, Andy Seaborne wrote:
>
>>> B/ A different, better approach is to build a special version of TDB. The
>>> changes needed are small but you need to build Jena.
>>>
>>> These instructions apply to code in SVN as it is now, today. Not the last
>>> release, not last week. It's just easier to setup and explain from the
>>> current code base as a small recent change centralised the point you need
>>> to change and also introduced an easy to use testing feature.
>>>
>>> 1/ svn co the Jena code from trunk.
>>>
>> Done
>>> 2/ Build Jena
>>> mvn clean install
>>>
>> Done
>>> It is easier to build and install than just package.
>>>
>>> You must use the development releases of the other modules.
>>> I don't think you need to set up maven to use the snapshot builds on Apache
>>> but if you do:
>>>
>>> Set <repository>
>>> http://jena.apache.org/download/maven.html
>>>
>>> 3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the
>>> code.
>> Didn't set up maven or use Eclipse.
>>
>>> 4/ Setup to use this build for tdbdump. e.g. the apache-jena or fuseki.
>>>
>>> For added ease - use the Fuseki server jar which as everything in it
>>>
>>> java -cp fuseki-server.jar tdb.tdbdump —version
>>
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar
>> tdb.tdbdump —version
>>
>> Jena: VERSION: 2.10.0-SNAPSHOT
>> Jena: BUILD_DATE: 2013-01-28T21:00:30+0000
>> ARQ: VERSION: 2.10.0-SNAPSHOT
>> ARQ: BUILD_DATE: 2013-01-28T21:00:30+0000
>> TDB: VERSION: 0.10.0-SNAPSHOT
>> TDB: BUILD_DATE: 2013-01-28T21:00:30+0000
>>
>>> Check timestamps/version numbers.
>>>
>>> 5/ Test create a small text file of a few triples.
>>>
>>> --- D.ttl
>>> @prefix : <http://example/> .
>>>
>>> :s1 :p 1 .
>>> :s2 :p 2 .
>>> :s3 :q 3 .
>>> :s2 :q 4 .
>>> :s1 :q 5 .
>>>
>>> ---
>>>
>>> tdbdump --data D.ttl should dump the file with triples clustered by subject.
>>>
>>> (no - you do not need to load a database - --data is a recent feature for
>>> testing)
>>
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar
>> tdb.tdbdump --data D.ttl
>> <http://example/s1> <http://example/p>
>> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s1> <http://example/q>
>> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/p>
>> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/q>
>> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s3> <http://example/q>
>> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>
>>> 6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method
>>> "chooseScanAllIndex"
>>>
>>> Change:
>>> -----
>>> if ( tupleLen != 4 )
>>> return indexes[0] ;
>>> ==>
>>> if ( tupleLen != 4 )
>>> {
>>> if ( indexes.length == 3 )
>>> return indexes[1] ;
>>> else
>>> return indexes[0] ;
>>> }
>>> -----
>>>
>>> 7/ Rebuild.
>>>
>>> Yes - the tests for TDB should pass!
>>>
>>> 8/ check the new version
>>>
>>> tdbdump --version
>>>
>>> check the change
>>>
>>> tdbdump --data D.ttl
>>>
>>> and it should be n-triples clustered by property, different to earlier on.
>>
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar
>> tdb.tdbdump --data D.ttl
>> <http://example/s1> <http://example/p>
>> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/p>
>> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s3> <http://example/q>
>> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/q>
>> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s1> <http://example/q>
>> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>
>> Is it what you expect?
>
> Yes.
>
>>
>>>
>>> 9/ Dump your database.
>>>
>>> Hope there is a good index.
>>
>> It works and no errors were reported, however the size of the dump file is
>> just 84MB, which is considerable smaller than the actual tdb (~1GB)
>
> Quite possible - especially if you have also been deleting stuff in the
> database as well as adding.
>
>>
>>> You can also try indexes[2] not indexes[1] to use the OSP index.
>>> Each dumps the entire database, but in different triple orders.
>>
>> I did also try this changes of indexes, and it gave me the same error
>>
>> Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException:
>> RecordRangeIterator: records not strictly increasing:
>> 00000000021aa0a20000000006cffe6b000000000005233d //
>> 00000000021a2c0a0000000006b85f9f000000000005233d
>
> The OSP index is also broken.
>
>>
>>> 10/ Clean up maven to get rid of the temporary build.
>>>
>>> rm -r REPO/org/apache/jena/
>>>
>>> 11/ Rebuild the database with tdbloader/tdbloader2.
>>
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar
>> tdb.tdbloader --loc=tdb tdb.dump
>>
>> but the size of the tdb is smaller than the original tdb
>
> The loader produces more compact indexes than if the data has been loaded
> incrementally. This is even more the case for tdblaoder2.
>
> Also if you have been deleting and adding, for 0.8, then the database can
> grow. This is addressed, but not totlally fixed in 0.9.X
>
>>> (the load is slower than if dumped in SPO order)
>>>
>>> I tested the change here on that test file - I don't have a large corrupt
>>> database to try it on.
>>>
>>>> Any ideas of how to get it fixed are more than welcome.
>>>
>>> Personally, I would adopt a 2 stream approach.
>>>
>>> Do approach above and also collect all the data together and start a fresh
>>> load of the database on another machine.
>>
>> Doing it already.
>
> Andy
>
>>
>> Thanks,
>> Emilio
>>
>>>
>>> Good luck
>>> Andy
>>>
>>>>
>>>> Regards, Emilio
>>>>
>>>>
>>>> -- Emilio Migueláñez Martín [email protected]
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Emilio Migueláñez Martín
>> [email protected]
>>
>>
>
--
Emilio Migueláñez Martín
[email protected]