Re: TDB: records not strictly increasing

Emilio Miguelanez Tue, 29 Jan 2013 02:43:01 -0800

Andy,

I have run a couple of query tests, to see if I get same number of triples on 
both instances of tdb


select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>}
select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Job>}
select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Symptom>}

So far, same results and so loss of data

I have also run the query that generates StorageException:RecordRangeIterator 
error, and it works fine in the new tdb.

So, should I assume that the new tdb is fixed now? Should I run any other tests?

Using transactions would prevent for this problem to happen again? (Remember, I 
was using version 0.8.10)

Thanks for your help.

Regards,
Emilio

On 29 Jan 2013, at 08:26, Andy Seaborne wrote:

> 
>>> B/ A different, better approach is to build a special version of TDB. The 
>>> changes needed are small but you need to build Jena.
>>> 
>>> These instructions apply to code in SVN as it is now, today.  Not the last 
>>> release, not last week.  It's just easier to setup and explain from the 
>>> current code base as a small recent change centralised the point you need 
>>> to change and also introduced an easy to use testing feature.
>>> 
>>> 1/ svn co the Jena code from trunk.
>>> 
>> Done
>>> 2/ Build Jena
>>>   mvn clean install
>>> 
>> Done
>>> It is easier to build and install than just package.
>>> 
>>> You must use the development releases of the other modules.
>>> I don't think you need to set up maven to use the snapshot builds on Apache 
>>> but if you do:
>>> 
>>> Set <repository>
>>> http://jena.apache.org/download/maven.html
>>> 
>>> 3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the 
>>> code.
>> Didn't set up maven or use Eclipse.
>> 
>>> 4/ Setup to use this build for tdbdump.  e.g. the apache-jena or fuseki.
>>> 
>>> For added ease - use the Fuseki server jar which as everything in it
>>> 
>>> java -cp fuseki-server.jar tdb.tdbdump —version
>> 
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar 
>> tdb.tdbdump —version
>> 
>> Jena:       VERSION: 2.10.0-SNAPSHOT
>> Jena:       BUILD_DATE: 2013-01-28T21:00:30+0000
>> ARQ:        VERSION: 2.10.0-SNAPSHOT
>> ARQ:        BUILD_DATE: 2013-01-28T21:00:30+0000
>> TDB:        VERSION: 0.10.0-SNAPSHOT
>> TDB:        BUILD_DATE: 2013-01-28T21:00:30+0000
>> 
>>> Check timestamps/version numbers.
>>> 
>>> 5/ Test create a small text file of a few triples.
>>> 
>>> --- D.ttl
>>> @prefix : <http://example/> .
>>> 
>>> :s1 :p 1 .
>>> :s2 :p 2 .
>>> :s3 :q 3 .
>>> :s2 :q 4 .
>>> :s1 :q 5 .
>>> 
>>> ---
>>> 
>>> tdbdump --data D.ttl should dump the file with triples clustered by subject.
>>> 
>>> (no - you do not need to load a database - --data is a recent feature for 
>>> testing)
>> 
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar 
>> tdb.tdbdump --data D.ttl
>> <http://example/s1> <http://example/p> 
>> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s1> <http://example/q> 
>> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/p> 
>> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/q> 
>> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s3> <http://example/q> 
>> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> 
>>> 6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method 
>>> "chooseScanAllIndex"
>>> 
>>> Change:
>>> -----
>>>        if ( tupleLen != 4 )
>>>            return indexes[0] ;
>>> ==>
>>>        if ( tupleLen != 4 )
>>>        {
>>>            if ( indexes.length == 3 )
>>>                return indexes[1] ;
>>>            else
>>>                return indexes[0] ;
>>>        }
>>> -----
>>> 
>>> 7/ Rebuild.
>>> 
>>> Yes - the tests for TDB should pass!
>>> 
>>> 8/ check the new version
>>> 
>>> tdbdump --version
>>> 
>>> check the change
>>> 
>>> tdbdump --data D.ttl
>>> 
>>> and it should be n-triples clustered by property, different to earlier on.
>> 
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar 
>> tdb.tdbdump --data D.ttl
>> <http://example/s1> <http://example/p> 
>> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/p> 
>> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s3> <http://example/q> 
>> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/q> 
>> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s1> <http://example/q> 
>> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> 
>> Is it what you expect?
> 
> Yes.
> 
>> 
>>> 
>>> 9/ Dump your database.
>>> 
>>> Hope there is a good index.
>> 
>> It works and no errors were reported, however the size of the dump file is 
>> just 84MB, which is considerable smaller than the actual tdb (~1GB)
> 
> Quite possible - especially if you have also been deleting stuff in the 
> database as well as adding.
> 
>> 
>>> You can also try indexes[2] not indexes[1] to use the OSP index.
>>> Each dumps the entire database, but in different triple orders.
>> 
>> I did also try this changes of indexes, and it gave me the same error
>> 
>> Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException: 
>> RecordRangeIterator: records not strictly increasing: 
>> 00000000021aa0a20000000006cffe6b000000000005233d // 
>> 00000000021a2c0a0000000006b85f9f000000000005233d
> 
> The OSP index is also broken.
> 
>> 
>>> 10/ Clean up maven to get rid of the temporary build.
>>> 
>>> rm -r REPO/org/apache/jena/
>>> 
>>> 11/ Rebuild the database with tdbloader/tdbloader2.
>> 
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar 
>> tdb.tdbloader --loc=tdb tdb.dump
>> 
>> but the size of the tdb is smaller than the original tdb
> 
> The loader produces more compact indexes than if the data has been loaded 
> incrementally.  This is even more the case for tdblaoder2.
> 
> Also if you have been deleting and adding, for 0.8, then the database can 
> grow.  This is addressed, but not totlally fixed in 0.9.X
> 
>>> (the load is slower than if dumped in SPO order)
>>> 
>>> I tested the change here on that test file - I don't have a large corrupt 
>>> database to try it on.
>>> 
>>>> Any ideas of how to get it fixed are more than welcome.
>>> 
>>> Personally, I would adopt a 2 stream approach.
>>> 
>>> Do approach above and also collect all the data together and start a fresh 
>>> load of the database on another machine.
>> 
>> Doing it already.
> 
>       Andy
> 
>> 
>> Thanks,
>> Emilio
>> 
>>> 
>>>     Good luck
>>>     Andy
>>> 
>>>> 
>>>> Regards, Emilio
>>>> 
>>>> 
>>>> -- Emilio Migueláñez Martín [email protected]
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> --
>> Emilio Migueláñez Martín
>> [email protected]
>> 
>> 
> 

--
Emilio Migueláñez Martín
[email protected]

Re: TDB: records not strictly increasing

Reply via email to