Re: TDB instance corrupted by nulls

Jean-Marc Vanel Tue, 30 Dec 2014 13:00:29 -0800

Thanks Andy;

Indeed I have a little program to populate the database that was missing a
transaction.


FYI the rest is a generic RDF manipulation web UI :
https://github.com/jmvanel/semantic_forms/blob/master/scala/forms_play/README.md


2014-12-30 13:20 GMT+01:00 Andy Seaborne <[email protected]>:

> Hi there,
>
> The way I have seen this happen is when the database was updated
> non-transactionally at some time in the past.
>
> The node table didn't get flushed to disk when the index was updated.
> Because indexes are memory mapped files and the node table is a
> non-write-through cache, the index can be ahead of the on-disk node table.
>
> When you come to read the quads, the index says there is a node by NodeId,
> but the system can't retrieve it.  A null ends up in the subject used to
> create the quad to write in the dump which triggers the consistency check.
>
>      1. how can I avoid in the first place to insert such bad stuff in TDB
>>> ?
>>>
>>
> Transactions.
>
> Or if bulk loaded, let the bulk loader finish cleanly.
>
>      2. are there ways to "purge" the database of the bad stuff ?
>>>
>>
> Not in any practical way.
>
> It would be possible to write a clean-up utility (roughly - fix tdbdump to
> recover from errors and reload). Fixing in-place would be possible but it
> needs working below the RDF level.
>
> In both cases, it is going to be quite unpredictable as to what data is
> lost.
>
>         Andy
>
>
> On 29/12/14 18:26, Jean-Marc Vanel wrote:
>
>> Hi
>>
>> I prepared, hopefully in a reproductible way,
>> a TDB instance with a quad having a null subject ( read below ) , and
>> tdbdump fails :
>>
>> Exception in thread "main" java.lang.UnsupportedOperationException: Quad:
>> subject cannot be null
>> at com.hp.hpl.jena.sparql.core.Quad.<init>(Quad.java:62)
>> at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:161)
>> at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:152)
>> at com.hp.hpl.jena.tdb.lib.TupleLib.access$100(TupleLib.java:44)
>> at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:86)
>> at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:82)
>> at org.apache.jena.atlas.iterator.Iter$4.next(Iter.java:323)
>> at org.apache.jena.atlas.iterator.IteratorCons.next(IteratorCons.java:97)
>> at
>> org.apache.jena.riot.system.StreamOps.sendQuadsToStream(
>> StreamOps.java:143)
>> at org.apache.jena.riot.writer.NQuadsWriter.write$(NQuadsWriter.java:63)
>> at org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:46)
>> at org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:92)
>> at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:1331)
>> at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1205)
>> at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1195)
>> at tdb.tdbdump.exec(tdbdump.java:50)
>> at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
>> at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
>> at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
>> at tdb.tdbdump.main(*tdbdump*.java:32)
>>
>> Actually the bad quad has all null, except the predicate that is
>> *rdfs:rest*
>> .
>> Certainly some RDF list was badly constructed by my code,
>> but I have 2 questions:
>>
>>
>>     1. how can I avoid in the first place to insert such bad stuff in TDB
>> ?
>>     2. are there ways to "purge" the database of the bad stuff ?
>>
>>
>> Actually this is just test data in my case, but the question is important
>> anyway.
>>
>>
>


-- 
Jean-Marc Vanel
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
http://deductions-software.com/
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: TDB instance corrupted by nulls

Reply via email to