Hi there,

The way I have seen this happen is when the database was updated non-transactionally at some time in the past.

The node table didn't get flushed to disk when the index was updated. Because indexes are memory mapped files and the node table is a non-write-through cache, the index can be ahead of the on-disk node table.

When you come to read the quads, the index says there is a node by NodeId, but the system can't retrieve it. A null ends up in the subject used to create the quad to write in the dump which triggers the consistency check.

    1. how can I avoid in the first place to insert such bad stuff in TDB ?

Transactions.

Or if bulk loaded, let the bulk loader finish cleanly.

    2. are there ways to "purge" the database of the bad stuff ?

Not in any practical way.

It would be possible to write a clean-up utility (roughly - fix tdbdump to recover from errors and reload). Fixing in-place would be possible but it needs working below the RDF level.

In both cases, it is going to be quite unpredictable as to what data is lost.

        Andy

On 29/12/14 18:26, Jean-Marc Vanel wrote:
Hi

I prepared, hopefully in a reproductible way,
a TDB instance with a quad having a null subject ( read below ) , and
tdbdump fails :

Exception in thread "main" java.lang.UnsupportedOperationException: Quad:
subject cannot be null
at com.hp.hpl.jena.sparql.core.Quad.<init>(Quad.java:62)
at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:161)
at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:152)
at com.hp.hpl.jena.tdb.lib.TupleLib.access$100(TupleLib.java:44)
at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:86)
at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:82)
at org.apache.jena.atlas.iterator.Iter$4.next(Iter.java:323)
at org.apache.jena.atlas.iterator.IteratorCons.next(IteratorCons.java:97)
at
org.apache.jena.riot.system.StreamOps.sendQuadsToStream(StreamOps.java:143)
at org.apache.jena.riot.writer.NQuadsWriter.write$(NQuadsWriter.java:63)
at org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:46)
at org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:92)
at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:1331)
at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1205)
at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1195)
at tdb.tdbdump.exec(tdbdump.java:50)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at tdb.tdbdump.main(*tdbdump*.java:32)

Actually the bad quad has all null, except the predicate that is *rdfs:rest*
.
Certainly some RDF list was badly constructed by my code,
but I have 2 questions:


    1. how can I avoid in the first place to insert such bad stuff in TDB ?
    2. are there ways to "purge" the database of the bad stuff ?


Actually this is just test data in my case, but the question is important
anyway.


Reply via email to