Thanks Andy; Indeed I have a little program to populate the database that was missing a transaction.
FYI the rest is a generic RDF manipulation web UI : https://github.com/jmvanel/semantic_forms/blob/master/scala/forms_play/README.md 2014-12-30 13:20 GMT+01:00 Andy Seaborne <[email protected]>: > Hi there, > > The way I have seen this happen is when the database was updated > non-transactionally at some time in the past. > > The node table didn't get flushed to disk when the index was updated. > Because indexes are memory mapped files and the node table is a > non-write-through cache, the index can be ahead of the on-disk node table. > > When you come to read the quads, the index says there is a node by NodeId, > but the system can't retrieve it. A null ends up in the subject used to > create the quad to write in the dump which triggers the consistency check. > > 1. how can I avoid in the first place to insert such bad stuff in TDB >>> ? >>> >> > Transactions. > > Or if bulk loaded, let the bulk loader finish cleanly. > > 2. are there ways to "purge" the database of the bad stuff ? >>> >> > Not in any practical way. > > It would be possible to write a clean-up utility (roughly - fix tdbdump to > recover from errors and reload). Fixing in-place would be possible but it > needs working below the RDF level. > > In both cases, it is going to be quite unpredictable as to what data is > lost. > > Andy > > > On 29/12/14 18:26, Jean-Marc Vanel wrote: > >> Hi >> >> I prepared, hopefully in a reproductible way, >> a TDB instance with a quad having a null subject ( read below ) , and >> tdbdump fails : >> >> Exception in thread "main" java.lang.UnsupportedOperationException: Quad: >> subject cannot be null >> at com.hp.hpl.jena.sparql.core.Quad.<init>(Quad.java:62) >> at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:161) >> at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:152) >> at com.hp.hpl.jena.tdb.lib.TupleLib.access$100(TupleLib.java:44) >> at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:86) >> at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:82) >> at org.apache.jena.atlas.iterator.Iter$4.next(Iter.java:323) >> at org.apache.jena.atlas.iterator.IteratorCons.next(IteratorCons.java:97) >> at >> org.apache.jena.riot.system.StreamOps.sendQuadsToStream( >> StreamOps.java:143) >> at org.apache.jena.riot.writer.NQuadsWriter.write$(NQuadsWriter.java:63) >> at org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:46) >> at org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:92) >> at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:1331) >> at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1205) >> at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1195) >> at tdb.tdbdump.exec(tdbdump.java:50) >> at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102) >> at arq.cmdline.CmdMain.mainRun(CmdMain.java:63) >> at arq.cmdline.CmdMain.mainRun(CmdMain.java:50) >> at tdb.tdbdump.main(*tdbdump*.java:32) >> >> Actually the bad quad has all null, except the predicate that is >> *rdfs:rest* >> . >> Certainly some RDF list was badly constructed by my code, >> but I have 2 questions: >> >> >> 1. how can I avoid in the first place to insert such bad stuff in TDB >> ? >> 2. are there ways to "purge" the database of the bad stuff ? >> >> >> Actually this is just test data in my case, but the question is important >> anyway. >> >> > -- Jean-Marc Vanel Déductions SARL - Consulting, services, training, Rule-based programming, Semantic Web http://deductions-software.com/ +33 (0)6 89 16 29 52 Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
