Hi Donald,
The bulk loader tdbloader is not transactional and if aborted part way
through, the database is suspect. You *may* find that deleting the
prefix tables sorts things but but there is a good chance the triple
indexes or node table is broken as well.
TDB has two bulk loader - tdbloader2 can be faster for larger datasets.
Whatever one you use, more RAM (not java heap) improves performance of
loading.
Both bulk loaders can only do anything specail if the database is
initially empty. tdbloader2 simply refuses to load an existing
database, tdbloader,
You can give all the files to load in a single command to load multiple
files into an empty database.
I tried downloading PubChem but failed (the server didn't like the buylk
loader I was using). Are you loading the whole thing? Which is about
1.6 billion triples? You will need a large RAM machine to use TDB.
Having an SSD makes a big difference.
Once you have a loaded the database, you can move it to another machine
by simply copying the directory when no program is connected to the
database.
Andy
On 24/07/15 15:40, Pellegrino, Donald (DA) wrote:
I attempted to load the NCBI PubChem RDF Compound data
(https://pubchem.ncbi.nlm.nih.gov/rdf/#_Toc421254632) into an Apache Jena TDB database.
Given 18 hours, the load of PubChem RDF Compound data was only 12/109 .ttl.gz files (11%)
complete. Therefore, I hit CTRL+C to cancel the tdbloader operation and try other
approaches. Unfortunately, now when I try to run tdbloader I get "WARN
DatasetPrefixesTDB :: Mangled prefix map: graph name=" followed by a
java.lang.NullPointerException. Partial tdbloader error output is below.
Please let me know if you have any suggestions for debugging this error.
---
tdbloader --verbose --loc=/home/irkmoo/reactionsdb/ pc_compound_type.ttl.gz
Java maximum memory: 1029177344
symbol:http://jena.hpl.hp.com/ARQ#constantBNodeLabels = true
symbol:http://jena.hpl.hp.com/ARQ#regexImpl =
symbol:http://jena.hpl.hp.com/ARQ#javaRegex
symbol:http://jena.hpl.hp.com/ARQ#stageGenerator =
com.hp.hpl.jena.tdb.solver.StageGeneratorDirectTDB@18078bef
symbol:http://jena.hpl.hp.com/ARQ#strictSPARQL = false
symbol:http://jena.hpl.hp.com/ARQ#enablePropertyFunctions = true
10:26:22 INFO loader :: -- Start triples data phase
10:26:22 INFO loader :: ** Load into triples table with existing
data
10:26:22 INFO loader :: -- Start quads data phase
10:26:22 INFO loader :: ** Load empty quads table
10:26:22 INFO loader :: Load: pc_compound_type.ttl.gz --
2015/07/24 10:26:22 EDT
10:26:22 WARN DatasetPrefixesTDB :: Mangled prefix map: graph name=
java.lang.NullPointerException
at
com.hp.hpl.jena.tdb.store.DatasetPrefixesTDB.readPrefixMap(DatasetPrefixesTDB.java:119)
at
com.hp.hpl.jena.sparql.graph.GraphPrefixesProjection.getNsPrefixMap(GraphPrefixesProjection.java:62)
at
com.hp.hpl.jena.tdb.store.DatasetPrefixesTDB.getPrefixMapping(DatasetPrefixesTDB.java:168)
at
com.hp.hpl.jena.tdb.store.DatasetPrefixesTDB.getPrefixMapping(DatasetPrefixesTDB.java:160)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$DestinationDSG.prefix(BulkLoader.java:272)
at
org.apache.jena.riot.lang.LangTurtleBase.emitPrefix(LangTurtleBase.java:492)
at
org.apache.jena.riot.lang.LangTurtleBase.directivePrefix(LangTurtleBase.java:164)
at
org.apache.jena.riot.lang.LangTurtleBase.directive(LangTurtleBase.java:140)
at
org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:79)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
at
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:182)
at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:906)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:687)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:666)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:654)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:148)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:114)
at com.hp.hpl.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:261)
at com.hp.hpl.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:193)
at com.hp.hpl.jena.tdb.TDBLoader.load(TDBLoader.java:74)
at tdb.tdbloader.loadQuads(tdbloader.java:118)
at tdb.tdbloader.exec(tdbloader.java:86)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at tdb.tdbloader.main(tdbloader.java:44)
10:26:22 WARN DatasetPrefixesTDB :: Mangled prefix map: graph name=
java.lang.NullPointerException
...