Hi Donald,

The bulk loader tdbloader is not transactional and if aborted part way through, the database is suspect. You *may* find that deleting the prefix tables sorts things but but there is a good chance the triple indexes or node table is broken as well.

TDB has two bulk loader - tdbloader2 can be faster for larger datasets. Whatever one you use, more RAM (not java heap) improves performance of loading.

Both bulk loaders can only do anything specail if the database is initially empty. tdbloader2 simply refuses to load an existing database, tdbloader,

You can give all the files to load in a single command to load multiple files into an empty database.

I tried downloading PubChem but failed (the server didn't like the buylk loader I was using). Are you loading the whole thing? Which is about 1.6 billion triples? You will need a large RAM machine to use TDB. Having an SSD makes a big difference.

Once you have a loaded the database, you can move it to another machine by simply copying the directory when no program is connected to the database.

        Andy

On 24/07/15 15:40, Pellegrino, Donald (DA) wrote:
I attempted to load the NCBI PubChem RDF Compound data 
(https://pubchem.ncbi.nlm.nih.gov/rdf/#_Toc421254632) into an Apache Jena TDB database. 
Given 18 hours, the load of PubChem RDF Compound data was only 12/109 .ttl.gz files (11%) 
complete. Therefore, I hit CTRL+C to cancel the tdbloader operation and try other 
approaches. Unfortunately, now when I try to run tdbloader I get "WARN  
DatasetPrefixesTDB   :: Mangled prefix map: graph name=" followed by a 
java.lang.NullPointerException. Partial tdbloader error output is below.

Please let me know if you have any suggestions for debugging this error.

---

tdbloader --verbose --loc=/home/irkmoo/reactionsdb/ pc_compound_type.ttl.gz
Java maximum memory: 1029177344
symbol:http://jena.hpl.hp.com/ARQ#constantBNodeLabels = true
symbol:http://jena.hpl.hp.com/ARQ#regexImpl = 
symbol:http://jena.hpl.hp.com/ARQ#javaRegex
symbol:http://jena.hpl.hp.com/ARQ#stageGenerator = 
com.hp.hpl.jena.tdb.solver.StageGeneratorDirectTDB@18078bef
symbol:http://jena.hpl.hp.com/ARQ#strictSPARQL = false
symbol:http://jena.hpl.hp.com/ARQ#enablePropertyFunctions = true
10:26:22 INFO  loader               :: -- Start triples data phase
10:26:22 INFO  loader               :: ** Load into triples table with existing 
data
10:26:22 INFO  loader               :: -- Start quads data phase
10:26:22 INFO  loader               :: ** Load empty quads table
10:26:22 INFO  loader               :: Load: pc_compound_type.ttl.gz -- 
2015/07/24 10:26:22 EDT
10:26:22 WARN  DatasetPrefixesTDB   :: Mangled prefix map: graph name=
java.lang.NullPointerException
         at 
com.hp.hpl.jena.tdb.store.DatasetPrefixesTDB.readPrefixMap(DatasetPrefixesTDB.java:119)
         at 
com.hp.hpl.jena.sparql.graph.GraphPrefixesProjection.getNsPrefixMap(GraphPrefixesProjection.java:62)
         at 
com.hp.hpl.jena.tdb.store.DatasetPrefixesTDB.getPrefixMapping(DatasetPrefixesTDB.java:168)
         at 
com.hp.hpl.jena.tdb.store.DatasetPrefixesTDB.getPrefixMapping(DatasetPrefixesTDB.java:160)
         at 
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$DestinationDSG.prefix(BulkLoader.java:272)
         at 
org.apache.jena.riot.lang.LangTurtleBase.emitPrefix(LangTurtleBase.java:492)
         at 
org.apache.jena.riot.lang.LangTurtleBase.directivePrefix(LangTurtleBase.java:164)
         at 
org.apache.jena.riot.lang.LangTurtleBase.directive(LangTurtleBase.java:140)
         at 
org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:79)
         at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
         at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:182)
         at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:906)
         at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:687)
         at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:666)
         at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:654)
         at 
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:148)
         at 
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:114)
         at com.hp.hpl.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:261)
         at com.hp.hpl.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:193)
         at com.hp.hpl.jena.tdb.TDBLoader.load(TDBLoader.java:74)
         at tdb.tdbloader.loadQuads(tdbloader.java:118)
         at tdb.tdbloader.exec(tdbloader.java:86)
         at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
         at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
         at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
         at tdb.tdbloader.main(tdbloader.java:44)
10:26:22 WARN  DatasetPrefixesTDB   :: Mangled prefix map: graph name=
java.lang.NullPointerException
...



Reply via email to