Hello JENA users and developers, please help me. I am trying to load BTC dataset via low-end hardware (Core 2 Quad Q6600 2.40 GHz, 4 GB RAM, 2x250 GB SATA Barracuda 7200.10 RAID0 Stripe)
First, I was using TDB. But hardware is too bad for this task -- I can load only ~100 million quads. So I decided to switch to SDB. But sdbload utility is unable to import .nq files: egor@egorov:~/semsearch/sdb$ sdbload -v sdb.ttl ../dataset/btc-2009-chunk-115-urified.nq Start load: sdb.ttl Start load: ../dataset/btc-2009-chunk-115-urified.nq WARN Only triples or default graph data expected : named graph data ignored <[email protected]> So I am using the following java code to import nquads: Store store = SDBFactory.connectStore("/home/egor/semsearch/sdb/sdb.ttl"); Dataset dataset = SDBFactory.connectDataset(store); RDFDataMgr.read(dataset, "/home/egor/semsearch/dataset/btc-2009-chunk-115-urified.nq"); I have the following questions: 1. What approx. hardware requirements to load ~1.3 Billion quads into TDB or SDB backend? 2. Is it real to load the BTC dataset via my computer & sdb? 3. Why sdbload utility is unable to load NQuads, but RDFDataMgr.read accepts .nq files? I think that it is very useful feature for sdbload utility, can it be realized in new versions of jena? Thank you! Egor Egorov
