On 20/03/13 09:23, Егор Егоров wrote:
Hello JENA users and developers, please help me.

I am trying to load BTC dataset via low-end hardware (Core 2 Quad Q6600
2.40 GHz, 4 GB RAM, 2x250 GB SATA Barracuda 7200.10 RAID0 Stripe)

First, I was using TDB. But hardware is too bad for this task -- I can load
only ~100 million quads. So I decided to switch to SDB.

SDB is slower than TDB ... and SDB at 100 million triples is pushing it somewhat.

tdbloader will do the job but are you running a 32 bit JVM?

But sdbload utility is unable to import .nq files:

egor@egorov:~/semsearch/sdb$ sdbload -v sdb.ttl
../dataset/btc-2009-chunk-115-urified.nq
Start load: sdb.ttl
Start load: ../dataset/btc-2009-chunk-115-urified.nq
WARN  Only triples or default graph data expected : named graph data ignored
<[email protected]>
So I am using the following java code to import nquads:

Store store =
SDBFactory.connectStore("/home/egor/semsearch/sdb/sdb.ttl");
Dataset dataset = SDBFactory.connectDataset(store);
RDFDataMgr.read(dataset,
"/home/egor/semsearch/dataset/btc-2009-chunk-115-urified.nq");

I have the following questions:
1. What approx. hardware requirements to load ~1.3 Billion quads into TDB
or SDB backend?

1.3 billion ... what sort fo queries do you want to ask of the data once loaded? Only simply queries are going to stand much chance of running at a tolerable speed.

If you can borrow a large machine to load the database you'll get on better. Databases are portable - you can copy the database directory around.

2. Is it real to load the BTC dataset via my computer & sdb?

No.

3. Why sdbload utility is unable to load NQuads, but RDFDataMgr.read
accepts .nq files? I think that it is very useful feature for sdbload
utility, can it be realized in new versions of jena?

sdbload needs some maintainence - it's a bit old. There is no technical reason in SDB that prevents loading from quads, it just needs. It's just old code.


Thank you!

Egor Egorov


        Andy

Reply via email to