Thank you, Andy, for your reply. > tdbloader will do the job but are you running a 32 bit JVM?
I am using 64-bit Ubuntu 12.04. TDB is stagnating after 24 hours of work -- its throughput slow down from 80k tps to ~500 tps and I think it will never finish. My PC is totally freeze, I cant even open new terminal tab. > 1.3 billion ... what sort fo queries do you want to ask of the data once loaded? Only simply queries are going to stand much chance of running at a tolerable speed. A lot of simple SELECT queries is enough for me, I understand that overall performance will be low. > If you can borrow a large machine to load the database you'll get on better. Databases are portable - you can copy the database directory around. I am thinking about Amazon EC2 or new standalone server. What amount of RAM is enough to load the entire BTC dataset via TDB? And I have ~30 identical computers like mine right now. Is it possible to configure a cluster and load the entire dataset via tdb? Or it is better to use another store that supports jena api? Thank you! 2013/3/21 Andy Seaborne <[email protected]> > On 20/03/13 09:23, Егор Егоров wrote: > >> Hello JENA users and developers, please help me. >> >> I am trying to load BTC dataset via low-end hardware (Core 2 Quad Q6600 >> 2.40 GHz, 4 GB RAM, 2x250 GB SATA Barracuda 7200.10 RAID0 Stripe) >> >> First, I was using TDB. But hardware is too bad for this task -- I can >> load >> only ~100 million quads. So I decided to switch to SDB. >> > > SDB is slower than TDB ... and SDB at 100 million triples is pushing it > somewhat. > > tdbloader will do the job but are you running a 32 bit JVM? > > But sdbload utility is unable to import .nq files: >> >> egor@egorov:~/semsearch/sdb$ sdbload -v sdb.ttl >> ../dataset/btc-2009-chunk-115-**urified.nq >> Start load: sdb.ttl >> Start load: ../dataset/btc-2009-chunk-115-**urified.nq >> WARN Only triples or default graph data expected : named graph data >> ignored >> <[email protected]> >> >> So I am using the following java code to import nquads: >> >> Store store = >> SDBFactory.connectStore("/**home/egor/semsearch/sdb/sdb.**ttl"); >> Dataset dataset = SDBFactory.connectDataset(**store); >> RDFDataMgr.read(dataset, >> "/home/egor/semsearch/dataset/**btc-2009-chunk-115-urified.nq"**); >> >> I have the following questions: >> 1. What approx. hardware requirements to load ~1.3 Billion quads into TDB >> or SDB backend? >> > > 1.3 billion ... what sort fo queries do you want to ask of the data once > loaded? Only simply queries are going to stand much chance of running at a > tolerable speed. > > If you can borrow a large machine to load the database you'll get on > better. Databases are portable - you can copy the database directory > around. > > > 2. Is it real to load the BTC dataset via my computer & sdb? >> > > No. > > > 3. Why sdbload utility is unable to load NQuads, but RDFDataMgr.read >> accepts .nq files? I think that it is very useful feature for sdbload >> utility, can it be realized in new versions of jena? >> > > sdbload needs some maintainence - it's a bit old. There is no technical > reason in SDB that prevents loading from quads, it just needs. It's just > old code. > > >> Thank you! >> >> Egor Egorov >> >> > Andy >
