Re: sdbload utility NQuads format support

Егор Егоров Wed, 20 Mar 2013 21:43:07 -0700

Thank you, Andy, for your reply.

> tdbloader will do the job but are you running a 32 bit JVM?


I am using 64-bit Ubuntu 12.04.

TDB is stagnating after 24 hours of work -- its throughput slow down from
80k tps to ~500 tps and I think it will never finish. My PC is totally
freeze, I cant even open new terminal tab.

> 1.3 billion  ...  what sort fo queries do you want to ask of the data
once loaded?  Only simply queries are going to stand much chance of running
at a tolerable speed.

A lot of simple SELECT queries is enough for me, I understand that overall
performance will be low.

> If you can borrow a large machine to load the database you'll get on
better.  Databases are portable - you can copy the database directory
around.

I am thinking about Amazon EC2 or new standalone server. What amount of RAM
is enough to load the entire BTC dataset via TDB?

And I have ~30 identical computers like mine right now. Is it possible to
configure a cluster and load the entire dataset via tdb? Or it is better to
use another store that supports jena api?

Thank you!


2013/3/21 Andy Seaborne <[email protected]>

> On 20/03/13 09:23, Егор Егоров wrote:
>
>> Hello JENA users and developers, please help me.
>>
>> I am trying to load BTC dataset via low-end hardware (Core 2 Quad Q6600
>> 2.40 GHz, 4 GB RAM, 2x250 GB SATA Barracuda 7200.10 RAID0 Stripe)
>>
>> First, I was using TDB. But hardware is too bad for this task -- I can
>> load
>> only ~100 million quads. So I decided to switch to SDB.
>>
>
> SDB is slower than TDB ... and SDB at 100 million triples is pushing it
> somewhat.
>
> tdbloader will do the job but are you running a 32 bit JVM?
>
>  But sdbload utility is unable to import .nq files:
>>
>> egor@egorov:~/semsearch/sdb$ sdbload -v sdb.ttl
>> ../dataset/btc-2009-chunk-115-**urified.nq
>> Start load: sdb.ttl
>> Start load: ../dataset/btc-2009-chunk-115-**urified.nq
>> WARN  Only triples or default graph data expected : named graph data
>> ignored
>> <[email protected]>
>>
>> So I am using the following java code to import nquads:
>>
>> Store store =
>> SDBFactory.connectStore("/**home/egor/semsearch/sdb/sdb.**ttl");
>> Dataset dataset = SDBFactory.connectDataset(**store);
>> RDFDataMgr.read(dataset,
>> "/home/egor/semsearch/dataset/**btc-2009-chunk-115-urified.nq"**);
>>
>> I have the following questions:
>> 1. What approx. hardware requirements to load ~1.3 Billion quads into TDB
>> or SDB backend?
>>
>
> 1.3 billion  ...  what sort fo queries do you want to ask of the data once
> loaded?  Only simply queries are going to stand much chance of running at a
> tolerable speed.
>
> If you can borrow a large machine to load the database you'll get on
> better.  Databases are portable - you can copy the database directory
> around.
>
>
>  2. Is it real to load the BTC dataset via my computer & sdb?
>>
>
> No.
>
>
>  3. Why sdbload utility is unable to load NQuads, but RDFDataMgr.read
>> accepts .nq files? I think that it is very useful feature for sdbload
>> utility, can it be realized in new versions of jena?
>>
>
> sdbload needs some maintainence - it's a bit old.  There is no technical
> reason in SDB that prevents loading from quads, it just needs.  It's just
> old code.
>
>
>> Thank you!
>>
>> Egor Egorov
>>
>>
>         Andy
>

Re: sdbload utility NQuads format support

Reply via email to