Re: sparql 1.4 billion triples

Siddhesh Rane Sun, 16 Dec 2018 03:19:20 -0800

In-memory database has following limitations :

1) Time to create the database. Not a problem if you have a dedicated
machine which runs 24/7 where you load data once and the process never
exits. But a huge waste of time if you get hardware during certain time
slots and you have to load data from the start.

2) In-memory database is all or nothing. If your dataset can't fit in RAM,
you are out of luck. I had tried using this but many times it would go OOM.
With vmtouch, you can load an index partially, until as much free RAM is
available. Something is better than nothing.

Vmtouch is not doing anything magical. Tdb already uses mmap. When run on
its own, Linux will bring most of the index in RAM. But think about the
time it will take for that to happen. If one query takes 50 seconds (I've
seen it go to 500-1000s as well), then in 1 hour you would have run just 72
queries. If instead your speed was 1s/query you would have executed 3600
queries and that would bring more of the index in RAM for future queries to
run fast as well. So its also the rate of speedup that matters.
With vmtouch, you vmtouch at the beginning and it gives you a fast head
start and then its your program maintaining the cache.

Regards,
Siddhesh

On Sat, 15 Dec 2018, 9:15 pm ajs6f <[email protected] wrote:

> What is the advantage to doing that as opposed to using Jena's built-in
> in-memory dataset?
>
> ajs6f
>
> > On Dec 15, 2018, at 3:04 AM, Siddhesh Rane <[email protected]> wrote:
> >
> > Bring the entire database in RAM.
> > Use "vmtouch <database location>"
> > Get vmtouch from https://hoytech.com/vmtouch/
> >
> > I had used jena for 150M triples and my performance findings are
> documented
> > at
> >
> https://lists.apache.org/thread.html/254968eee3cd04370eafa2f9cc586e238f8a7034cf9ab4cbde3dc8e9@%3Cusers.jena.apache.org%3E
> >
> > Regards,
> > Siddhesh
> >
> > On Fri, 7 Dec 2018, 8:23 pm [email protected] <[email protected] wrote:
> >
> >> Dear jena,
> >> I have built a graph with 1.4 billion triples and store it as a data set
> >> in TDB  through Fuseki upload system.
> >> Now, I try to make some sparql search, the speed is very slow.
> >>
> >> For example, when I make the sqarql in Fuseki in the following, it takes
> >> 50 seconds.
> >> How can I improve the speed?
> >> ------------------------------
> >> Best wishes!
> >>
> >>
> >> 胡云苹
> >> 浙江大学控制科学与工程学院
> >> 浙江省杭州市浙大路38号浙大玉泉校区CSC研究所
> >> Institute of Cyber-Systems and Control, College of Control Science and
> >> Engineering, Zhejiang University, Hangzhou 310027,P.R.China
> >> Email : [email protected] <[email protected]>;[email protected]
> >>
> >>
>
>

Re: sparql 1.4 billion triples

Reply via email to