Re: sparql 1.4 billion triples

ajs6f Sun, 16 Dec 2018 05:30:08 -0800

This seems to be a Linux-only technique that relies on installing and 
maintaining vmtouch, correct?


It doesn't seem that we could support that as a general solution, but would you 
be interested in writing something that gives the essentials up for someplace 
in the Jena docs? I'll admit I'm not sure where it would best go, but it might 
be very helpful to users who can take advantage of it.

ajs6f

> On Dec 16, 2018, at 6:11 AM, Siddhesh Rane <kingsid...@gmail.com> wrote:
> 
> In-memory database has following limitations :
> 
> 1) Time to create the database. Not a problem if you have a dedicated
> machine which runs 24/7 where you load data once and the process never
> exits. But a huge waste of time if you get hardware during certain time
> slots and you have to load data from the start.
> 
> 2) In-memory database is all or nothing. If your dataset can't fit in RAM,
> you are out of luck. I had tried using this but many times it would go OOM.
> With vmtouch, you can load an index partially, until as much free RAM is
> available. Something is better than nothing.
> 
> Vmtouch is not doing anything magical. Tdb already uses mmap. When run on
> its own, Linux will bring most of the index in RAM. But think about the
> time it will take for that to happen. If one query takes 50 seconds (I've
> seen it go to 500-1000s as well), then in 1 hour you would have run just 72
> queries. If instead your speed was 1s/query you would have executed 3600
> queries and that would bring more of the index in RAM for future queries to
> run fast as well. So its also the rate of speedup that matters.
> With vmtouch, you vmtouch at the beginning and it gives you a fast head
> start and then its your program maintaining the cache.
> 
> Regards,
> Siddhesh
> 
> 
> On Sat, 15 Dec 2018, 9:15 pm ajs6f <aj...@apache.org wrote:
> 
>> What is the advantage to doing that as opposed to using Jena's built-in
>> in-memory dataset?
>> 
>> ajs6f
>> 
>>> On Dec 15, 2018, at 3:04 AM, Siddhesh Rane <kingsid...@gmail.com> wrote:
>>> 
>>> Bring the entire database in RAM.
>>> Use "vmtouch <database location>"
>>> Get vmtouch from https://hoytech.com/vmtouch/
>>> 
>>> I had used jena for 150M triples and my performance findings are
>> documented
>>> at
>>> 
>> https://lists.apache.org/thread.html/254968eee3cd04370eafa2f9cc586e238f8a7034cf9ab4cbde3dc8e9@%3Cusers.jena.apache.org%3E
>>> 
>>> Regards,
>>> Siddhesh
>>> 
>>> On Fri, 7 Dec 2018, 8:23 pm y...@zju.edu.cn <y...@zju.edu.cn wrote:
>>> 
>>>> Dear jena,
>>>> I have built a graph with 1.4 billion triples and store it as a data set
>>>> in TDB  through Fuseki upload system.
>>>> Now, I try to make some sparql search, the speed is very slow.
>>>> 
>>>> For example, when I make the sqarql in Fuseki in the following, it takes
>>>> 50 seconds.
>>>> How can I improve the speed?
>>>> ------------------------------
>>>> Best wishes!
>>>> 
>>>> 
>>>> 胡云苹
>>>> 浙江大学控制科学与工程学院
>>>> 浙江省杭州市浙大路38号浙大玉泉校区CSC研究所
>>>> Institute of Cyber-Systems and Control, College of Control Science and
>>>> Engineering, Zhejiang University, Hangzhou 310027,P.R.China
>>>> Email : y...@zju.edu.cn <y...@iipc.zju.edu.cn>;hyphy...@163.com
>>>> 
>>>> 
>> 
>>

Re: sparql 1.4 billion triples

Reply via email to