Re: sparql 1.4 billion triples

Siddhesh Rane Sun, 16 Dec 2018 11:58:25 -0800

I'll be happy to document this. I think FAQ would be a good place.

I actually looked further into this and found that the vmtouch
functionality is provided in the jdk itself.
java.nio.MappedByteBuffer#load method will bring file pages in memory [1].
The way it works is similar to vmtouch, i.e. reading a byte from each page
to cause page fault and load that page in memory [2].


[1]
https://docs.oracle.com/javase/8/docs/api/java/nio/MappedByteBuffer.html#load--

[2]
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/nio/MappedByteBuffer.java#l156


On Sun, 16 Dec 2018, 6:59 pm ajs6f <aj...@apache.org wrote:

> This seems to be a Linux-only technique that relies on installing and
> maintaining vmtouch, correct?
>
> It doesn't seem that we could support that as a general solution, but
> would you be interested in writing something that gives the essentials up
> for someplace in the Jena docs? I'll admit I'm not sure where it would best
> go, but it might be very helpful to users who can take advantage of it.
>
> ajs6f
>
> > On Dec 16, 2018, at 6:11 AM, Siddhesh Rane <kingsid...@gmail.com> wrote:
> >
> > In-memory database has following limitations :
> >
> > 1) Time to create the database. Not a problem if you have a dedicated
> > machine which runs 24/7 where you load data once and the process never
> > exits. But a huge waste of time if you get hardware during certain time
> > slots and you have to load data from the start.
> >
> > 2) In-memory database is all or nothing. If your dataset can't fit in
> RAM,
> > you are out of luck. I had tried using this but many times it would go
> OOM.
> > With vmtouch, you can load an index partially, until as much free RAM is
> > available. Something is better than nothing.
> >
> > Vmtouch is not doing anything magical. Tdb already uses mmap. When run on
> > its own, Linux will bring most of the index in RAM. But think about the
> > time it will take for that to happen. If one query takes 50 seconds (I've
> > seen it go to 500-1000s as well), then in 1 hour you would have run just
> 72
> > queries. If instead your speed was 1s/query you would have executed 3600
> > queries and that would bring more of the index in RAM for future queries
> to
> > run fast as well. So its also the rate of speedup that matters.
> > With vmtouch, you vmtouch at the beginning and it gives you a fast head
> > start and then its your program maintaining the cache.
> >
> > Regards,
> > Siddhesh
> >
> >
> > On Sat, 15 Dec 2018, 9:15 pm ajs6f <aj...@apache.org wrote:
> >
> >> What is the advantage to doing that as opposed to using Jena's built-in
> >> in-memory dataset?
> >>
> >> ajs6f
> >>
> >>> On Dec 15, 2018, at 3:04 AM, Siddhesh Rane <kingsid...@gmail.com>
> wrote:
> >>>
> >>> Bring the entire database in RAM.
> >>> Use "vmtouch <database location>"
> >>> Get vmtouch from https://hoytech.com/vmtouch/
> >>>
> >>> I had used jena for 150M triples and my performance findings are
> >> documented
> >>> at
> >>>
> >>
> https://lists.apache.org/thread.html/254968eee3cd04370eafa2f9cc586e238f8a7034cf9ab4cbde3dc8e9@%3Cusers.jena.apache.org%3E
> >>>
> >>> Regards,
> >>> Siddhesh
> >>>
> >>> On Fri, 7 Dec 2018, 8:23 pm y...@zju.edu.cn <y...@zju.edu.cn wrote:
> >>>
> >>>> Dear jena,
> >>>> I have built a graph with 1.4 billion triples and store it as a data
> set
> >>>> in TDB  through Fuseki upload system.
> >>>> Now, I try to make some sparql search, the speed is very slow.
> >>>>
> >>>> For example, when I make the sqarql in Fuseki in the following, it
> takes
> >>>> 50 seconds.
> >>>> How can I improve the speed?
> >>>> ------------------------------
> >>>> Best wishes!
> >>>>
> >>>>
> >>>> 胡云苹
> >>>> 浙江大学控制科学与工程学院
> >>>> 浙江省杭州市浙大路38号浙大玉泉校区CSC研究所
> >>>> Institute of Cyber-Systems and Control, College of Control Science and
> >>>> Engineering, Zhejiang University, Hangzhou 310027,P.R.China
> >>>> Email : y...@zju.edu.cn <y...@iipc.zju.edu.cn>;hyphy...@163.com
> >>>>
> >>>>
> >>
> >>
>
>

Re: sparql 1.4 billion triples

Reply via email to