I'll be happy to document this. I think FAQ would be a good place. I actually looked further into this and found that the vmtouch functionality is provided in the jdk itself. java.nio.MappedByteBuffer#load method will bring file pages in memory [1]. The way it works is similar to vmtouch, i.e. reading a byte from each page to cause page fault and load that page in memory [2].
[1] https://docs.oracle.com/javase/8/docs/api/java/nio/MappedByteBuffer.html#load-- [2] http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/nio/MappedByteBuffer.java#l156 On Sun, 16 Dec 2018, 6:59 pm ajs6f <aj...@apache.org wrote: > This seems to be a Linux-only technique that relies on installing and > maintaining vmtouch, correct? > > It doesn't seem that we could support that as a general solution, but > would you be interested in writing something that gives the essentials up > for someplace in the Jena docs? I'll admit I'm not sure where it would best > go, but it might be very helpful to users who can take advantage of it. > > ajs6f > > > On Dec 16, 2018, at 6:11 AM, Siddhesh Rane <kingsid...@gmail.com> wrote: > > > > In-memory database has following limitations : > > > > 1) Time to create the database. Not a problem if you have a dedicated > > machine which runs 24/7 where you load data once and the process never > > exits. But a huge waste of time if you get hardware during certain time > > slots and you have to load data from the start. > > > > 2) In-memory database is all or nothing. If your dataset can't fit in > RAM, > > you are out of luck. I had tried using this but many times it would go > OOM. > > With vmtouch, you can load an index partially, until as much free RAM is > > available. Something is better than nothing. > > > > Vmtouch is not doing anything magical. Tdb already uses mmap. When run on > > its own, Linux will bring most of the index in RAM. But think about the > > time it will take for that to happen. If one query takes 50 seconds (I've > > seen it go to 500-1000s as well), then in 1 hour you would have run just > 72 > > queries. If instead your speed was 1s/query you would have executed 3600 > > queries and that would bring more of the index in RAM for future queries > to > > run fast as well. So its also the rate of speedup that matters. > > With vmtouch, you vmtouch at the beginning and it gives you a fast head > > start and then its your program maintaining the cache. > > > > Regards, > > Siddhesh > > > > > > On Sat, 15 Dec 2018, 9:15 pm ajs6f <aj...@apache.org wrote: > > > >> What is the advantage to doing that as opposed to using Jena's built-in > >> in-memory dataset? > >> > >> ajs6f > >> > >>> On Dec 15, 2018, at 3:04 AM, Siddhesh Rane <kingsid...@gmail.com> > wrote: > >>> > >>> Bring the entire database in RAM. > >>> Use "vmtouch <database location>" > >>> Get vmtouch from https://hoytech.com/vmtouch/ > >>> > >>> I had used jena for 150M triples and my performance findings are > >> documented > >>> at > >>> > >> > https://lists.apache.org/thread.html/254968eee3cd04370eafa2f9cc586e238f8a7034cf9ab4cbde3dc8e9@%3Cusers.jena.apache.org%3E > >>> > >>> Regards, > >>> Siddhesh > >>> > >>> On Fri, 7 Dec 2018, 8:23 pm y...@zju.edu.cn <y...@zju.edu.cn wrote: > >>> > >>>> Dear jena, > >>>> I have built a graph with 1.4 billion triples and store it as a data > set > >>>> in TDB through Fuseki upload system. > >>>> Now, I try to make some sparql search, the speed is very slow. > >>>> > >>>> For example, when I make the sqarql in Fuseki in the following, it > takes > >>>> 50 seconds. > >>>> How can I improve the speed? > >>>> ------------------------------ > >>>> Best wishes! > >>>> > >>>> > >>>> 胡云苹 > >>>> 浙江大学控制科学与工程学院 > >>>> 浙江省杭州市浙大路38号浙大玉泉校区CSC研究所 > >>>> Institute of Cyber-Systems and Control, College of Control Science and > >>>> Engineering, Zhejiang University, Hangzhou 310027,P.R.China > >>>> Email : y...@zju.edu.cn <y...@iipc.zju.edu.cn>;hyphy...@163.com > >>>> > >>>> > >> > >> > >