Re: Memory management with Fuseki

Andy Seaborne Fri, 17 Apr 2020 05:10:14 -0700

More information:

About Java and containers and sizing:


Summary - things got better at Java10 - running with Java11 is a good idea.

https://www.docker.com/blog/improved-docker-container-integration-with-java-10/

    Andy

On 17/04/2020 10:58, Rob Vesse wrote:

Okay, that's very helpful

So one thing that jumps out at me looking at that Dockerfile and its associated 
entrypoint script is that it starts the JVM without any explicit heap size 
settings.  When that is done the JVM will pick default heap sizes itself which 
normally would be fine.  However in a container the amount of memory apparently 
available may not actually reflect external limits that the container 
runtime/orchestrator is imposing.  Just as a practical example I ran the 
container locally (using docker run to drop into a shell) and ran the same 
basic Java command the entrypoint runs, adding extra arguments to have the JVM 
dump its settings and I see a heap size of ~3GB:

bash-4.3$ java -cp "*:/javalibs/*" -XX:+PrintFlagsFinal -version | grep -iE 
'HeapSize'
     uintx ErgoHeapSizeLimit                         = 0                        
           {product}
     uintx HeapSizePerGCThread                       = 87241520                 
           {product}
     uintx InitialHeapSize                          := 197132288                
           {product}
     uintx LargePageHeapSizeThreshold                = 134217728                
           {product}
     uintx MaxHeapSize                              := 3141533696               
           {product}

I repeated the same experiment running the container inside a Kubernetes pod 
with a 1GB resource limit and the JVM still picked a 3GB limit

This is a common problem that can occur in any containerised environment, it 
would be better to modify the Dockerfile to explicitly set desired heap sizes 
to match the resource limits your container orchestrator is going to impose 
upon you.  Be aware when choosing a heap size that a lot of TDB memory usage is 
off heap so you should set a JVM heap size that takes that into account, so 
perhaps try -Xmx512m leaving half your memory for off-heap usage (assuming the 
1GB resource limit you state).  You'll likely need to experiment to find 
settings that work for your workload.

Hope this helps,

Rob

On 17/04/2020, 09:26, "Luís Moreira de Sousa" 
<[email protected]> wrote:

     Hi all, some answers below to the many questions.
1. This Fuseki instance is based on the image maintained at DockerHub by the secoresearch account. Copies of the Dockerfile and tdb.cfg files are at the end of this message. There is no other code involved.2. The image is deployed to an Openshift cluster with a default resource base of 1 CPU and 1 GB of RAM. The intention is to use Fuseki as component of a information system easy to deploy by institutions in developing countries, where resources may be limited and know-how lacking. These resources have shown sufficient to run software such as Postgres or MapServer.3. Openshift provides a user interface to easy monitor the resources taken up by a running container (aka pod), no code is involved in this monitoring. It is also possible to launch a shell session into the container and monitor that way. At the end of the message is a print out from top showing that nothing else is running in this particular container. All memory is used eihter by Fuseki or the system.4. The datasets I have been using to test Fuseki were created with rdflib and are saved as XML/RDF. Each contains some dozens of objects of interst and respective relations from a larger database. The largest of these RDF files contains just under 100 000 triples and occupies 20 MB in disk. I uploaded a new graph with more meaningfull labels (https://pasteboard.co/J4cfPM9.png). Each point in the graph is a dataset, in the xx axis (horizontal) is the number of triples in the dataset, in the yy axis (vertical) is the additional memory required by Fuseki once the dataset is added. Again, note that all datasets are uploaded in persistent mode.5. Regarding the JVM, the information in the manual simply refers that the heap size is somewhat dependent on the kind of queries run. But the problem on this end is with dataset upload. At this stage I do not know what or how to modify in the JVM set-up.Thank you for your help.Dockerfile
     ----------
     FROM secoresearch/fuseki:latest
# Set environment variables
     ENV ADMIN_PASSWORD toto
     ENV ENABLE_DATA_WRITE true
     ENV ENABLE_UPDATE true
     ENV ENABLE_UPLOAD true
# Add in config files
     COPY ./tbd.cfg $FUSEKI_BASE/tbd.cfg
     COPY ./tbd.cfg $FUSEKI_HOME/tbd.cfg
tbf.cfg
     -------
     {
       "tdb.node2nodeid_cache_size" :  50000 ,
       "tdb.nodeid2node_cache_size" :  250000 ,
     }
top
     ---
     Mem: 39251812K used, 26724204K free, 21104K shrd, 58340K buff, 23792776K 
cached
     CPU:   9% usr   5% sys   0% nic  84% idle   0% io   0% irq   0% sirq
     Load average: 2.02 1.93 1.75 3/4355 114
       PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
         1     0 9008     S   20528m  30%   4   0% java -cp *:/javalibs/* 
org.apache.jena.fuseki.cmd.FusekiCmd
       109   102 9008     S     1520   0%   7   0% /bin/sh
       102     0 9008     S     1512   0%   1   0% /bin/sh -c 
TERM="xterm-termite" /bin/sh
       110   109 9008     R     1508   0%   1   0% top
--
     Luís

Re: Memory management with Fuseki

Reply via email to