Re: Memory management with Fuseki

Rob Vesse Fri, 17 Apr 2020 02:59:13 -0700

Okay, that's very helpful

So one thing that jumps out at me looking at that Dockerfile and its associated 
entrypoint script is that it starts the JVM without any explicit heap size 
settings.  When that is done the JVM will pick default heap sizes itself which 
normally would be fine.  However in a container the amount of memory apparently 
available may not actually reflect external limits that the container 
runtime/orchestrator is imposing.  Just as a practical example I ran the 
container locally (using docker run to drop into a shell) and ran the same 
basic Java command the entrypoint runs, adding extra arguments to have the JVM 
dump its settings and I see a heap size of ~3GB:


bash-4.3$ java -cp "*:/javalibs/*" -XX:+PrintFlagsFinal -version | grep -iE 
'HeapSize'
    uintx ErgoHeapSizeLimit                         = 0                         
          {product}
    uintx HeapSizePerGCThread                       = 87241520                  
          {product}
    uintx InitialHeapSize                          := 197132288                 
          {product}
    uintx LargePageHeapSizeThreshold                = 134217728                 
          {product}
    uintx MaxHeapSize                              := 3141533696                
          {product}

I repeated the same experiment running the container inside a Kubernetes pod 
with a 1GB resource limit and the JVM still picked a 3GB limit

This is a common problem that can occur in any containerised environment, it 
would be better to modify the Dockerfile to explicitly set desired heap sizes 
to match the resource limits your container orchestrator is going to impose 
upon you.  Be aware when choosing a heap size that a lot of TDB memory usage is 
off heap so you should set a JVM heap size that takes that into account, so 
perhaps try -Xmx512m leaving half your memory for off-heap usage (assuming the 
1GB resource limit you state).  You'll likely need to experiment to find 
settings that work for your workload.

Hope this helps,

Rob

On 17/04/2020, 09:26, "Luís Moreira de Sousa" 
<luis.de.so...@protonmail.ch.INVALID> wrote:

    Hi all, some answers below to the many questions.
    
    1. This Fuseki instance is based on the image maintained at DockerHub by 
the secoresearch account. Copies of the Dockerfile and tdb.cfg files are at the 
end of this message. There is no other code involved.
    
    2. The image is deployed to an Openshift cluster with a default resource 
base of 1 CPU and 1 GB of RAM. The intention is to use Fuseki as component of a 
information system easy to deploy by institutions in developing countries, 
where resources may be limited and know-how lacking. These resources have shown 
sufficient to run software such as Postgres or MapServer.
    
    3. Openshift provides a user interface to easy monitor the resources taken 
up by a running container (aka pod), no code is involved in this monitoring. It 
is also possible to launch a shell session into the container and monitor that 
way. At the end of the message is a print out from top showing that nothing 
else is running in this particular container. All memory is used eihter by 
Fuseki or the system.
    
    4. The datasets I have been using to test Fuseki were created with rdflib 
and are saved as XML/RDF. Each contains some dozens of objects of interst and 
respective relations from a larger database. The largest of these RDF files 
contains just under 100 000 triples and occupies 20 MB in disk. I uploaded a 
new graph with more meaningfull labels (https://pasteboard.co/J4cfPM9.png). 
Each point in the graph is a dataset, in the xx axis (horizontal) is the number 
of triples in the dataset, in the yy axis (vertical) is the additional memory 
required by Fuseki once the dataset is added. Again, note that all datasets are 
uploaded in persistent mode.
    
    5. Regarding the JVM, the information in the manual simply refers that the 
heap size is somewhat dependent on the kind of queries run. But the problem on 
this end is with dataset upload. At this stage I do not know what or how to 
modify in the JVM set-up.
    
    Thank you for your help.
    
    Dockerfile
    ----------
    FROM secoresearch/fuseki:latest
    
    # Set environment variables
    ENV ADMIN_PASSWORD toto
    ENV ENABLE_DATA_WRITE true
    ENV ENABLE_UPDATE true
    ENV ENABLE_UPLOAD true
    
    # Add in config files
    COPY ./tbd.cfg $FUSEKI_BASE/tbd.cfg
    COPY ./tbd.cfg $FUSEKI_HOME/tbd.cfg
    
    
    tbf.cfg
    -------
    {
      "tdb.node2nodeid_cache_size" :  50000 ,
      "tdb.nodeid2node_cache_size" :  250000 ,
    }
    
    top
    ---
    Mem: 39251812K used, 26724204K free, 21104K shrd, 58340K buff, 23792776K 
cached
    CPU:   9% usr   5% sys   0% nic  84% idle   0% io   0% irq   0% sirq
    Load average: 2.02 1.93 1.75 3/4355 114
      PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
        1     0 9008     S   20528m  30%   4   0% java -cp *:/javalibs/* 
org.apache.jena.fuseki.cmd.FusekiCmd
      109   102 9008     S     1520   0%   7   0% /bin/sh
      102     0 9008     S     1512   0%   1   0% /bin/sh -c 
TERM="xterm-termite" /bin/sh
      110   109 9008     R     1508   0%   1   0% top
    
    
    
    
    --
    Luís

Re: Memory management with Fuseki

Reply via email to