Okay, that's very helpful
So one thing that jumps out at me looking at that Dockerfile and its associated
entrypoint script is that it starts the JVM without any explicit heap size
settings. When that is done the JVM will pick default heap sizes itself which
normally would be fine. However in a container the amount of memory apparently
available may not actually reflect external limits that the container
runtime/orchestrator is imposing. Just as a practical example I ran the
container locally (using docker run to drop into a shell) and ran the same
basic Java command the entrypoint runs, adding extra arguments to have the JVM
dump its settings and I see a heap size of ~3GB:
bash-4.3$ java -cp "*:/javalibs/*" -XX:+PrintFlagsFinal -version | grep -iE
'HeapSize'
uintx ErgoHeapSizeLimit = 0
{product}
uintx HeapSizePerGCThread = 87241520
{product}
uintx InitialHeapSize := 197132288
{product}
uintx LargePageHeapSizeThreshold = 134217728
{product}
uintx MaxHeapSize := 3141533696
{product}
I repeated the same experiment running the container inside a Kubernetes pod
with a 1GB resource limit and the JVM still picked a 3GB limit
This is a common problem that can occur in any containerised environment, it
would be better to modify the Dockerfile to explicitly set desired heap sizes
to match the resource limits your container orchestrator is going to impose
upon you. Be aware when choosing a heap size that a lot of TDB memory usage is
off heap so you should set a JVM heap size that takes that into account, so
perhaps try -Xmx512m leaving half your memory for off-heap usage (assuming the
1GB resource limit you state). You'll likely need to experiment to find
settings that work for your workload.
Hope this helps,
Rob
On 17/04/2020, 09:26, "Luís Moreira de Sousa"
<[email protected]> wrote:
Hi all, some answers below to the many questions.
1. This Fuseki instance is based on the image maintained at DockerHub by
the secoresearch account. Copies of the Dockerfile and tdb.cfg files are at the
end of this message. There is no other code involved.
2. The image is deployed to an Openshift cluster with a default resource
base of 1 CPU and 1 GB of RAM. The intention is to use Fuseki as component of a
information system easy to deploy by institutions in developing countries,
where resources may be limited and know-how lacking. These resources have shown
sufficient to run software such as Postgres or MapServer.
3. Openshift provides a user interface to easy monitor the resources taken
up by a running container (aka pod), no code is involved in this monitoring. It
is also possible to launch a shell session into the container and monitor that
way. At the end of the message is a print out from top showing that nothing
else is running in this particular container. All memory is used eihter by
Fuseki or the system.
4. The datasets I have been using to test Fuseki were created with rdflib
and are saved as XML/RDF. Each contains some dozens of objects of interst and
respective relations from a larger database. The largest of these RDF files
contains just under 100 000 triples and occupies 20 MB in disk. I uploaded a
new graph with more meaningfull labels (https://pasteboard.co/J4cfPM9.png).
Each point in the graph is a dataset, in the xx axis (horizontal) is the number
of triples in the dataset, in the yy axis (vertical) is the additional memory
required by Fuseki once the dataset is added. Again, note that all datasets are
uploaded in persistent mode.
5. Regarding the JVM, the information in the manual simply refers that the
heap size is somewhat dependent on the kind of queries run. But the problem on
this end is with dataset upload. At this stage I do not know what or how to
modify in the JVM set-up.
Thank you for your help.
Dockerfile
----------
FROM secoresearch/fuseki:latest
# Set environment variables
ENV ADMIN_PASSWORD toto
ENV ENABLE_DATA_WRITE true
ENV ENABLE_UPDATE true
ENV ENABLE_UPLOAD true
# Add in config files
COPY ./tbd.cfg $FUSEKI_BASE/tbd.cfg
COPY ./tbd.cfg $FUSEKI_HOME/tbd.cfg
tbf.cfg
-------
{
"tdb.node2nodeid_cache_size" : 50000 ,
"tdb.nodeid2node_cache_size" : 250000 ,
}
top
---
Mem: 39251812K used, 26724204K free, 21104K shrd, 58340K buff, 23792776K
cached
CPU: 9% usr 5% sys 0% nic 84% idle 0% io 0% irq 0% sirq
Load average: 2.02 1.93 1.75 3/4355 114
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
1 0 9008 S 20528m 30% 4 0% java -cp *:/javalibs/*
org.apache.jena.fuseki.cmd.FusekiCmd
109 102 9008 S 1520 0% 7 0% /bin/sh
102 0 9008 S 1512 0% 1 0% /bin/sh -c
TERM="xterm-termite" /bin/sh
110 109 9008 R 1508 0% 1 0% top
--
Luís