What do we know so far?
1/ 6 datasets, uptime 20Mb each (file size? format? Compressed? Inference?)
(is that datasets or graphs?)
2/ At 1G the system kills processes.
What we don't know:
A/ Heap size
B/ Machine RAM size - TDB uses memory mapped so this matters. It also
means the process size may look large (the database files are mapped)
but this is virtual memory, not real RAM.
C/ Why 1G? That is pretty small for general purpose Java program. Java
needs space to load the code and the basic overheads of running a
webserver. (this is with built-in Jetty? Not Fuseki in Tomcat?)
I don't understand the graph - what is 120G?
Andy
On 16/04/2020 10:11, Rob Vesse wrote:
I find the implied figures hard to believe, as Lorenz has said you will need to
share your findings via some other service since this mailing list does not
permit attachments.
Many people use Fuseki and TDB to host datasets in the hundreds of millions (if
not billions) of triples in production environments e.g. much of UK Open Data
from govt agencies is backed by Fuseki/TDB in one form or another. Also the
memory usage of Fuseki/TDB cannot realistically be reduced to something as
crude as MB/triples because the memory management going on within the JVM and
TDB is far more complicated than that, see my previous reply to your earlier
questions [1]
Rob
[1]
https://lists.apache.org/thread.html/rf76be4fba2d9679f346dd7482d9925293eb768bbedce3feff7bb4376%40%3Cusers.jena.apache.org%3E
On 16/04/2020, 08:47, "Lorenz Buehmann" <buehm...@informatik.uni-leipzig.de>
wrote:
No attachments possible on this mailing list. Use some external service
to share attachments please or try to embed it as image (in case it's
just an image) as you did in your other thread. Or just use Gist
On 16.04.20 09:27, Luís Moreira de Sousa wrote:
> Dear all,
>
> I have been tweaking the tdb.node2nodeid_cache_size and
> tdb.nodeid2node_cache_size parameters as Andy suggested. They indeed
reduce the RAM used by Fuseki, but not to a point where it becomes usable. In
attachment you can find a chart plotting memory use increase against dataset size.
There is no visible correlation, but on average each additional triplet requires
upwards of 30 MB of RAM.
>
> The actual datasets I work with count triplets in the millions (from
relational databases with tens of thousands of records). Even if I ever convince a
data centre to provide the required amounts of RAM to a single container, the
costs will be prohibitive.
>
> Can anyone provide their experiences with Fuseki in production?
Particularly in micro-services/containerised platforms?
>
> Thank you.
>
> --
> Luís
>
>