What do we know so far?

1/ 6 datasets, uptime 20Mb each (file size? format? Compressed? Inference?)

(is that datasets or graphs?)

2/ At 1G the system kills processes.

What we don't know:

A/ Heap size

B/ Machine RAM size - TDB uses memory mapped so this matters. It also means the process size may look large (the database files are mapped) but this is virtual memory, not real RAM.

C/ Why 1G? That is pretty small for general purpose Java program. Java needs space to load the code and the basic overheads of running a webserver. (this is with built-in Jetty? Not Fuseki in Tomcat?)

I don't understand the graph - what is 120G?

    Andy

On 16/04/2020 10:11, Rob Vesse wrote:
I find the implied figures hard to believe, as Lorenz has said you will need to 
share your findings via some other service since this mailing list does not 
permit attachments.

Many people use Fuseki and TDB to host datasets in the hundreds of millions (if 
not billions) of triples in production environments e.g. much of UK Open Data 
from govt agencies is backed by Fuseki/TDB in one form or another.  Also the 
memory usage of Fuseki/TDB cannot realistically be reduced to something as 
crude as MB/triples because the memory management going on within the JVM and 
TDB is far more complicated than that, see my previous reply to your earlier 
questions [1]

Rob

[1] 
https://lists.apache.org/thread.html/rf76be4fba2d9679f346dd7482d9925293eb768bbedce3feff7bb4376%40%3Cusers.jena.apache.org%3E

On 16/04/2020, 08:47, "Lorenz Buehmann" <buehm...@informatik.uni-leipzig.de> 
wrote:

     No attachments possible on this mailing list. Use some external service
     to share attachments please or try to embed it as image (in case it's
     just an image) as you did in your other thread. Or just use Gist
On 16.04.20 09:27, Luí­s Moreira de Sousa wrote:
     > Dear all,
     >
     > I have been tweaking the tdb.node2nodeid_cache_size and
     > tdb.nodeid2node_cache_size parameters as Andy suggested. They indeed 
reduce the RAM used by Fuseki, but not to a point where it becomes usable. In 
attachment you can find a chart plotting memory use increase against dataset size. 
There is no visible correlation, but on average each additional triplet requires 
upwards of 30 MB of RAM.
     >
     > The actual datasets I work with count triplets in the millions (from 
relational databases with tens of thousands of records). Even if I ever convince a 
data centre to provide the required amounts of RAM to a single container, the 
costs will be prohibitive.
     >
     > Can anyone provide their experiences with Fuseki in production? 
Particularly in micro-services/containerised platforms?
     >
     > Thank you.
     >
     > --
     > Luís
     >
     >



Reply via email to