Re: Memory management with Fuseki

Lorenz Buehmann Thu, 16 Apr 2020 03:33:02 -0700

The TE said

> In attachment you can find a chart plotting memory use increase against 
> dataset size. There is no visible correlation, but on average each additional 
> triplet requires upwards of 30 MB of RAM.
but those numbers can't be correct ...


The y axis denotes the memory consumption in GB? Sure? Not MB?

And x axis the number of triples? Is it logarithmic scale or are those
really just 20 000 triples? In general, why would somebody benchmark
those small datasets?

Also, why should the lowest number of triples (~10 000) consume 120GB -
looks weird. At least not what I think would happen with Fuseki alone.
Nor any other tool. If that'S true, something else is consuming memory
or there is some leak.

Again, MB vs  GB?

And how did you estimate "30MB per triple"? I can't believe this.

Please show code of your experiments as well as details of the
experimental setup.

On 16.04.20 12:16, Andy Seaborne wrote:
> What do we know so far?
>
> 1/ 6 datasets, uptime 20Mb each (file size? format? Compressed?
> Inference?)
>
> (is that datasets or graphs?)
>
> 2/ At 1G the system kills processes.
>
> What we don't know:
>
> A/ Heap size
>
> B/ Machine RAM size - TDB uses memory mapped so this matters. It also
> means the process size may look large (the database files are mapped)
> but this is virtual memory, not real RAM.
>
> C/ Why 1G? That is pretty small for general purpose Java program. Java
> needs space to load the code and the basic overheads of running a
> webserver. (this is with built-in Jetty? Not Fuseki in Tomcat?)
>
> I don't understand the graph - what is 120G?
>
>     Andy
>
> On 16/04/2020 10:11, Rob Vesse wrote:
>> I find the implied figures hard to believe, as Lorenz has said you
>> will need to share your findings via some other service since this
>> mailing list does not permit attachments.
>>
>> Many people use Fuseki and TDB to host datasets in the hundreds of
>> millions (if not billions) of triples in production environments e.g.
>> much of UK Open Data from govt agencies is backed by Fuseki/TDB in
>> one form or another.  Also the memory usage of Fuseki/TDB cannot
>> realistically be reduced to something as crude as MB/triples because
>> the memory management going on within the JVM and TDB is far more
>> complicated than that, see my previous reply to your earlier
>> questions [1]
>>
>> Rob
>>
>> [1]
>> https://lists.apache.org/thread.html/rf76be4fba2d9679f346dd7482d9925293eb768bbedce3feff7bb4376%40%3Cusers.jena.apache.org%3E
>>
>> On 16/04/2020, 08:47, "Lorenz Buehmann"
>> <buehm...@informatik.uni-leipzig.de> wrote:
>>
>>      No attachments possible on this mailing list. Use some external
>> service
>>      to share attachments please or try to embed it as image (in case
>> it's
>>      just an image) as you did in your other thread. Or just use Gist
>>           On 16.04.20 09:27, Luís Moreira de Sousa wrote:
>>      > Dear all,
>>      >
>>      > I have been tweaking the tdb.node2nodeid_cache_size and
>>      > tdb.nodeid2node_cache_size parameters as Andy suggested. They
>> indeed reduce the RAM used by Fuseki, but not to a point where it
>> becomes usable. In attachment you can find a chart plotting memory
>> use increase against dataset size. There is no visible correlation,
>> but on average each additional triplet requires upwards of 30 MB of RAM.
>>      >
>>      > The actual datasets I work with count triplets in the millions
>> (from relational databases with tens of thousands of records). Even
>> if I ever convince a data centre to provide the required amounts of
>> RAM to a single container, the costs will be prohibitive.
>>      >
>>      > Can anyone provide their experiences with Fuseki in
>> production? Particularly in micro-services/containerised platforms?
>>      >
>>      > Thank you.
>>      >
>>      > --
>>      > Luís
>>      >
>>      >
>>          
>>
>>
>>

Re: Memory management with Fuseki

Reply via email to