Re: Ever-increasing memory usage in Fuseki

2023-11-04 Thread Dave Reynolds

Hi Hugo,

A data size of 20Mt is not big and shouldn't need anything like 30G of 
heap let alone 18G of stack. As others have said you need a much bigger 
container limit if you going pack both those plus the JVM plus direct 
buffers in.


For fuseki in kubernetes at that scale we would typically run with a 
heap more like 2G and a container limit of 4G (assuming TDB1) and no 
need to increase stack defaults. For larger sizes in the 100-200Mt 
region then 3-4G of heap and 8G container limit typically works for us. 
We only get up to needing machine sizes on the 30G range for 600MT and 
very high request rates (millions a day) and even then we keep heap 
relatively small (~12G) to leave space for the OS caching. For TDB1 the 
heap size, once there's enough to work in, is more critical for updates 
than for query and at 20M your updates can't be that big.


The progressive jetty memory leak mentioned in the threads Martynas 
referenced was solved by switching off Jetty's use of direct memory, 
which you can do with configuration files in your container. I don't 
think this is the cause of your problems in any case unless you are 
dealing with a very high request rate.


Hope this helps.
Dave

On 03/11/2023 10:08, Hugo Mills wrote:

Thanks to all who replied. We're trying out all the recommendations (slowly), 
and will update when we've got something to report.

Hugo.

Dr. Hugo Mills
Senior Data Scientist
hugo.mi...@agrimetrics.co.uk


NEWS: Visit our Data Marketplace to explore our agrifood data catalogue.
www.agrimetrics.co.uk




-Original Message-
From: Andy Seaborne 
Sent: Thursday, November 2, 2023 8:40 AM
To: users@jena.apache.org
Subject: Re: Ever-increasing memory usage in Fuseki

[You don't often get email from a...@apache.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Hugo,

On 01/11/2023 19:43, Hugo Mills wrote:

Hi,

We've got an application we've inherited recently which uses a Fuseki
database. It was originally Fuseki 3.4.0, and has been upgraded to
4.9.0 recently. The 3.4.0 server needed regular restarts (once a day)
in order to keep working; the 4.9.0 server is even more unreliable,
and has been running out of memory and being OOM-killed multiple times
a day. This afternoon, it crashed enough times, fast enough, to make
Kubernetes go into a back-off loop, and brought the app down for some time.

We're using OpenJDK 19. The JVM options are: "-Xmx:30g -Xms18g", and
the container we're running it in has a memory limit of 31 GiB.


Setting Xmx close to the container limit can cause problems.

The JVM itself takes space and the operating system needs space.
The JVM itself has a ~1G extra space for direct memory which networking uses.

The Java heap will almost certainly grow to reach Xmx at some point because 
Java delays running full garbage collections. The occasional drops you see are 
likely incremental garbage collections happening.

If Xxm is very close to container limit, the heap will naturally grow (it does 
not know about the container limit),  then the total in-use memory for the 
machine is reached and the container is killed.

30G heap looks like a very tight setting. Is there anything customized running 
in Fuseki? is the server dedicated to Fuseki?

As Conal mentioned, TDB used memory mapped files - these are not part of the 
heap. They are part of the OS virtual memory.

Is this a single database?
One TDB database needs about 4G RAM of heap space. Try a setting of -Xmx4G.

Only if you have a high proportion of very large literals will that setting not 
work.

More is not better from TDB's point of view.  Space for memory mapped files is 
handled elsewhere, and that space that will expand and contract as needed. If 
that space is squeezed out the system will slow down.


We tried the
"-XX:+UserSerialGC" option this evening, but it didn't seem to help
much. We see the RAM usage of the java process rising steadily as
queries are made, with occasional small, but insufficient, drops.



The store is somewhere around 20M triples in size.


Is this a TDB database or in-memory? (I'm guessing TDB but could you confirm 
that.)

Query processing can lead to a lot of memory use if the queries are inefficient 
and there is a high, overlapping query load.

What is the query load on the server? Are there many overlapping requests?


Could anyone suggest any tweaks or options we could do to make this
more stable, and not leak memory? We've downgraded to 3.4.0 again, and
it's not running out of space every few minutes at least, but it still
has an ever-growing memory usage.

Thanks,

Hugo.

*Dr. Hugo Mills*

Senior Data Scientist

hugo.mi...@agrimetrics.co.uk <mailto:hugo.mi...@agrimetrics.co.uk>


RE: Ever-increasing memory usage in Fuseki

2023-11-03 Thread Hugo Mills
Thanks to all who replied. We're trying out all the recommendations (slowly), 
and will update when we've got something to report.

Hugo.

Dr. Hugo Mills
Senior Data Scientist
hugo.mi...@agrimetrics.co.uk 


NEWS: Visit our Data Marketplace to explore our agrifood data catalogue.
www.agrimetrics.co.uk




-Original Message-
From: Andy Seaborne  
Sent: Thursday, November 2, 2023 8:40 AM
To: users@jena.apache.org
Subject: Re: Ever-increasing memory usage in Fuseki

[You don't often get email from a...@apache.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Hugo,

On 01/11/2023 19:43, Hugo Mills wrote:
> Hi,
>
> We've got an application we've inherited recently which uses a Fuseki 
> database. It was originally Fuseki 3.4.0, and has been upgraded to 
> 4.9.0 recently. The 3.4.0 server needed regular restarts (once a day) 
> in order to keep working; the 4.9.0 server is even more unreliable, 
> and has been running out of memory and being OOM-killed multiple times 
> a day. This afternoon, it crashed enough times, fast enough, to make 
> Kubernetes go into a back-off loop, and brought the app down for some time.
>
> We're using OpenJDK 19. The JVM options are: "-Xmx:30g -Xms18g", and 
> the container we're running it in has a memory limit of 31 GiB.

Setting Xmx close to the container limit can cause problems.

The JVM itself takes space and the operating system needs space.
The JVM itself has a ~1G extra space for direct memory which networking uses.

The Java heap will almost certainly grow to reach Xmx at some point because 
Java delays running full garbage collections. The occasional drops you see are 
likely incremental garbage collections happening.

If Xxm is very close to container limit, the heap will naturally grow (it does 
not know about the container limit),  then the total in-use memory for the 
machine is reached and the container is killed.

30G heap looks like a very tight setting. Is there anything customized running 
in Fuseki? is the server dedicated to Fuseki?

As Conal mentioned, TDB used memory mapped files - these are not part of the 
heap. They are part of the OS virtual memory.

Is this a single database?
One TDB database needs about 4G RAM of heap space. Try a setting of -Xmx4G.

Only if you have a high proportion of very large literals will that setting not 
work.

More is not better from TDB's point of view.  Space for memory mapped files is 
handled elsewhere, and that space that will expand and contract as needed. If 
that space is squeezed out the system will slow down.

> We tried the
> "-XX:+UserSerialGC" option this evening, but it didn't seem to help 
> much. We see the RAM usage of the java process rising steadily as 
> queries are made, with occasional small, but insufficient, drops.

> The store is somewhere around 20M triples in size.

Is this a TDB database or in-memory? (I'm guessing TDB but could you confirm 
that.)

Query processing can lead to a lot of memory use if the queries are inefficient 
and there is a high, overlapping query load.

What is the query load on the server? Are there many overlapping requests?

> Could anyone suggest any tweaks or options we could do to make this 
> more stable, and not leak memory? We've downgraded to 3.4.0 again, and 
> it's not running out of space every few minutes at least, but it still 
> has an ever-growing memory usage.
>
> Thanks,
>
> Hugo.
>
> *Dr. Hugo Mills*
>
> Senior Data Scientist
>
> hugo.mi...@agrimetrics.co.uk <mailto:hugo.mi...@agrimetrics.co.uk>


Re: Ever-increasing memory usage in Fuseki

2023-11-02 Thread Andy Seaborne

Hi Hugo,

On 01/11/2023 19:43, Hugo Mills wrote:

Hi,

We’ve got an application we’ve inherited recently which uses a Fuseki 
database. It was originally Fuseki 3.4.0, and has been upgraded to 4.9.0 
recently. The 3.4.0 server needed regular restarts (once a day) in order 
to keep working; the 4.9.0 server is even more unreliable, and has been 
running out of memory and being OOM-killed multiple times a day. This 
afternoon, it crashed enough times, fast enough, to make Kubernetes go 
into a back-off loop, and brought the app down for some time.


We’re using OpenJDK 19. The JVM options are: “-Xmx:30g -Xms18g”, and the 
container we’re running it in has a memory limit of 31 GiB.


Setting Xmx close to the container limit can cause problems.

The JVM itself takes space and the operating system needs space.
The JVM itself has a ~1G extra space for direct memory which networking 
uses.


The Java heap will almost certainly grow to reach Xmx at some point 
because Java delays running full garbage collections. The occasional 
drops you see are likely incremental garbage collections happening.


If Xxm is very close to container limit, the heap will naturally grow 
(it does not know about the container limit),  then the total in-use 
memory for the machine is reached and the container is killed.


30G heap looks like a very tight setting. Is there anything customized 
running in Fuseki? is the server dedicated to Fuseki?


As Conal mentioned, TDB used memory mapped files - these are not part of 
the heap. They are part of the OS virtual memory.


Is this a single database?
One TDB database needs about 4G RAM of heap space. Try a setting of -Xmx4G.

Only if you have a high proportion of very large literals will that 
setting not work.


More is not better from TDB's point of view.  Space for memory mapped 
files is handled elsewhere, and that space that will expand and contract 
as needed. If that space is squeezed out the system will slow down.


We tried the 
“-XX:+UserSerialGC” option this evening, but it didn’t seem to help 
much. We see the RAM usage of the java process rising steadily as 
queries are made, with occasional small, but insufficient, drops.



The store is somewhere around 20M triples in size.


Is this a TDB database or in-memory? (I'm guessing TDB but could you 
confirm that.)


Query processing can lead to a lot of memory use if the queries are 
inefficient and there is a high, overlapping query load.


What is the query load on the server? Are there many overlapping requests?

Could anyone suggest any tweaks or options we could do to make this more 
stable, and not leak memory? We’ve downgraded to 3.4.0 again, and it’s 
not running out of space every few minutes at least, but it still has an 
ever-growing memory usage.


Thanks,

Hugo.

*Dr. Hugo Mills*

Senior Data Scientist

hugo.mi...@agrimetrics.co.uk 


Re: Ever-increasing memory usage in Fuseki

2023-11-01 Thread Martynas Jusevičius
There were several long threads about this issue in the past months. I
think the consensus was it's Jetty-related, but I don't know if the issue
is addressed.

https://lists.apache.org/thread/31ytzp3p2zg3gcsm86t1xlh4nsmdcfkc
https://lists.apache.org/thread/b64trj1c9n9rt0xjowqt4j23h9cy3v4c
https://lists.apache.org/thread/m7ypdsndjosxmdsxp9ch437305qw9mwd

On Wed, Nov 1, 2023 at 8:43 PM Hugo Mills 
wrote:

> Hi,
>
>
>
> We’ve got an application we’ve inherited recently which uses a Fuseki
> database. It was originally Fuseki 3.4.0, and has been upgraded to 4.9.0
> recently. The 3.4.0 server needed regular restarts (once a day) in order to
> keep working; the 4.9.0 server is even more unreliable, and has been
> running out of memory and being OOM-killed multiple times a day. This
> afternoon, it crashed enough times, fast enough, to make Kubernetes go into
> a back-off loop, and brought the app down for some time.
>
>
>
> We’re using OpenJDK 19. The JVM options are: “-Xmx:30g -Xms18g”, and the
> container we’re running it in has a memory limit of 31 GiB. We tried the
> “-XX:+UserSerialGC” option this evening, but it didn’t seem to help much.
> We see the RAM usage of the java process rising steadily as queries are
> made, with occasional small, but insufficient, drops.
>
> The store is somewhere around 20M triples in size.
>
>
>
> Could anyone suggest any tweaks or options we could do to make this more
> stable, and not leak memory? We’ve downgraded to 3.4.0 again, and it’s not
> running out of space every few minutes at least, but it still has an
> ever-growing memory usage.
>
>
>
> Thanks,
>
> Hugo.
>
>
>
> *Dr. Hugo Mills*
>
> Senior Data Scientist
>
> hugo.mi...@agrimetrics.co.uk
>
>
> [image: Text Description automatically generated]
>
> *NEWS: *Visit our Data Marketplace
>  to explore our agrifood data
> catalogue.
>
> www.agrimetrics.co.uk
> [image: Icon Description automatically
> generated] [image: Icon
> Description automatically generated with medium confidence]
> 
>
>
>
>
>
>
>


Re: Ever-increasing memory usage in Fuseki

2023-11-01 Thread Conal McLaughlin
Hey Hugo,

Hard to say without more information but we have experienced containers
being killed like you described and solved it by setting
-XX:MaxDirectMemorySize.

Jena/TDB storage makes use of memory mapped files
https://jena.apache.org/documentation/tdb/store-parameters.html#file-access---mapped-and-direct-modes
by default so this has an effect on overall memory usage inside the
container that is not immediately apparent when observing JVM stats.

Conal
IOTICS


On Wed, 1 Nov 2023, 19:43 Hugo Mills,  wrote:

> Hi,
>
>
>
> We’ve got an application we’ve inherited recently which uses a Fuseki
> database. It was originally Fuseki 3.4.0, and has been upgraded to 4.9.0
> recently. The 3.4.0 server needed regular restarts (once a day) in order to
> keep working; the 4.9.0 server is even more unreliable, and has been
> running out of memory and being OOM-killed multiple times a day. This
> afternoon, it crashed enough times, fast enough, to make Kubernetes go into
> a back-off loop, and brought the app down for some time.
>
>
>
> We’re using OpenJDK 19. The JVM options are: “-Xmx:30g -Xms18g”, and the
> container we’re running it in has a memory limit of 31 GiB. We tried the
> “-XX:+UserSerialGC” option this evening, but it didn’t seem to help much.
> We see the RAM usage of the java process rising steadily as queries are
> made, with occasional small, but insufficient, drops.
>
> The store is somewhere around 20M triples in size.
>
>
>
> Could anyone suggest any tweaks or options we could do to make this more
> stable, and not leak memory? We’ve downgraded to 3.4.0 again, and it’s not
> running out of space every few minutes at least, but it still has an
> ever-growing memory usage.
>
>
>
> Thanks,
>
> Hugo.
>
>
>
> *Dr. Hugo Mills*
>
> Senior Data Scientist
>
> hugo.mi...@agrimetrics.co.uk
>
>
> [image: Text Description automatically generated]
>
> *NEWS: *Visit our Data Marketplace
>  to explore our agrifood data
> catalogue.
>
> www.agrimetrics.co.uk
> [image: Icon Description automatically
> generated] [image: Icon
> Description automatically generated with medium confidence]
> 
>
>
>
>
>
>
>


Ever-increasing memory usage in Fuseki

2023-11-01 Thread Hugo Mills
Hi,

We've got an application we've inherited recently which uses a Fuseki database. 
It was originally Fuseki 3.4.0, and has been upgraded to 4.9.0 recently. The 
3.4.0 server needed regular restarts (once a day) in order to keep working; the 
4.9.0 server is even more unreliable, and has been running out of memory and 
being OOM-killed multiple times a day. This afternoon, it crashed enough times, 
fast enough, to make Kubernetes go into a back-off loop, and brought the app 
down for some time.

We're using OpenJDK 19. The JVM options are: "-Xmx:30g -Xms18g", and the 
container we're running it in has a memory limit of 31 GiB. We tried the 
"-XX:+UserSerialGC" option this evening, but it didn't seem to help much. We 
see the RAM usage of the java process rising steadily as queries are made, with 
occasional small, but insufficient, drops.
The store is somewhere around 20M triples in size.

Could anyone suggest any tweaks or options we could do to make this more 
stable, and not leak memory? We've downgraded to 3.4.0 again, and it's not 
running out of space every few minutes at least, but it still has an 
ever-growing memory usage.

Thanks,
Hugo.

Dr. Hugo Mills
Senior Data Scientist
hugo.mi...@agrimetrics.co.uk

[Text  Description automatically generated]
NEWS: Visit our Data Marketplace to 
explore our agrifood data catalogue.
www.agrimetrics.co.uk
[Icon  Description automatically 
generated][Icon  Description 
automatically generated with medium confidence]