Re: Mystery memory leak in fuseki

2023-09-01 Thread Rob @ DNR
Yes and No

The embedded server mode of operation for Fuseki is dependent on Jetty.  But 
the real core of Fuseki is actually just plain Java Servlets and Filter’s and 
Fuseki’s own dynamic dispatch code.

FWIW I am also a big fan of JAX-RS but moving to JAX-RS would probably be a 
much more substantiative rewrite.  This would need to be done carefully to 
support Fuseki’s dynamic configuration model but I think it is possible, not 
sure it’s in-scope for Jena 5 timeframe though

Rob

From: Martynas Jusevičius 
Date: Thursday, 31 August 2023 at 19:35
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Does Fuseki have direct code dependency on Jetty? Or would it be possible
to try switching to a different servlet container such as Tomcat?

JAX-RS, which I’ve advocated here multiple times, provides such a
higher-level abstraction above servlets that would enable easy switching.

On Fri, 25 Aug 2023 at 16.18, Dave Reynolds 
wrote:

> On 25/08/2023 11:44, Andy Seaborne wrote:
> >
> >
> > On 03/07/2023 14:20, Dave Reynolds wrote:
> >> We have a very strange problem with recent fuseki versions when
> >> running (in docker containers) on small machines. Suspect a jetty
> >> issue but it's not clear.
> >
> >  From the threads here, it does seem to be Jetty related.
>
> Yes.
>
> We've followed up on Rob's suggestions for tuning the jetty settings so
> we can use a stock fuseki. On 4.9.0 if we switch off direct buffer using
> in jetty altogether the problem does seem to go away. The performance
> hit we see is small and barely above noise.
>
> We currently have a soak test of leaving direct buffers on but limiting
> max and retained levels, that looks promising but too early to be sure.
>
> > I haven't managed to reproduce the situation on my machine in any sort
> > of predictable way where I can look at what's going on.
>
> Understood. While we can reproduce some effects in desktop test set ups
> the only real test has been to leave configurations running for days at
> a time in the real dev setting with all it's monitoring and
> instrumentation. Which makes testing any changes very painful, let alone
> deeper investigations.
>
> > For Jena5, there will be a switch to a Jetty to use uses jakarta.*
> > packages. That's no more than a rename of imports. The migration
> > EE8->EE9 is only repackaging.  That's Jetty10->Jetty11.
> >
> > There is now Jetty12. It is a major re-architecture of Jetty including
> > it's network handling for better HTTP/2 and HTTP/3.
> >
> > If there has been some behaviour of Jetty involved in the memory growth,
> > it is quite unlikely to carried over to Jetty12.
> >
> > Jetty12 is not a simple switch of artifacts for Fuseki. APIs have
> > changed but it's a step that going to be needed sometime.
> >
> > If it does not turn out that Fuseki needs a major re-architecture, I
> > think that Jena5 should be based on Jetty12. So far, it looks doable.
>
> Sound promising. Agreed that jetty12 is enough of a new build it's
> unlikely to have the same behaviour.
>
> We've being testing some of our troublesome queries on 4.9.0 on java 11
> vs java 17 and see a 10-15% performance hit on java 17 (even after we
> take control of the GC by forcing both to use the old parallel GC
> instead of G1). No idea why, seems wrong! Makes us inclined to stick
> with java 11 and thus jena 4.x series as long as we can.
>
> Dave
>
>


Re: Mystery memory leak in fuseki

2023-08-31 Thread Martynas Jusevičius
Does Fuseki have direct code dependency on Jetty? Or would it be possible
to try switching to a different servlet container such as Tomcat?

JAX-RS, which I’ve advocated here multiple times, provides such a
higher-level abstraction above servlets that would enable easy switching.

On Fri, 25 Aug 2023 at 16.18, Dave Reynolds 
wrote:

> On 25/08/2023 11:44, Andy Seaborne wrote:
> >
> >
> > On 03/07/2023 14:20, Dave Reynolds wrote:
> >> We have a very strange problem with recent fuseki versions when
> >> running (in docker containers) on small machines. Suspect a jetty
> >> issue but it's not clear.
> >
> >  From the threads here, it does seem to be Jetty related.
>
> Yes.
>
> We've followed up on Rob's suggestions for tuning the jetty settings so
> we can use a stock fuseki. On 4.9.0 if we switch off direct buffer using
> in jetty altogether the problem does seem to go away. The performance
> hit we see is small and barely above noise.
>
> We currently have a soak test of leaving direct buffers on but limiting
> max and retained levels, that looks promising but too early to be sure.
>
> > I haven't managed to reproduce the situation on my machine in any sort
> > of predictable way where I can look at what's going on.
>
> Understood. While we can reproduce some effects in desktop test set ups
> the only real test has been to leave configurations running for days at
> a time in the real dev setting with all it's monitoring and
> instrumentation. Which makes testing any changes very painful, let alone
> deeper investigations.
>
> > For Jena5, there will be a switch to a Jetty to use uses jakarta.*
> > packages. That's no more than a rename of imports. The migration
> > EE8->EE9 is only repackaging.  That's Jetty10->Jetty11.
> >
> > There is now Jetty12. It is a major re-architecture of Jetty including
> > it's network handling for better HTTP/2 and HTTP/3.
> >
> > If there has been some behaviour of Jetty involved in the memory growth,
> > it is quite unlikely to carried over to Jetty12.
> >
> > Jetty12 is not a simple switch of artifacts for Fuseki. APIs have
> > changed but it's a step that going to be needed sometime.
> >
> > If it does not turn out that Fuseki needs a major re-architecture, I
> > think that Jena5 should be based on Jetty12. So far, it looks doable.
>
> Sound promising. Agreed that jetty12 is enough of a new build it's
> unlikely to have the same behaviour.
>
> We've being testing some of our troublesome queries on 4.9.0 on java 11
> vs java 17 and see a 10-15% performance hit on java 17 (even after we
> take control of the GC by forcing both to use the old parallel GC
> instead of G1). No idea why, seems wrong! Makes us inclined to stick
> with java 11 and thus jena 4.x series as long as we can.
>
> Dave
>
>


Re: Mystery memory leak in fuseki

2023-08-25 Thread Dave Reynolds

On 25/08/2023 11:44, Andy Seaborne wrote:



On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


 From the threads here, it does seem to be Jetty related.


Yes.

We've followed up on Rob's suggestions for tuning the jetty settings so 
we can use a stock fuseki. On 4.9.0 if we switch off direct buffer using 
in jetty altogether the problem does seem to go away. The performance 
hit we see is small and barely above noise.


We currently have a soak test of leaving direct buffers on but limiting 
max and retained levels, that looks promising but too early to be sure.


I haven't managed to reproduce the situation on my machine in any sort 
of predictable way where I can look at what's going on.


Understood. While we can reproduce some effects in desktop test set ups 
the only real test has been to leave configurations running for days at 
a time in the real dev setting with all it's monitoring and 
instrumentation. Which makes testing any changes very painful, let alone 
deeper investigations.


For Jena5, there will be a switch to a Jetty to use uses jakarta.* 
packages. That's no more than a rename of imports. The migration 
EE8->EE9 is only repackaging.  That's Jetty10->Jetty11.


There is now Jetty12. It is a major re-architecture of Jetty including 
it's network handling for better HTTP/2 and HTTP/3.


If there has been some behaviour of Jetty involved in the memory growth, 
it is quite unlikely to carried over to Jetty12.


Jetty12 is not a simple switch of artifacts for Fuseki. APIs have 
changed but it's a step that going to be needed sometime.


If it does not turn out that Fuseki needs a major re-architecture, I 
think that Jena5 should be based on Jetty12. So far, it looks doable.


Sound promising. Agreed that jetty12 is enough of a new build it's 
unlikely to have the same behaviour.


We've being testing some of our troublesome queries on 4.9.0 on java 11 
vs java 17 and see a 10-15% performance hit on java 17 (even after we 
take control of the GC by forcing both to use the old parallel GC 
instead of G1). No idea why, seems wrong! Makes us inclined to stick 
with java 11 and thus jena 4.x series as long as we can.


Dave



Re: Mystery memory leak in fuseki

2023-08-25 Thread Andy Seaborne




On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when running 
(in docker containers) on small machines. Suspect a jetty issue but it's 
not clear.


From the threads here, it does seem to be Jetty related.

I haven't managed to reproduce the situation on my machine in any sort 
of predictable way where I can look at what's going on.



For Jena5, there will be a switch to a Jetty to use uses jakarta.* 
packages. That's no more than a rename of imports. The migration 
EE8->EE9 is only repackaging.  That's Jetty10->Jetty11.


There is now Jetty12. It is a major re-architecture of Jetty including 
it's network handling for better HTTP/2 and HTTP/3.


If there has been some behaviour of Jetty involved in the memory growth, 
it is quite unlikely to carried over to Jetty12.


Jetty12 is not a simple switch of artifacts for Fuseki. APIs have 
changed but it's a step that going to be needed sometime.


If it does not turn out that Fuseki needs a major re-architecture, I 
think that Jena5 should be based on Jetty12. So far, it looks doable.


Andy


Re: Re: Mystery memory leak in fuseki

2023-07-21 Thread Martynas Jusevičius
ng that should be
> explored.
> > >  >
> > >  > Dave
> > >  >
> > >  >
> > >  > On 11/07/2023 09:45, Rob @ DNR wrote:
> > >  > > Dave
> > >  > >
> > >  > > Thanks for the further information.
> > >  > >
> > >  > > Have you experimented with using Jetty 10 but providing more
> > > detailed configuration?Fuseki supports providing detailed Jetty
> > > configuration if needed via the --jetty-config option
> > >  > >
> > >  > > The following section look relevant:
> > >  > >
> > >  > >
> > >
> https://eclipse.dev/jetty/documentation/jetty-10/operations-guide/index.html#og-module-bytebufferpool
> > >  > >
> > >  > > It looks like the default is that Jetty uses a heuristic to
> > > determine these values, sadly the heuristic in question is not
> detailed
> > > in that documentation.
> > >  > >
> > >  > > Best guess from digging through their code is that the
> “heuristic”
> > > is this:
> > >  > >
> > >  > >
> > >
> https://github.com/eclipse/jetty.project/blob/jetty-10.0.x/jetty-io/src/main/java/org/eclipse/jetty/io/AbstractByteBufferPool.java#L78-L84
> > >  > >
> > >  > > i.e., ¼ of the configured max heap size.This doesn’t necessarily
> > > align with the exact sizes of process growth you see but I note the
> > > documentation does explicitly say that buffers used can go beyond
> these
> > > limits but that those will just be GC’d rather than pooled for reuse.
> > >  > >
> > >  > > Example byte buffer configuration at
> > >
> https://github.com/eclipse/jetty.project/blob/9a05c75ad28ebad4abbe624fa432664c59763747/jetty-server/src/main/config/etc/jetty-bytebufferpool.xml#L4
> > >  > >
> > >  > > Any chance you could try customising this for your needs with
> stock
> > > Fuseki and see if this allows you to make the process size smaller and
> > > sufficiently predictable for your use case?
> > >  > >
> > >  > > Rob
> > >  > >
> > >  > > From: Dave Reynolds 
> > >  > > Date: Tuesday, 11 July 2023 at 08:58
> > >  > > To: users@jena.apache.org 
> > >  > > Subject: Re: Mystery memory leak in fuseki
> > >  > > For interest[*] ...
> > >  > >
> > >  > > This is what the core JVM metrics look like when transitioning
> from a
> > >  > > Jetty10 to a Jetty9.4 instance. You can see the direct buffer
> > > cycling up
> > >  > > to 500MB (which happens to be the max heap setting) on Jetty 10,
> > > nothing
> > >  > > on Jetty 9. The drop in Mapped buffers is just because TDB hadn't
> been
> > >  > > asked any queries yet.
> > >  > >
> > >  > >
> > >
> https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0
> > >  > >
> > >  > > Here' the same metrics around the time of triggering a TDB
> backup.
> > > Shows
> > >  > > the mapped buffer use for TDB but no significant impact on heap
> etc.
> > >  > >
> > >  > >
> > >
> https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0
> > >  > >
> > >  > > These are all on the same instance as the RES memory trace:
> > >  > >
> > >  > >
> > >
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
> > >  > >
> > >  > > Dave
> > >  > >
> > >  > > [*] I've been staring and metric graphs for so many days I may
> have a
> > >  > > distorted notion of what's interesting :)
> > >  > >
> > >  > > On 11/07/2023 08:39, Dave Reynolds wrote:
> > >  > >> After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
> > >  > >> production, containerized, environment then it is indeed very
> stable.
> > >  > >>
> > >  > >> Running at less that 6% of memory on 4GB machine compared to
> peaks of
> > >  > >> ~50% for versions with Jetty 10. RES shows as 240K with 35K
> shared
> > >  > >> (presume mostly libraries).
> > >  > >>
> > >  > 

RE: Re: Mystery memory leak in fuseki

2023-07-20 Thread Conal McLaughlin
 default is that Jetty uses a heuristic to 
> > determine these values, sadly the heuristic in question is not detailed 
> > in that documentation.
> >  > >
> >  > > Best guess from digging through their code is that the “heuristic” 
> > is this:
> >  > >
> >  > > 
> > https://github.com/eclipse/jetty.project/blob/jetty-10.0.x/jetty-io/src/main/java/org/eclipse/jetty/io/AbstractByteBufferPool.java#L78-L84
> >  > >
> >  > > i.e., ¼ of the configured max heap size.This doesn’t necessarily 
> > align with the exact sizes of process growth you see but I note the 
> > documentation does explicitly say that buffers used can go beyond these 
> > limits but that those will just be GC’d rather than pooled for reuse.
> >  > >
> >  > > Example byte buffer configuration at 
> > https://github.com/eclipse/jetty.project/blob/9a05c75ad28ebad4abbe624fa432664c59763747/jetty-server/src/main/config/etc/jetty-bytebufferpool.xml#L4
> >  > >
> >  > > Any chance you could try customising this for your needs with stock 
> > Fuseki and see if this allows you to make the process size smaller and 
> > sufficiently predictable for your use case?
> >  > >
> >  > > Rob
> >  > >
> >  > > From: Dave Reynolds 
> >  > > Date: Tuesday, 11 July 2023 at 08:58
> >  > > To: users@jena.apache.org 
> >  > > Subject: Re: Mystery memory leak in fuseki
> >  > > For interest[*] ...
> >  > >
> >  > > This is what the core JVM metrics look like when transitioning from a
> >  > > Jetty10 to a Jetty9.4 instance. You can see the direct buffer 
> > cycling up
> >  > > to 500MB (which happens to be the max heap setting) on Jetty 10, 
> > nothing
> >  > > on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
> >  > > asked any queries yet.
> >  > >
> >  > > 
> > https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0
> >  > >
> >  > > Here' the same metrics around the time of triggering a TDB backup. 
> > Shows
> >  > > the mapped buffer use for TDB but no significant impact on heap etc.
> >  > >
> >  > > 
> > https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0
> >  > >
> >  > > These are all on the same instance as the RES memory trace:
> >  > >
> >  > > 
> > https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
> >  > >
> >  > > Dave
> >  > >
> >  > > [*] I've been staring and metric graphs for so many days I may have a
> >  > > distorted notion of what's interesting :)
> >  > >
> >  > > On 11/07/2023 08:39, Dave Reynolds wrote:
> >  > >> After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
> >  > >> production, containerized, environment then it is indeed very stable.
> >  > >>
> >  > >> Running at less that 6% of memory on 4GB machine compared to peaks of
> >  > >> ~50% for versions with Jetty 10. RES shows as 240K with 35K shared
> >  > >> (presume mostly libraries).
> >  > >>
> >  > >> Copy of trace is:
> >  > >> 
> > https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
> >  > >>
> >  > >> The high spikes on left of image are the prior run on with out of the
> >  > >> box 4.7.0 on same JVM.
> >  > >>
> >  > >> The small spike at 06:00 is a dump so TDB was able to touch and 
> > scan all
> >  > >> the (modest) data with very minor blip in resident size (as you'd 
> > hope).
> >  > >> JVM stats show the mapped buffers for TDB jumping up but confirm 
> > heap is
> >  > >> stable at < 60M, non-heap 60M.
> >  > >>
> >  > >> Dave
> >  > >>
> >  > >> On 10/07/2023 20:52, Dave Reynolds wrote:
> >  > >>> Since this thread has got complex, I'm posting this update here 
> > at the
> >  > >>> top level.
> >  > >>>
> >  > >>> Thanks to folks, especially Andy and Rob for suggestions and for
> >  > >>> investigating.
> >  > >>>
> >  > >>> Afte

Re: Mystery memory leak in fuseki

2023-07-19 Thread Andy Seaborne

Conal,

Thanks for the information.
Can you see if metaspace is growing as well?

All,

Could someone please try running Fuseki main, with no datasets (--empty) 
with some healthcheck ping traffic.


Andy

On 19/07/2023 14:42, Conal McLaughlin wrote:

Hey Dave,

Thank you for providing an in depth analysis of your issues.
We appear to be witnessing the same type of problems with our current 
Fuseki deployment.
We are deploying a containerised Fuseki into a AWS ECS task alongside 
other containers - this may not be ideal but that’s a different story.


I just wanted to add another data point to everything you have described.
Firstly, it does seem like “idle” (or very low traffic) instances are 
the problem, for us (coupled with a larger heap than necessary).
We witness the same increase in the ECS task memory consumption up until 
the whole thing is killed off. Which includes the Fuseki container.


In an attempt to see what was going on beneath the hood, we turned up 
the logging to TRACE in the log4j2.xml file provided to Fuseki.

This appeared to stabilise the increasing memory consumption.
Even just switching the `logger.jetty.level` to TRACE alleviates the issue.


Colour me confused!

A Log4j logger that is active will use a few objects - may that's enough 
to trigger a minor GC which in turn is enough to flush some non-heap 
resources.


How big is the heap?
This is Java17?

We are testing this on Fuseki 4.8.0/TDB2 with close to 0 triples and 
extremely low query traffic / health checks via /ping.

KPk7uhH2F9Lp.png
ecs-task-memory - Image on Pasteboard 
<https://pasteboard.co/KPk7uhH2F9Lp.png>

pasteboard.co <https://pasteboard.co/KPk7uhH2F9Lp.png>

<https://pasteboard.co/KPk7uhH2F9Lp.png>


Cheers,
Conal

On 2023/07/11 09:31:25 Dave Reynolds wrote:
 > Hi Rob,
 >
 > Good point. Will try to find time to experiment with that but given the
 > testing cycle time that will take a while and can't start immediately.
 >
 > I'm a little sceptical though. As mentioned before, all the metrics we
 > see show the direct memory pool that Jetty uses cycling up the max heap
 > size and then being collected but with no long term growth to match the
 > process size growth. This really feels more like a bug (though not sure
 > where) than tuning. The fact that actual behaviour doesn't match the
 > documentation isn't encouraging.
 >
 > It's also pretty hard to figure what the right pool configuration would
 > be. This thing is just being asked to deliver a few metrics (12KB per
 > request) several times a minute but manages to eat 500MB of direct
 > buffer space every 5mins. So what the right pool parameters are to
 > support real usage peaks is not going to be easy to figure out.
 >
 > None the less you are right. That's something that should be explored.
 >
 > Dave
 >
 >
 > On 11/07/2023 09:45, Rob @ DNR wrote:
 > > Dave
 > >
 > > Thanks for the further information.
 > >
 > > Have you experimented with using Jetty 10 but providing more 
detailed configuration?Fuseki supports providing detailed Jetty 
configuration if needed via the --jetty-config option

 > >
 > > The following section look relevant:
 > >
 > > 
https://eclipse.dev/jetty/documentation/jetty-10/operations-guide/index.html#og-module-bytebufferpool

 > >
 > > It looks like the default is that Jetty uses a heuristic to 
determine these values, sadly the heuristic in question is not detailed 
in that documentation.

 > >
 > > Best guess from digging through their code is that the “heuristic” 
is this:

 > >
 > > 
https://github.com/eclipse/jetty.project/blob/jetty-10.0.x/jetty-io/src/main/java/org/eclipse/jetty/io/AbstractByteBufferPool.java#L78-L84

 > >
 > > i.e., ¼ of the configured max heap size.This doesn’t necessarily 
align with the exact sizes of process growth you see but I note the 
documentation does explicitly say that buffers used can go beyond these 
limits but that those will just be GC’d rather than pooled for reuse.

 > >
 > > Example byte buffer configuration at 
https://github.com/eclipse/jetty.project/blob/9a05c75ad28ebad4abbe624fa432664c59763747/jetty-server/src/main/config/etc/jetty-bytebufferpool.xml#L4

 > >
 > > Any chance you could try customising this for your needs with stock 
Fuseki and see if this allows you to make the process size smaller and 
sufficiently predictable for your use case?

 > >
 > > Rob
 > >
 > > From: Dave Reynolds 
 > > Date: Tuesday, 11 July 2023 at 08:58
 > > To: users@jena.apache.org 
 > > Subject: Re: Mystery memory leak in fuseki
 > > For interest[*] ...
 > >
 > > This is what the core JVM metrics look like when transitioning from a
 > > Jetty10 to a Jetty9.4 instance. You can see the direct buffer 
cycling up
 > &g

Re: Mystery memory leak in fuseki

2023-07-12 Thread Marco Neumann
Thanks Dave, I am not familiar with Prometheus JVM metrics but I gather
it's an open source solution that you have coupled with grafana for
visualization.  I will have a look into this.

Best,
Marco

On Tue, Jul 11, 2023 at 9:32 AM Dave Reynolds 
wrote:

> Hi Marco,
>
> On 11/07/2023 09:04, Marco Neumann wrote:
> > Dave, can you say a bit more about the profiling methodology? Are you
> using
> > a tool such as VisualVM to collect the data? Or do you just use the
> system
> > monitor?
>
> The JVM metrics here are from prometheus scanning the metrics exposed by
> fuseki via the built in micrometer (displayed use grafana). They give a
> *lot* of details on things like GC behaviour etc which I'm not showing.
>
> Ironically the only thing this fuseki was doing when it died originally
> was supporting these metric scans, and the health check ping.
>
> The overall memory curve is picked up by telegraph scanning the OS level
> stats for the docker processes (collected via influx DB and again
> displayed in grafana). These are what you would get with e.g. top on the
> machine or a system monitor but means we have longer term records which
> we access remotely. When I quoted 240K RES, 35K shared that was actually
> just top on the machine.
>
> When running locally can also use things like jconsole or visualVM but
> I actually find the prometheus + telegraph metrics we have in our
> production monitoring more detailed and easier to work with. We run lots
> of services so the monitoring and alerting stack, while all industry
> standard, has been a life saver for us.
>
> For doing the debugging locally I also tried setting the JVM flags to
> enable finer grain native memory tracking and use jcmd (in a scripted
> loop) to pull out those more detailed metrics. Though they are not that
> much more detailed than the micrometer/prometheus metrics.
> That use of jcmd and the caution on how to interpret RES came from the
> blog item I mentioned earlier:
> https://poonamparhar.github.io/troubleshooting_native_memory_leaks/
>
> For the memory leak checking I used valgrind but there's lots of others.
>
> Dave
>
> >
> > Marco
> >
> > On Tue, Jul 11, 2023 at 8:57 AM Dave Reynolds  >
> > wrote:
> >
> >> For interest[*] ...
> >>
> >> This is what the core JVM metrics look like when transitioning from a
> >> Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up
> >> to 500MB (which happens to be the max heap setting) on Jetty 10, nothing
> >> on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
> >> asked any queries yet.
> >>
> >>
> >>
> https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0
> >>
> >> Here' the same metrics around the time of triggering a TDB backup. Shows
> >> the mapped buffer use for TDB but no significant impact on heap etc.
> >>
> >>
> >>
> https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0
> >>
> >> These are all on the same instance as the RES memory trace:
> >>
> >>
> >>
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
> >>
> >> Dave
> >>
> >> [*] I've been staring and metric graphs for so many days I may have a
> >> distorted notion of what's interesting :)
> >>
> >> On 11/07/2023 08:39, Dave Reynolds wrote:
> >>> After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
> >>> production, containerized, environment then it is indeed very stable.
> >>>
> >>> Running at less that 6% of memory on 4GB machine compared to peaks of
> >>> ~50% for versions with Jetty 10. RES shows as 240K with 35K shared
> >>> (presume mostly libraries).
> >>>
> >>> Copy of trace is:
> >>>
> >>
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
> >>>
> >>> The high spikes on left of image are the prior run on with out of the
> >>> box 4.7.0 on same JVM.
> >>>
> >>> The small spike at 06:00 is a dump so TDB was able to touch and scan
> all
> >>> the (modest) data with very minor blip in resident size (as you'd
> hope).
> >>> JVM stats show the mapped buffers for TDB jumping up but confirm heap
> is
> >>> stable at < 60M, non-heap 60M.
> >>>
> >>> Dave
> >>>
> >>> On 10/07/2023 20:52, Dave Reynolds wrote:
>  Since this thread has got complex, I'm posting this update here at the
>  top level.
> 
>  Thanks to folks, especially Andy and Rob for suggestions and for
>  investigating.
> 
>  After a lot more testing at our end I believe we now have some
>  workarounds.
> 
>  First, at least on java 17, the process growth does seem to level out.
>  Despite what I just said to Rob, having just checked our soak tests, a
>  jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days.
>  Process size oscillates between 1.5GB and 2GB but hasn't gone above
>  that in a week. The 

Re: Mystery memory leak in fuseki

2023-07-11 Thread Dave Reynolds

Hi Rob,

Good point. Will try to find time to experiment with that but given the 
testing cycle time that will take a while and can't start immediately.


I'm a little sceptical though. As mentioned before, all the metrics we 
see show the direct memory pool that Jetty uses cycling up the max heap 
size and then being collected but with no long term growth to match the 
process size growth. This really feels more like a bug (though not sure 
where) than tuning. The fact that actual behaviour doesn't match the 
documentation isn't encouraging.


It's also pretty hard to figure what the right pool configuration would 
be. This thing is just being asked to deliver a few metrics (12KB per 
request) several times a minute but manages to eat 500MB of direct 
buffer space every 5mins. So what the right pool parameters are to 
support real usage peaks is not going to be easy to figure out.


None the less you are right. That's something that should be explored.

Dave


On 11/07/2023 09:45, Rob @ DNR wrote:

Dave

Thanks for the further information.

Have you experimented with using Jetty 10 but providing more detailed 
configuration?  Fuseki supports providing detailed Jetty configuration if 
needed via the --jetty-config option

The following section look relevant:

https://eclipse.dev/jetty/documentation/jetty-10/operations-guide/index.html#og-module-bytebufferpool

It looks like the default is that Jetty uses a heuristic to determine these 
values, sadly the heuristic in question is not detailed in that documentation.

Best guess from digging through their code is that the “heuristic” is this:

https://github.com/eclipse/jetty.project/blob/jetty-10.0.x/jetty-io/src/main/java/org/eclipse/jetty/io/AbstractByteBufferPool.java#L78-L84

i.e., ¼ of the configured max heap size.  This doesn’t necessarily align with 
the exact sizes of process growth you see but I note the documentation does 
explicitly say that buffers used can go beyond these limits but that those will 
just be GC’d rather than pooled for reuse.

Example byte buffer configuration at 
https://github.com/eclipse/jetty.project/blob/9a05c75ad28ebad4abbe624fa432664c59763747/jetty-server/src/main/config/etc/jetty-bytebufferpool.xml#L4

Any chance you could try customising this for your needs with stock Fuseki and 
see if this allows you to make the process size smaller and sufficiently 
predictable for your use case?

Rob

From: Dave Reynolds 
Date: Tuesday, 11 July 2023 at 08:58
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
For interest[*] ...

This is what the core JVM metrics look like when transitioning from a
Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up
to 500MB (which happens to be the max heap setting) on Jetty 10, nothing
on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
asked any queries yet.

https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0

Here' the same metrics around the time of triggering a TDB backup. Shows
the mapped buffer use for TDB but no significant impact on heap etc.

https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0

These are all on the same instance as the RES memory trace:

https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0

Dave

[*] I've been staring and metric graphs for so many days I may have a
distorted notion of what's interesting :)

On 11/07/2023 08:39, Dave Reynolds wrote:

After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
production, containerized, environment then it is indeed very stable.

Running at less that 6% of memory on 4GB machine compared to peaks of
~50% for versions with Jetty 10. RES shows as 240K with 35K shared
(presume mostly libraries).

Copy of trace is:
https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0

The high spikes on left of image are the prior run on with out of the
box 4.7.0 on same JVM.

The small spike at 06:00 is a dump so TDB was able to touch and scan all
the (modest) data with very minor blip in resident size (as you'd hope).
JVM stats show the mapped buffers for TDB jumping up but confirm heap is
stable at < 60M, non-heap 60M.

Dave

On 10/07/2023 20:52, Dave Reynolds wrote:

Since this thread has got complex, I'm posting this update here at the
top level.

Thanks to folks, especially Andy and Rob for suggestions and for
investigating.

After a lot more testing at our end I believe we now have some
workarounds.

First, at least on java 17, the process growth does seem to level out.
Despite what I just said to Rob, having just checked our soak tests, a
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days.
Process size oscillates between 1.5GB and 2GB but hasn't gone above
that in a week. The oscillation is almost entir

Re: Mystery memory leak in fuseki

2023-07-11 Thread Rob @ DNR
Dave

Thanks for the further information.

Have you experimented with using Jetty 10 but providing more detailed 
configuration?  Fuseki supports providing detailed Jetty configuration if 
needed via the --jetty-config option

The following section look relevant:

https://eclipse.dev/jetty/documentation/jetty-10/operations-guide/index.html#og-module-bytebufferpool

It looks like the default is that Jetty uses a heuristic to determine these 
values, sadly the heuristic in question is not detailed in that documentation.

Best guess from digging through their code is that the “heuristic” is this:

https://github.com/eclipse/jetty.project/blob/jetty-10.0.x/jetty-io/src/main/java/org/eclipse/jetty/io/AbstractByteBufferPool.java#L78-L84

i.e., ¼ of the configured max heap size.  This doesn’t necessarily align with 
the exact sizes of process growth you see but I note the documentation does 
explicitly say that buffers used can go beyond these limits but that those will 
just be GC’d rather than pooled for reuse.

Example byte buffer configuration at 
https://github.com/eclipse/jetty.project/blob/9a05c75ad28ebad4abbe624fa432664c59763747/jetty-server/src/main/config/etc/jetty-bytebufferpool.xml#L4

Any chance you could try customising this for your needs with stock Fuseki and 
see if this allows you to make the process size smaller and sufficiently 
predictable for your use case?

Rob

From: Dave Reynolds 
Date: Tuesday, 11 July 2023 at 08:58
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
For interest[*] ...

This is what the core JVM metrics look like when transitioning from a
Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up
to 500MB (which happens to be the max heap setting) on Jetty 10, nothing
on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
asked any queries yet.

https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0

Here' the same metrics around the time of triggering a TDB backup. Shows
the mapped buffer use for TDB but no significant impact on heap etc.

https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0

These are all on the same instance as the RES memory trace:

https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0

Dave

[*] I've been staring and metric graphs for so many days I may have a
distorted notion of what's interesting :)

On 11/07/2023 08:39, Dave Reynolds wrote:
> After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
> production, containerized, environment then it is indeed very stable.
>
> Running at less that 6% of memory on 4GB machine compared to peaks of
> ~50% for versions with Jetty 10. RES shows as 240K with 35K shared
> (presume mostly libraries).
>
> Copy of trace is:
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
>
> The high spikes on left of image are the prior run on with out of the
> box 4.7.0 on same JVM.
>
> The small spike at 06:00 is a dump so TDB was able to touch and scan all
> the (modest) data with very minor blip in resident size (as you'd hope).
> JVM stats show the mapped buffers for TDB jumping up but confirm heap is
> stable at < 60M, non-heap 60M.
>
> Dave
>
> On 10/07/2023 20:52, Dave Reynolds wrote:
>> Since this thread has got complex, I'm posting this update here at the
>> top level.
>>
>> Thanks to folks, especially Andy and Rob for suggestions and for
>> investigating.
>>
>> After a lot more testing at our end I believe we now have some
>> workarounds.
>>
>> First, at least on java 17, the process growth does seem to level out.
>> Despite what I just said to Rob, having just checked our soak tests, a
>> jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days.
>> Process size oscillates between 1.5GB and 2GB but hasn't gone above
>> that in a week. The oscillation is almost entirely the cycling of the
>> direct memory buffers used by Jetty. Empirically those cycle up to
>> something comparable to the set max heap size, at least for us.
>>
>> While this week long test was 4.7.0, based on earlier tests I suspect
>> 4.8.0 (and now 4.9.0) would also level out at least on a timescale of
>> days.
>>
>> The key has been setting the max heap low. At 2GB and even 1GB (the
>> default on a 4GB machine) we see higher peak levels of direct buffers
>> and overall process size grew to around 3GB at which point the
>> container is killed on the small machines. Though java 17 does seem to
>> be better behaved that java 11, so switching to that probably also
>> helped.
>>
>> Given the actual he

Re: Mystery memory leak in fuseki

2023-07-11 Thread Dave Reynolds

Hi Marco,

On 11/07/2023 09:04, Marco Neumann wrote:

Dave, can you say a bit more about the profiling methodology? Are you using
a tool such as VisualVM to collect the data? Or do you just use the system
monitor?


The JVM metrics here are from prometheus scanning the metrics exposed by 
fuseki via the built in micrometer (displayed use grafana). They give a 
*lot* of details on things like GC behaviour etc which I'm not showing.


Ironically the only thing this fuseki was doing when it died originally 
was supporting these metric scans, and the health check ping.


The overall memory curve is picked up by telegraph scanning the OS level 
stats for the docker processes (collected via influx DB and again 
displayed in grafana). These are what you would get with e.g. top on the 
machine or a system monitor but means we have longer term records which 
we access remotely. When I quoted 240K RES, 35K shared that was actually 
just top on the machine.


When running locally can also use things like jconsole or visualVM but 
I actually find the prometheus + telegraph metrics we have in our 
production monitoring more detailed and easier to work with. We run lots 
of services so the monitoring and alerting stack, while all industry 
standard, has been a life saver for us.


For doing the debugging locally I also tried setting the JVM flags to 
enable finer grain native memory tracking and use jcmd (in a scripted 
loop) to pull out those more detailed metrics. Though they are not that 
much more detailed than the micrometer/prometheus metrics.
That use of jcmd and the caution on how to interpret RES came from the 
blog item I mentioned earlier:

https://poonamparhar.github.io/troubleshooting_native_memory_leaks/

For the memory leak checking I used valgrind but there's lots of others.

Dave



Marco

On Tue, Jul 11, 2023 at 8:57 AM Dave Reynolds 
wrote:


For interest[*] ...

This is what the core JVM metrics look like when transitioning from a
Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up
to 500MB (which happens to be the max heap setting) on Jetty 10, nothing
on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
asked any queries yet.


https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0

Here' the same metrics around the time of triggering a TDB backup. Shows
the mapped buffer use for TDB but no significant impact on heap etc.


https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0

These are all on the same instance as the RES memory trace:


https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0

Dave

[*] I've been staring and metric graphs for so many days I may have a
distorted notion of what's interesting :)

On 11/07/2023 08:39, Dave Reynolds wrote:

After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
production, containerized, environment then it is indeed very stable.

Running at less that 6% of memory on 4GB machine compared to peaks of
~50% for versions with Jetty 10. RES shows as 240K with 35K shared
(presume mostly libraries).

Copy of trace is:


https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0


The high spikes on left of image are the prior run on with out of the
box 4.7.0 on same JVM.

The small spike at 06:00 is a dump so TDB was able to touch and scan all
the (modest) data with very minor blip in resident size (as you'd hope).
JVM stats show the mapped buffers for TDB jumping up but confirm heap is
stable at < 60M, non-heap 60M.

Dave

On 10/07/2023 20:52, Dave Reynolds wrote:

Since this thread has got complex, I'm posting this update here at the
top level.

Thanks to folks, especially Andy and Rob for suggestions and for
investigating.

After a lot more testing at our end I believe we now have some
workarounds.

First, at least on java 17, the process growth does seem to level out.
Despite what I just said to Rob, having just checked our soak tests, a
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days.
Process size oscillates between 1.5GB and 2GB but hasn't gone above
that in a week. The oscillation is almost entirely the cycling of the
direct memory buffers used by Jetty. Empirically those cycle up to
something comparable to the set max heap size, at least for us.

While this week long test was 4.7.0, based on earlier tests I suspect
4.8.0 (and now 4.9.0) would also level out at least on a timescale of
days.

The key has been setting the max heap low. At 2GB and even 1GB (the
default on a 4GB machine) we see higher peak levels of direct buffers
and overall process size grew to around 3GB at which point the
container is killed on the small machines. Though java 17 does seem to
be better behaved that java 11, so switching to that probably also
helped.

Given the actual 

Re: Mystery memory leak in fuseki

2023-07-11 Thread Marco Neumann
Dave, can you say a bit more about the profiling methodology? Are you using
a tool such as VisualVM to collect the data? Or do you just use the system
monitor?

Marco

On Tue, Jul 11, 2023 at 8:57 AM Dave Reynolds 
wrote:

> For interest[*] ...
>
> This is what the core JVM metrics look like when transitioning from a
> Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up
> to 500MB (which happens to be the max heap setting) on Jetty 10, nothing
> on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
> asked any queries yet.
>
>
> https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0
>
> Here' the same metrics around the time of triggering a TDB backup. Shows
> the mapped buffer use for TDB but no significant impact on heap etc.
>
>
> https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0
>
> These are all on the same instance as the RES memory trace:
>
>
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
>
> Dave
>
> [*] I've been staring and metric graphs for so many days I may have a
> distorted notion of what's interesting :)
>
> On 11/07/2023 08:39, Dave Reynolds wrote:
> > After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
> > production, containerized, environment then it is indeed very stable.
> >
> > Running at less that 6% of memory on 4GB machine compared to peaks of
> > ~50% for versions with Jetty 10. RES shows as 240K with 35K shared
> > (presume mostly libraries).
> >
> > Copy of trace is:
> >
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
> >
> > The high spikes on left of image are the prior run on with out of the
> > box 4.7.0 on same JVM.
> >
> > The small spike at 06:00 is a dump so TDB was able to touch and scan all
> > the (modest) data with very minor blip in resident size (as you'd hope).
> > JVM stats show the mapped buffers for TDB jumping up but confirm heap is
> > stable at < 60M, non-heap 60M.
> >
> > Dave
> >
> > On 10/07/2023 20:52, Dave Reynolds wrote:
> >> Since this thread has got complex, I'm posting this update here at the
> >> top level.
> >>
> >> Thanks to folks, especially Andy and Rob for suggestions and for
> >> investigating.
> >>
> >> After a lot more testing at our end I believe we now have some
> >> workarounds.
> >>
> >> First, at least on java 17, the process growth does seem to level out.
> >> Despite what I just said to Rob, having just checked our soak tests, a
> >> jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days.
> >> Process size oscillates between 1.5GB and 2GB but hasn't gone above
> >> that in a week. The oscillation is almost entirely the cycling of the
> >> direct memory buffers used by Jetty. Empirically those cycle up to
> >> something comparable to the set max heap size, at least for us.
> >>
> >> While this week long test was 4.7.0, based on earlier tests I suspect
> >> 4.8.0 (and now 4.9.0) would also level out at least on a timescale of
> >> days.
> >>
> >> The key has been setting the max heap low. At 2GB and even 1GB (the
> >> default on a 4GB machine) we see higher peak levels of direct buffers
> >> and overall process size grew to around 3GB at which point the
> >> container is killed on the small machines. Though java 17 does seem to
> >> be better behaved that java 11, so switching to that probably also
> >> helped.
> >>
> >> Given the actual heap is low (50MB heap, 60MB non-heap) then needing
> >> 2GB to run in feels high but is workable. So my previously suggested
> >> rule of thumb that, in this low memory regime, allow 4x the max heap
> >> size seems to work.
> >>
> >> Second, we're now pretty confident the issue is jetty 10+.
> >>
> >> We've built a fuseki-server 4.9.0 with Jetty replaced by version
> >> 9.4.51.v20230217. This required some minor source changes to compile
> >> and pass tests. On a local bare metal test where we saw process growth
> >> up to 1.5-2GB this build has run stably using less than 500MB for 4
> >> hours.
> >>
> >> We'll set a longer term test running in the target containerized
> >> environment to confirm things but quite hopeful this will be long term
> >> stable.
> >>
> >> I realise Jetty 9.4.x is out of community support but eclipse say EOL
> >> is "unlikely to happen before 2025". So, while this may not be a
> >> solution for the Jena project, it could give us a workaround at the
> >> cost of doing custom builds.
> >>
> >> Dave
> >>
> >>
> >> On 03/07/2023 14:20, Dave Reynolds wrote:
> >>> We have a very strange problem with recent fuseki versions when
> >>> running (in docker containers) on small machines. Suspect a jetty
> >>> issue but it's not clear.
> >>>
> >>> Wondering if anyone has seen anything like this.
> >>>
> >>> This is a production service but with tiny data (~250k 

Re: Mystery memory leak in fuseki

2023-07-11 Thread Dave Reynolds

For interest[*] ...

This is what the core JVM metrics look like when transitioning from a 
Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up 
to 500MB (which happens to be the max heap setting) on Jetty 10, nothing 
on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been 
asked any queries yet.


https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0

Here' the same metrics around the time of triggering a TDB backup. Shows 
the mapped buffer use for TDB but no significant impact on heap etc.


https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0

These are all on the same instance as the RES memory trace:

https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0

Dave

[*] I've been staring and metric graphs for so many days I may have a 
distorted notion of what's interesting :)


On 11/07/2023 08:39, Dave Reynolds wrote:
After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the 
production, containerized, environment then it is indeed very stable.


Running at less that 6% of memory on 4GB machine compared to peaks of 
~50% for versions with Jetty 10. RES shows as 240K with 35K shared 
(presume mostly libraries).


Copy of trace is: 
https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0


The high spikes on left of image are the prior run on with out of the 
box 4.7.0 on same JVM.


The small spike at 06:00 is a dump so TDB was able to touch and scan all 
the (modest) data with very minor blip in resident size (as you'd hope). 
JVM stats show the mapped buffers for TDB jumping up but confirm heap is 
stable at < 60M, non-heap 60M.


Dave

On 10/07/2023 20:52, Dave Reynolds wrote:
Since this thread has got complex, I'm posting this update here at the 
top level.


Thanks to folks, especially Andy and Rob for suggestions and for 
investigating.


After a lot more testing at our end I believe we now have some 
workarounds.


First, at least on java 17, the process growth does seem to level out. 
Despite what I just said to Rob, having just checked our soak tests, a 
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days. 
Process size oscillates between 1.5GB and 2GB but hasn't gone above 
that in a week. The oscillation is almost entirely the cycling of the 
direct memory buffers used by Jetty. Empirically those cycle up to 
something comparable to the set max heap size, at least for us.


While this week long test was 4.7.0, based on earlier tests I suspect 
4.8.0 (and now 4.9.0) would also level out at least on a timescale of 
days.


The key has been setting the max heap low. At 2GB and even 1GB (the 
default on a 4GB machine) we see higher peak levels of direct buffers 
and overall process size grew to around 3GB at which point the 
container is killed on the small machines. Though java 17 does seem to 
be better behaved that java 11, so switching to that probably also 
helped.


Given the actual heap is low (50MB heap, 60MB non-heap) then needing 
2GB to run in feels high but is workable. So my previously suggested 
rule of thumb that, in this low memory regime, allow 4x the max heap 
size seems to work.


Second, we're now pretty confident the issue is jetty 10+.

We've built a fuseki-server 4.9.0 with Jetty replaced by version 
9.4.51.v20230217. This required some minor source changes to compile 
and pass tests. On a local bare metal test where we saw process growth 
up to 1.5-2GB this build has run stably using less than 500MB for 4 
hours.


We'll set a longer term test running in the target containerized 
environment to confirm things but quite hopeful this will be long term 
stable.


I realise Jetty 9.4.x is out of community support but eclipse say EOL 
is "unlikely to happen before 2025". So, while this may not be a 
solution for the Jena project, it could give us a workaround at the 
cost of doing custom builds.


Dave


On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB 
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of 
a day or so to reach ~3GB of memory at which point the 4GB machine 
becomes unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks 
and (prometheus) metrics scrapes from the monitoring systems.


Furthermore 

Re: Mystery memory leak in fuseki

2023-07-11 Thread Dave Reynolds
After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the 
production, containerized, environment then it is indeed very stable.


Running at less that 6% of memory on 4GB machine compared to peaks of 
~50% for versions with Jetty 10. RES shows as 240K with 35K shared 
(presume mostly libraries).


Copy of trace is: 
https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0


The high spikes on left of image are the prior run on with out of the 
box 4.7.0 on same JVM.


The small spike at 06:00 is a dump so TDB was able to touch and scan all 
the (modest) data with very minor blip in resident size (as you'd hope). 
JVM stats show the mapped buffers for TDB jumping up but confirm heap is 
stable at < 60M, non-heap 60M.


Dave

On 10/07/2023 20:52, Dave Reynolds wrote:
Since this thread has got complex, I'm posting this update here at the 
top level.


Thanks to folks, especially Andy and Rob for suggestions and for 
investigating.


After a lot more testing at our end I believe we now have some workarounds.

First, at least on java 17, the process growth does seem to level out. 
Despite what I just said to Rob, having just checked our soak tests, a 
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days. 
Process size oscillates between 1.5GB and 2GB but hasn't gone above that 
in a week. The oscillation is almost entirely the cycling of the direct 
memory buffers used by Jetty. Empirically those cycle up to something 
comparable to the set max heap size, at least for us.


While this week long test was 4.7.0, based on earlier tests I suspect 
4.8.0 (and now 4.9.0) would also level out at least on a timescale of days.


The key has been setting the max heap low. At 2GB and even 1GB (the 
default on a 4GB machine) we see higher peak levels of direct buffers 
and overall process size grew to around 3GB at which point the container 
is killed on the small machines. Though java 17 does seem to be better 
behaved that java 11, so switching to that probably also helped.


Given the actual heap is low (50MB heap, 60MB non-heap) then needing 2GB 
to run in feels high but is workable. So my previously suggested rule of 
thumb that, in this low memory regime, allow 4x the max heap size seems 
to work.


Second, we're now pretty confident the issue is jetty 10+.

We've built a fuseki-server 4.9.0 with Jetty replaced by version 
9.4.51.v20230217. This required some minor source changes to compile and 
pass tests. On a local bare metal test where we saw process growth up to 
1.5-2GB this build has run stably using less than 500MB for 4 hours.


We'll set a longer term test running in the target containerized 
environment to confirm things but quite hopeful this will be long term 
stable.


I realise Jetty 9.4.x is out of community support but eclipse say EOL is 
"unlikely to happen before 2025". So, while this may not be a solution 
for the Jena project, it could give us a workaround at the cost of doing 
custom builds.


Dave


On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB 
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of 
a day or so to reach ~3GB of memory at which point the 4GB machine 
becomes unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks 
and (prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 
0 and 500MB on approx a 10min cycle but is stable over a period of 
days and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the cycling direct 

Re: Mystery memory leak in fuseki

2023-07-10 Thread Dave Reynolds
Since this thread has got complex, I'm posting this update here at the 
top level.


Thanks to folks, especially Andy and Rob for suggestions and for 
investigating.


After a lot more testing at our end I believe we now have some workarounds.

First, at least on java 17, the process growth does seem to level out. 
Despite what I just said to Rob, having just checked our soak tests, a 
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days. 
Process size oscillates between 1.5GB and 2GB but hasn't gone above that 
in a week. The oscillation is almost entirely the cycling of the direct 
memory buffers used by Jetty. Empirically those cycle up to something 
comparable to the set max heap size, at least for us.


While this week long test was 4.7.0, based on earlier tests I suspect 
4.8.0 (and now 4.9.0) would also level out at least on a timescale of days.


The key has been setting the max heap low. At 2GB and even 1GB (the 
default on a 4GB machine) we see higher peak levels of direct buffers 
and overall process size grew to around 3GB at which point the container 
is killed on the small machines. Though java 17 does seem to be better 
behaved that java 11, so switching to that probably also helped.


Given the actual heap is low (50MB heap, 60MB non-heap) then needing 2GB 
to run in feels high but is workable. So my previously suggested rule of 
thumb that, in this low memory regime, allow 4x the max heap size seems 
to work.


Second, we're now pretty confident the issue is jetty 10+.

We've built a fuseki-server 4.9.0 with Jetty replaced by version 
9.4.51.v20230217. This required some minor source changes to compile and 
pass tests. On a local bare metal test where we saw process growth up to 
1.5-2GB this build has run stably using less than 500MB for 4 hours.


We'll set a longer term test running in the target containerized 
environment to confirm things but quite hopeful this will be long term 
stable.


I realise Jetty 9.4.x is out of community support but eclipse say EOL is 
"unlikely to happen before 2025". So, while this may not be a solution 
for the Jena project, it could give us a workaround at the cost of doing 
custom builds.


Dave


On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when running 
(in docker containers) on small machines. Suspect a jetty issue but it's 
not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB as 
NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a 
day or so to reach ~3GB of memory at which point the 4GB machine becomes 
unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks and 
(prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 0 
and 500MB on approx a 10min cycle but is stable over a period of days 
and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory 
buffers) and then stays stable (at least on a three day soak test).  We 
could live with allocating 1.5GB to a system that should only need a few 
100MB but concerned that it may not be stable in the really long term 
and, in any case, would rather be able to update to more recent fuseki 
versions.


Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then 
keeps ticking up slowly at random intervals. We project that it would 
take a few weeks to grow the scale it did under java 11 but it will 
still eventually kill the machine.


Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries and 
that should still leave plenty of space for OS buffers etc in the 
remaining memory on a 4GB machine.






Re: Mystery memory leak in fuseki

2023-07-10 Thread Dave Reynolds

Hi Rob,

On 10/07/2023 14:05, Rob @ DNR wrote:

Dave

Poked around a bit today but not sure I’ve reproduced anything as such or found 
any smoking guns

I ran a Fuseki instance with the same watch command you showed in your last 
message.  JVM Heap stays essentially static even after hours, there’s some 
minor fluctuation up and down in used heap space but the heap itself doesn’t 
grow at all.  Did this with a couple of different versions of 4.x to see if 
there’s any discernible difference but nothing meaningful showed up.  I also 
used 3.17.0 but again couldn’t reproduce the behaviour you are describing.


I too reported the heap (and non-heap) remain stable so not sure in what 
way the behaviour you are seeing is different. The issue is process size.



For reference I’m on OS X 13.4.1 using OpenJDK 17


For reference I have similar behaviour on the follow combinations:

Containerized:
   Amazon Linux 2, Amazon Corretto 11
   Amazon Linux 2, Amazon Corretto 17
   Amazon Linux 2, Elipse Temurin 17
   Ubuntu 22.04, Elipse Temurin 17

Bare metal:
   Ubuntu 22.04, OpenJdk 11


The process peak memory (for all versions I tested) seems to peak at about 1.5G 
as reported by the vmmap tool.  Ongoing monitoring, i.e., OS X Activity Monitor 
shows the memory usage of the process fluctuating over time, but I don’t ever 
see the unlimited growth that your original report suggested.  Also, I didn’t 
set heap explicitly at all so I’m getting the default max heap of 4GB, and my 
actual heap usage was around 100 MB.


If I set the heap max to 500M the process size growth seems to largely 
level off around 1.5-2GB over the space of hours. So comparable to 
yours. However, it's not clear that it is absolutely stable by then. Our 
original failures only occurred after several days and the graphs for 
24hour tests are noisy enough to not be confident it's reached any 
absolute stability by 2GB.


If I set the heap max to 4GB then the process grows larger and we've 
certainly seen instances where it reached 3GB. Even though the the heap 
size itself is stable (small fluctuations but no trends) and remains 
under 100MB. Not left it going longer than than because that's already 
no good for us.


Note almost all of these tests have been with data in TDB, even though 
not running any queries. If I run a fuseki with just --mem and no data 
loaded at all the growth is slower and that may be the closer to your 
test setup but the growth is still there.



I see from vmmap that most of the memory appears to be virtual memory related 
to the many shared native libraries that the JVM links against which on a real 
OS is often swapped out as it’s not under active usage.


I don't have vmmap available (that's a BSD tool I think) but clearly 
virtual memory is a different matter. I'm only concerned with the 
resident set size.


To check resident size the original graphs showing the issue were based 
on memory % metrics from docker runtime (via prometheus metrics scrape) 
when testing as a container. Testing bare metal then I've used both RSS 
and the so called PSS mentioned in:

https://poonamparhar.github.io/troubleshooting_native_memory_leaks/

PSS didn't show any noticeably different curve than RSS so while RSS can 
be misleading it seems accurate here.



In a container, where swap is likely disabled, that’s obviously more 
problematic as everything occupies memory even if much of it might be for 
native libraries that are never needed by anything Fuseki does.  Again, I don’t 
see how that would lead to the apparently unbounded memory usage you’re 
describing.


Not sure swap is relevant, even on the bare metal there's no swapping 
going on.


Certainly agree that virtual memory space will fill up with mappings for 
native libraries but it's not VSS I'm worried about.



You could try using jlink to build a minimal image where you only have the 
parts of the JDK that you need in the image.  I found the following old Jena 
thread - https://lists.apache.org/thread/dmmkndmy2ds8pf95zvqbcxpv84bj7cz6 - 
which actually describes an apparently similar memory issue but also has an 
example of a Dockerfile linked at the start of the thread that builds just such 
a minimal JRE for Fuseki.


Interesting thought but petty sure at this point the issue is Jetty, and 
that gives us the best work around so far. I'll post separately about that.



Note that I also ran the leaks tool against the long running Fuseki processes 
and that didn’t find anything of note, 5.19KB of memory leaks over a 3.5 hr run 
so no smoking gun there.


Agreed, we've also run leak testers but didn't find any issue and didn't 
expect to. As we've said several times, throughout all this - heap, 
non-heap, thread count, thread stack and direct memory buffers (at least 
as visible to the JVM) are all stable.


Cheers,
Dave


Regards,

Rob

From: Dave Reynolds 
Date: Friday, 7 July 2023 at 11:11
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Hi Andy

Re: Mystery memory leak in fuseki

2023-07-10 Thread Rob @ DNR
Dave

Poked around a bit today but not sure I’ve reproduced anything as such or found 
any smoking guns

I ran a Fuseki instance with the same watch command you showed in your last 
message.  JVM Heap stays essentially static even after hours, there’s some 
minor fluctuation up and down in used heap space but the heap itself doesn’t 
grow at all.  Did this with a couple of different versions of 4.x to see if 
there’s any discernible difference but nothing meaningful showed up.  I also 
used 3.17.0 but again couldn’t reproduce the behaviour you are describing.

For reference I’m on OS X 13.4.1 using OpenJDK 17

The process peak memory (for all versions I tested) seems to peak at about 1.5G 
as reported by the vmmap tool.  Ongoing monitoring, i.e., OS X Activity Monitor 
shows the memory usage of the process fluctuating over time, but I don’t ever 
see the unlimited growth that your original report suggested.  Also, I didn’t 
set heap explicitly at all so I’m getting the default max heap of 4GB, and my 
actual heap usage was around 100 MB.

I see from vmmap that most of the memory appears to be virtual memory related 
to the many shared native libraries that the JVM links against which on a real 
OS is often swapped out as it’s not under active usage.

In a container, where swap is likely disabled, that’s obviously more 
problematic as everything occupies memory even if much of it might be for 
native libraries that are never needed by anything Fuseki does.  Again, I don’t 
see how that would lead to the apparently unbounded memory usage you’re 
describing.

You could try using jlink to build a minimal image where you only have the 
parts of the JDK that you need in the image.  I found the following old Jena 
thread - https://lists.apache.org/thread/dmmkndmy2ds8pf95zvqbcxpv84bj7cz6 - 
which actually describes an apparently similar memory issue but also has an 
example of a Dockerfile linked at the start of the thread that builds just such 
a minimal JRE for Fuseki.

Note that I also ran the leaks tool against the long running Fuseki processes 
and that didn’t find anything of note, 5.19KB of memory leaks over a 3.5 hr run 
so no smoking gun there.

Regards,

Rob

From: Dave Reynolds 
Date: Friday, 7 July 2023 at 11:11
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Hi Andy,

Thanks for looking.

Good thought on some issue with stacked requests causing thread leak but
don't think that matches our data.

 From the metrics the number of threads and total thread memory used is
not that great and is stable long term while the process size grows, at
least in our situation.

This is based on both the JVM metrics from the prometheus scrape and by
switching on native memory checking and using jcmd to do various low
level dumps.

In a test set up we can replicate the long term (~3 hours) process
growth (while the heap, non-heap and threads stay stable) by just doing
something like:

watch -n 1 'curl -s http://localhost:3030/$/metrics'

With no other requests at all. So I think that makes it less likely the
root cause is triggered by stacked concurrent requests. Certainly the
curl process has exited completely each time. Though I guess there could
some connection cleanup going on in the linux kernel still.

 > Is the OOM kill the container runtime or Java exception?

We're not limiting the container memory but the OOM error is from docker
runtime itself:
 fatal error: out of memory allocating heap arena map

We have replicated the memory growth outside a container but not left
that to soak on a small machine to provoke an OOM, so not sure if the
OOM killer would hit first or get a java OOM exception first.

One curiosity we've found on the recent tests is that, when the process
has grown to dangerous level for the server, we do randomly sometimes
see the JVM (Temurin 17.0.7) spit out a thread dump and heap summary as
if there were a low level exception. However, there's no exception
message at all - just a timestamp the thread dump and nothing else. The
JVM seems to just carry on and the process doesn't exit. We're not
setting any debug flags and not requesting any thread dump, and there's
no obvious triggering event. This is before the server gets completely
out of the memory causing the docker runtime to barf.

Dave


On 07/07/2023 09:56, Andy Seaborne wrote:
> I tried running without any datasets. I get the same heap effect of
> growing slowly then a dropping back.
>
> Fuseki Main (fuseki-server did the same but the figures are from main -
> there is less going on)
> Version 4.8.0
>
> fuseki -v --ping --empty# No datasets
>
> 4G heap.
> 71M allocated
> 4 threads (+ Daemon system threads)
> 2 are not parked (i.e. they are blocked)
> The heap grows slowly to 48M then a GC runs then drops to 27M
> This repeats.
>
> Run one ping.
> Heap now 142M, 94M/21M GC cycle
> and 2 more threads at least for a while. They seem to go away a

Re: Mystery memory leak in fuseki

2023-07-07 Thread Dave Reynolds

Hi Andy,

Thanks for looking.

Good thought on some issue with stacked requests causing thread leak but 
don't think that matches our data.


From the metrics the number of threads and total thread memory used is 
not that great and is stable long term while the process size grows, at 
least in our situation.


This is based on both the JVM metrics from the prometheus scrape and by 
switching on native memory checking and using jcmd to do various low 
level dumps.


In a test set up we can replicate the long term (~3 hours) process 
growth (while the heap, non-heap and threads stay stable) by just doing 
something like:


watch -n 1 'curl -s http://localhost:3030/$/metrics'

With no other requests at all. So I think that makes it less likely the 
root cause is triggered by stacked concurrent requests. Certainly the 
curl process has exited completely each time. Though I guess there could 
some connection cleanup going on in the linux kernel still.


> Is the OOM kill the container runtime or Java exception?

We're not limiting the container memory but the OOM error is from docker 
runtime itself:

fatal error: out of memory allocating heap arena map

We have replicated the memory growth outside a container but not left 
that to soak on a small machine to provoke an OOM, so not sure if the 
OOM killer would hit first or get a java OOM exception first.


One curiosity we've found on the recent tests is that, when the process 
has grown to dangerous level for the server, we do randomly sometimes 
see the JVM (Temurin 17.0.7) spit out a thread dump and heap summary as 
if there were a low level exception. However, there's no exception 
message at all - just a timestamp the thread dump and nothing else. The 
JVM seems to just carry on and the process doesn't exit. We're not 
setting any debug flags and not requesting any thread dump, and there's 
no obvious triggering event. This is before the server gets completely 
out of the memory causing the docker runtime to barf.


Dave


On 07/07/2023 09:56, Andy Seaborne wrote:
I tried running without any datasets. I get the same heap effect of 
growing slowly then a dropping back.


Fuseki Main (fuseki-server did the same but the figures are from main - 
there is less going on)

Version 4.8.0

fuseki -v --ping --empty    # No datasets

4G heap.
71M allocated
4 threads (+ Daemon system threads)
2 are not parked (i.e. they are blocked)
The heap grows slowly to 48M then a GC runs then drops to 27M
This repeats.

Run one ping.
Heap now 142M, 94M/21M GC cycle
and 2 more threads at least for a while. They seem to go away after time.
2 are not parked.

Now pause process the JVM, queue 100 pings and continue the process.
Heap now 142M, 80M/21M GC cycle
and no more threads.

Thread stacks are not heap so there may be something here.

Same except -Xmx500M
RSS is 180M
Heap is 35M actual.
56M/13M heap cycle
and after one ping:
I saw 3 more threads, and one quickly exited.
2 are not parked

100 concurrent ping requests.
Maybe 15 more threads. 14 parked. One is marked "running" by visualvm.
RSS is 273M

With -Xmx250M -Xss170k
The Fuseki command failed below 170k during classloading.

1000 concurrent ping requests.
Maybe 15 more threads. 14 parked. One is marked "running" by visualvm.
The threads aren't being gathered.
RSS is 457M.

So a bit of speculation:

Is the OOM kill the container runtime or Java exception?

There aren't many moving parts.

Maybe under some circumstances, the metrics gatherer or ping caller
causes more threads. This could be bad timing, several operations 
arriving at the same time, or it could be the client end isn't releasing 
the HTTP connection in a timely manner or is delayed/failing to read the 
entire response.  HTTP/1.1. -- HTTP/2 probably isn't at risk.


Together with a dataset, memory mapped files etc, it is pushing the 
process size up and on a small machine that might become a problem 
especially if the container host is limiting RAM.


But speculation.

     Andy



Re: Mystery memory leak in fuseki

2023-07-07 Thread Andy Seaborne
I tried running without any datasets. I get the same heap effect of 
growing slowly then a dropping back.


Fuseki Main (fuseki-server did the same but the figures are from main - 
there is less going on)

Version 4.8.0

fuseki -v --ping --empty# No datasets

4G heap.
71M allocated
4 threads (+ Daemon system threads)
2 are not parked (i.e. they are blocked)
The heap grows slowly to 48M then a GC runs then drops to 27M
This repeats.

Run one ping.
Heap now 142M, 94M/21M GC cycle
and 2 more threads at least for a while. They seem to go away after time.
2 are not parked.

Now pause process the JVM, queue 100 pings and continue the process.
Heap now 142M, 80M/21M GC cycle
and no more threads.

Thread stacks are not heap so there may be something here.

Same except -Xmx500M
RSS is 180M
Heap is 35M actual.
56M/13M heap cycle
and after one ping:
I saw 3 more threads, and one quickly exited.
2 are not parked

100 concurrent ping requests.
Maybe 15 more threads. 14 parked. One is marked "running" by visualvm.
RSS is 273M

With -Xmx250M -Xss170k
The Fuseki command failed below 170k during classloading.

1000 concurrent ping requests.
Maybe 15 more threads. 14 parked. One is marked "running" by visualvm.
The threads aren't being gathered.
RSS is 457M.

So a bit of speculation:

Is the OOM kill the container runtime or Java exception?

There aren't many moving parts.

Maybe under some circumstances, the metrics gatherer or ping caller
causes more threads. This could be bad timing, several operations 
arriving at the same time, or it could be the client end isn't releasing 
the HTTP connection in a timely manner or is delayed/failing to read the 
entire response.  HTTP/1.1. -- HTTP/2 probably isn't at risk.


Together with a dataset, memory mapped files etc, it is pushing the 
process size up and on a small machine that might become a problem 
especially if the container host is limiting RAM.


But speculation.

Andy



Re: Mystery memory leak in fuseki

2023-07-06 Thread Andy Seaborne

Hi Frank,

Was this exactly the same JVM version for each of 4.x version?

The ping action hasn't changed but versions of jetty have.

> Can anyone reproduce my observations?

A few 100 pings is a small amount of memory and a minor CG clears it up.

When there is a nothing going on, there is a slow heap growth then a 
minor GC runs every some often and heap drops. The fluctuations is 27M. 
A minor GCs clear them out every about every 200 seconds. The allocated 
heap size, initially at 71M, does not get larger.


This is with 4.8.0

Andy

Ubuntu 23.04
openjdk version "17.0.7" 2023-04-18
64G RAM


On 05/07/2023 16:25, Lange, Frank wrote:

Hi folks,

Dave mentioned ping for health checks in his initial post. I have the suspicion 
that the ping endpoint produces a memory leak.

Test setup:
* Debian 12, Linux kernel 6.1.0-9-amd64
* `java -version`:
   openjdk version "17.0.7" 2023-04-18
   OpenJDK Runtime Environment (build 17.0.7+7-Debian-1deb12u1)
* fresh install of apache-jena-fuseki-4.8.0, default configuration (run 
directory is created automatically), no datasets
* Fuseki is started via the `fuseki-server` script; no extra JVM_ARGS (i.e. it 
becomes --Xmx4G)

Test execution:
* Call the ping endpoint a few hundred times, e.g. via `for i in {1..100}; do 
curl http://127.0.0.1:3030/$/ping; done`.

Observation:
* Memory consumption of the java process increases significantly up to around 
4GBs, then some GC steps in and reduces memory consumption to less than 1GB. 
The cycle repeats with more pings.
* Tweaking the JVM arg --Xmx can change when GC steps in.

Can anyone reproduce my observations?

I tried that with all versions down from v4.8.0 down to v4.0.0 and I'm happy to 
give you some clues:
The erratic behaviour starts with version 4.3.0, so it's advisable to check what happened 
between v4.2.0 and v4.3.0. Another impression is that v4.1.0 is even less 
"memory-leaky" than v4.2.0.

I also analyzed with VisualVM in this test setup, but to be honest I don't see 
any suspicious memory leak situation there.


Best regards,
Frank




Von: Dave Reynolds 
Gesendet: Dienstag, 4. Juli 2023 12:16
An: users@jena.apache.org
Betreff: Re: Mystery memory leak in fuseki

Try that again:

For interest this is what the JVM metrics look like. The main
heap/non-heap ones are:

https://www.dropbox.com/s/g1ih98kprnvjvxx/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB)
then GC back. Lots of churn just for reporting the metrics but no sign
of the upward trend which dominates the MEM% curves and nothing to
explain the growth to 1.8GB and beyond

Guess could try doing a heap dump anyway in case that gives a clue but
not sure that's the right haystack.

Dave

On 04/07/2023 10:56, Dave Reynolds wrote:

For interest this is what the JVM metrics look like. The main
heap/non-heap ones are:

https://www.dropbox.com/s/8auux5v352ur04m/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB)
then GC back. Lots of churn just for reporting the metrics but no sign
of the upward trend which dominates the MEM% curves and nothing to
explain the growth to 1.8GB and beyond

Guess could try doing a heap dump anyway in case that gives a clue but
not sure that's the right haystack.

Dave


On 04/07/2023 10:41, Dave Reynolds wrote:

  >  Does this only happen in a container?  Or can you reproduce it
running locally as well?

Not reproduced locally yet, partly because it's harder to set up the
equivalent metrics monitoring there.

Can try harder at that.

  > If you can reproduce it locally then attaching a profiler like
VisualVM so you can take a heap snapshot and see where the memory is
going that would be useful

Thanks, aware of that option but I thought that would just allow us to
probe the heap, non-heap and buffer JVM memory pools. We have quite
detailed monitoring traces on all the JVM metrics which confirms heap
and non-heap are all fine, sitting stably at a low level and not
reflecting the leak.

That's also what tells us the direct memory buffers are cycling but
being properly collected and not leaking. Assuming the JVM metrics are
accurate then the leak is somewhere in native memory beyond the ken of
the JVM metrics.

Dave


On 04/07/2023 10:11, Rob @ DNR wrote:

Does this only happen in a container?  Or can you reproduce it
running locally as well?

If you can reproduce it locally then attaching a profiler like
VisualVM so you can take a heap snapshot and see where the memory is
going that would be useful

Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31

Re: Mystery memory leak in fuseki

2023-07-06 Thread Dave Reynolds
a memory leak.

Test setup:
* Debian 12, Linux kernel 6.1.0-9-amd64
* `java -version`:
   openjdk version "17.0.7" 2023-04-18
   OpenJDK Runtime Environment (build 17.0.7+7-Debian-1deb12u1)
* fresh install of apache-jena-fuseki-4.8.0, default configuration (run 
directory is created automatically), no datasets
* Fuseki is started via the `fuseki-server` script; no extra JVM_ARGS (i.e. it 
becomes --Xmx4G)

Test execution:
* Call the ping endpoint a few hundred times, e.g. via `for i in {1..100}; do 
curl http://127.0.0.1:3030/$/ping; done`.

Observation:
* Memory consumption of the java process increases significantly up to around 
4GBs, then some GC steps in and reduces memory consumption to less than 1GB. 
The cycle repeats with more pings.
* Tweaking the JVM arg --Xmx can change when GC steps in.

Can anyone reproduce my observations?

I tried that with all versions down from v4.8.0 down to v4.0.0 and I'm happy to 
give you some clues:
The erratic behaviour starts with version 4.3.0, so it's advisable to check what happened 
between v4.2.0 and v4.3.0. Another impression is that v4.1.0 is even less 
"memory-leaky" than v4.2.0.

I also analyzed with VisualVM in this test setup, but to be honest I don't see 
any suspicious memory leak situation there.


Best regards,
Frank




Von: Dave Reynolds 
Gesendet: Dienstag, 4. Juli 2023 12:16
An: users@jena.apache.org
Betreff: Re: Mystery memory leak in fuseki

Try that again:

For interest this is what the JVM metrics look like. The main
heap/non-heap ones are:

https://www.dropbox.com/s/g1ih98kprnvjvxx/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB)
then GC back. Lots of churn just for reporting the metrics but no sign
of the upward trend which dominates the MEM% curves and nothing to
explain the growth to 1.8GB and beyond

Guess could try doing a heap dump anyway in case that gives a clue but
not sure that's the right haystack.

Dave

On 04/07/2023 10:56, Dave Reynolds wrote:

For interest this is what the JVM metrics look like. The main
heap/non-heap ones are:

https://www.dropbox.com/s/8auux5v352ur04m/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB)
then GC back. Lots of churn just for reporting the metrics but no sign
of the upward trend which dominates the MEM% curves and nothing to
explain the growth to 1.8GB and beyond

Guess could try doing a heap dump anyway in case that gives a clue but
not sure that's the right haystack.

Dave


On 04/07/2023 10:41, Dave Reynolds wrote:

  >  Does this only happen in a container?  Or can you reproduce it
running locally as well?

Not reproduced locally yet, partly because it's harder to set up the
equivalent metrics monitoring there.

Can try harder at that.

  > If you can reproduce it locally then attaching a profiler like
VisualVM so you can take a heap snapshot and see where the memory is
going that would be useful

Thanks, aware of that option but I thought that would just allow us to
probe the heap, non-heap and buffer JVM memory pools. We have quite
detailed monitoring traces on all the JVM metrics which confirms heap
and non-heap are all fine, sitting stably at a low level and not
reflecting the leak.

That's also what tells us the direct memory buffers are cycling but
being properly collected and not leaking. Assuming the JVM metrics are
accurate then the leak is somewhere in native memory beyond the ken of
the JVM metrics.

Dave


On 04/07/2023 10:11, Rob @ DNR wrote:

Does this only happen in a container?  Or can you reproduce it
running locally as well?

If you can reproduce it locally then attaching a profiler like
VisualVM so you can take a heap snapshot and see where the memory is
going that would be useful

Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
16hours it gets to about 1.6GB and by eye has nearly flatted off
somewhat but not completely.

For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
comparison. The curve from then onwards is 4.7.0.

The spikes on the 4.7.0 match the allocation and recovery of the direct
memory buffers. The JVM metrics show those cycling around every 10mins
and being reclaimed each time with no leaking visible at that level.
Heap, non-heap and mapped buffers are all basically unchanging which is
to be expected

Re: Mystery memory leak in fuseki

2023-07-05 Thread Lange, Frank
Hi folks,

Dave mentioned ping for health checks in his initial post. I have the suspicion 
that the ping endpoint produces a memory leak.

Test setup:
* Debian 12, Linux kernel 6.1.0-9-amd64
* `java -version`:
  openjdk version "17.0.7" 2023-04-18
  OpenJDK Runtime Environment (build 17.0.7+7-Debian-1deb12u1)
* fresh install of apache-jena-fuseki-4.8.0, default configuration (run 
directory is created automatically), no datasets
* Fuseki is started via the `fuseki-server` script; no extra JVM_ARGS (i.e. it 
becomes --Xmx4G)

Test execution:
* Call the ping endpoint a few hundred times, e.g. via `for i in {1..100}; do 
curl http://127.0.0.1:3030/$/ping; done`.

Observation:
* Memory consumption of the java process increases significantly up to around 
4GBs, then some GC steps in and reduces memory consumption to less than 1GB. 
The cycle repeats with more pings.
* Tweaking the JVM arg --Xmx can change when GC steps in.

Can anyone reproduce my observations?

I tried that with all versions down from v4.8.0 down to v4.0.0 and I'm happy to 
give you some clues:
The erratic behaviour starts with version 4.3.0, so it's advisable to check 
what happened between v4.2.0 and v4.3.0. Another impression is that v4.1.0 is 
even less "memory-leaky" than v4.2.0.

I also analyzed with VisualVM in this test setup, but to be honest I don't see 
any suspicious memory leak situation there.


Best regards,
Frank




Von: Dave Reynolds 
Gesendet: Dienstag, 4. Juli 2023 12:16
An: users@jena.apache.org
Betreff: Re: Mystery memory leak in fuseki

Try that again:

For interest this is what the JVM metrics look like. The main
heap/non-heap ones are:

https://www.dropbox.com/s/g1ih98kprnvjvxx/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB)
then GC back. Lots of churn just for reporting the metrics but no sign
of the upward trend which dominates the MEM% curves and nothing to
explain the growth to 1.8GB and beyond

Guess could try doing a heap dump anyway in case that gives a clue but
not sure that's the right haystack.

Dave

On 04/07/2023 10:56, Dave Reynolds wrote:
> For interest this is what the JVM metrics look like. The main
> heap/non-heap ones are:
>
> https://www.dropbox.com/s/8auux5v352ur04m/fusdeki-metrics-1.png?dl=0
>
> So stable at around 75MB used, 110MB committed.
>
> Whereas the buffer pools are:
>
> https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0
>
> So gets up to a size comparable with the allowed max heap size (500MB)
> then GC back. Lots of churn just for reporting the metrics but no sign
> of the upward trend which dominates the MEM% curves and nothing to
> explain the growth to 1.8GB and beyond
>
> Guess could try doing a heap dump anyway in case that gives a clue but
> not sure that's the right haystack.
>
> Dave
>
>
> On 04/07/2023 10:41, Dave Reynolds wrote:
>>  >  Does this only happen in a container?  Or can you reproduce it
>> running locally as well?
>>
>> Not reproduced locally yet, partly because it's harder to set up the
>> equivalent metrics monitoring there.
>>
>> Can try harder at that.
>>
>>  > If you can reproduce it locally then attaching a profiler like
>> VisualVM so you can take a heap snapshot and see where the memory is
>> going that would be useful
>>
>> Thanks, aware of that option but I thought that would just allow us to
>> probe the heap, non-heap and buffer JVM memory pools. We have quite
>> detailed monitoring traces on all the JVM metrics which confirms heap
>> and non-heap are all fine, sitting stably at a low level and not
>> reflecting the leak.
>>
>> That's also what tells us the direct memory buffers are cycling but
>> being properly collected and not leaking. Assuming the JVM metrics are
>> accurate then the leak is somewhere in native memory beyond the ken of
>> the JVM metrics.
>>
>> Dave
>>
>>
>> On 04/07/2023 10:11, Rob @ DNR wrote:
>>> Does this only happen in a container?  Or can you reproduce it
>>> running locally as well?
>>>
>>> If you can reproduce it locally then attaching a profiler like
>>> VisualVM so you can take a heap snapshot and see where the memory is
>>> going that would be useful
>>>
>>> Rob
>>>
>>> From: Dave Reynolds 
>>> Date: Tuesday, 4 July 2023 at 09:31
>>> To: users@jena.apache.org 
>>> Subject: Re: Mystery memory leak in fuseki
>>> Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After

Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds

Try that again:

For interest this is what the JVM metrics look like. The main 
heap/non-heap ones are:


https://www.dropbox.com/s/g1ih98kprnvjvxx/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB) 
then GC back. Lots of churn just for reporting the metrics but no sign 
of the upward trend which dominates the MEM% curves and nothing to 
explain the growth to 1.8GB and beyond


Guess could try doing a heap dump anyway in case that gives a clue but 
not sure that's the right haystack.


Dave

On 04/07/2023 10:56, Dave Reynolds wrote:
For interest this is what the JVM metrics look like. The main 
heap/non-heap ones are:


https://www.dropbox.com/s/8auux5v352ur04m/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB) 
then GC back. Lots of churn just for reporting the metrics but no sign 
of the upward trend which dominates the MEM% curves and nothing to 
explain the growth to 1.8GB and beyond


Guess could try doing a heap dump anyway in case that gives a clue but 
not sure that's the right haystack.


Dave


On 04/07/2023 10:41, Dave Reynolds wrote:
 >  Does this only happen in a container?  Or can you reproduce it 
running locally as well?


Not reproduced locally yet, partly because it's harder to set up the 
equivalent metrics monitoring there.


Can try harder at that.

 > If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Thanks, aware of that option but I thought that would just allow us to 
probe the heap, non-heap and buffer JVM memory pools. We have quite 
detailed monitoring traces on all the JVM metrics which confirms heap 
and non-heap are all fine, sitting stably at a low level and not 
reflecting the leak.


That's also what tells us the direct memory buffers are cycling but 
being properly collected and not leaking. Assuming the JVM metrics are 
accurate then the leak is somewhere in native memory beyond the ken of 
the JVM metrics.


Dave


On 04/07/2023 10:11, Rob @ DNR wrote:
Does this only happen in a container?  Or can you reproduce it 
running locally as well?


If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
16hours it gets to about 1.6GB and by eye has nearly flatted off
somewhat but not completely.

For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
comparison. The curve from then onwards is 4.7.0.

The spikes on the 4.7.0 match the allocation and recovery of the direct
memory buffers. The JVM metrics show those cycling around every 10mins
and being reclaimed each time with no leaking visible at that level.
Heap, non-heap and mapped buffers are all basically unchanging which is
to be expected since it's doing nothing apart from reporting metrics.

Whereas this curve (again from 17:20 onwards) shows basically the same
4.7.0 set up on a separate host, showing that despite flattening out
somewhat usage continues to grow - a least on a 16 hour timescale.

https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0


Both of those runs were using Eclipse Temurin on a base Ubuntu jammy
container. Pervious runs used AWS Corretto on an AL2 base container.
Behaviour basically unchanged so eliminates this being some
Corretto-specific issue or a weird base container OS issue.

Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap 
and

not direct memory (I though that was a hard bound set at start up), I
don't see how it can be involved.

  Andy

On 03/07/2023 14:20, Dave Reynolds wrote:

We have a very strange problem with recent fuseki versions when
running (in docker containers) on small machines. Suspect a jetty
issue but it's not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB
as NQuads). Runs on 4GB machines with java heap allocation of 
500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
s

Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds
For interest this is what the JVM metrics look like. The main 
heap/non-heap ones are:


https://www.dropbox.com/s/8auux5v352ur04m/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB) 
then GC back. Lots of churn just for reporting the metrics but no sign 
of the upward trend which dominates the MEM% curves and nothing to 
explain the growth to 1.8GB and beyond


Guess could try doing a heap dump anyway in case that gives a clue but 
not sure that's the right haystack.


Dave


On 04/07/2023 10:41, Dave Reynolds wrote:
 >  Does this only happen in a container?  Or can you reproduce it 
running locally as well?


Not reproduced locally yet, partly because it's harder to set up the 
equivalent metrics monitoring there.


Can try harder at that.

 > If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Thanks, aware of that option but I thought that would just allow us to 
probe the heap, non-heap and buffer JVM memory pools. We have quite 
detailed monitoring traces on all the JVM metrics which confirms heap 
and non-heap are all fine, sitting stably at a low level and not 
reflecting the leak.


That's also what tells us the direct memory buffers are cycling but 
being properly collected and not leaking. Assuming the JVM metrics are 
accurate then the leak is somewhere in native memory beyond the ken of 
the JVM metrics.


Dave


On 04/07/2023 10:11, Rob @ DNR wrote:
Does this only happen in a container?  Or can you reproduce it running 
locally as well?


If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
16hours it gets to about 1.6GB and by eye has nearly flatted off
somewhat but not completely.

For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
comparison. The curve from then onwards is 4.7.0.

The spikes on the 4.7.0 match the allocation and recovery of the direct
memory buffers. The JVM metrics show those cycling around every 10mins
and being reclaimed each time with no leaking visible at that level.
Heap, non-heap and mapped buffers are all basically unchanging which is
to be expected since it's doing nothing apart from reporting metrics.

Whereas this curve (again from 17:20 onwards) shows basically the same
4.7.0 set up on a separate host, showing that despite flattening out
somewhat usage continues to grow - a least on a 16 hour timescale.

https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0


Both of those runs were using Eclipse Temurin on a base Ubuntu jammy
container. Pervious runs used AWS Corretto on an AL2 base container.
Behaviour basically unchanged so eliminates this being some
Corretto-specific issue or a weird base container OS issue.

Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap and
not direct memory (I though that was a hard bound set at start up), I
don't see how it can be involved.

  Andy

On 03/07/2023 14:20, Dave Reynolds wrote:

We have a very strange problem with recent fuseki versions when
running (in docker containers) on small machines. Suspect a jetty
issue but it's not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].

We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
support) with no problems.

Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of
a day or so to reach ~3GB of memory at which point the 4GB machine
becomes unviable and things get OOM killed.

The strange thing is that this growth happens when the system is
answering no Sparql queries at all, just regular health ping checks
and (prometheus) metrics scrapes from the monitoring systems.

Furthermore the space being consumed is not visible to any of the JVM
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly
non-heap metaspace).
- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being
reclaimed. Since there are no sparql queries at all we 

Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds
>  Does this only happen in a container?  Or can you reproduce it 
running locally as well?


Not reproduced locally yet, partly because it's harder to set up the 
equivalent metrics monitoring there.


Can try harder at that.

> If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Thanks, aware of that option but I thought that would just allow us to 
probe the heap, non-heap and buffer JVM memory pools. We have quite 
detailed monitoring traces on all the JVM metrics which confirms heap 
and non-heap are all fine, sitting stably at a low level and not 
reflecting the leak.


That's also what tells us the direct memory buffers are cycling but 
being properly collected and not leaking. Assuming the JVM metrics are 
accurate then the leak is somewhere in native memory beyond the ken of 
the JVM metrics.


Dave


On 04/07/2023 10:11, Rob @ DNR wrote:

Does this only happen in a container?  Or can you reproduce it running locally 
as well?

If you can reproduce it locally then attaching a profiler like VisualVM so you 
can take a heap snapshot and see where the memory is going that would be useful

Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
16hours it gets to about 1.6GB and by eye has nearly flatted off
somewhat but not completely.

For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
comparison. The curve from then onwards is 4.7.0.

The spikes on the 4.7.0 match the allocation and recovery of the direct
memory buffers. The JVM metrics show those cycling around every 10mins
and being reclaimed each time with no leaking visible at that level.
Heap, non-heap and mapped buffers are all basically unchanging which is
to be expected since it's doing nothing apart from reporting metrics.

Whereas this curve (again from 17:20 onwards) shows basically the same
4.7.0 set up on a separate host, showing that despite flattening out
somewhat usage continues to grow - a least on a 16 hour timescale.

https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0


Both of those runs were using Eclipse Temurin on a base Ubuntu jammy
container. Pervious runs used AWS Corretto on an AL2 base container.
Behaviour basically unchanged so eliminates this being some
Corretto-specific issue or a weird base container OS issue.

Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap and
not direct memory (I though that was a hard bound set at start up), I
don't see how it can be involved.

  Andy

On 03/07/2023 14:20, Dave Reynolds wrote:

We have a very strange problem with recent fuseki versions when
running (in docker containers) on small machines. Suspect a jetty
issue but it's not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].

We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
support) with no problems.

Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of
a day or so to reach ~3GB of memory at which point the 4GB machine
becomes unviable and things get OOM killed.

The strange thing is that this growth happens when the system is
answering no Sparql queries at all, just regular health ping checks
and (prometheus) metrics scrapes from the monitoring systems.

Furthermore the space being consumed is not visible to any of the JVM
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly
non-heap metaspace).
- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being
reclaimed. Since there are no sparql queries at all we assume this is
jetty NIO buffers being churned as a result of the metric scrapes.
However, this direct buffer behaviour seems stable, it cycles between
0 and 500MB on approx a 10min cycle but is stable over a period of
days and shows no leaks.

Yet the java process grows from an initial 100MB to at least 3GB. This
can occur in the space of a couple of hours or can take up to a day or
two with no predictability in how fast.

Presumably there is some low level JNI space allocated by Jetty (?)
which is invisible to all the JVM metrics and is not being reliably
reclaimed.

Trying 4.6.0, which we've had less problems with elsewhere, that seems
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory
buffers) and then

Re: Mystery memory leak in fuseki

2023-07-04 Thread Martynas Jusevičius
You can profile it in the container as well :)
https://github.com/AtomGraph/fuseki-docker#profiling

On Tue, 4 Jul 2023 at 11.12, Rob @ DNR  wrote:

> Does this only happen in a container?  Or can you reproduce it running
> locally as well?
>
> If you can reproduce it locally then attaching a profiler like VisualVM so
> you can take a heap snapshot and see where the memory is going that would
> be useful
>
> Rob
>
> From: Dave Reynolds 
> Date: Tuesday, 4 July 2023 at 09:31
> To: users@jena.apache.org 
> Subject: Re: Mystery memory leak in fuseki
> Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
> 16hours it gets to about 1.6GB and by eye has nearly flatted off
> somewhat but not completely.
>
> For interest here's a MEM% curve on a 4GB box (hope the link works).
>
> https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0
>
> The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
> comparison. The curve from then onwards is 4.7.0.
>
> The spikes on the 4.7.0 match the allocation and recovery of the direct
> memory buffers. The JVM metrics show those cycling around every 10mins
> and being reclaimed each time with no leaking visible at that level.
> Heap, non-heap and mapped buffers are all basically unchanging which is
> to be expected since it's doing nothing apart from reporting metrics.
>
> Whereas this curve (again from 17:20 onwards) shows basically the same
> 4.7.0 set up on a separate host, showing that despite flattening out
> somewhat usage continues to grow - a least on a 16 hour timescale.
>
> https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0
>
>
> Both of those runs were using Eclipse Temurin on a base Ubuntu jammy
> container. Pervious runs used AWS Corretto on an AL2 base container.
> Behaviour basically unchanged so eliminates this being some
> Corretto-specific issue or a weird base container OS issue.
>
> Dave
>
> On 03/07/2023 14:54, Andy Seaborne wrote:
> > Hi Dave,
> >
> > Could you try 4.7.0?
> >
> > 4.6.0 was 2022-08-20
> > 4.7.0 was 2022-12-27
> > 4.8.0 was 2023-04-20
> >
> > This is an in-memory database?
> >
> > Micrometer/Prometheus has had several upgrades but if it is not heap and
> > not direct memory (I though that was a hard bound set at start up), I
> > don't see how it can be involved.
> >
> >  Andy
> >
> > On 03/07/2023 14:20, Dave Reynolds wrote:
> >> We have a very strange problem with recent fuseki versions when
> >> running (in docker containers) on small machines. Suspect a jetty
> >> issue but it's not clear.
> >>
> >> Wondering if anyone has seen anything like this.
> >>
> >> This is a production service but with tiny data (~250k triples, ~60MB
> >> as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].
> >>
> >> We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
> >> support) with no problems.
> >>
> >> Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of
> >> a day or so to reach ~3GB of memory at which point the 4GB machine
> >> becomes unviable and things get OOM killed.
> >>
> >> The strange thing is that this growth happens when the system is
> >> answering no Sparql queries at all, just regular health ping checks
> >> and (prometheus) metrics scrapes from the monitoring systems.
> >>
> >> Furthermore the space being consumed is not visible to any of the JVM
> >> metrics:
> >> - Heap and and non-heap are stable at around 100MB total (mostly
> >> non-heap metaspace).
> >> - Mapped buffers stay at 50MB and remain long term stable.
> >> - Direct memory buffers being allocated up to around 500MB then being
> >> reclaimed. Since there are no sparql queries at all we assume this is
> >> jetty NIO buffers being churned as a result of the metric scrapes.
> >> However, this direct buffer behaviour seems stable, it cycles between
> >> 0 and 500MB on approx a 10min cycle but is stable over a period of
> >> days and shows no leaks.
> >>
> >> Yet the java process grows from an initial 100MB to at least 3GB. This
> >> can occur in the space of a couple of hours or can take up to a day or
> >> two with no predictability in how fast.
> >>
> >> Presumably there is some low level JNI space allocated by Jetty (?)
> >> which is invisible to all the JVM metrics and is not being reliably
> >> reclaimed.
> >>
> >> Trying 4.6.0, which we've had less problems 

Re: Mystery memory leak in fuseki

2023-07-04 Thread Rob @ DNR
Does this only happen in a container?  Or can you reproduce it running locally 
as well?

If you can reproduce it locally then attaching a profiler like VisualVM so you 
can take a heap snapshot and see where the memory is going that would be useful

Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
16hours it gets to about 1.6GB and by eye has nearly flatted off
somewhat but not completely.

For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
comparison. The curve from then onwards is 4.7.0.

The spikes on the 4.7.0 match the allocation and recovery of the direct
memory buffers. The JVM metrics show those cycling around every 10mins
and being reclaimed each time with no leaking visible at that level.
Heap, non-heap and mapped buffers are all basically unchanging which is
to be expected since it's doing nothing apart from reporting metrics.

Whereas this curve (again from 17:20 onwards) shows basically the same
4.7.0 set up on a separate host, showing that despite flattening out
somewhat usage continues to grow - a least on a 16 hour timescale.

https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0


Both of those runs were using Eclipse Temurin on a base Ubuntu jammy
container. Pervious runs used AWS Corretto on an AL2 base container.
Behaviour basically unchanged so eliminates this being some
Corretto-specific issue or a weird base container OS issue.

Dave

On 03/07/2023 14:54, Andy Seaborne wrote:
> Hi Dave,
>
> Could you try 4.7.0?
>
> 4.6.0 was 2022-08-20
> 4.7.0 was 2022-12-27
> 4.8.0 was 2023-04-20
>
> This is an in-memory database?
>
> Micrometer/Prometheus has had several upgrades but if it is not heap and
> not direct memory (I though that was a hard bound set at start up), I
> don't see how it can be involved.
>
>  Andy
>
> On 03/07/2023 14:20, Dave Reynolds wrote:
>> We have a very strange problem with recent fuseki versions when
>> running (in docker containers) on small machines. Suspect a jetty
>> issue but it's not clear.
>>
>> Wondering if anyone has seen anything like this.
>>
>> This is a production service but with tiny data (~250k triples, ~60MB
>> as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].
>>
>> We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
>> support) with no problems.
>>
>> Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of
>> a day or so to reach ~3GB of memory at which point the 4GB machine
>> becomes unviable and things get OOM killed.
>>
>> The strange thing is that this growth happens when the system is
>> answering no Sparql queries at all, just regular health ping checks
>> and (prometheus) metrics scrapes from the monitoring systems.
>>
>> Furthermore the space being consumed is not visible to any of the JVM
>> metrics:
>> - Heap and and non-heap are stable at around 100MB total (mostly
>> non-heap metaspace).
>> - Mapped buffers stay at 50MB and remain long term stable.
>> - Direct memory buffers being allocated up to around 500MB then being
>> reclaimed. Since there are no sparql queries at all we assume this is
>> jetty NIO buffers being churned as a result of the metric scrapes.
>> However, this direct buffer behaviour seems stable, it cycles between
>> 0 and 500MB on approx a 10min cycle but is stable over a period of
>> days and shows no leaks.
>>
>> Yet the java process grows from an initial 100MB to at least 3GB. This
>> can occur in the space of a couple of hours or can take up to a day or
>> two with no predictability in how fast.
>>
>> Presumably there is some low level JNI space allocated by Jetty (?)
>> which is invisible to all the JVM metrics and is not being reliably
>> reclaimed.
>>
>> Trying 4.6.0, which we've had less problems with elsewhere, that seems
>> to grow to around 1GB (plus up to 0.5GB for the cycling direct memory
>> buffers) and then stays stable (at least on a three day soak test).
>> We could live with allocating 1.5GB to a system that should only need
>> a few 100MB but concerned that it may not be stable in the really long
>> term and, in any case, would rather be able to update to more recent
>> fuseki versions.
>>
>> Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then
>> keeps ticking up slowly at random intervals. We project that it would
>> take a few weeks to grow the scale it did under java 11 but it will
>> still eventually kill the machine.
>>
>> Anyone seem anything remotely like this?
>>
>> Dave
>>
>> [1]  500M heap may be overkill but there can be some complex queries
>> and that should still leave plenty of space for OS buffers etc in the
>> remaining memory on a 4GB machine.
>>
>>
>>


Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After 
16hours it gets to about 1.6GB and by eye has nearly flatted off 
somewhat but not completely.


For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for 
comparison. The curve from then onwards is 4.7.0.


The spikes on the 4.7.0 match the allocation and recovery of the direct 
memory buffers. The JVM metrics show those cycling around every 10mins 
and being reclaimed each time with no leaking visible at that level. 
Heap, non-heap and mapped buffers are all basically unchanging which is 
to be expected since it's doing nothing apart from reporting metrics.


Whereas this curve (again from 17:20 onwards) shows basically the same 
4.7.0 set up on a separate host, showing that despite flattening out 
somewhat usage continues to grow - a least on a 16 hour timescale.


https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0


Both of those runs were using Eclipse Temurin on a base Ubuntu jammy 
container. Pervious runs used AWS Corretto on an AL2 base container. 
Behaviour basically unchanged so eliminates this being some 
Corretto-specific issue or a weird base container OS issue.


Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap and 
not direct memory (I though that was a hard bound set at start up), I 
don't see how it can be involved.


     Andy

On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB 
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of 
a day or so to reach ~3GB of memory at which point the 4GB machine 
becomes unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks 
and (prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 
0 and 500MB on approx a 10min cycle but is stable over a period of 
days and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory 
buffers) and then stays stable (at least on a three day soak test).  
We could live with allocating 1.5GB to a system that should only need 
a few 100MB but concerned that it may not be stable in the really long 
term and, in any case, would rather be able to update to more recent 
fuseki versions.


Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then 
keeps ticking up slowly at random intervals. We project that it would 
take a few weeks to grow the scale it did under java 11 but it will 
still eventually kill the machine.


Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries 
and that should still leave plenty of space for OS buffers etc in the 
remaining memory on a 4GB machine.






Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds

Thanks for the suggestion, that could be useful.

Not managed to make that work yet. From within the container get 
permission denied, and running it on the host is no use because the 
relevant so's aren't where ltrace expects and it crashes out.


Similarly strace can't attach to the process in the container and 
running on the host gives no info.


Guess would have to replicate the set up without using containers. 
Certainly possible but a fair amount of work and loses all the metrics 
we get from the container stack. May have to resort to that.


Dave

On 03/07/2023 22:22, Justin wrote:

You might try running `ltrace` to watch the library calls and system calls
the jvm is making.
e.g.
ltrace -S -f -p 

I think the `sbrk` system call is used to allocate memory. It might be
interesting to see if you can catch the jvm invoking that system call and
also see what is happening around it.

On Mon, Jul 3, 2023 at 10:50 AM Dave Reynolds 
wrote:


On 03/07/2023 14:36, Martynas Jusevičius wrote:

There have been a few similar threads:

https://www.mail-archive.com/users@jena.apache.org/msg19871.html

https://www.mail-archive.com/users@jena.apache.org/msg18825.html



Thanks, I've seen those and not sure they quite match our case but maybe
I'm mistaken.

We already have a smallish heap allocation (500MB) which seem to be a
key conclusion of both those threads. Though I guess we could try even
lower.

Furthermore the second thread was related to 3.16.0 which is completely
stable for us at 150MB (rather than the 1.5GB that 4.6.* gets to, let
alone the 3+GB that gets 4.8.0 killed).

Dave




On Mon, 3 Jul 2023 at 15.20, Dave Reynolds 
wrote:


We have a very strange problem with recent fuseki versions when running
(in docker containers) on small machines. Suspect a jetty issue but it's
not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB as
NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].

We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
support) with no problems.

Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a
day or so to reach ~3GB of memory at which point the 4GB machine becomes
unviable and things get OOM killed.

The strange thing is that this growth happens when the system is
answering no Sparql queries at all, just regular health ping checks and
(prometheus) metrics scrapes from the monitoring systems.

Furthermore the space being consumed is not visible to any of the JVM
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly
non-heap metaspace).
- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being
reclaimed. Since there are no sparql queries at all we assume this is
jetty NIO buffers being churned as a result of the metric scrapes.
However, this direct buffer behaviour seems stable, it cycles between 0
and 500MB on approx a 10min cycle but is stable over a period of days
and shows no leaks.

Yet the java process grows from an initial 100MB to at least 3GB. This
can occur in the space of a couple of hours or can take up to a day or
two with no predictability in how fast.

Presumably there is some low level JNI space allocated by Jetty (?)
which is invisible to all the JVM metrics and is not being reliably
reclaimed.

Trying 4.6.0, which we've had less problems with elsewhere, that seems
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory
buffers) and then stays stable (at least on a three day soak test).  We
could live with allocating 1.5GB to a system that should only need a few
100MB but concerned that it may not be stable in the really long term
and, in any case, would rather be able to update to more recent fuseki
versions.

Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then
keeps ticking up slowly at random intervals. We project that it would
take a few weeks to grow the scale it did under java 11 but it will
still eventually kill the machine.

Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries and
that should still leave plenty of space for OS buffers etc in the
remaining memory on a 4GB machine.












Re: Mystery memory leak in fuseki

2023-07-03 Thread Justin
You might try running `ltrace` to watch the library calls and system calls
the jvm is making.
e.g.
ltrace -S -f -p 

I think the `sbrk` system call is used to allocate memory. It might be
interesting to see if you can catch the jvm invoking that system call and
also see what is happening around it.

On Mon, Jul 3, 2023 at 10:50 AM Dave Reynolds 
wrote:

> On 03/07/2023 14:36, Martynas Jusevičius wrote:
> > There have been a few similar threads:
> >
> > https://www.mail-archive.com/users@jena.apache.org/msg19871.html
> >
> > https://www.mail-archive.com/users@jena.apache.org/msg18825.html
>
>
> Thanks, I've seen those and not sure they quite match our case but maybe
> I'm mistaken.
>
> We already have a smallish heap allocation (500MB) which seem to be a
> key conclusion of both those threads. Though I guess we could try even
> lower.
>
> Furthermore the second thread was related to 3.16.0 which is completely
> stable for us at 150MB (rather than the 1.5GB that 4.6.* gets to, let
> alone the 3+GB that gets 4.8.0 killed).
>
> Dave
>
>
> >
> > On Mon, 3 Jul 2023 at 15.20, Dave Reynolds 
> > wrote:
> >
> >> We have a very strange problem with recent fuseki versions when running
> >> (in docker containers) on small machines. Suspect a jetty issue but it's
> >> not clear.
> >>
> >> Wondering if anyone has seen anything like this.
> >>
> >> This is a production service but with tiny data (~250k triples, ~60MB as
> >> NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].
> >>
> >> We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
> >> support) with no problems.
> >>
> >> Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a
> >> day or so to reach ~3GB of memory at which point the 4GB machine becomes
> >> unviable and things get OOM killed.
> >>
> >> The strange thing is that this growth happens when the system is
> >> answering no Sparql queries at all, just regular health ping checks and
> >> (prometheus) metrics scrapes from the monitoring systems.
> >>
> >> Furthermore the space being consumed is not visible to any of the JVM
> >> metrics:
> >> - Heap and and non-heap are stable at around 100MB total (mostly
> >> non-heap metaspace).
> >> - Mapped buffers stay at 50MB and remain long term stable.
> >> - Direct memory buffers being allocated up to around 500MB then being
> >> reclaimed. Since there are no sparql queries at all we assume this is
> >> jetty NIO buffers being churned as a result of the metric scrapes.
> >> However, this direct buffer behaviour seems stable, it cycles between 0
> >> and 500MB on approx a 10min cycle but is stable over a period of days
> >> and shows no leaks.
> >>
> >> Yet the java process grows from an initial 100MB to at least 3GB. This
> >> can occur in the space of a couple of hours or can take up to a day or
> >> two with no predictability in how fast.
> >>
> >> Presumably there is some low level JNI space allocated by Jetty (?)
> >> which is invisible to all the JVM metrics and is not being reliably
> >> reclaimed.
> >>
> >> Trying 4.6.0, which we've had less problems with elsewhere, that seems
> >> to grow to around 1GB (plus up to 0.5GB for the cycling direct memory
> >> buffers) and then stays stable (at least on a three day soak test).  We
> >> could live with allocating 1.5GB to a system that should only need a few
> >> 100MB but concerned that it may not be stable in the really long term
> >> and, in any case, would rather be able to update to more recent fuseki
> >> versions.
> >>
> >> Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then
> >> keeps ticking up slowly at random intervals. We project that it would
> >> take a few weeks to grow the scale it did under java 11 but it will
> >> still eventually kill the machine.
> >>
> >> Anyone seem anything remotely like this?
> >>
> >> Dave
> >>
> >> [1]  500M heap may be overkill but there can be some complex queries and
> >> that should still leave plenty of space for OS buffers etc in the
> >> remaining memory on a 4GB machine.
> >>
> >>
> >>
> >>
> >
>


Re: Mystery memory leak in fuseki

2023-07-03 Thread Dave Reynolds

On 03/07/2023 14:36, Martynas Jusevičius wrote:

There have been a few similar threads:

https://www.mail-archive.com/users@jena.apache.org/msg19871.html

https://www.mail-archive.com/users@jena.apache.org/msg18825.html



Thanks, I've seen those and not sure they quite match our case but maybe 
I'm mistaken.


We already have a smallish heap allocation (500MB) which seem to be a 
key conclusion of both those threads. Though I guess we could try even 
lower.


Furthermore the second thread was related to 3.16.0 which is completely 
stable for us at 150MB (rather than the 1.5GB that 4.6.* gets to, let 
alone the 3+GB that gets 4.8.0 killed).


Dave




On Mon, 3 Jul 2023 at 15.20, Dave Reynolds 
wrote:


We have a very strange problem with recent fuseki versions when running
(in docker containers) on small machines. Suspect a jetty issue but it's
not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB as
NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].

We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
support) with no problems.

Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a
day or so to reach ~3GB of memory at which point the 4GB machine becomes
unviable and things get OOM killed.

The strange thing is that this growth happens when the system is
answering no Sparql queries at all, just regular health ping checks and
(prometheus) metrics scrapes from the monitoring systems.

Furthermore the space being consumed is not visible to any of the JVM
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly
non-heap metaspace).
- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being
reclaimed. Since there are no sparql queries at all we assume this is
jetty NIO buffers being churned as a result of the metric scrapes.
However, this direct buffer behaviour seems stable, it cycles between 0
and 500MB on approx a 10min cycle but is stable over a period of days
and shows no leaks.

Yet the java process grows from an initial 100MB to at least 3GB. This
can occur in the space of a couple of hours or can take up to a day or
two with no predictability in how fast.

Presumably there is some low level JNI space allocated by Jetty (?)
which is invisible to all the JVM metrics and is not being reliably
reclaimed.

Trying 4.6.0, which we've had less problems with elsewhere, that seems
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory
buffers) and then stays stable (at least on a three day soak test).  We
could live with allocating 1.5GB to a system that should only need a few
100MB but concerned that it may not be stable in the really long term
and, in any case, would rather be able to update to more recent fuseki
versions.

Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then
keeps ticking up slowly at random intervals. We project that it would
take a few weeks to grow the scale it did under java 11 but it will
still eventually kill the machine.

Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries and
that should still leave plenty of space for OS buffers etc in the
remaining memory on a 4GB machine.








Re: Mystery memory leak in fuseki

2023-07-03 Thread Dave Reynolds

On 03/07/2023 15:07, Andy Seaborne wrote:

A possibility:

https://www.nickebbitt.com/blog/2022/01/26/the-story-of-a-java-17-native-memory-leak/

suggests workaround

-XX:-UseStringDeduplication

https://bugs.openjdk.org/browse/JDK-8277981
https://github.com/openjdk/jdk/pull/6613

which may be in Java 17.0.2


Ah, thanks hadn't spotted that. Though I was testing with 17.0.7 and, as 
you say, they claim that was fixed in 17.02.


Dave


Re: Mystery memory leak in fuseki

2023-07-03 Thread Dave Reynolds

Hi Andy,

> Could you try 4.7.0?

Will do, though each test takes quite a while :)

> This is an in-memory database?

No TDB1, sorry should have said that.

Though as I say we are leaving the system to soak with absolutely no 
queries arriving so it's not TDB churn and it's RSS that's filling up.


FWIW 3.16.0 runs at 150MB with the same max heap setting, completely 
stable. So that's 10x smaller than 4.6.0 stabilizes at. If nothing else 
that confirms that the container set up itself is not to blame.


> Micrometer/Prometheus has had several upgrades but if it is not heap and
> not direct memory (I though that was a hard bound set at start up), I
> don't see how it can be involved.

Likewise.

Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap and 
not direct memory (I though that was a hard bound set at start up), I 
don't see how it can be involved.


     Andy

On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB 
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of 
a day or so to reach ~3GB of memory at which point the 4GB machine 
becomes unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks 
and (prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 
0 and 500MB on approx a 10min cycle but is stable over a period of 
days and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory 
buffers) and then stays stable (at least on a three day soak test).  
We could live with allocating 1.5GB to a system that should only need 
a few 100MB but concerned that it may not be stable in the really long 
term and, in any case, would rather be able to update to more recent 
fuseki versions.


Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then 
keeps ticking up slowly at random intervals. We project that it would 
take a few weeks to grow the scale it did under java 11 but it will 
still eventually kill the machine.


Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries 
and that should still leave plenty of space for OS buffers etc in the 
remaining memory on a 4GB machine.






Re: Mystery memory leak in fuseki

2023-07-03 Thread Andy Seaborne

A possibility:

https://www.nickebbitt.com/blog/2022/01/26/the-story-of-a-java-17-native-memory-leak/

suggests workaround

-XX:-UseStringDeduplication

https://bugs.openjdk.org/browse/JDK-8277981
https://github.com/openjdk/jdk/pull/6613

which may be in Java 17.0.2

Andy


Re: Mystery memory leak in fuseki

2023-07-03 Thread Andy Seaborne

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap and 
not direct memory (I though that was a hard bound set at start up), I 
don't see how it can be involved.


Andy

On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when running 
(in docker containers) on small machines. Suspect a jetty issue but it's 
not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB as 
NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a 
day or so to reach ~3GB of memory at which point the 4GB machine becomes 
unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks and 
(prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 0 
and 500MB on approx a 10min cycle but is stable over a period of days 
and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory 
buffers) and then stays stable (at least on a three day soak test).  We 
could live with allocating 1.5GB to a system that should only need a few 
100MB but concerned that it may not be stable in the really long term 
and, in any case, would rather be able to update to more recent fuseki 
versions.


Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then 
keeps ticking up slowly at random intervals. We project that it would 
take a few weeks to grow the scale it did under java 11 but it will 
still eventually kill the machine.


Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries and 
that should still leave plenty of space for OS buffers etc in the 
remaining memory on a 4GB machine.






Re: Mystery memory leak in fuseki

2023-07-03 Thread Martynas Jusevičius
There have been a few similar threads:

https://www.mail-archive.com/users@jena.apache.org/msg19871.html

https://www.mail-archive.com/users@jena.apache.org/msg18825.html

On Mon, 3 Jul 2023 at 15.20, Dave Reynolds 
wrote:

> We have a very strange problem with recent fuseki versions when running
> (in docker containers) on small machines. Suspect a jetty issue but it's
> not clear.
>
> Wondering if anyone has seen anything like this.
>
> This is a production service but with tiny data (~250k triples, ~60MB as
> NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].
>
> We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
> support) with no problems.
>
> Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a
> day or so to reach ~3GB of memory at which point the 4GB machine becomes
> unviable and things get OOM killed.
>
> The strange thing is that this growth happens when the system is
> answering no Sparql queries at all, just regular health ping checks and
> (prometheus) metrics scrapes from the monitoring systems.
>
> Furthermore the space being consumed is not visible to any of the JVM
> metrics:
> - Heap and and non-heap are stable at around 100MB total (mostly
> non-heap metaspace).
> - Mapped buffers stay at 50MB and remain long term stable.
> - Direct memory buffers being allocated up to around 500MB then being
> reclaimed. Since there are no sparql queries at all we assume this is
> jetty NIO buffers being churned as a result of the metric scrapes.
> However, this direct buffer behaviour seems stable, it cycles between 0
> and 500MB on approx a 10min cycle but is stable over a period of days
> and shows no leaks.
>
> Yet the java process grows from an initial 100MB to at least 3GB. This
> can occur in the space of a couple of hours or can take up to a day or
> two with no predictability in how fast.
>
> Presumably there is some low level JNI space allocated by Jetty (?)
> which is invisible to all the JVM metrics and is not being reliably
> reclaimed.
>
> Trying 4.6.0, which we've had less problems with elsewhere, that seems
> to grow to around 1GB (plus up to 0.5GB for the cycling direct memory
> buffers) and then stays stable (at least on a three day soak test).  We
> could live with allocating 1.5GB to a system that should only need a few
> 100MB but concerned that it may not be stable in the really long term
> and, in any case, would rather be able to update to more recent fuseki
> versions.
>
> Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then
> keeps ticking up slowly at random intervals. We project that it would
> take a few weeks to grow the scale it did under java 11 but it will
> still eventually kill the machine.
>
> Anyone seem anything remotely like this?
>
> Dave
>
> [1]  500M heap may be overkill but there can be some complex queries and
> that should still leave plenty of space for OS buffers etc in the
> remaining memory on a 4GB machine.
>
>
>
>