Re: huge fuseki memory usage; NIO errors; heap NOT running out

Andy Seaborne Tue, 12 Jun 2018 10:01:33 -0700



On 11/06/18 22:28, Dan Pritts wrote:

Hi all,
we've been having trouble with our production fuseki instance. a fewspecifics:
fuseki 3.6.0, standalone/jetty. OpenJDK 1.8.0.171 on RHEL6. On anm4.2xlarge, shared with two other applications.
we have about 21M triples in the database.


For background, could you share a directory listing with files sizes?

We hit fuseki medium hard,on the order of 1000 hits per minute. 99%+ of the hits are queries. Ourcode could stand to do some client-side caching, we get lots ofrepetitive queries. That said, fuseki is normally plenty fast at those,it's rare that it takes >10ms on a query.
It looks like i'm getting hit by JENA-1516, I will schedule an upgradeto 3.7 ASAP.
The log is full of errors like this.

[2018-06-11 16:15:07] BindingTDB ERROR get1(?s)
org.apache.jena.tdb.base.file.FileException:ObjectFileStorage.read[nodes](488281706)[filesize=569694455][file.size()=569694455]:Failed to read the length : got 0 bytes atorg.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:341)
[2018-06-11 16:15:39] BindingTDB ERROR get1(?identifier)
org.apache.jena.tdb.base.file.FileException: In the middle of analloc-write atorg.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:311) atorg.apache.jena.tdb.base.objectfile.ObjectFileWrapper.read(ObjectFileWrapper.java:57)
         at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)


Yes, that looks like JENA-1516.

The memory observations looks to be unconnected.

The problem that got me looking is that fuseki memory usage goes nuts,which causes the server to start swapping, etc. Swapping = slow =pager. Total memory + swap in use by fuseki when I investigated wasabout 32GB; It's configured to use a 16GB heap. Garbage collectionlogging was not configured properly, so I can't say whether my immediateproblem was heap exhaustion.
I'm monitoring swap usage hourly - sometime in a <1hr timeframe the swapusage increased past 2GB (10%) to about 11GB (10 of which was clearedafter I restarted fuseki). So the memory ballooned fairly quickly whenit happened.
The TDB errors happen much earlier than that memory goes nuts.Obviously, could be a delayed effect of this problem, but I'm wondering:
- if this rings a bell in some other way - how much memory should Iexpect fuseki to need?
-  if there is any particular debugging I should enable
-  if our traffic level is out of the ordinary

thanks
danno

With TDB, the files are accessed as memory mapped files. This shows upas virtual memory for the Java JVM but it is not swap and not in theheap. It is parts of the OS file system cache mapped to the JVM process.

The 16G heap may help the rest of the server because of the use ofmemory for query execution and (TDB1) memory for transactions. For thefile handling, it's used for the node cache only.

The OS file cache is in the RAM otherwise unused by applications. Itflexs up and down based on space unused in applications. Not allocatingall RAM to the applications (heaps) improves performance.

When you restart - looks like that 10G is the mapped file space beingdropped. Mapping on-demand in chunks, so on restart it is very small andgrows over time. It should reach a steady state. It should not causeswapping or GC.

Each index file is mapped 1-1 this way; some indexes only get touched byunusual queries so don't get mapped in practice. So the extra virtualmemory should be less than the on-disk size (modified by strings take upmore space in RAM than on disk in Java).


The heap will probably grow for other uses.

Java monitoring of the heap size should show the heap in use after amajor GC to be a different, smaller size.


If that is not how it is, there is something to investigate.

    Andy

>
> thanks
> danno

Re: huge fuseki memory usage; NIO errors; heap NOT running out

Reply via email to