On 11/06/18 22:28, Dan Pritts wrote:
Hi all,

we've been having trouble with our production fuseki instance.  a few specifics:

fuseki 3.6.0, standalone/jetty.  OpenJDK 1.8.0.171 on RHEL6.   On an m4.2xlarge, shared with two other applications.

we have about 21M triples in the database.

For background, could you share a directory listing with files sizes?

We hit fuseki medium hard, on the order of 1000 hits per minute.  99%+ of the hits are queries. Our code could stand to do some client-side caching, we get lots of repetitive queries.  That said, fuseki is normally plenty fast at those, it's rare that it takes >10ms on a query.

It looks like i'm getting hit by JENA-1516, I will schedule an upgrade to 3.7 ASAP.

The log is full of errors like this.

[2018-06-11 16:15:07] BindingTDB ERROR get1(?s)
org.apache.jena.tdb.base.file.FileException: ObjectFileStorage.read[nodes](488281706)[filesize=569694455][file.size()=569694455]: Failed to read the length : got 0 bytes         at org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:341)

[2018-06-11 16:15:39] BindingTDB ERROR get1(?identifier)
org.apache.jena.tdb.base.file.FileException: In the middle of an alloc-write         at org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:311)         at org.apache.jena.tdb.base.objectfile.ObjectFileWrapper.read(ObjectFileWrapper.java:57)
         at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)


Yes, that looks like JENA-1516.

The memory observations looks to be unconnected.



The problem that got me looking is that fuseki memory usage goes nuts, which causes the server to start swapping, etc.  Swapping = slow = pager.    Total memory + swap in use by fuseki when I investigated  was about 32GB; It's configured to use a 16GB heap.  Garbage collection logging was not configured properly, so I can't say whether my immediate problem was heap exhaustion.

I'm monitoring swap usage hourly - sometime in a <1hr timeframe the swap usage increased past 2GB (10%) to about 11GB (10 of which was cleared after I restarted fuseki).  So the memory ballooned fairly quickly when it happened.

The TDB errors happen much earlier than that memory goes nuts. Obviously, could be a delayed effect of this problem, but I'm wondering:

-  if this rings a bell in some other way - how much memory should I expect fuseki to need?
-  if there is any particular debugging I should enable
-  if our traffic level is out of the ordinary

thanks
danno

With TDB, the files are accessed as memory mapped files. This shows up as virtual memory for the Java JVM but it is not swap and not in the heap. It is parts of the OS file system cache mapped to the JVM process.

The 16G heap may help the rest of the server because of the use of memory for query execution and (TDB1) memory for transactions. For the file handling, it's used for the node cache only.

The OS file cache is in the RAM otherwise unused by applications. It flexs up and down based on space unused in applications. Not allocating all RAM to the applications (heaps) improves performance.

When you restart - looks like that 10G is the mapped file space being dropped. Mapping on-demand in chunks, so on restart it is very small and grows over time. It should reach a steady state. It should not cause swapping or GC.

Each index file is mapped 1-1 this way; some indexes only get touched by unusual queries so don't get mapped in practice. So the extra virtual memory should be less than the on-disk size (modified by strings take up more space in RAM than on disk in Java).

The heap will probably grow for other uses.

Java monitoring of the heap size should show the heap in use after a major GC to be a different, smaller size.

If that is not how it is, there is something to investigate.

    Andy

>
> thanks
> danno

Reply via email to