On 11/06/18 22:28, Dan Pritts wrote:
Hi all,
we've been having trouble with our production fuseki instance. a few
specifics:
fuseki 3.6.0, standalone/jetty. OpenJDK 1.8.0.171 on RHEL6. On an
m4.2xlarge, shared with two other applications.
we have about 21M triples in the database.
For background, could you share a directory listing with files sizes?
We hit fuseki medium hard,
on the order of 1000 hits per minute. 99%+ of the hits are queries. Our
code could stand to do some client-side caching, we get lots of
repetitive queries. That said, fuseki is normally plenty fast at those,
it's rare that it takes >10ms on a query.
It looks like i'm getting hit by JENA-1516, I will schedule an upgrade
to 3.7 ASAP.
The log is full of errors like this.
[2018-06-11 16:15:07] BindingTDB ERROR get1(?s)
org.apache.jena.tdb.base.file.FileException:
ObjectFileStorage.read[nodes](488281706)[filesize=569694455][file.size()=569694455]:
Failed to read the length : got 0 bytes
at
org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:341)
[2018-06-11 16:15:39] BindingTDB ERROR get1(?identifier)
org.apache.jena.tdb.base.file.FileException: In the middle of an
alloc-write
at
org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:311)
at
org.apache.jena.tdb.base.objectfile.ObjectFileWrapper.read(ObjectFileWrapper.java:57)
at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
Yes, that looks like JENA-1516.
The memory observations looks to be unconnected.
The problem that got me looking is that fuseki memory usage goes nuts,
which causes the server to start swapping, etc. Swapping = slow =
pager. Total memory + swap in use by fuseki when I investigated was
about 32GB; It's configured to use a 16GB heap. Garbage collection
logging was not configured properly, so I can't say whether my immediate
problem was heap exhaustion.
I'm monitoring swap usage hourly - sometime in a <1hr timeframe the swap
usage increased past 2GB (10%) to about 11GB (10 of which was cleared
after I restarted fuseki). So the memory ballooned fairly quickly when
it happened.
The TDB errors happen much earlier than that memory goes nuts.
Obviously, could be a delayed effect of this problem, but I'm wondering:
- if this rings a bell in some other way - how much memory should I
expect fuseki to need?
- if there is any particular debugging I should enable
- if our traffic level is out of the ordinary
thanks
danno
With TDB, the files are accessed as memory mapped files. This shows up
as virtual memory for the Java JVM but it is not swap and not in the
heap. It is parts of the OS file system cache mapped to the JVM process.
The 16G heap may help the rest of the server because of the use of
memory for query execution and (TDB1) memory for transactions. For the
file handling, it's used for the node cache only.
The OS file cache is in the RAM otherwise unused by applications. It
flexs up and down based on space unused in applications. Not allocating
all RAM to the applications (heaps) improves performance.
When you restart - looks like that 10G is the mapped file space being
dropped. Mapping on-demand in chunks, so on restart it is very small and
grows over time. It should reach a steady state. It should not cause
swapping or GC.
Each index file is mapped 1-1 this way; some indexes only get touched by
unusual queries so don't get mapped in practice. So the extra virtual
memory should be less than the on-disk size (modified by strings take up
more space in RAM than on disk in Java).
The heap will probably grow for other uses.
Java monitoring of the heap size should show the heap in use after a
major GC to be a different, smaller size.
If that is not how it is, there is something to investigate.
Andy
>
> thanks
> danno