Ah, I missed the distinction between the mmap'd files and the
bytebuffers. Seems unlikely that in particular is the problem.
It also occurred to me along the way that the repeated incidences
roughly coordinate with the jvm 1.8.0_171 release. That said, we've had
memory use balloon for no apparent reason occasionally in the past, also.
Andy Seaborne wrote on 6/20/18 11:44 AM:
On 20/06/18 16:04, Dan Pritts wrote:
Andy Seaborne wrote on 6/20/18 6:43 AM:
For a database that does fit in RAM, the actual RAM size is a bit
bigger than the disk size (e.g. nodes are UTF-8 on disk and Java
strings in RAM so x2 bytes + string overheads and this is in-heap).
Yeah, in-heap memory isn't the primary issue here, except that NIO
is apparently using out-of-heap memory to cache everything it puts in
the heap. Crazy.
BufferAllocatorDirect is not used by default - it is used only is
journal spilling to disk for large transactions is enabled which is
not the default. (aside to all: if you are in this situation, please
consider TDB2)
Is TDB2 considered production ready?
Yes - although it is less used than TDB1 due to being younger.
(TDB1 remains the better choice for many small updates)
on a related note, is fuseki compatible with java 9? I looked
through the release notes in Jira and didn't see anything one way or
another, just the bit about being able to build it with java 9.
Should be.
What does not work is building with Java9+ and running on Java8 even
with a target of Java8. The JDK runtime library for ByteBuffers has
changed some method return signatures.
It is not specific to Jena:
https://stackoverflow.com/questions/48693695/java-nio-buffer-not-loading-clear-method-on-runtime
https://jira.mongodb.org/browse/JAVA-2559
...and a separate BufferAllocatorMapped class. I'm not a java
programmer, so it's not simple for me to track what's getting used
where, but the TODO makes me wonder.
Good find - the TODO is misleading though.
Direct ByteBuffers aren't used normally.
My interpretation of the article I posted is that the direct
bytebuffers are preferable to the in-heap ones.
pros and cons.
They are faster but the optimizer gets better and better at avoiding
the overheads.
They require more management - they need to be freed etc because they
are not in the heap (they are malloc space).
The memory mapped files are like "direct" byte buffers : same lower
overheads and also no copy from file system cache to into the JVM.
Of course, that's just one article, and I don't know what
disadvantages they have. One obvious one is that there aren't knobs
to control the size of the buffers like there are for the heap.
If you get a chance to dump threads and heap as per Rob's
suggestion, that would be great. I can't see a reason for what you
are seeing at the moment.
I'll send you and Rob some stuff directly, it's pretty big. I was
holding off sending until i had a repeat occurrence.
--
Dan Pritts
ICPSR Computing & Network Services
University of Michigan
<https://www.postbox-inc.com>