So the issue is that memory goes up, that is the heap expands to the
maximum Xmx size set? The JVM does not return any heap back to the OS
(as far as I know) so if all the applications grow their heaps, the
real RAM to match that or swapping may result.
Hi Andy,
thanks for taking the time to help.
The problem is that the NON-HEAP memory usage skyrockets.
I "allocate" memory for the heap. The gc logs suggested that I was
never exceeding 6GB of heap in use, even when things went to hell. So I
set the heap to 10GB.
Now that I know we're using NIO, I "allocate" memory for NIO to hold the
entire index in ram. the db is 2.4GB on disk. I don't know NIO well
but this seems plausible.
let's throw another gig at java for its own internal use.
That would add up to 10 + 2.4 + 1 = 13.4GB of memory i might expect java
to use. There's nothing else on the server except apache, linux, and a
few system daemons (postfix, etc).
I upgraded to 3.7 and put fuseki on its own AWS instance last night. RAM
was 16GB and swap 10GB.
once today it filled ram & swap such that linux whacked the jvm
process. Two other times today it was swapping heavily (5GB or swap
used) and we restarted fuseki before the system ran out of swap.
For some reason, the JVM running fuseki+jetty is going nuts with its
memory usage. It *is* using more heap than usual when this happens, but
it's not using more than the 10GB I allocated. At least, not according
to the garbage collection logs.
We have had this problem a few times in the past - memory usage would
spike drastically. We'd always attributed it to a slow memory leak, and
decided we should restart fuseki regularly. But in the last couple
weeks it's happened probably a dozen times.
after the third time today, I put it on a 32GB instance. Of course, the
problem hasn't happened since.
A couple of possibilities:
1/ A query does an ORDER BY that involves a large set of results to
sort. This then drives up the heap requirement, the JVM gorws the heap
and now the process is larger. There may well be a CPU spike at this
time.
2/ Updates are building up. The journal isn't flushed to the main
database until there is a quiet moment and with the high query rate
you may get bursts of time when it is not quiet. The updates are safe
in the journal (the commit happened) but also in-memory as an overlay
on the database. The overlays are collapsed when there are no readers
or writers.
What might be happening is that there isn't a quiet moment.
The traffic is certainly steady - it was about 1500 hits/minute today
when we first crashed.
Big sudden jump would imply a big update as well.
Setting the log into INFO (and, yes, at load it does get big)
What you are looking for is overlaps of query/updates so that the log
shows true concurrent execution (i.e [1] starts, [2] starts, [1]
finishes logged after [2] starts) around the time the size grows
quickly and check the size of updates.
I will look for this. I am dubious, though. We don't make many writes,
and those we do are not very big. Our dataset is metadata about our
archive. The archive is 50 years old, and grows steadily but slowly.
we had disabled the fuseki log but left httpd logging enabled because
each was huge. Unfortunately the updates were all in POSTs, which i
hadn't noticed until i went looking just now. So I will have to wait
until next time.
thanks
danno