So the issue is that memory goes up, that is the heap expands to the maximum Xmx size set? The JVM does not return any heap back to the OS (as far as I know) so if all the applications grow their heaps, the real RAM to match that or swapping may result.
Hi Andy,

thanks for taking the time to help.

The problem is that the NON-HEAP memory usage skyrockets.

I "allocate" memory for the heap. The gc logs suggested that I was never exceeding 6GB of heap in use, even when things went to hell. So I set the heap to 10GB.

Now that I know we're using NIO, I "allocate" memory for NIO to hold the entire index in ram. the db is 2.4GB on disk. I don't know NIO well but this seems plausible.

let's throw another gig at java for its own internal use.

That would add up to 10 + 2.4 + 1 = 13.4GB of memory i might expect java to use. There's nothing else on the server except apache, linux, and a few system daemons (postfix, etc).

I upgraded to 3.7 and put fuseki on its own AWS instance last night. RAM was 16GB and swap 10GB.

once today it filled ram & swap such that linux whacked the jvm process. Two other times today it was swapping heavily (5GB or swap used) and we restarted fuseki before the system ran out of swap.

For some reason, the JVM running fuseki+jetty is going nuts with its memory usage. It *is* using more heap than usual when this happens, but it's not using more than the 10GB I allocated. At least, not according to the garbage collection logs.

We have had this problem a few times in the past - memory usage would spike drastically. We'd always attributed it to a slow memory leak, and decided we should restart fuseki regularly. But in the last couple weeks it's happened probably a dozen times.

after the third time today, I put it on a 32GB instance. Of course, the problem hasn't happened since.

A couple of possibilities:

1/ A query does an ORDER BY that involves a large set of results to sort. This then drives up the heap requirement, the JVM gorws the heap and now the process is larger. There may well be a CPU spike at this time.

2/ Updates are building up. The journal isn't flushed to the main database until there is a quiet moment and with the high query rate you may get bursts of time when it is not quiet. The updates are safe in the journal (the commit happened) but also in-memory as an overlay on the database. The overlays are collapsed when there are no readers or writers.

What might be happening is that there isn't a quiet moment.
The traffic is certainly steady - it was about 1500 hits/minute today when we first crashed.
Big sudden jump would imply a big update as well.

Setting the log into INFO (and, yes, at load it does get big)

What you are looking for is overlaps of query/updates so that the log shows true concurrent execution (i.e [1] starts, [2] starts, [1] finishes logged after [2] starts) around the time the size grows quickly and check the size of updates.
I will look for this. I am dubious, though. We don't make many writes, and those we do are not very big. Our dataset is metadata about our archive. The archive is 50 years old, and grows steadily but slowly.

we had disabled the fuseki log but left httpd logging enabled because each was huge. Unfortunately the updates were all in POSTs, which i hadn't noticed until i went looking just now. So I will have to wait until next time.

thanks
danno


Reply via email to