TDB Java heap space and OutOfMemory errors

Osma Suominen Thu, 05 Jun 2014 04:48:07 -0700

Hi Andy!

On 05/06/14 14:28, Andy Seaborne wrote:

Osma,


What are the long running queries?


Here's one (sorry for the messed up line breaks):

2014-06-04 11:26:42,705 INFO Fuseki :: [14739587] Query =PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX skos:<http://www.w3.org/2004/02/skos/core#> PREFIX text:<http://jena.apache.org/text#> SELECT DISTINCT ?s ?label ?alabel?hlabel ?graph (GROUP_CONCAT(?type) as ?types) WHERE { GRAPH<http://www.yso.fi/onto/puho/> { { ?s text:query ('ampumavä*' 100000000)} { ?s rdf:type<http://www.w3.org/2004/02/skos/core#Concept> } {?s skos:prefLabel ?label . ?srdf:type ?type . ?s ?prop ?match . FILTER (strstarts(lcase(str(?match)), 'ampumavä') && !(?match !=?label && strstarts(lcase(str(?label)), 'ampumavä'))&& langMatches(lang(?label), lang(?match))) } FILTER NOT EXISTS {?s owl:deprecatedtrue } } BIND(IF(?prop = skos:altLabel, ?match,?unbound) as ?alabel) BIND(IF(?prop = skos:hiddenLabel, ?match,?unbound) as ?hlabel) VALUES (?prop) { (skos:prefLabel)(skos:altLabel) (skos:hiddenLabel) } } GROUP BY ?match ?s?label ?alabel ?hlabel ?graph ?prop ORDER BY lcase(str(?match))lang(?match)2014-06-04 11:39:06,670 WARN Fuseki :: [14739587] RC = 500: GC overhead limit exceeded

java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-06-04 11:43:54,817 INFO Fuseki :: [14739587] 500 GCoverhead limit exceeded (1,032.111 s)

This type of query normally takes less than one second. Now it took morethan 15 minutes.

It is possible that updates are unflushed (a bit surprising given the
length of time since over night updates) - you can check this by looking
the the journal file and if is zero length, there are no outstanding
commits waiting to be flushed.

It's too late now to check what size the journal was when this happenedyesterday, but I checked it for today, and all the three *jrnl files hada size of zero and a timestamp of 04:57, which was the time when thelast updates finished. So at least last night, the flush completed verysoon after the update.

But the size locked down by unflushed commits does not change due to
read load.

Queries can uses a lot of memory and several in parallel would also
cause OOME.

It is possible that there happened to be a large number of parallelqueries. However, looking at the log files, I couldn't find any notablepeak.

How many datasets are on this server?

Only one TDB dataset (no inference), with a jena-text index. Within thisdataset there are about thirty named graphs, and most queries arelimited to a single named graph.

On AWS, we have seen virtualization hardware "go bad" (I can't explain
it any better).  Only seen on old hardware, m1 generation) .  A server,
for no reason we can determine [*], simply starts having very high load,
makes very slow progress but is functionally fine.  But it's randomly
going slow, queries build up which can means more memory in active use
at anyone point meaning OOME is possible.  This is not a common occurrence.

[*] We allowed for cycle stealing from co-resident VMs - the slow down
is 10x scale.

This is not AWS, but a VMware setup. It is possible that there was somecontention for resources from other VMs running on the same hardwarethat could have caused queries to run slower and thus build up and uselots of memory.

As first aid, I will increase the amount of memory to 8GB (from 6GB) andam investigating ways to detect this situation from outside Fuseki(which means I must be able to replicate the situation - will tryrunning Fuseki with low amounts of memory) so that I can give it a kickif it happens again.


-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: Jena/Fuseki/TDB Java heap space and OutOfMemory errors

Reply via email to