Hi Andy!
On 05/06/14 14:28, Andy Seaborne wrote:
Osma,
What are the long running queries?
Here's one (sorry for the messed up line breaks):
2014-06-04 11:26:42,705 INFO Fuseki :: [14739587] Query =
PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX skos:
<http://www.w3.org/2004/02/skos/core#> PREFIX text:
<http://jena.apache.org/text#> SELECT DISTINCT ?s ?label ?alabel
?hlabel ?graph (GROUP_CONCAT(?type) as ?types) WHERE { GRAPH
<http://www.yso.fi/onto/puho/> { { ?s text:query ('ampumavä*' 100000000)
} { ?s rdf:type
<http://www.w3.org/2004/02/skos/core#Concept> } {
?s skos:prefLabel ?label . ?s
rdf:type ?type . ?s ?prop ?match . FILTER (
strstarts(lcase(str(?match)), 'ampumavä') && !(?match !=
?label && strstarts(lcase(str(?label)), 'ampumavä'))
&& langMatches(lang(?label), lang(?match))
) } FILTER NOT EXISTS {?s owl:deprecated
true } } BIND(IF(?prop = skos:altLabel, ?match,
?unbound) as ?alabel) BIND(IF(?prop = skos:hiddenLabel, ?match,
?unbound) as ?hlabel) VALUES (?prop) { (skos:prefLabel)
(skos:altLabel) (skos:hiddenLabel) } } GROUP BY ?match ?s
?label ?alabel ?hlabel ?graph ?prop ORDER BY lcase(str(?match))
lang(?match)
2014-06-04 11:39:06,670 WARN Fuseki :: [14739587] RC = 500
: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
2014-06-04 11:43:54,817 INFO Fuseki :: [14739587] 500 GC
overhead limit exceeded (1,032.111 s)
This type of query normally takes less than one second. Now it took more
than 15 minutes.
It is possible that updates are unflushed (a bit surprising given the
length of time since over night updates) - you can check this by looking
the the journal file and if is zero length, there are no outstanding
commits waiting to be flushed.
It's too late now to check what size the journal was when this happened
yesterday, but I checked it for today, and all the three *jrnl files had
a size of zero and a timestamp of 04:57, which was the time when the
last updates finished. So at least last night, the flush completed very
soon after the update.
But the size locked down by unflushed commits does not change due to
read load.
Queries can uses a lot of memory and several in parallel would also
cause OOME.
It is possible that there happened to be a large number of parallel
queries. However, looking at the log files, I couldn't find any notable
peak.
How many datasets are on this server?
Only one TDB dataset (no inference), with a jena-text index. Within this
dataset there are about thirty named graphs, and most queries are
limited to a single named graph.
On AWS, we have seen virtualization hardware "go bad" (I can't explain
it any better). Only seen on old hardware, m1 generation) . A server,
for no reason we can determine [*], simply starts having very high load,
makes very slow progress but is functionally fine. But it's randomly
going slow, queries build up which can means more memory in active use
at anyone point meaning OOME is possible. This is not a common occurrence.
[*] We allowed for cycle stealing from co-resident VMs - the slow down
is 10x scale.
This is not AWS, but a VMware setup. It is possible that there was some
contention for resources from other VMs running on the same hardware
that could have caused queries to run slower and thus build up and use
lots of memory.
As first aid, I will increase the amount of memory to 8GB (from 6GB) and
am investigating ways to detect this situation from outside Fuseki
(which means I must be able to replicate the situation - will try
running Fuseki with low amounts of memory) so that I can give it a kick
if it happens again.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi