Hi Andy!

On 05/06/14 14:28, Andy Seaborne wrote:
Osma,

What are the long running queries?

Here's one (sorry for the messed up line breaks):

2014-06-04 11:26:42,705 INFO Fuseki :: [14739587] Query = PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX text: <http://jena.apache.org/text#> SELECT DISTINCT ?s ?label ?alabel ?hlabel ?graph (GROUP_CONCAT(?type) as ?types) WHERE { GRAPH <http://www.yso.fi/onto/puho/> { { ?s text:query ('ampumavä*' 100000000) } { ?s rdf:type <http://www.w3.org/2004/02/skos/core#Concept> } { ?s skos:prefLabel ?label . ?s rdf:type ?type . ?s ?prop ?match . FILTER ( strstarts(lcase(str(?match)), 'ampumavä') && !(?match != ?label && strstarts(lcase(str(?label)), 'ampumavä')) && langMatches(lang(?label), lang(?match)) ) } FILTER NOT EXISTS {?s owl:deprecated true } } BIND(IF(?prop = skos:altLabel, ?match, ?unbound) as ?alabel) BIND(IF(?prop = skos:hiddenLabel, ?match, ?unbound) as ?hlabel) VALUES (?prop) { (skos:prefLabel) (skos:altLabel) (skos:hiddenLabel) } } GROUP BY ?match ?s ?label ?alabel ?hlabel ?graph ?prop ORDER BY lcase(str(?match)) lang(?match) 2014-06-04 11:39:06,670 WARN Fuseki :: [14739587] RC = 500 : GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
2014-06-04 11:43:54,817 INFO Fuseki :: [14739587] 500 GC overhead limit exceeded (1,032.111 s)


This type of query normally takes less than one second. Now it took more than 15 minutes.

It is possible that updates are unflushed (a bit surprising given the
length of time since over night updates) - you can check this by looking
the the journal file and if is zero length, there are no outstanding
commits waiting to be flushed.

It's too late now to check what size the journal was when this happened yesterday, but I checked it for today, and all the three *jrnl files had a size of zero and a timestamp of 04:57, which was the time when the last updates finished. So at least last night, the flush completed very soon after the update.

But the size locked down by unflushed commits does not change due to
read load.

Queries can uses a lot of memory and several in parallel would also
cause OOME.

It is possible that there happened to be a large number of parallel queries. However, looking at the log files, I couldn't find any notable peak.

How many datasets are on this server?

Only one TDB dataset (no inference), with a jena-text index. Within this dataset there are about thirty named graphs, and most queries are limited to a single named graph.

On AWS, we have seen virtualization hardware "go bad" (I can't explain
it any better).  Only seen on old hardware, m1 generation) .  A server,
for no reason we can determine [*], simply starts having very high load,
makes very slow progress but is functionally fine.  But it's randomly
going slow, queries build up which can means more memory in active use
at anyone point meaning OOME is possible.  This is not a common occurrence.

[*] We allowed for cycle stealing from co-resident VMs - the slow down
is 10x scale.

This is not AWS, but a VMware setup. It is possible that there was some contention for resources from other VMs running on the same hardware that could have caused queries to run slower and thus build up and use lots of memory.

As first aid, I will increase the amount of memory to 8GB (from 6GB) and am investigating ways to detect this situation from outside Fuseki (which means I must be able to replicate the situation - will try running Fuseki with low amounts of memory) so that I can give it a kick if it happens again.

-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to