We also use max=3 on the httpd front end to limit the impact of parallel
requests. Not perfect by any means but it does no harm.
Good idea, I could see if I can implement this kind of restriction in
Varnish which is in front of Fuseki.
Thanks again Andy, this turned out to be the key.
My current understanding of the situation: With lots of parallel
requests, Fuseki simply gets overwhelmed, eats too much memory and the
JVM GC starts thrashing. By default, there is either no limit on the
number of parallel requests in Fuseki or the limits are pretty high (I
couldn't tell which). Also the queue for incoming requests has no upper
limit, so even when the situation starts clearing up there may be a long
backlog of requests to process. I also found ARQ query timeouts
ineffective, I think they only consider the query phase but not all the
waiting which affects total request time. So if a query has to wait in
some queue for a long time before processing starts, it will not time
out regardless of the timeout setting.
Limiting parallel requests in Varnish using the .max_connections
setting, as I tried first, is maybe not such a good idea, because
Varnish will fail fast when the limit is reached and return with an HTTP
error. It would be better to wait a while, in case the congestion is
just temporary.
I found out that Fuseki can use a custom Jetty configuration (using the
--jetty-config=jetty.xml parameter) where limits can be set on the
thread count and the queue size for waiting requests. I determined by
experimentation that a Jetty thread count close to the number of CPU
cores (4) in the system makes the most sense - at least for me, queries
are generally CPU bound as the TDB database usually fits in disk cache.
Anything above that will generally just increase Fuseki memory
consumption with no improvement in performance.
My jetty.xml is attached. I have set the thread count to between 4 and 6
and the size of the request queue to 100 (never reached in my tests). In
my stress tests, memory consumption (Fuseki process RSS) maxed at 6.6GB
and I have set -Xmx to 8GB, so there is still some breathing space.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN"
"http://www.eclipse.org/jetty/configure.dtd">
<!--
Reference: http://wiki.eclipse.org/Jetty/Reference/jetty.xml_syntax
http://wiki.eclipse.org/Jetty/Reference/jetty.xml
-->
<Configure id="Fuseki" class="org.eclipse.jetty.server.Server">
<Call name="addConnector">
<Arg>
<!-- org.eclipse.jetty.server.nio.BlockingChannelConnector -->
<!-- org.eclipse.jetty.server.nio.SelectChannelConnector -->
<New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
<!-- BlockingChannelConnector specific:
<Set name="useDirectBuffer">false</Set>
-->
<!-- Only listen to interface ...
<Set name="host">localhost</Set>
-->
<Set name="port">3030</Set>
<Set name="maxIdleTime">0</Set>
<!-- All connectors -->
<Set name="requestHeaderSize">65536</Set> <!-- 64*1024 -->
<Set name="requestBufferSize">5242880</Set> <!-- 5*1024*1024 -->
<Set name="responseBufferSize">5242880</Set> <!-- 5*1024*1024 -->
</New>
</Arg>
</Call>
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<!-- specify a bounded queue -->
<Arg>
<New class="java.util.concurrent.ArrayBlockingQueue">
<Arg type="int">100</Arg>
</New>
</Arg>
<Set name="minThreads">4</Set>
<Set name="maxThreads">6</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
</Configure>