We also use max=3 on the httpd front end to limit the impact of parallel
requests.  Not perfect by any means but it does no harm.

Good idea, I could see if I can implement this kind of restriction in
Varnish which is in front of Fuseki.

Thanks again Andy, this turned out to be the key.

My current understanding of the situation: With lots of parallel requests, Fuseki simply gets overwhelmed, eats too much memory and the JVM GC starts thrashing. By default, there is either no limit on the number of parallel requests in Fuseki or the limits are pretty high (I couldn't tell which). Also the queue for incoming requests has no upper limit, so even when the situation starts clearing up there may be a long backlog of requests to process. I also found ARQ query timeouts ineffective, I think they only consider the query phase but not all the waiting which affects total request time. So if a query has to wait in some queue for a long time before processing starts, it will not time out regardless of the timeout setting.

Limiting parallel requests in Varnish using the .max_connections setting, as I tried first, is maybe not such a good idea, because Varnish will fail fast when the limit is reached and return with an HTTP error. It would be better to wait a while, in case the congestion is just temporary.

I found out that Fuseki can use a custom Jetty configuration (using the --jetty-config=jetty.xml parameter) where limits can be set on the thread count and the queue size for waiting requests. I determined by experimentation that a Jetty thread count close to the number of CPU cores (4) in the system makes the most sense - at least for me, queries are generally CPU bound as the TDB database usually fits in disk cache. Anything above that will generally just increase Fuseki memory consumption with no improvement in performance.

My jetty.xml is attached. I have set the thread count to between 4 and 6 and the size of the request queue to 100 (never reached in my tests). In my stress tests, memory consumption (Fuseki process RSS) maxed at 6.6GB and I have set -Xmx to 8GB, so there is still some breathing space.

-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN"
"http://www.eclipse.org/jetty/configure.dtd";>
 
<!-- 
  Reference: http://wiki.eclipse.org/Jetty/Reference/jetty.xml_syntax
  http://wiki.eclipse.org/Jetty/Reference/jetty.xml
-->

<Configure id="Fuseki" class="org.eclipse.jetty.server.Server">
  <Call name="addConnector">
    <Arg>
      <!-- org.eclipse.jetty.server.nio.BlockingChannelConnector -->
      <!-- org.eclipse.jetty.server.nio.SelectChannelConnector -->
      <New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
        <!-- BlockingChannelConnector specific:
             <Set name="useDirectBuffer">false</Set>
        -->
        <!-- Only listen to interface ...
        <Set name="host">localhost</Set>
        -->
        <Set name="port">3030</Set>
        <Set name="maxIdleTime">0</Set>
        <!-- All connectors -->
        <Set name="requestHeaderSize">65536</Set>       <!-- 64*1024 -->
        <Set name="requestBufferSize">5242880</Set>     <!-- 5*1024*1024 -->
        <Set name="responseBufferSize">5242880</Set>    <!-- 5*1024*1024 -->
      </New>
    </Arg>
  </Call>
    <Set name="ThreadPool">
      <New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
        <!-- specify a bounded queue -->
        <Arg>
           <New class="java.util.concurrent.ArrayBlockingQueue">
              <Arg type="int">100</Arg>
           </New>
      </Arg>
        <Set name="minThreads">4</Set>
        <Set name="maxThreads">6</Set>
        <Set name="detailedDump">false</Set>
      </New>
    </Set>
</Configure>

Reply via email to