Lindsay Martin created SOLR-6983:
------------------------------------

             Summary: SocketExceptions no longer trigger retries when 
processing distributed updates
                 Key: SOLR-6983
                 URL: https://issues.apache.org/jira/browse/SOLR-6983
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
    Affects Versions: 4.7
            Reporter: Lindsay Martin


Our production Solr cluster is frequently placing replicas into 
leader-initiated recovery whenever a "java.net.SocketException: Connection 
reset" is thrown when processing distributed updates.

This problem surfaced after upgrading from Solr 4.6.1 to Solr 4.10.2. In the 
old version, a retry was attempted whenever a SocketException was encountered 
when a leader was updating a replica. After the upgrade to Solr 4.10.2, this 
retry mechanism no longer occurs.

Here is an example stacktrace:
{noformat}
2015-01-11 09:38:00.913 [updateExecutor-1-thread-35734] ERROR 
org.apache.solr.update.StreamingSolrServers  – error
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
        at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
        at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
        at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
        at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
        at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
        at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
        at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
        at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
        at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
        at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
        at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
        at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
        at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2015-01-11 09:38:00.917 [qtp268575911-3616964] WARN  
org.apache.solr.update.processor.DistributedUpdateProcessor  – Error sending 
update
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
        at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
        at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
        at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
        at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
        at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
        at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
        at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
        at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
        at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
        at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
        at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
        at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
        at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2015-01-11 09:38:00.958 [qtp268575911-3616964] ERROR 
org.apache.solr.update.processor.DistributedUpdateProcessor  – Setting up to 
try to start recovery on replica http://solr26:8983/solr/listings/ after: 
java.net.SocketException: Connection reset
{noformat}

The underlying SocketException may be related to 
[SOLR-6931|https://issues.apache.org/jira/browse/SOLR-6931] and the disabling 
of the HttpClient stale connection check.

It appears that the retry logic was removed from the SolrCommandDistributor 
starting with 4.7 to address 
[SOLR-5509|https://issues.apache.org/jira/browse/SOLR-5509]. See 
https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java?r1=1546672&r2=1546164&pathrev=1546672

Is there any way the retry logic could be restored for Solr 4.10.x ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to