[jira] [Commented] (SOLR-6875) No data integrity between replicas

2015-06-12 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583657#comment-14583657
 ] 

Erick Erickson commented on SOLR-6875:
--

Do any of the logs on the leaders mention leader initiated recovery? And how 
fast are you sending documents at Solr? I've seen situations where flooding 
too many updates at Solr can cause some wonky behavior, there are some 
inefficiencies in how leaders talk to replicas, see Tim Potter's blog here: 
http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/

The symptom I saw was two-fold:
1 the leader forced the follower into recovery. No errors reported on the 
follower, just a timeout on the leader
2 There were a bazillion updates coming in as fast as possible, there were a 
lot of threads outstanding on the leader from ConcurrentUpdateSolrServer.

Not saying this is your problem, but if you see something like this it'd be 
good to know when tracking this down. If you don't have followers going down 
then this isn't the issue.

 No data integrity between replicas
 --

 Key: SOLR-6875
 URL: https://issues.apache.org/jira/browse/SOLR-6875
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.2
 Environment: One replica is @ Linux solr1.devops.wegohealth.com 
 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux
 Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic 
 #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 Solr is running with the next options:
 * -Xms12G
 * -Xmx16G
 * -XX:+UseConcMarkSweepGC
 * -XX:+UseLargePages
 * -XX:+CMSParallelRemarkEnabled
 * -XX:+ParallelRefProcEnabled
 * -XX:+UseLargePages
 * -XX:+AggressiveOpts
 * -XX:CMSInitiatingOccupancyFraction=75
Reporter: Alexander S.
 Attachments: replica1.png, replica2.png


 Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total.
 Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, 
 and another (Solr1.1) 45 574 038 docs.
 Solr1 is the leader, these errors appeared in the logs:
 {code}
 ERROR - 2014-12-20 09:54:38.783; 
 org.apache.solr.update.StreamingSolrServers$1; error
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
 at 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
 at 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
 at 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
 at 
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
 at 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
 at 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
 at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
 at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 WARN  - 2014-12-20 09:54:38.787; 
 org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending 
 update
 java.net.SocketException: Connection reset
 at 

[jira] [Commented] (SOLR-6875) No data integrity between replicas

2015-01-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877
 ] 

Alexander S. commented on SOLR-6875:


Now we have 4 shards, each with 2 replics (8 total nodes) and the next picture:
{noformat}
Shard 1:
  Replica 1: 14 486 089
  Replica 2: 14 496 445

Shard 2
  Replica 1: 14 496 609
  Replica 2: 14 496 609

Shard 3
  Replica 1: 14 492 812
  Replica 2: 14 492 812

Shard 4
  Replica 1: 14 488 755
  Replica 2: 14 488 755
{noformat}

How could it be? We didn't see anything like that before upgrade from 4.8.1 to 
4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason?

 No data integrity between replicas
 --

 Key: SOLR-6875
 URL: https://issues.apache.org/jira/browse/SOLR-6875
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.2
 Environment: One replica is @ Linux solr1.devops.wegohealth.com 
 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux
 Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic 
 #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 Solr is running with the next options:
 * -Xms12G
 * -Xmx16G
 * -XX:+UseConcMarkSweepGC
 * -XX:+UseLargePages
 * -XX:+CMSParallelRemarkEnabled
 * -XX:+ParallelRefProcEnabled
 * -XX:+UseLargePages
 * -XX:+AggressiveOpts
 * -XX:CMSInitiatingOccupancyFraction=75
Reporter: Alexander S.

 Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total.
 Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, 
 and another (Solr1.1) 45 574 038 docs.
 Solr1 is the leader, these errors appeared in the logs:
 {code}
 ERROR - 2014-12-20 09:54:38.783; 
 org.apache.solr.update.StreamingSolrServers$1; error
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
 at 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
 at 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
 at 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
 at 
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
 at 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
 at 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
 at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
 at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 WARN  - 2014-12-20 09:54:38.787; 
 org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending 
 update
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 at