Re: Meet CorruptIndexException while shutdown one node in Solr cloud
Hi Erick, Thanks for your advice about having openSearcher set to true unnecessary for my case. For CorruptIndexException issue, I think Solr should handle this quite well too. Because I always shutdown tomcat gracefully. Recently I did a couple of tests about this issue. When keep posting update request to Solr and stop one of three tomcat node in a single shard cluster, it is easy to reproduction CorruptIndexException, no matter the stop node is leader node or replica node. So I think this is a Bug of Solr. Any idea how can I avoid meeting this issue? For example if I can remove one node from zookeeper before stop it. Also please show me if reboot tomcat node is the only way to resolve the memory issue. If I can control the field cache size, then reboot is unnecessary. Below is the trace when start tomcat and first time meet CorruptIndexException issue: 2017-09-19 10:18:57,614 ERROR [RecoveryThread][RQ-Init] (SolrException.java:142) - SnapPull failed :org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677) at org.apache.solr.handler.SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:673) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:493) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:163) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:447) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235) Caused by: org.apache.lucene.index.CorruptIndexException: liveDocs.count()=10309577 info.docCount=15057819 info.getDelCount()=4748252 (filename=_4y65a_13g.del) at org.apache.lucene.codecs.lucene40.Lucene40LiveDocsFormat.readLiveDocs(Lucene40LiveDocsFormat.java:96) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:116) at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:144) at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:238) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:104) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:422) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:279) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476) ... 7 more Regards. Geng, Wei -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Meet CorruptIndexException while shutdown one node in Solr cloud
Hi team, Currently I am using Solr 4.10 in tomcat. I have a one shard Solr Cloud with 3 replicas. I set heap size to 15GB for each node. As I have big data volume and large amount of query request. So always meet frequent full GC issue. We have checked this and found that many memory was used as field cache by Solr. To avoid this, we begin to reboot tomcat instance one by one in schedule. We don't kill any process but run script "catalina.sh stop" to shutdown tomcat gracefully. To keep message not pending, we receive message from user all the time and send update request to Solr once get new message. This means Solr may get update request during shutdown. I think that is the reason we get CorruptIndexException. Since we begin to do the reboot, we always get CorruptIndexException. The trace is as below: 2017-09-14 04:25:49,241 ERROR[commitScheduler-15-thread-1][R31609](CommitTracker) - auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:607) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.lucene.index.CorruptIndexException: liveDocs.count()=33574 info.docCount=34156 info.getDelCount()=584 (filename=_1uvck_k.del) at org.apache.lucene.codecs.lucene40.Lucene40LiveDocsFormat.readLiveDocs(Lucene40LiveDocsFormat.java:96) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:116) at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:144) at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282) at org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:279) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476) ... 10 more As we shutdown Solr gracefully, I think Solr should be strong enough to handle this case. Please give me some advice about why this happen and what we can do to avoid this. Ps below is some of our solrConfig cotent: 6 true 1000 Regards, Geng, Wei -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Create too many zookeeper connections when recreate CloudSolrServer instance
Hi Walter, Shawn, Thanks for your quickly reply, the information you provide is really helpful. Now I know how to find a right way to resolve my issue. Regards, Geng, Wei -- View this message in context: http://lucene.472066.n3.nabble.com/Create-too-many-zookeeper-connections-when-recreate-CloudSolrServer-instance-tp4346040p4346944.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Create too many zookeeper connections when recreate CloudSolrServer instance
I am not mean my Zookeeper cluster is rebooting frequently, just want to ensure my query service can be stable when Zookeeper cluster has issue or reboot. Will do some test to check if there is some issue here. Maybe current Zookeeper client can handle this case well. Hacking the client will always be the last choice. Regards, Geng, Wei -- View this message in context: http://lucene.472066.n3.nabble.com/Create-too-many-zookeeper-connections-when-recreate-CloudSolrServer-instance-tp4346040p4346528.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Create too many zookeeper connections when recreate CloudSolrServer instance
Hi Shawn, Thanks for your detail explanation. The reason I want to shutdown the CloudSolrServer instance and create a new one is that I have concern that if it can successfully reconnect to Zookeeper server if Zookeeper cluster has some issue and reboot. I will do related test with version 6.5.0, which is the version I want to upgrade to. If there is any issue, I will report the issue to you and your team as you suggested. Anyway I will abandon the way that shutdown/close the CloudSolrServer instance and create a new one. The alternative opinion is to manage Zookeeper connection myself by extending Class ZkClientClusterStateProvider. Regards, Geng, Wei -- View this message in context: http://lucene.472066.n3.nabble.com/Create-too-many-zookeeper-connections-when-recreate-CloudSolrServer-instance-tp4346040p4346295.html Sent from the Solr - User mailing list archive at Nabble.com.
Create too many zookeeper connections when recreate CloudSolrServer instance
Hi Community, I use Solr(4.10.2) as indexing tool. I use a singleton CloudSolrServer instance to query Solr. When meet exception, for example current Solr server not response, i will create a new CloudSolrServer instance and shutdown the old one. We have many query threads that share the same CloudSolrServer instance. In a case, when thread A meet an Exception it create a new CloudSolrServer instance and begin to shutdown current CloudSolrServer, from Solr code I know the first step is to close the Zookeeper connection; while at the same time, thread B may still doing query with this instance, the first step of query is to check Zookeeper connection, if the connection is not exist, then create one. Thread A can processed to do the shutdown. Then the Zookeeper connection created by thread B is over there without access. Due to this, we may have more and more zookeeper connections at the same time till we can't create one new and get below exception on zookeeper server side: 2017-07-06 09:42:37,595 [myid:5] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:10199:NIOServerCnxnFactory@193] - Too many connections from /169.171.87.37 - max is 60 So I just want to know if I operate CloudSolrServer in a wrong way and do you have any suggestions about how to fill my requirement. Regards, Geng, Wei -- View this message in context: http://lucene.472066.n3.nabble.com/Create-too-many-zookeeper-connections-when-recreate-CloudSolrServer-instance-tp4346040.html Sent from the Solr - User mailing list archive at Nabble.com.