Hello Everyone, I have a Solr Cloud cluster running 6.4.2 supporting a large Sitecore install. As part of our CI/CD pipeline we deploy many times a day to a blue/green set-up in AWS. In order to keep our Solr collections in sync across these deployments we create a temporary snapshot at the beginning of the deployment. Each pod refers to their indexes using collection aliases e.g.<collection>_green vs <collection>_blue. During the deployment, we can switch where these aliases are pointed so that essentially we "pause" an index by pointing the alias at a snapshot version of the collection. We can then "resume" indexing by pointing the collection alias back to the live index. This exposes a bug ( SOLR-11616 <https://issues.apache.org/jira/browse/SOLR-11616> ) that causes our deployments to break and get stuck until someone manually goes to recreate the affected collections. According to that ticket it was fixed and patched in Solr 7.2 and on. Unfortunately we cannot upgrade as Solr 7.x is not compatible with our application (Sitecore CMS).
It seems to occur randomly and I haven't been able to get a solid repro case yet. I did notice that forcing a leader election in the cloud can heal the problem (by restarting the current leader host). I wasn't able to figure out how to change the leader in a satisfactory way via the Solr API so I would just have to cycle all of my Solr Cloud nodes in order to automate a solution, which isn't ideal. Has anyone else wrestled with that bug or have a suggestion for how to minimize the impact? Thanks, Joe *Here are some of the error messages:* org.apache.solr.common.SolrException: Exception while restoring the backup index at org.apache.solr.handler.RestoreCore.doRestore(RestoreCore.java:130) at org.apache.solr.handler.admin.RestoreCoreOp.execute(RestoreCoreOp.java:65) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:377) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:379) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.file.NoSuchFileException: /var/solr/data/itembuckets_commerce_products_web_index_snapshot_shard1_replica0/data/restore.20180627124334892/_10n.si at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:192) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137) at org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:355) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:286) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:938) at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:125) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:240) at org.apache.solr.update.DefaultSolrCoreState.changeWriter(DefaultSolrCoreState.java:203) at org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:212) at org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:686) at org.apache.solr.handler.RestoreCore.doRestore(RestoreCore.java:108) org.apache.solr.common.SolrException: Failed to backup core=sitecore_web_index_shard1_replica1 because java.nio.file.NoSuchFileException: /var/solr/data/sitecore_web_index_shard1_replica1/data/index/segments_268 at org.apache.solr.handler.admin.BackupCoreOp.execute(BackupCoreOp.java:80) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:377) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:379) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.file.NoSuchFileException: /var/solr/data/sitecore_web_index_shard1_replica1/data/index/segments_268 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:192) at org.apache.lucene.store.Directory.copyFrom(Directory.java:177) at org.apache.solr.core.backup.repository.LocalFileSystemRepository.copyFileFrom(LocalFileSystemRepository.java:145) at org.apache.solr.handler.SnapShooter.createSnapshot(SnapShooter.java:218) at org.apache.solr.handler.SnapShooter.createSnapshot(SnapShooter.java:170) at org.apache.solr.handler.admin.BackupCoreOp.execute(BackupCoreOp.java:78) ... 33 more *I also saw these but I'm not sure if they are related:* it is unusual to create a collection without cores solrindexwriter was not closed prior to finalize() indicates a bug --possible resource leak -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html