Hi everyone, We had an OOM event earlier this morning. This has caused one of our shards to lose all it's replicas and it's leader is still in a down state. We have restarted the Java process (solr) and it's still in a down state. Logs below:
``` Feb 25, 2021 @ 11:46:43.000 2021-02-25 00:46:43.268 WARN (updateExecutor-3-thread-1-processing-n:10.0.10.43:8983_solr x:search-collection-2018-10-30_shard2_5_replica_n1480 c:search-collection-2018-10-30 s:shard2_5 r:core_node1481) [c:search-collection-2018-10-30 s:shard2_5 r:core_node1481 x:search-collection-2018-10-30_shard2_5_replica_n1480] o.a.s.c.RecoveryStrategy Stopping recovery for core=[search-collection-2018-10-30_shard2_5_replica_n1480] coreNodeName=[core_node1481] ∎ Feb 25, 2021 @ 11:46:40.000 2021-02-25 00:46:40.759 WARN (zkCallback-7-thread-2) [c:search-collection-2018-10-30 s:shard2_5 r:core_node1481 x:search-collection-2018-10-30_shard2_5_replica_n1480] o.a.s.c.RecoveryStrategy Stopping recovery for core=[search-collection-2018-10-30_shard2_5_replica_n1480] coreNodeName=[core_node1481] ∎ Feb 25, 2021 @ 11:46:35.000 2021-02-25 00:46:35.761 WARN (zkCallback-7-thread-2) [c:search-collection-2018-10-30 s:shard2_5 r:core_node1481 x:search-collection-2018-10-30_shard2_5_replica_n1480] o.a.s.c.RecoveryStrategy Stopping recovery for core=[search-collection-2018-10-30_shard2_5_replica_n1480] coreNodeName=[core_node1481] ∎ Feb 25, 2021 @ 11:46:33.000 2021-02-25 00:46:33.270 WARN (updateExecutor-3-thread-2-processing-n:10.0.10.43:8983_solr x:search-collection-2018-10-30_shard2_5_replica_n1480 c:search-collection-2018-10-30 s:shard2_5 r:core_node1481) [c:search-collection-2018-10-30 s:shard2_5 r:core_node1481 x:search-collection-2018-10-30_shard2_5_replica_n1480] o.a.s.c.RecoveryStrategy Stopping recovery for core=[search-collection-2018-10-30_shard2_5_replica_n1480] coreNodeName=[core_node1481] ∎ Feb 25, 2021 @ 11:46:30.000 2021-02-25 00:46:30.759 WARN (zkCallback-7-thread-2) [c:search-collection-2018-10-30 s:shard2_5 r:core_node1481 x:search-collection-2018-10-30_shard2_5_replica_n1480] o.a.s.c.RecoveryStrategy Stopping recovery for core=[search-collection-2018-10-30_shard2_5_replica_n1480] coreNodeName=[core_node1481] ∎ Feb 25, 2021 @ 11:46:25.000 2021-02-25 00:46:25.761 WARN (zkCallback-7-thread-2) [c:search-collection-2018-10-30 s:shard2_5 r:core_node1481 x:search-collection-2018-10-30_shard2_5_replica_n1480] o.a.s.c.RecoveryStrategy Stopping recovery for core=[search-collection-2018-10-30_shard2_5_replica_n1480] coreNodeName=[core_node1481] ∎ Feb 25, 2021 @ 11:46:23.000 2021-02-25 00:46:23.279 WARN (updateExecutor-3-thread-1-processing-n:10.0.10.43:8983_solr x:search-collection-2018-10-30_shard2_5_replica_n1480 c:search-collection-2018-10-30 s:shard2_5 r:core_node1481) [c:search-collection-2018-10-30 s:shard2_5 r:core_node1481 x:search-collection-2018-10-30_shard2_5_replica_n1480] o.a.s.c.RecoveryStrategy Stopping recovery for core=[search-collection-2018-10-30_shard2_5_replica_n1480] coreNodeName=[core_node1481] ∎ ``` Questions: 1. Is there anything we can do to force this core to go live? 2. If the core is unrecoverable, is there a way to clear the core up such that we can reindex only that shard? Any other advice would be great too :) Ash -- ** ** <https://www.canva.com/>Empowering the world to design Share accurate information on COVID-19 and spread messages of support to your community. Here are some resources <https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr&utm_source=news&utm_campaign=covid19_templates> that can help. <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://instagram.com/canva>