Re: All cores gone along with all solr configuration upon reboot

2020-08-24 Thread Erick Erickson
this is consistent with the data disappearing from Zookeeper due 
to misconfiguration and/or some external process removing it when
you reboot.

So here’s what I’d do next:

Go ahead and reboot. You do _not_ need to start Solr to run bin/solr
scripts, and among them are 

bin/solr zk ls -r / -z path_to_Zookeeper_ensemble

that should dump a listing of the zk tree. Is it what you expect or 
does it mysteriously disappear when you reboot? If so, you must
track down what it is about your environment that’s deleting the
ZK data on reboot.

When Solr comes back up, it’s saying “Look, there’s no collection
information in Zookeeper but there are replicas on disk. That must
mean someone deleted the collections while I was down, so I’ll
clean up”.

bin/solr zk -help

will show you a lot of unix-like commands for poking around Zookeeper
without having to start Zookeeper. There are also GUI tools out there
you can use.

Again, it’s a near certainty that
1> your ZK data is disappearing when you reboot
2> something external to Solr is doing it.

Best,
Erick


> On Aug 24, 2020, at 1:11 AM, yaswanth kumar  wrote:
> 
> Hi Erick,
> 
> Here is the latest most error that I captured which seems to be actually
> deleting the cores ( I did noticed that the core folders under the path
> ../solr/server/solr were deleted one by one when the server came back from
> reboot)
> 
> 2020-08-24 04:41:27.424 ERROR
> (coreContainerWorkExecutor-2-thread-1-processing-n:9.70.170.51:8080_solr) [
>  ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on$
>at
> org.apache.solr.cloud.ZkController.checkStateInZk(ZkController.java:1875)
> 
> *org.apache.solr.cloud.ZkController$NotInClusterStateException:
> coreNodeName core_node3 does not exist in shard shard1, ignore the
> exception if the replica was deleted*at
> org.apache.solr.cloud.ZkController.checkStateInZk(ZkController.java:1875)
> ~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
> ivera - 2019-07-19$
>at
> org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1774)
> ~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
> ivera - 2019-07-19 15$
>at
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1238)
> ~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
> ivera - 201$
>at
> org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:756)
> ~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
> ivera - 2019-07-19$
>at
> org.apache.solr.core.CoreContainer$$Lambda$343/.call(Unknown
> Source) ~[?:?]
>at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
> ~[metrics-core-4.0.5.jar:4.0.5]
>at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> ~[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e$
>at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown
> Source) ~[?:?]
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> ~[?:?]
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> ~[?:?]
>at java.lang.Thread.run(Thread.java:834) [?:?]
> 
> For some reason I believe solr is not able to find the replica in the
> clusterstate and its causing the delete activity, not really sure on why
> its not able to find it in clusterstate, I think due to some issue looks
> like first clusterstate is getting wiped out and then slowly rest the cores
> are getting deleted themselves.
> 
> As you asked I did cross checked once again on the port numbers and I am
> using 2181 as a clientport and the same is what I see in the dashboard
> screen of solr for ZKHOST., not really sure on how can I prevent this going
> forward. One thing here is that I am using Solr basic AUTHENTICATION plugin
> if it makes any difference.
> 
> On Sat, Aug 22, 2020 at 11:55 AM Erick Erickson 
> wrote:
> 
>> Autopurge shouldn’t matter, that’s just cleaning up old snapshots. That
>> is, it should be configured, but having it enabled or not should have no
>> bearing on your data disappearing.
>> 
>> Also, are you absolutely certain that you are using your external ZK?
>> Check the port on the admin screen. 9983 is the default for embededded ZK.
>> 
>> All that said, nothing in Solr just deletes all this. The fact that you
>> only saw this on reboot is highly suspicious, some external-to-Solr
>> process, anything from a startup script to restoring a disk image to…. is
>> removing that data I suspect.
>> 
>> Best,
>> Erick
>> 
>>> On Aug 22, 2020, at 9:24 AM, yaswanth kumar 
>> wrote:
>>> 
>>> Thanks Eric for looking into this..
>>> 
>>> But as I said before I confirmed 

Re: All cores gone along with all solr configuration upon reboot

2020-08-23 Thread yaswanth kumar
Hi Erick,

Here is the latest most error that I captured which seems to be actually
deleting the cores ( I did noticed that the core folders under the path
../solr/server/solr were deleted one by one when the server came back from
reboot)

2020-08-24 04:41:27.424 ERROR
(coreContainerWorkExecutor-2-thread-1-processing-n:9.70.170.51:8080_solr) [
  ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on$
at
org.apache.solr.cloud.ZkController.checkStateInZk(ZkController.java:1875)

*org.apache.solr.cloud.ZkController$NotInClusterStateException:
coreNodeName core_node3 does not exist in shard shard1, ignore the
exception if the replica was deleted*at
org.apache.solr.cloud.ZkController.checkStateInZk(ZkController.java:1875)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19$
at
org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1774)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15$
at
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1238)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 201$
at
org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:756)
~[solr-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19$
at
org.apache.solr.core.CoreContainer$$Lambda$343/.call(Unknown
Source) ~[?:?]
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
~[metrics-core-4.0.5.jar:4.0.5]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e$
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown
Source) ~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
~[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]

For some reason I believe solr is not able to find the replica in the
clusterstate and its causing the delete activity, not really sure on why
its not able to find it in clusterstate, I think due to some issue looks
like first clusterstate is getting wiped out and then slowly rest the cores
are getting deleted themselves.

As you asked I did cross checked once again on the port numbers and I am
using 2181 as a clientport and the same is what I see in the dashboard
screen of solr for ZKHOST., not really sure on how can I prevent this going
forward. One thing here is that I am using Solr basic AUTHENTICATION plugin
if it makes any difference.

On Sat, Aug 22, 2020 at 11:55 AM Erick Erickson 
wrote:

> Autopurge shouldn’t matter, that’s just cleaning up old snapshots. That
> is, it should be configured, but having it enabled or not should have no
> bearing on your data disappearing.
>
> Also, are you absolutely certain that you are using your external ZK?
> Check the port on the admin screen. 9983 is the default for embededded ZK.
>
> All that said, nothing in Solr just deletes all this. The fact that you
> only saw this on reboot is highly suspicious, some external-to-Solr
> process, anything from a startup script to restoring a disk image to…. is
> removing that data I suspect.
>
> Best,
> Erick
>
> > On Aug 22, 2020, at 9:24 AM, yaswanth kumar 
> wrote:
> >
> > Thanks Eric for looking into this..
> >
> > But as I said before I confirmed that the paths in zookeeper were
> changed to local path than the /tmp that comes default with package. Does
> the zoo.cfg need to have autopurge settings ??which I don’t have in my
> config
> >
> > Also I did make sure that zoo.cfg inside solr and my external zoo are
> pointing to the same and have same configs if it matters.
> >
> > Sent from my iPhone
> >
> >> On Aug 22, 2020, at 9:07 AM, Erick Erickson 
> wrote:
> >>
> >> Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults
> to putting its data in /tmp/zookeeper, see the zookeeper config file. And,
> of course, when you reboot it goes away.
> >>
> >> I’ve always disliked this, but the Zookeeper folks did it that way. So
> if you just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under
> Solr’s control.
> >>
> >> As for how to recover, assuming you put your configsets in some kind of
> version control as we recommend:
> >>
> >> 0> set up Zookeeper to keep it’s data somewhere permanent. You may want
> to archive snapshots upon occasion as well.
> >>
> >> 1> save away the data directory for _one_ replica from each shard of
> every collection somewhere. You should have a bunch of directories like
> SOLR_HOME/…./collection1_shard1_replica_n1/data.
> >>
> >> 2> recreate 

Re: All cores gone along with all solr configuration upon reboot

2020-08-22 Thread Erick Erickson
Autopurge shouldn’t matter, that’s just cleaning up old snapshots. That is, it 
should be configured, but having it enabled or not should have no bearing on 
your data disappearing.

Also, are you absolutely certain that you are using your external ZK? Check the 
port on the admin screen. 9983 is the default for embededded ZK.

All that said, nothing in Solr just deletes all this. The fact that you only 
saw this on reboot is highly suspicious, some external-to-Solr process, 
anything from a startup script to restoring a disk image to…. is removing that 
data I suspect.

Best,
Erick

> On Aug 22, 2020, at 9:24 AM, yaswanth kumar  wrote:
> 
> Thanks Eric for looking into this..
> 
> But as I said before I confirmed that the paths in zookeeper were changed to 
> local path than the /tmp that comes default with package. Does the zoo.cfg 
> need to have autopurge settings ??which I don’t have in my config
> 
> Also I did make sure that zoo.cfg inside solr and my external zoo are 
> pointing to the same and have same configs if it matters.
> 
> Sent from my iPhone
> 
>> On Aug 22, 2020, at 9:07 AM, Erick Erickson  wrote:
>> 
>> Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults to 
>> putting its data in /tmp/zookeeper, see the zookeeper config file. And, of 
>> course, when you reboot it goes away.
>> 
>> I’ve always disliked this, but the Zookeeper folks did it that way. So if 
>> you just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under 
>> Solr’s control.
>> 
>> As for how to recover, assuming you put your configsets in some kind of 
>> version control as we recommend:
>> 
>> 0> set up Zookeeper to keep it’s data somewhere permanent. You may want to 
>> archive snapshots upon occasion as well.
>> 
>> 1> save away the data directory for _one_ replica from each shard of every 
>> collection somewhere. You should have a bunch of directories like 
>> SOLR_HOME/…./collection1_shard1_replica_n1/data.
>> 
>> 2> recreate all your collections with leader-only new collections with the 
>> exact same number of shards, i.e. shards with only a single replica.
>> 
>> 3> shut down all your Solr instances
>> 
>> 4> copy the data directories you saved in <2>. You _MUST_ copy to 
>> corresponding shards. The important bit is that a data directory from 
>> collection1_shard1 goes back to collection1_shard1. If you copy it back to 
>> collection1_shard2 Bad Things Happen. Actually, I’d delete the target data 
>> directories first and then copy.
>> 
>> 5> restart your Solr instances and verify they look OK.
>> 
>> 6> use the collections API ADDREPLICA to build out your collections.
>> 
>> Best,
>> Erick
>> 
>>> On Aug 22, 2020, at 12:10 AM, yaswanth kumar  wrote:
>>> 
>>> Can someone help me on the below issue??
>>> 
>>> I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes
>>> 
>>> All the configs were pushed initially and Also Indexed all the data into 
>>> multiple collections with 3 replicas on each collection 
>>> 
>>> Now part of server maintenance these solr nodes were restarted and once 
>>> they came back solr could became empty.. lost all the collections .. all 
>>> collections specific instance directories  in the path /solr/server/solr 
>>> Were deleted ..but data folders are intact nothing lost.. not really sure 
>>> on how to recover from this situation.
>>> 
>>> Did make sure that the zoo.cfg was properly configured (permanent paths for 
>>> zoo data and logs instead of /tmp )as I am using external zoo instead of 
>>> the one that comes with solr.
>>> 
>>> Solr data path is a nas storage which is a common for all three solr nodes
>>> 
>>> Another data point is that I enabled solr basic authentication as well if 
>>> that’s making any difference. Even clusterstate , schema’s, security Json 
>>> were all lost.. really looking for a help in understanding to prevent this 
>>> happening again.
>>> 
>>> Sent from my iPhone
>> 



Re: All cores gone along with all solr configuration upon reboot

2020-08-22 Thread yaswanth kumar
Thanks Eric for looking into this..

But as I said before I confirmed that the paths in zookeeper were changed to 
local path than the /tmp that comes default with package. Does the zoo.cfg need 
to have autopurge settings ??which I don’t have in my config

Also I did make sure that zoo.cfg inside solr and my external zoo are pointing 
to the same and have same configs if it matters.

Sent from my iPhone

> On Aug 22, 2020, at 9:07 AM, Erick Erickson  wrote:
> 
> Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults to 
> putting its data in /tmp/zookeeper, see the zookeeper config file. And, of 
> course, when you reboot it goes away.
> 
> I’ve always disliked this, but the Zookeeper folks did it that way. So if you 
> just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under Solr’s 
> control.
> 
> As for how to recover, assuming you put your configsets in some kind of 
> version control as we recommend:
> 
> 0> set up Zookeeper to keep it’s data somewhere permanent. You may want to 
> archive snapshots upon occasion as well.
> 
> 1> save away the data directory for _one_ replica from each shard of every 
> collection somewhere. You should have a bunch of directories like 
> SOLR_HOME/…./collection1_shard1_replica_n1/data.
> 
> 2> recreate all your collections with leader-only new collections with the 
> exact same number of shards, i.e. shards with only a single replica.
> 
> 3> shut down all your Solr instances
> 
> 4> copy the data directories you saved in <2>. You _MUST_ copy to 
> corresponding shards. The important bit is that a data directory from 
> collection1_shard1 goes back to collection1_shard1. If you copy it back to 
> collection1_shard2 Bad Things Happen. Actually, I’d delete the target data 
> directories first and then copy.
> 
> 5> restart your Solr instances and verify they look OK.
> 
> 6> use the collections API ADDREPLICA to build out your collections.
> 
> Best,
> Erick
> 
>> On Aug 22, 2020, at 12:10 AM, yaswanth kumar  wrote:
>> 
>> Can someone help me on the below issue??
>> 
>> I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes
>> 
>> All the configs were pushed initially and Also Indexed all the data into 
>> multiple collections with 3 replicas on each collection 
>> 
>> Now part of server maintenance these solr nodes were restarted and once they 
>> came back solr could became empty.. lost all the collections .. all 
>> collections specific instance directories  in the path /solr/server/solr 
>> Were deleted ..but data folders are intact nothing lost.. not really sure on 
>> how to recover from this situation.
>> 
>> Did make sure that the zoo.cfg was properly configured (permanent paths for 
>> zoo data and logs instead of /tmp )as I am using external zoo instead of the 
>> one that comes with solr.
>> 
>> Solr data path is a nas storage which is a common for all three solr nodes
>> 
>> Another data point is that I enabled solr basic authentication as well if 
>> that’s making any difference. Even clusterstate , schema’s, security Json 
>> were all lost.. really looking for a help in understanding to prevent this 
>> happening again.
>> 
>> Sent from my iPhone
> 


Re: All cores gone along with all solr configuration upon reboot

2020-08-22 Thread Erick Erickson
Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults to putting 
its data in /tmp/zookeeper, see the zookeeper config file. And, of course, when 
you reboot it goes away.

I’ve always disliked this, but the Zookeeper folks did it that way. So if you 
just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under Solr’s 
control.

As for how to recover, assuming you put your configsets in some kind of version 
control as we recommend:

0> set up Zookeeper to keep it’s data somewhere permanent. You may want to 
archive snapshots upon occasion as well.

1> save away the data directory for _one_ replica from each shard of every 
collection somewhere. You should have a bunch of directories like 
SOLR_HOME/…./collection1_shard1_replica_n1/data.

2> recreate all your collections with leader-only new collections with the 
exact same number of shards, i.e. shards with only a single replica.

3> shut down all your Solr instances

4> copy the data directories you saved in <2>. You _MUST_ copy to corresponding 
shards. The important bit is that a data directory from collection1_shard1 goes 
back to collection1_shard1. If you copy it back to collection1_shard2 Bad 
Things Happen. Actually, I’d delete the target data directories first and then 
copy.

5> restart your Solr instances and verify they look OK.

6> use the collections API ADDREPLICA to build out your collections.

Best,
Erick

> On Aug 22, 2020, at 12:10 AM, yaswanth kumar  wrote:
> 
> Can someone help me on the below issue??
> 
> I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes
> 
> All the configs were pushed initially and Also Indexed all the data into 
> multiple collections with 3 replicas on each collection 
> 
> Now part of server maintenance these solr nodes were restarted and once they 
> came back solr could became empty.. lost all the collections .. all 
> collections specific instance directories  in the path /solr/server/solr Were 
> deleted ..but data folders are intact nothing lost.. not really sure on how 
> to recover from this situation.
> 
> Did make sure that the zoo.cfg was properly configured (permanent paths for 
> zoo data and logs instead of /tmp )as I am using external zoo instead of the 
> one that comes with solr.
> 
> Solr data path is a nas storage which is a common for all three solr nodes
> 
> Another data point is that I enabled solr basic authentication as well if 
> that’s making any difference. Even clusterstate , schema’s, security Json 
> were all lost.. really looking for a help in understanding to prevent this 
> happening again.
> 
> Sent from my iPhone



All cores gone along with all solr configuration upon reboot

2020-08-21 Thread yaswanth kumar
Can someone help me on the below issue??

I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes

All the configs were pushed initially and Also Indexed all the data into 
multiple collections with 3 replicas on each collection 

Now part of server maintenance these solr nodes were restarted and once they 
came back solr could became empty.. lost all the collections .. all collections 
specific instance directories  in the path /solr/server/solr Were deleted ..but 
data folders are intact nothing lost.. not really sure on how to recover from 
this situation.

Did make sure that the zoo.cfg was properly configured (permanent paths for zoo 
data and logs instead of /tmp )as I am using external zoo instead of the one 
that comes with solr.

Solr data path is a nas storage which is a common for all three solr nodes

Another data point is that I enabled solr basic authentication as well if 
that’s making any difference. Even clusterstate , schema’s, security Json were 
all lost.. really looking for a help in understanding to prevent this happening 
again.

Sent from my iPhone