Re: Solr node not found in ZK live_nodes
This happens some time that one of the node goes down but then it gets registered as Leader/Active. Does the Cloud View shows anything about this node (Recovering/Down/Recovery Failed etc.) and are you able to perform query to just this shard/node directly? Susheel On Wed, Dec 7, 2016 at 10:13 PM, Mark Miller wrote: > That already happens. The ZK client itself will reconnect when it can and > trigger everything to be setup like when the cluster first starts up, > including a live node and leader election, etc. > > You may have hit a bug or something else missing from this conversation, > but reconnecting after losing the ZK connection is a basic feature from day > one. > > Mark > On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada > wrote: > > > Thanks Erick! Should I create a JIRA issue for the same? > > > > Regarding the logs, I have changed the log level to WARN. That may be the > > reason, I couldn't get anything from it. > > > > Thanks, > > Manohar > > > > On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson > > wrote: > > > > > Most likely reason is that the Solr node in question, > > > was not reachable thus it was removed from > > > live_nodes. Perhaps due to temporary network > > > glitch, long GC pause or the like. If you're rolling > > > your logs over it's quite possible that any illuminating > > > messages were lost. The default 4M size for each > > > log is quite lo at INFO level... > > > > > > It does seem possible for a Solr node to periodically > > > check its status and re-insert itself into live_nodes, > > > go through recovery and all that. So far most of that > > > registration logic is baked into startup code. What > > > do others think? Worth a JIRA? > > > > > > Erick > > > > > > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada > > > wrote: > > > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper > (3.4.6). > > > > > > > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when > > > setup > > > > was done 3 months back. Suddenly, few days back our search started > > > failing > > > > because one of the solr node(consider s16) was not seen in Zookeeper, > > > i.e., > > > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not > found. > > > > However, the corresponding Solr process was up and running. > > > > > > > > To my surprise, I couldn't find any errors or warnings in solr or > > > zookeeper > > > > logs related to this. I have few questions - > > > > > > > > 1. Is there any reason why this registration to ZK was lost? I know > > logs > > > > should provide some information, but, it didn't. Did anyone > encountered > > > > similar issue, if so, what can be the root cause? > > > > 2. Shouldn't Solr be clever enough to detect that the registration to > > ZK > > > > was lost (for some reason) and should try to re-register again? > > > > > > > > PS: The issue is resolved by restarting the Solr node. However, I am > > > > curious to know why it happened in the first place. > > > > > > > > Thanks > > > > > > -- > - Mark > about.me/markrmiller >
Re: Solr node not found in ZK live_nodes
That already happens. The ZK client itself will reconnect when it can and trigger everything to be setup like when the cluster first starts up, including a live node and leader election, etc. You may have hit a bug or something else missing from this conversation, but reconnecting after losing the ZK connection is a basic feature from day one. Mark On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada wrote: > Thanks Erick! Should I create a JIRA issue for the same? > > Regarding the logs, I have changed the log level to WARN. That may be the > reason, I couldn't get anything from it. > > Thanks, > Manohar > > On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson > wrote: > > > Most likely reason is that the Solr node in question, > > was not reachable thus it was removed from > > live_nodes. Perhaps due to temporary network > > glitch, long GC pause or the like. If you're rolling > > your logs over it's quite possible that any illuminating > > messages were lost. The default 4M size for each > > log is quite lo at INFO level... > > > > It does seem possible for a Solr node to periodically > > check its status and re-insert itself into live_nodes, > > go through recovery and all that. So far most of that > > registration logic is baked into startup code. What > > do others think? Worth a JIRA? > > > > Erick > > > > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada > > wrote: > > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6). > > > > > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when > > setup > > > was done 3 months back. Suddenly, few days back our search started > > failing > > > because one of the solr node(consider s16) was not seen in Zookeeper, > > i.e., > > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found. > > > However, the corresponding Solr process was up and running. > > > > > > To my surprise, I couldn't find any errors or warnings in solr or > > zookeeper > > > logs related to this. I have few questions - > > > > > > 1. Is there any reason why this registration to ZK was lost? I know > logs > > > should provide some information, but, it didn't. Did anyone encountered > > > similar issue, if so, what can be the root cause? > > > 2. Shouldn't Solr be clever enough to detect that the registration to > ZK > > > was lost (for some reason) and should try to re-register again? > > > > > > PS: The issue is resolved by restarting the Solr node. However, I am > > > curious to know why it happened in the first place. > > > > > > Thanks > > > -- - Mark about.me/markrmiller
Re: Solr node not found in ZK live_nodes
Thanks Erick! Should I create a JIRA issue for the same? Regarding the logs, I have changed the log level to WARN. That may be the reason, I couldn't get anything from it. Thanks, Manohar On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson wrote: > Most likely reason is that the Solr node in question, > was not reachable thus it was removed from > live_nodes. Perhaps due to temporary network > glitch, long GC pause or the like. If you're rolling > your logs over it's quite possible that any illuminating > messages were lost. The default 4M size for each > log is quite lo at INFO level... > > It does seem possible for a Solr node to periodically > check its status and re-insert itself into live_nodes, > go through recovery and all that. So far most of that > registration logic is baked into startup code. What > do others think? Worth a JIRA? > > Erick > > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada > wrote: > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6). > > > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when > setup > > was done 3 months back. Suddenly, few days back our search started > failing > > because one of the solr node(consider s16) was not seen in Zookeeper, > i.e., > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found. > > However, the corresponding Solr process was up and running. > > > > To my surprise, I couldn't find any errors or warnings in solr or > zookeeper > > logs related to this. I have few questions - > > > > 1. Is there any reason why this registration to ZK was lost? I know logs > > should provide some information, but, it didn't. Did anyone encountered > > similar issue, if so, what can be the root cause? > > 2. Shouldn't Solr be clever enough to detect that the registration to ZK > > was lost (for some reason) and should try to re-register again? > > > > PS: The issue is resolved by restarting the Solr node. However, I am > > curious to know why it happened in the first place. > > > > Thanks >
Re: Solr node not found in ZK live_nodes
Most likely reason is that the Solr node in question, was not reachable thus it was removed from live_nodes. Perhaps due to temporary network glitch, long GC pause or the like. If you're rolling your logs over it's quite possible that any illuminating messages were lost. The default 4M size for each log is quite lo at INFO level... It does seem possible for a Solr node to periodically check its status and re-insert itself into live_nodes, go through recovery and all that. So far most of that registration logic is baked into startup code. What do others think? Worth a JIRA? Erick On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada wrote: > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6). > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when setup > was done 3 months back. Suddenly, few days back our search started failing > because one of the solr node(consider s16) was not seen in Zookeeper, i.e., > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found. > However, the corresponding Solr process was up and running. > > To my surprise, I couldn't find any errors or warnings in solr or zookeeper > logs related to this. I have few questions - > > 1. Is there any reason why this registration to ZK was lost? I know logs > should provide some information, but, it didn't. Did anyone encountered > similar issue, if so, what can be the root cause? > 2. Shouldn't Solr be clever enough to detect that the registration to ZK > was lost (for some reason) and should try to re-register again? > > PS: The issue is resolved by restarting the Solr node. However, I am > curious to know why it happened in the first place. > > Thanks
Solr node not found in ZK live_nodes
We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6). All the Solr nodes were registered to Zookeeper (ls /live_nodes) when setup was done 3 months back. Suddenly, few days back our search started failing because one of the solr node(consider s16) was not seen in Zookeeper, i.e., when we checked for *"ls /live_nodes"*, *s16 *solr node was not found. However, the corresponding Solr process was up and running. To my surprise, I couldn't find any errors or warnings in solr or zookeeper logs related to this. I have few questions - 1. Is there any reason why this registration to ZK was lost? I know logs should provide some information, but, it didn't. Did anyone encountered similar issue, if so, what can be the root cause? 2. Shouldn't Solr be clever enough to detect that the registration to ZK was lost (for some reason) and should try to re-register again? PS: The issue is resolved by restarting the Solr node. However, I am curious to know why it happened in the first place. Thanks