Re: Solr node not found in ZK live_nodes

2016-12-08 Thread Susheel Kumar
This happens some time that one of the node goes down but then it gets
registered as Leader/Active.  Does the Cloud View shows anything about this
node (Recovering/Down/Recovery Failed etc.) and are you able to perform
query to just this shard/node directly?

Susheel

On Wed, Dec 7, 2016 at 10:13 PM, Mark Miller  wrote:

> That already happens. The ZK client itself will reconnect when it can and
> trigger everything to be setup like when the cluster first starts up,
> including a live node and leader election, etc.
>
> You may have hit a bug or something else missing from this conversation,
> but reconnecting after losing the ZK connection is a basic feature from day
> one.
>
> Mark
> On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada 
> wrote:
>
> > Thanks Erick! Should I create a JIRA issue for the same?
> >
> > Regarding the logs, I have changed the log level to WARN. That may be the
> > reason, I couldn't get anything from it.
> >
> > Thanks,
> > Manohar
> >
> > On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson 
> > wrote:
> >
> > > Most likely reason is that the Solr node in question,
> > > was not reachable thus it was removed from
> > > live_nodes. Perhaps due to temporary network
> > > glitch, long GC pause or the like. If you're rolling
> > > your logs over it's quite possible that any illuminating
> > > messages were lost. The default 4M size for each
> > > log is quite lo at INFO level...
> > >
> > > It does seem possible for a Solr node to periodically
> > > check its status and re-insert itself into live_nodes,
> > > go through recovery and all that. So far most of that
> > > registration logic is baked into startup code. What
> > > do others think? Worth a JIRA?
> > >
> > > Erick
> > >
> > > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada 
> > > wrote:
> > > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper
> (3.4.6).
> > > >
> > > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> > > setup
> > > > was done 3 months back. Suddenly, few days back our search started
> > > failing
> > > > because one of the solr node(consider s16) was not seen in Zookeeper,
> > > i.e.,
> > > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not
> found.
> > > > However, the corresponding Solr process was up and running.
> > > >
> > > > To my surprise, I couldn't find any errors or warnings in solr or
> > > zookeeper
> > > > logs related to this. I have few questions -
> > > >
> > > > 1. Is there any reason why this registration to ZK was lost? I know
> > logs
> > > > should provide some information, but, it didn't. Did anyone
> encountered
> > > > similar issue, if so, what can be the root cause?
> > > > 2. Shouldn't Solr be clever enough to detect that the registration to
> > ZK
> > > > was lost (for some reason) and should try to re-register again?
> > > >
> > > > PS: The issue is resolved by restarting the Solr node. However, I am
> > > > curious to know why it happened in the first place.
> > > >
> > > > Thanks
> > >
> >
> --
> - Mark
> about.me/markrmiller
>


Re: Solr node not found in ZK live_nodes

2016-12-07 Thread Mark Miller
That already happens. The ZK client itself will reconnect when it can and
trigger everything to be setup like when the cluster first starts up,
including a live node and leader election, etc.

You may have hit a bug or something else missing from this conversation,
but reconnecting after losing the ZK connection is a basic feature from day
one.

Mark
On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada 
wrote:

> Thanks Erick! Should I create a JIRA issue for the same?
>
> Regarding the logs, I have changed the log level to WARN. That may be the
> reason, I couldn't get anything from it.
>
> Thanks,
> Manohar
>
> On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson 
> wrote:
>
> > Most likely reason is that the Solr node in question,
> > was not reachable thus it was removed from
> > live_nodes. Perhaps due to temporary network
> > glitch, long GC pause or the like. If you're rolling
> > your logs over it's quite possible that any illuminating
> > messages were lost. The default 4M size for each
> > log is quite lo at INFO level...
> >
> > It does seem possible for a Solr node to periodically
> > check its status and re-insert itself into live_nodes,
> > go through recovery and all that. So far most of that
> > registration logic is baked into startup code. What
> > do others think? Worth a JIRA?
> >
> > Erick
> >
> > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada 
> > wrote:
> > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).
> > >
> > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> > setup
> > > was done 3 months back. Suddenly, few days back our search started
> > failing
> > > because one of the solr node(consider s16) was not seen in Zookeeper,
> > i.e.,
> > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
> > > However, the corresponding Solr process was up and running.
> > >
> > > To my surprise, I couldn't find any errors or warnings in solr or
> > zookeeper
> > > logs related to this. I have few questions -
> > >
> > > 1. Is there any reason why this registration to ZK was lost? I know
> logs
> > > should provide some information, but, it didn't. Did anyone encountered
> > > similar issue, if so, what can be the root cause?
> > > 2. Shouldn't Solr be clever enough to detect that the registration to
> ZK
> > > was lost (for some reason) and should try to re-register again?
> > >
> > > PS: The issue is resolved by restarting the Solr node. However, I am
> > > curious to know why it happened in the first place.
> > >
> > > Thanks
> >
>
-- 
- Mark
about.me/markrmiller


Re: Solr node not found in ZK live_nodes

2016-12-06 Thread Manohar Sripada
Thanks Erick! Should I create a JIRA issue for the same?

Regarding the logs, I have changed the log level to WARN. That may be the
reason, I couldn't get anything from it.

Thanks,
Manohar

On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson 
wrote:

> Most likely reason is that the Solr node in question,
> was not reachable thus it was removed from
> live_nodes. Perhaps due to temporary network
> glitch, long GC pause or the like. If you're rolling
> your logs over it's quite possible that any illuminating
> messages were lost. The default 4M size for each
> log is quite lo at INFO level...
>
> It does seem possible for a Solr node to periodically
> check its status and re-insert itself into live_nodes,
> go through recovery and all that. So far most of that
> registration logic is baked into startup code. What
> do others think? Worth a JIRA?
>
> Erick
>
> On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada 
> wrote:
> > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).
> >
> > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> setup
> > was done 3 months back. Suddenly, few days back our search started
> failing
> > because one of the solr node(consider s16) was not seen in Zookeeper,
> i.e.,
> > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
> > However, the corresponding Solr process was up and running.
> >
> > To my surprise, I couldn't find any errors or warnings in solr or
> zookeeper
> > logs related to this. I have few questions -
> >
> > 1. Is there any reason why this registration to ZK was lost? I know logs
> > should provide some information, but, it didn't. Did anyone encountered
> > similar issue, if so, what can be the root cause?
> > 2. Shouldn't Solr be clever enough to detect that the registration to ZK
> > was lost (for some reason) and should try to re-register again?
> >
> > PS: The issue is resolved by restarting the Solr node. However, I am
> > curious to know why it happened in the first place.
> >
> > Thanks
>


Re: Solr node not found in ZK live_nodes

2016-12-06 Thread Erick Erickson
Most likely reason is that the Solr node in question,
was not reachable thus it was removed from
live_nodes. Perhaps due to temporary network
glitch, long GC pause or the like. If you're rolling
your logs over it's quite possible that any illuminating
messages were lost. The default 4M size for each
log is quite lo at INFO level...

It does seem possible for a Solr node to periodically
check its status and re-insert itself into live_nodes,
go through recovery and all that. So far most of that
registration logic is baked into startup code. What
do others think? Worth a JIRA?

Erick

On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada  wrote:
> We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).
>
> All the Solr nodes were registered to Zookeeper (ls /live_nodes) when setup
> was done 3 months back. Suddenly, few days back our search started failing
> because one of the solr node(consider s16) was not seen in Zookeeper, i.e.,
> when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
> However, the corresponding Solr process was up and running.
>
> To my surprise, I couldn't find any errors or warnings in solr or zookeeper
> logs related to this. I have few questions -
>
> 1. Is there any reason why this registration to ZK was lost? I know logs
> should provide some information, but, it didn't. Did anyone encountered
> similar issue, if so, what can be the root cause?
> 2. Shouldn't Solr be clever enough to detect that the registration to ZK
> was lost (for some reason) and should try to re-register again?
>
> PS: The issue is resolved by restarting the Solr node. However, I am
> curious to know why it happened in the first place.
>
> Thanks


Solr node not found in ZK live_nodes

2016-12-06 Thread Manohar Sripada
We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).

All the Solr nodes were registered to Zookeeper (ls /live_nodes) when setup
was done 3 months back. Suddenly, few days back our search started failing
because one of the solr node(consider s16) was not seen in Zookeeper, i.e.,
when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
However, the corresponding Solr process was up and running.

To my surprise, I couldn't find any errors or warnings in solr or zookeeper
logs related to this. I have few questions -

1. Is there any reason why this registration to ZK was lost? I know logs
should provide some information, but, it didn't. Did anyone encountered
similar issue, if so, what can be the root cause?
2. Shouldn't Solr be clever enough to detect that the registration to ZK
was lost (for some reason) and should try to re-register again?

PS: The issue is resolved by restarting the Solr node. However, I am
curious to know why it happened in the first place.

Thanks