[jira] [Updated] (SOLR-8599) Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent state

2016-04-21 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-8599:
---
Fix Version/s: 5.5.1
   6.0
   master

> Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent 
> state
> ---
>
> Key: SOLR-8599
> URL: https://issues.apache.org/jira/browse/SOLR-8599
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Keith Laban
>Assignee: Dennis Gove
> Fix For: master, 6.0, 5.5.1
>
> Attachments: SOLR-8599.patch, SOLR-8599.patch, SOLR-8599.patch, 
> SOLR-8599.patch
>
>
> We originally saw this happen due to a DNS exception (see stack trace below). 
> Although any exception thrown in the constructor of SolrZooKeeper or the 
> parent class, ZooKeeper, will cause DefaultConnectionStrategy to fail to 
> update the zookeeper client. Once it gets into this state, it will not try to 
> connect again until the process is restarted. The node itself will also 
> respond successfully to query requests, but not to update requests.
> Two things should be address here:
> 1) Fix the error handling and issue some number of retries
> 2) If we are stuck in a state like this stop responding to all requests 
> {code}
> 2016-01-23 13:49:20.222 ERROR ConnectionManager [main-EventThread] - 
> :java.net.UnknownHostException: HOSTNAME: unknown error
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
> at org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-23 13:49:20.222 INFO ConnectionManager [main-EventThread] - 
> Connected:false
> 2016-01-23 13:49:20.222 INFO ClientCnxn [main-EventThread] - EventThread shut 
> down
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8599) Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent state

2016-03-10 Thread Keith Laban (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Laban updated SOLR-8599:
--
Attachment: SOLR-8599.patch

uploaded a new patch against master addressing Uwes concerns. I see that this 
was already merged in master, but that seems to be the only branch its 
currently on so hopefully its ok to put this out there.

Instead of scheduling a thread to change the address I was able to mock the 
DefaultConnectionStrategy to throw an exception in the reconnect call the first 
time its called to verify that it is being retried. With this change I also 
changed the server address to be final again and removed the public setter 
method seeing how they were no longer needed

> Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent 
> state
> ---
>
> Key: SOLR-8599
> URL: https://issues.apache.org/jira/browse/SOLR-8599
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Keith Laban
>Assignee: Dennis Gove
> Attachments: SOLR-8599.patch, SOLR-8599.patch, SOLR-8599.patch, 
> SOLR-8599.patch
>
>
> We originally saw this happen due to a DNS exception (see stack trace below). 
> Although any exception thrown in the constructor of SolrZooKeeper or the 
> parent class, ZooKeeper, will cause DefaultConnectionStrategy to fail to 
> update the zookeeper client. Once it gets into this state, it will not try to 
> connect again until the process is restarted. The node itself will also 
> respond successfully to query requests, but not to update requests.
> Two things should be address here:
> 1) Fix the error handling and issue some number of retries
> 2) If we are stuck in a state like this stop responding to all requests 
> {code}
> 2016-01-23 13:49:20.222 ERROR ConnectionManager [main-EventThread] - 
> :java.net.UnknownHostException: HOSTNAME: unknown error
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
> at org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-23 13:49:20.222 INFO ConnectionManager [main-EventThread] - 
> Connected:false
> 2016-01-23 13:49:20.222 INFO ClientCnxn [main-EventThread] - EventThread shut 
> down
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8599) Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent state

2016-02-22 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8599:
--
Attachment: SOLR-8599.patch

New patch with slightly updated test. This now causes an exception due to an 
invalid url instead of a DNS failure. It exposes the same issue being fixed and 
the test shows that the patch does in fact resolve it.

> Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent 
> state
> ---
>
> Key: SOLR-8599
> URL: https://issues.apache.org/jira/browse/SOLR-8599
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Keith Laban
>Assignee: Dennis Gove
> Attachments: SOLR-8599.patch, SOLR-8599.patch, SOLR-8599.patch
>
>
> We originally saw this happen due to a DNS exception (see stack trace below). 
> Although any exception thrown in the constructor of SolrZooKeeper or the 
> parent class, ZooKeeper, will cause DefaultConnectionStrategy to fail to 
> update the zookeeper client. Once it gets into this state, it will not try to 
> connect again until the process is restarted. The node itself will also 
> respond successfully to query requests, but not to update requests.
> Two things should be address here:
> 1) Fix the error handling and issue some number of retries
> 2) If we are stuck in a state like this stop responding to all requests 
> {code}
> 2016-01-23 13:49:20.222 ERROR ConnectionManager [main-EventThread] - 
> :java.net.UnknownHostException: HOSTNAME: unknown error
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
> at org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-23 13:49:20.222 INFO ConnectionManager [main-EventThread] - 
> Connected:false
> 2016-01-23 13:49:20.222 INFO ClientCnxn [main-EventThread] - EventThread shut 
> down
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8599) Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent state

2016-02-11 Thread Keith Laban (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Laban updated SOLR-8599:
--
Attachment: SOLR-8599.patch

added usage of DefaultSolrThreadFactory to test

> Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent 
> state
> ---
>
> Key: SOLR-8599
> URL: https://issues.apache.org/jira/browse/SOLR-8599
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Keith Laban
> Attachments: SOLR-8599.patch, SOLR-8599.patch
>
>
> We originally saw this happen due to a DNS exception (see stack trace below). 
> Although any exception thrown in the constructor of SolrZooKeeper or the 
> parent class, ZooKeeper, will cause DefaultConnectionStrategy to fail to 
> update the zookeeper client. Once it gets into this state, it will not try to 
> connect again until the process is restarted. The node itself will also 
> respond successfully to query requests, but not to update requests.
> Two things should be address here:
> 1) Fix the error handling and issue some number of retries
> 2) If we are stuck in a state like this stop responding to all requests 
> {code}
> 2016-01-23 13:49:20.222 ERROR ConnectionManager [main-EventThread] - 
> :java.net.UnknownHostException: HOSTNAME: unknown error
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
> at org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-23 13:49:20.222 INFO ConnectionManager [main-EventThread] - 
> Connected:false
> 2016-01-23 13:49:20.222 INFO ClientCnxn [main-EventThread] - EventThread shut 
> down
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8599) Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent state

2016-01-27 Thread Keith Laban (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Laban updated SOLR-8599:
--
Attachment: SOLR-8599.patch

Added a patch to address issue #1. In this patch I had to remove {{final}} 
clause for {{ZkServerAddress}} and add a public setter method in order to test 
the issue

> Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent 
> state
> ---
>
> Key: SOLR-8599
> URL: https://issues.apache.org/jira/browse/SOLR-8599
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Keith Laban
> Attachments: SOLR-8599.patch
>
>
> We originally saw this happen due to a DNS exception (see stack trace below). 
> Although any exception thrown in the constructor of SolrZooKeeper or the 
> parent class, ZooKeeper, will cause DefaultConnectionStrategy to fail to 
> update the zookeeper client. Once it gets into this state, it will not try to 
> connect again until the process is restarted. The node itself will also 
> respond successfully to query requests, but not to update requests.
> Two things should be address here:
> 1) Fix the error handling and issue some number of retries
> 2) If we are stuck in a state like this stop responding to all requests 
> {code}
> 2016-01-23 13:49:20.222 ERROR ConnectionManager [main-EventThread] - 
> :java.net.UnknownHostException: HOSTNAME: unknown error
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
> at org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-23 13:49:20.222 INFO ConnectionManager [main-EventThread] - 
> Connected:false
> 2016-01-23 13:49:20.222 INFO ClientCnxn [main-EventThread] - EventThread shut 
> down
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8599) Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent state

2016-01-26 Thread Keith Laban (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Laban updated SOLR-8599:
--
Summary: Errors in construction of SolrZooKeeper cause Solr to go into an 
inconsistent state  (was: Errors in construction of SolrZooKeeper cause  Solr 
to into inconsistent state)

> Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent 
> state
> ---
>
> Key: SOLR-8599
> URL: https://issues.apache.org/jira/browse/SOLR-8599
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Keith Laban
>
> We originally saw this happen due to a DNS exception (see stack trace below). 
> Although any exception thrown in the constructor of SolrZooKeeper or the 
> parent class, ZooKeeper, will cause DefaultConnectionStrategy to fail to 
> update the zookeeper client. Once it gets into this state, it will not try to 
> connect again until the process is restarted. The node itself will also 
> respond successfully to query requests, but not to update requests.
> Two things should be address here:
> 1) Fix the error handling and issue some number of retries
> 2) If we are stuck in a state like this stop responding to all requests 
> {code}
> 2016-01-23 13:49:20.222 ERROR ConnectionManager [main-EventThread] - 
> :java.net.UnknownHostException: HOSTNAME: unknown error
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
> at org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-23 13:49:20.222 INFO ConnectionManager [main-EventThread] - 
> Connected:false
> 2016-01-23 13:49:20.222 INFO ClientCnxn [main-EventThread] - EventThread shut 
> down
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org